A Tree for TEA (tea-tree)


#1

With reference to the discussion on updating nodes in a tree:

This is what I propose to implement. I would be interested to hear your thoughts before getting stuck in to writing it:

module TeaTree exposing (..)

It will be a multiway Tree implementation, not a binary tree.

Every node will be marked with a unique id, so that re-walking the tree from a Path can be confirmed as correct. Walking a Path will produce a Maybe.

walkPath : Path -> Tree a -> Maybe (Zipper a)

The Tree will only be able to be manipulated through TreeExplorer or Zipper.

The Tree implementation data structure will not be exposed; it will be opaque.

This is so that all work on the Tree can carry around a Focus context; in particular, a current id stamp used to label all nodes with unique references. Attaching and removing this Focus context will happen within the implementation and not something that the caller has to remember and pass around.

Walking the zipper context back to the root will produce a Tree and a Path.

walkToRoot : Zipper a -> ( Tree a, Path )

The Path and Tree can be recombined to recover a previous position in the tree.

walkPath : Path -> Tree a -> Maybe (Zipper a)

This allows events to be tagged with Paths which describe a return to a previously visited position within a tree, without capturing any other data associated with that node. This is to circumvent the stale data issue when a user is interacting with a tree.

The contents of nodes in the tree will be held in an Array Id a. Ids will be assigned sequentially as Ints. This will allow mapping by id without re-walking a Path possible.

mapNode : Path -> (a -> a) -> Tree a -> Tree a

It will be necessary to re-walk paths when adding new nodes into the tree, as this is the only situation when fresh ids will need to be generated. Moving or deleting nodes will also require Paths to be re-walked.

This more complex scheme may allow a more efficient return to a single position within a Tree than using indexedMap. This should be tested; for example could a tree with hundreds of visible nodes have many of them animated in parallel or would there be too much garbage collection churn when running this data structure? What about Trees that have nodes continually added and removed? but perhaps that is too hard a problem to solve for now.


Update on Tea Trees
#2

A tree for tea should be a tree-tea. You missed an easy pun. :wink:


#3

I think tea-tree is more in keeping with the arborial theme.


#4

Not only that, but tea tree is a real tree whose oil is used in (some traditional but mostly non-traditional) medicine (Wikipedia).


I would not worry about efficiency for now: First make it correct, then maintainable, then efficient. And the common implementation of a Rose tree:

type RoseTree a = RoseTree a (List (RoseTree a))

is able to take advantage of the compiler’s cleverness with regard to lists.
Benchmarking can only happen after an initial implementation; do not second-guess yourself from the get-go :slight_smile:.


#5

Fair point. I was wondering if putting the node data in an Array is going to be worth it in terms of efficiency, after all an Array in Elm is already implemented as a tree. It seems likely that the tree used to back Array would be faster as it has been cleverly optimized, but as you say, can leave that off for now.

It does make a difference to the API, as I want to expose a mapNode (renamed it to udpateDatum) like this:

updateDatum : (a -> a) -> Path -> Tree a -> Tree a

and really, that API signature is only being chosen to allow a potentially more efficient implementation. Still, its not a bad API? and I can implement in the first instance by walking the path.


#6

The only difference between your earlier function signature and this one is the order of the first two parameters; even if one order would in practice be faster to recurse on, this does not have an impact on the order your user-facing public function have to have. Personally I find the former version (where Path is the first argument) more natural to read.

Side note about the implementation of Array: I believe that this makes Arrays as fast as a tree (in that case, IIRC a fixed-branching-size base-32 tree); it would not make your trees faster.


#7

I didn’t re-order them to make it faster - I know that makes no difference.

What I am trying to say is that this signature was included to allow a more efficient implementation, the alternative would be:

updateDatum : (a -> a) -> Zipper a -> Zipper a

which means the caller must first walk a Zipper to the right location. This signature allows the Path to take a short-cut, by using the id of the datum more directly:

updateDatum : (a -> a) -> Path -> Tree a -> Tree a

I put the function a -> a first as then it becomes slightly easier to write functions for particular update operations that are then applied to specific paths and trees:

togglePath = updateDatum toggleDatum

and so on. It is quite common with map functions to put the transformation function first.


#8

What does this tree have to do with the elm architecture?
I mean this would also make sense in a language like Haskell or ML without any relation to TEA. Or am I mistaken?


#9

The answer to that is found in this discussion:

It is designed to work nicely with the Elm update cycle, where deltas to the Model are passed as Msgs. The specific problems being addressed are, how to avoid capturing stale state in the Msgs, how to be certain when re-visiting a node that the right one is being re-visited, and how to do this efficiently. The first problem is solved by only passing a Path with the Msgs, that described where in the tree an update is to be applied without capturing the state at that node. The other problems are solved by assigning unique ids to all nodes that act like references.

This tree implementation is only needed because multiple Msgs can be triggered against the same instance of the Model. It would be more normal to use a rose-tree in situations where that is not the case.


#10

Hi @rupert !

I’m writing a desktop open-source application in Elm and Typescript for HTML prototyping (i’ll publish beta version in next months, it has cca 6k lines of code before refactoring now) and the core is multi-way tree of elements.

I’ve uploaded gist with my solution (it seems to share some principles like the ones you suggest) and a screenshot for context - https://gist.github.com/MartinKavik/8a7b650698d70b28e4a2b9bb6e603ddb. it’s quite ugly code and I assume that you are more experienced functional and Elm developer, but you can use it as a “real world” example, because I’m using every functions in the rest of the app (and some more ones which are waiting for refactoring and importing to LTree.elm).

My main pain points:

  • Encoding and decoding to JSON (I was fighting serialization of recursive types first, but it seems to be more practical to flatten the tree and decode it as objects associated by “foreign” keys)
  • My “primary key” for nodes is called label and it is string, because I’m using it as UUID. The pain is to generate new UUIDs (e.g. when you want to duplicate a part of the certain tree), because I have to pass a seed for Random generator.
  • Building a tree/forest during JSON decoding or from downloaded data from the server (GraphQL) which have a flat structure (primary & foreign keys).
  • Testing - especially more complex operations like branch duplication or updating each node in a subtree.

So I would be happy to integrate your library instead of mine to the core if it will be possible. And I can help you with testing or with something else when you decide to implement it.

Thanks!


#11

Thanks, it is encouraging to see another solution and learn about your motivations.

I am using Int as the type of the ids on the nodes, but you are using String so that you can have UUIDs. Even the Int way requires remembering the last id generated, so that a +1 can be applied to generate the next one - with UUIDs, you must carry around the Seed for this purpose. In that sense, both ways of generating the ids carry an equal amount of ‘pain’.

Do you need String ids? I could quite easily allow the id stepping function to be a parameter, and the id type to also be a parameter.

type alias NextId a b = a -> (b, a)

nextIntId : NextId Int Int
nextIntId n = (n + 1, n + 1)

nextStringId : NextId Random.Seed String
nextStringId seed = Random.step Uuid.stringGenerator seed

Once the stepping function is established (when creating a new tree), it will be handled internally within my implementation, so the caller does not have to do the id generation.

I am going hard-code to Int ids though - as this makes using Array to hold the node data possible, using the Int ids for faster lookup and update operations. Also, these ids will be internal only, the module signature will not expose them. Nothing to stop you putting your own String ids in the data nodes in the tree though.

Here is my implementation so far:


#12

Hi @rupert, thank you for your work on this. The sample implementation makes for interesting reading.

I have a question, which you might or might not be able to answer. (Because I’m not sure if it’s a genuinely difficult problem or whether there’s an obvious solution that I’m missing.) I’ve developed an elm app that’s basically a tree editor. One of the operations that the app supports (and which seems like it would be common in any sort of editor for a tree-like structure) is to move a node from one place in the tree to another. So the Msg that comes to the app is something like MoveNode from to, where from and to are paths in the tree. The fly in the ointment is that inserting or deleting a node might change the structure of the tree so that the paths are no longer valid.

When I was figuring out how to implement this, I looked at an article about multi-zippers. I thought that this might be a would be a way of implementing the functionality that’s correct in principle, but I haven’t had the time to do it. (This editor is just a tool for me to do my primary job, so “mostly works now” is more important than “works correctly and elegantly in the general case”. It’s a very complicated algorithm (at least for me), so I fear it would take too long for me to nail down the implementation).

Like you, I implemented paths as a list of integers. What I wound up doing was writing code to compare from and to to see if deletion at from would affect the validity of to, and if so to modify the to value for the insertion. This eventually worked, but it was a bug-ridden road to get there. The code is also difficult to understand at a glance (even for me, who wrote it). I’ve pasted it below (warning: it’s ugly; inevitably some support functions are missing but I have added additional comments and I hope the remaining omitted material is obvious). It’s a particular requirement of my app that movement must not alter the linear order of the nodes; that validation is mixed up in this code as well (but wouldn’t be needed in the general case).

I like your concept of the tree datatype as a combination of a tree of IDs plus a mapping of IDs to datums. I think if I were to reimplement this code from scratch, I would probably adopt your structure and implement node movement by:

  1. Creating a new ID whose datum is a copy of from's datum
  2. Inserting this ID into the tree at to
  3. Deleting the ID of to from the mapping, and from the tree

In step 3, we no longer need the path to to to be valid, because we would walk the whole tree and use equality to find where to delete the to ID. This is not terribly efficient (since we have to walk the whole tree, instead of just the paths from the root to from and to. But it has the advantage of being simple and obviously correct. (If speed turned out to be a problem in practice, I have a feeling I could leverage some facts about the shape of the trees in my application domain to optimize the algorithm, at the cost of generality.)

So my question, which has turned into more of a musing, is whether you have thought about how to implement node movement in your library, and if so what conclusions you have drawn.

-- Return the new tree and also the path to the moved node
-- (It will be shown as "selected" in the interface)
-- The "to" argument is the intended new parent node;
-- depending on the direction of movt we will insert the
-- moved node either as the first or last child of this node.
-- Result is a custom type which is isomorphic to 
-- (elm core) Result String a, with a few more bells and
-- whistles not used in this fn; the R.(succeed,fail) =
-- core's Ok and Err constructors
moveTo : Path -> Path -> Tree -> Result (Tree, Path)
moveTo from to tree =
    if isOnlyChildAt tree from
    then R.fail "Can't move only child"
    else
        let
            -- Split the path into the common prefix and the differing suffixes
            -- For reasons, provide the full suffix as well as the head::tail
            -- of each list (head = sib, tail = tail, whole suffix = frag)
            { common, sibFrom, sibTo, tailFrom, tailTo, fragFrom, fragTo } = Path.splitCommon from to
        in
            case (sibFrom, sibTo) of
                (Nothing, _) -> R.fail "Can't move to own child"
                (Just sFrom, Nothing) ->
                    -- Movement to own parent
                    case (allFirst (Path.childPath sFrom common) tailFrom tree,
                              allLast (Path.childPath sFrom common) tailFrom tree) of
                        (True, False) ->
                            -- Leftward
                            let
                                toPath = Path.childPath sFrom common
                            in
                                R.succeed <| (performMove from toPath tree, toPath)
                        (False, True) ->
                            let
                                toPath = Path.childPath (sFrom + 1) common
                            in
                                R.succeed <| (performMove from toPath tree, toPath)
                        (False, False) -> R.fail "can't move from the middle"
                        otherwise -> R.fail "should never happen"
                (Just sFrom, Just sTo) ->
                    case sFrom - sTo of
                        -1 ->
                            -- Rightward
                            case (allFirst (Path.childPath sTo common) tailTo tree,
                                      allLast (Path.childPath sFrom common) tailFrom tree) of
                                (True, True) ->
                                    let
                                        adjPath1 = case Path.isFragEmpty tailFrom of
                                                       True -> Path.join (Path.childPath sFrom common) tailTo |> Debug.log "adjusted"
                                                       False -> Path.join common fragTo
                                        adjPath = adjPath1 |> Path.childPath 0
                                    in
                                        R.succeed <| (performMove from adjPath tree, adjPath)
                                otherwise -> R.fail "can't move to/from the middle"
                        1 ->
                            -- Leftward
                            case (allFirst (Path.childPath sFrom common) tailFrom tree,
                                      allLast (Path.childPath sTo common) tailTo tree) of
                                (True, True) ->
                                    let
                                        nKids = get to tree |>
                                                children.get |>
                                                Array.length
                                        adjPath1 = Path.join common fragTo
                                        adjPath = Path.childPath nKids adjPath1
                                    in
                                        R.succeed <| (performMove from adjPath tree, adjPath)
                                otherwise -> R.fail "can't move to/from the middle"

                        otherwise -> R.fail "can't move from non-adjacent siblings"

performMove : Path -> Path -> Tree -> Tree
performMove from to tree =
    let
        moved = get from tree
    in
        deleteAt from tree |>
        insertAt to moved

#13

Thanks Aaron for documenting your use-case. I will reply soon, as I have a few thoughts. On holiday at the moment though and the beach calls me…

Latest code for tea-tree is now in a github repo:


#14

Aaron, the beach was lovely…

Here is what I was thinking about - look at this tree editing demo here:

https://peterszerzo.github.io/elm-arborist/

In terms of the underlying events that this application is using, the event to remove a child tree and the event to re-attach the child tree in a different place are separate. I am guessing it proceeds roughly like this:

  1. Find path into tree where child tree to be moved is located.
  2. Remove this child tree (put it somewhere else in the Model).
  3. Find path into the tree where child tree is to be re-inserted.
  4. Re-insert the child tree.

If it is done this way, the second path can never be invalid as it is only built after the child tree has been removed.

Could you also do this? Or do you necessarily need to perform the move as a single operation on the tree?

The tea-tree I am writing will support multiple updates to the nodes of a single version of the tree, so long as they are not structural changes (add, delete, move). However, allowing structural changes against a single version can probably be done, somehow, I don’t know exactly how yet, but like you did it, or with some inspiration from the paper you linked. I am just focusing on my simpler use case to begin with.


#15

:palm_tree::tropical_drink::grinning:

Thanks for your response, and I’m glad you enjoyed your vacation.

In elm-arborist, it is only possible to insert a new node as the rightmost child of its parent. This applies also to the new child “inserted” by a movement operation. As a consequence, path invalidation can never occur. The implementation of movement, like the one I suggested in my previous post, also contains an operation that walks the whole tree (here). But I think this is not actually necessary for elm-arborist.

In slightly more detail, movement in arborist is always implemented as a swapping of two nodes. In the case we are interested in, the code first inserts a new dummy node in as a rightmost daughter of the destination node. Then it swaps the to-be-moved node with this dummy. Finally it walks the tree, deleting dummies as it goes. Movement in our sense could be more straightforwardly implemented by separating it from the “swap” operation (i.e. swapping the datum of two nodes while leaving their respective children in place).

This suggests another way of implementing node movement in terms of two primitive, path-invalidation safe operations: movement-to-rightmost-child, followed by reordering the children of the destination parent into the desired order (which could be any order).


#16

Very clever. I think that will be worth looking into, thanks.


#17

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.