How best to approach text annotation (overlapping highlights of text)?

I’m relatively new to Elm, and I’m trying to make an engine to display text annotations. For a given text, I have character offset locations which describe places in the text that are highlighted and annotated. For example, (251, 542, "This part is especially interesting!) and (300, 642, "and also this part!), representing overlapping regions of text.

How would I go about approaching this problem? I know that overlapping tags aren’t allowed in HTML or the DOM, or any kind of tree structure. So one solution might be to wrap all text in <span>s, avoiding any other tags, such that this:

Here is some text to be annotated. And now the parargraph ends. </p> 

<p>And another paragraph begins, with some <emph>emphasized</emph> text.</p>

becomes this:

<span class="highlight">Here is some text to be annotated. And now the parargraph ends. </span></p> 

<p><span class="highlight">And another paragraph begins, with some </span><emph><span class="highlight">emphasized</span></emph><span class="highlight"> text.</span></p>

But this is already getting really complex, and I wouldn’t know how to do that in Elm. Would it be:

  • treat the DOM HTML as a string, and find the text corresponding to its character offsets,
  • then find all the elements within those offsets
  • then surround their inner HTML with <span>?

Alternatively, maybe there’s a way to draw rectangles over text, corresponding to a region of text, such that the rectangles can overlap?

Or, has anyone successfully integrated something like [annotator}(GitHub - openannotation/annotator: Annotation tools for the web. Select text, images, or (nearly) anything else, and add your notes.) into an Elm project?

I have a feeling that Elm would be a tool for this, but I’m at a loss for how to begin even thinking about it.

Alternatively, maybe there’s a way to draw rectangles over text, corresponding to a region of text, such that the rectangles can overlap?

I tend to think that is more complex than it sounds at first. The layout and position of a piece of text is quite unpredictable (it depends on many factors incl. screen size, fonts, etc.). Also if you ever need to do something else with the annotated text besides rendering a colored rectangle above it, you would have a difficult time. But in general, I would think this approach would create a lot of other problems, possibly more complex than your original issue IMO.

I had a similar problem while working on a rich text editor with elm-rte. What I did when there was an intersection of text styles (eg. bold and italic) was something like this:

Screenshot 2022-04-04 at 11.06.29

<p>
  c<strong>u </strong><em><strong>te</strong>ologie reformată</em> și
</p>

So in this case the <strong> is split into a chunk which is outside the <em> and one that is inside it. The same logic would extend to all cases of overlapping tags, but I admit it could be quite tricky to think about it like this.

I found this to be a reliable way to deal with overlapping “annotations” though I did get a lot of help from that library to manage a program that deals with this kind of DOM. I would think it’s simpler in your case if you only need to transform a String to Html. In my case, the complexity came from the fact that this text was editable by the user, so I had to also worry about the other way around as well.

Good luck!

I put together an example that uses different color backgrounds on s to display there being multiple overlapping highlights. It does make a few simplifying assumptions, though:

  • Annotations are assigned to a paragraph, and annotations cannot spread across multiple paragraphs.
  • Plain text only, no rich text.
  • It doesn’t actually show the annotations.

The basic approach that it follows is to turn the Annotations for a paragraph into a list of “breakpoints”, or places where the styling changes (breakpointsForAnnotation). It then recurses down the list of style change breakpoints, making a element for the current set of style information (makeSpans, makeSpansHelper).

Getting rid of the “each annotation is only in one paragraph” part could be done by using List.Extra.mapAccuml instead of List.indexedMap inside of viewParagraphs and allowing viewParagraph to pass along some sort of state record from paragraph to paragraph (i.e. how many characters have passed in the total text so far, what is the current style state).

Incorporating rich text into this really depends on how the rich text is represented in the Elm code. After all, Html.text "<strong>Hello</strong>" shows all 22 characters of that instead of the one word in bold. In theory, things like “marking text as bold” can be represented as “style change breakpoints”, although you have to be somewhat careful with how you keep track of your style change state so that things like <strong><strong>Hello, </strong>world!</strong> work correctly.

If most of the style information (bold, italics, paragraph breaks, etc) are in the text in the form of tags, then you are probably going to need some sort of parser to turn that into an AST or other intermediate representation, like jxxcarlson/meenylatex or mweiss/elm-rte-toolkit or jxxcarlson/elm-markdown or dillonkearns/elm-markdown or dillonkearns/elm-review-html-to-elm have. If the AST can then be flattened into a giant block of text and a series of style change breakpoints, then the above strategy will still work from that point. Alternatively, you could carefully fold your annotations into the AST (a single annotation could result in multiple AST entries, depending on how much of the existing AST it stretches across) and then just render the AST.

I hope that this was helpful, or sparked some other useful ideas. Good luck!

2 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.