Hi everyone! I want some feedback about a well-known problem.
Context
Some Chrome extension breaks the Virtual DOM system and it causes infinite runtime errors. I recently found this is a very serious problem because Sentry notifies runtime errors every day (from the production app, of course).
In the discussions above, @evancz requests more information before fixing it. 12
Before suggesting fixes based on whatever, I think there is information to gather:
Make a list of the browser extensions that are known to cause problems.
Figure out how they are they modifying the <body> exactly. Adding at top? Adding at bottom? Adding indiscriminately somewhere in the DOM?
Sort the list of extensions by how they work.
Figure out how many people use these extensions, so we can weigh the importance in practice. Maybe certain categories of problem only show up in weird ones, but one category is super common and reasonable to work around in virtual-dom.
From there, it will be much easier to find a fix that makes sense for the actual reality of the situation.
Can folks in this thread work on gathering this information and come back with the lessons?
I think the best answer would be done by this format. (Edit: The latest version is here)
This test covers various patterns of breaking DOM.
How to break? (insert, remove, etc.)
How to initialize the app? (Browser.application or Browser.element)
Where in the Virtual DOM is dangerous to update? (child, next element, attribute, etc.)
After this problem is fixed, all the test cases should be green. Also, you can try some of the tests online. Turn on and off your extensions to see how results change.
Any feedback about this project will also be welcome.
Maybe you could add a <style> node test inside the test page, because Dark Reader extension (1 763 020 users) for example will add its own style node just after an existing one if it is found in the DOM tree, leading to runtime exceptions.
A known workaround is to wrap style nodes into their own div.
Google translate (Chrome builtin extension, > 10 000 000 users, most outside US) is also known to break some VDOM sites because it adds some <font> elements.
For example the update button of your test page:
I have not yet be able to reproduce the runtime error with your test page though, but it should not be hard with a few modifications. I know however that it can be disabled with <meta name="google" content="notranslate"> in the <head>.
From my experience in this community, Evan tends to want a lot of information rather than suggestions. I guess he already has all the possible solutions and their trade-offs in his head. So let’s concentrate on gathering facts here.
but one category is super common and reasonable to work around in virtual-dom.
So that’s why I’m suggesting to add something like a root cause, or fix. For example for your ChromeVox you describe a work-around, but it’s not clear why that work-around is needed or why it works.
So what about adding a column that describes to root cause, i.e. what does Elm do/not do that causes this problem? So for ChromeVox I would add: “taking over ”. This way we explicitly mention the category, something Evan asked for.
I’m not sure I fully get what you mean, but if the “root cause” mean something like “this variable in this line is potentially undefined, so…”, it’s relatively easy to find (I believe). It should be clearer once someone starts debugging hard.
For example for your ChromeVox you describe a work-around, but it’s not clear why that work-around is needed or why it works.
It avoids conflicting Elm and Chrome extension by separating the areas they control. Elm will treat a dummy element as the <body> element.
I believe that the root cause is always the same, messing-up with elements controlled by the virtual DOM, usually adding children elements, leading to invalid modifications during an update after diffing. That’s why the most important factor is where is the DOM modification done:
in the body itself or in another node?
at the begining of the node, in the middle or at the end?
Maybe we could add the kind of modification done, but I suspect that most of the time when it breaks elm, it is due to some added nodes (are there some agressive extensions that remove nodes?).
The potential most interesting fix(es) would have to be evaluated once we have enough data.
AFAICT this is exactly what the “patch to output” does, it moves elm to a div inside the body instead of letting it taking the whole body (for an application), so when it is in the table, it means that not replacing the body is a work-around.
Thank you for bringing this up once again. This is the number one problem in Elm! I mean how can I convince my team to invest in Elm if many popular extensions will break the app? Sooner or later. My arguments about easy refactoring and zero runtime errors will make me look silly in this context. This is the show stopper for the whole language.
When we upgraded our SPA sites to 0.19 and Browser.application we saw an increase of virtual dom related run time errors in our log tool (New Relic).
This happened in all browsers, not only Chrome.
This was due to people (not us) using Google Tag Manager to add 3rd party scripts (chats, marketing, re-targeting etc) to the <body> tag. This is a very common practice in e-commerce at least (which is what we do) and not something we have control over.
We rolled back to using Browser.element and ports+js for navigation and history and that made the errors go away.
Yes. The cases of Browser.application have been discussed a lot, but I didn’t know the use of analytics tools can be uncontrollable. It’s good to know for me.
I finally found potential way to fix this problem. This patch should make elm/virtual-dom work safely with extensions (still WIP though). It works as follows:
Mark the DOM node as created_by_elm
If unknown nodes have been inserted, skip them.
If existing nodes have been removed, re-create them by old vdom.
If existing nodes have been replaced with unknown nodes, re-create them by old vdom.
Since the elm/virtual-dom is the core of all Elm apps, enough tests should be done. Now I’m writing more tests (example output), but it’s still not enough.
If you find any downside of this fix, please reply in this thread.
(More information about extensions is still welcome too!)
Wouldn’t adding a mount node argument to Browser.application's init function on the javascript side solve this?
Like Browser.element has?
It could even be optional so that existing apps don’t break. If node: is not set the Elm.Main.init it just uses <body> like today.
For our sites, switching to Browser.element did solve it.
<body>
<div id="elm"></div>
<script>
var app = Elm.Main.init({
node: document.getElementById('elm') // optional, uses body if not set
});
</script>
</body>
I think this fix causes a big loss because it behaves (almost) as usual if there is no interruption. If there is, it re-creates DOM once but not more than once unless the extension continuously interrupts.
Anyway, it is worth trying. There might be a mistake.