Elm SPA caused all our pages to be removed from Google Search

We converted a Drupal site, and used Elm for the front-end. However Google does not see our site as an SPA as far as we can tell, so it only sees and indexes the index.html.

After a while all our pages were removed from Google Search, and traffic has dropped by 90%. See here what Google returns as indexed.

We have provided a sitemap but that didn’t help (we had this in place at launch). And from poking around in the Google Search Console it simply appears Google only sees the index.html content, which contains nothing. It does not launch a browser and wait for the content to be rendered.

We have decided to prerender our pages, as clearly Google does not recognise SPAs. We had done a ton of research, and from what we had seen Google does handle SPAs. But clearly it does not.

Is there any help the Elm community can give to see why Google does not recognise our SPA?

I’m not sure if this is the problem in this case but the normal googlebot and the chromium based googlebot that runs JS are different and the chromium one used to run with a delay compared to the non-js bot.

If you transitioned from the old page to the new then maybe the chromium-googlebot still didn’t have time to crawl the site with JS?

There may also be a wide variety of problems with the site, I’d start by looking into something like these:

Google does index JS websites, see for example package.elm-lang.org:

The site has been up for months, and pages are frequently crawled. But it just sees the index.html. Definitely does not fire up any kind of real browser.

If you want to go down the pre-rendering approach then it might be worth looking at either GitHub - lucamug/elm-starter: An Elm-based bootstrapper for Elm applications or https://elm-pages.com/ though I don’t know which would work better for you situation.

Usually Google has no problem indexing SPA (see site:https://package.elm-lang.org - Google Search for example)

Running Lighthouse, I see that often it returns the error “robots.txt is not valid. Lighthouse was unable to download a robots.txt file”. I see robots.txt is there but I wonder why Lighthouse is failing sometime.

Also the sitemap.xml file is very large. I don’t know if this can be an issue too.

What is the “Google Search Console” telling you? Does it read the 7710 entries you have in the sitemap? Something like:

In any case, this seems related to the concept of SPA per se. I don’t think Elm is related to this, as you mentioned in the title

1 Like

You’re sure package.elm-lang.org does not use server side rendering?

Sitemap is fine.

It may not be Elm, but these are actually 3 sites, none of them indexed. And I have a fourth completely unrelated site I build 2 years ago that also never got properly indexed. As it wasn’t important for that, I hadn’t investigated that one.

I have no clue what’s going on, and either way, people should be quite aware of the issue.

1 Like

(package.elm-lang.org does not use server side rendering- it’s plain old elm!)

3 Likes

Clicking on “repeat the search with the omitted results included.” seems that Google indexed 42,700 results from the website but all of them have the same title and no description so Google consolidate them into 6 pages.

I am not a SEO expert but I would suggest changing, for each page, both the <title> and the <meta> description, using an Elm port, so that the Google bot can differentiate the indexing. It is a good way to influence how Google search resault looks like. This should be done as quickly as possible, to be safe. I see now that the real content of the page arrives with some delay.

Also multiple URLs should not have the same content, otherwise better to use the “canonical” meta tag. Google doesn’t like duplicated content.

If possible it would be beneficial also to reduce the number of assets for each page. Now a typical page seems to have around: 113 requests, 1,5 MB transferred, 6.8MB resources, finish 7.4s, load 3.18s. I noted several calls to the same script https://platform.twitter.com/widgets.js, I wonder if this can also be improved

6 Likes

I haven’t tried it yet, but netlify also has a prerendering feature now.

The node path isn’t used. These are old Drupal paths, and reindexed with indeed no title. We have fixed that, but that doesn’t appear to have done anything. You will notice that the current state has proper titles, and canonical tags, etc, and even for reindexed pages when we look at what Google says it has cached/loaded, it’s just the index.html.

Thanks for pointing out the performance. We measured that without Twitter, but Google sees the Twitter widget, and that indeed seems to hurt performance badly. I’ll see if I can avoid loading that for the Google bot.

Do you use a header with CSP?

My “webserver” CouchDB added CSP in the response-header after an update and Chromium did not render my SPA anymore, while Fixefox did. It may not be related, but as Google uses Chrome for executing JS, it maybe worth to check.

Unfortunately, I cannot offer you a solution, I just want to tell you that I manage an Elm SPA e-commerce-site which Google has no problem indexing the site.

4 Likes

No, not having any CSP header.