Your problems writing Elm code with AI/LLM

Hi,
The “Elm cookbook by AI” thread highlighted the poor quality of AI written Elm code.

As it happens, I am starting a master’s thesis, and the prospective subject is code generation. The aim is to complement LLM generation with symbolic methods (namely constraint programming) to improve the code quality in languages that have a tiny corpus to train from.

Elm is a good candidate as a language that is relatively niche, syntactically simple, pure functional, and with great tooling to assess quality like the compiler or elm-review.

To help me assess the potential depth of such a project, I would like you to share your pain points when using LLMs to generate Elm code. Do you often encounter (from low-level to more subjective) :

  • Syntactically incorrect code?
  • Code that doesn’t typecheck?
  • Hallucinated core or lib functions?
  • Unidiomatic code?
  • Any other thing that frustrates you ?

I am also interested in the relative performance of different models, especially the self-hosted ones, so if you have used GPT-4, Gemma, llama.cpp, ect… Please share as well!

Thanks in advance.

2 Likes

I mostly use Copilot for my AI still, and based on that:

  • Syntactically incorrect code?

Yes, it most often assumes I’m writing Haskell when I start writing a new module.

  • Code that doesn’t typecheck?

Sometimes, though I can’t say with what frequency.

  • Hallucinated core or lib functions?

Quite a bit.

  • Unidiomatic code?

Sometimes, though I couldn’t say how often

  • Any other thing that frustrates you ?

I find that non-Copilot AI tools tend to be even worse regardless of the language. I do use them sometimes but always in a situation where I know roughly what I need and can work out the kinks. Every language I’ve tried, including the popular ones like JS, still have issues with code that’s just broken or not valid syntax.

2 Likes

Thank you for sharing!

Generating syntactically correct code seems like a low hanging fruit, good to know that it hasn’t been picked yet :slight_smile:

When using ChatGPT (3.5) for Elm, I don’t expect any helpful responses anymore. I just check it from time to time to see if something changed, or I use it for rubberducking. Again, with very low expectations.
My biggest gripe with it is hallucinated libs/functions and impure functions. Like it will straight up use Random.int as a function that doesn’t take anything and returns an Int. Usually, when I present it with a problem it will abstract the actual complicated part into some lib function that either doesn’t exist or it expects me to write it myself.
It consistently takes less time to write the solution myself from scratch than to lead ChatGPT to the right answer.

1 Like

I also use GitHub Copilot for Elm. I find that it does a good job when I already have a bunch of code (when I’m not writing stuff from scratch). Elm code is pretty “repetitive” and a lot about “just follow the pattern”, which is where Copilot shines. Many times where my instinct is to copy-paste some similar code, and then adjust it, Copilot now often just suggests the end result immediately. For example, today I wrote code for a “previous” button. Copilot then suggested the correct code for the “next” button. I also needed a map4 for a custom type. I already had map, map2 and map3. Copilot suggested a correct map4 before I wrote a single character. It’s also pretty good at helping with refactoring code, like changing a case of on a Maybe into a Maybe.unwrap (or the other way around).

I really like that way of using Copilot. Sometimes I can imagine exactly all of the code I need to write for a feature (“it’ll be like that feature in most places, except …”), and then using Copilot feels is a bit like pouring my thoughts directly into the code without having to type it all out. It feels a bit faster and less energy draining than doing it all by hand.

The biggest quirk I notice in Elm is that Copilot really likes to close lists. If I’m in the middle of a list of HTML attributes, Copilot often suggests the correct attribute I wanted to add, but also decides to close the list with a ] even though I’m clearly in the middle of the list. I have not noticed that problem when writing TypeScript.

Another thing I noticed is that Copilot is not as good when your types are in one file, and the code in another. Then I often think “that’s a good guess, Copilot, but if you would have looked in the other file you would have seen that it’s called phoneNumber not phone, and that there are three fields in this record, not just two”.

I’ve also noticed that sometimes it helps adding parameters to a function before trying to use them. When I write code manually, I often realize that I don’t have the data I need available, so I need to add that as a parameter. Copilot sometimes likes to “invent” those values instead. For example, instead of suggesting using a to-be-defined customer variable, it can come up with Customer.emptyCustomer instead.

2 Likes

Same issues here, and also has a hard time using certain patterns I tell it to, or remember I use Elm UI.

Side note: did you check out elm-dev? GitHub - mdgriffith/elm-dev. It uses the elm compiler to give you more info about your code. As a context feeder, it might be very useful. Using elm-dev to make an Elm-specific co-pilot is a side idea of mine, so I’m curious to know more about your thesis as it grows!

1 Like

Thanks for reminding me about elm-dev, definitely something I need to check out!

This would be absolutely awesome Axel. Thanks for pursuing it!

1 Like

I’ve been using Copilot for a while and like the way it cuts down on boilerplate, however, (maybe it’s just my suspicious mind), but it doesn’t appear to be 100% honest.

I’ve just started a new project, and 5 minutes ago I started hard coding the clients address into fields of a newly created Address type - Copilot autogenerated the postCode value for me, exactly correct.

So I asked copilot: Where did you get the postCode from?

This is its response:

“”"
As an AI, I don’t have access to any external databases or specific codebases to “get” a postCode from. If a postCode is mentioned in a code example I provide, it’s likely a placeholder or example value. In a real-world application, postCodes (or postal codes) would typically come from user input or be retrieved from a database or API.
“”"

The post code copilot provided is clearly not a placeholder or example value, it is my clients exact post code. I have just created the Address module, and the clients address does not exist anywhere in my codebase so it couldn’t have already ‘learned’ it - I was retrieving the address for the first time from an internet search engine.

Something similar happened in a previous project just after I started using copilot, but as it was something new to me I put it down to ‘user error’.

This particular ‘incident’ is not user error - AFAIK :grinning: :rage:

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.