Is Elm a great language for AI to code in?

I am writing an article about what makes a good language for AI to program in and why, and also trying to make a case for Elm as being a strong candidate here. There is also definitely a big overlap between what is good for humans to program in with what is good for AI to program in - I think for the reason that compilers exist to enforce structure, accuracy, efficiency, reliability, safety, correctness and so on.

Here are 10 qualities that I think can be evaluated against:

  1. Simple, precise, mostly deterministic semantics
  2. Strong static guarantees (types/effects/ownership/contracts)
  3. Regular, orthogonal design
  4. Memory safety & safe concurrency by default
  5. Declarative sublanguages/DSLs where possible
  6. Structured, machine-readable diagnostics + good messages
  7. Partial/incomplete programs (typed holes, stubs)
  8. Strong tooling APIs (fast checks, LSP, introspection)
  9. Readable, refactorable code by construction
  10. Smooth spectrum from spec to implementation (properties, contracts, proofs)

And my evaluation for Elm and some other languages is below.

Elm is a ‘minor’ language compared to these, and that sets it at a disadvantage in terms of available code to train on but I really do think it has great qualities in this area. Another criticism might be that if AI makes working with difficult languages like C++ easy, does having a better language even matter?

Do you agree with my assessment ? And choice of evaluation criteria ?

Would you like me to assess any other languages also against these 10 criteria ?

Summary

  • Elm: Very close to “ideal for AI” on most dimensions; small, pure, strongly typed, with excellent errors.
  • C++: Powerful but hostile on most of these criteria; good tooling via Clang, but language semantics and safety are poor for AI-generated code.
  • Java: Solid middle ground: safe, regular, tool-rich, but less expressive statically than ideal.
  • JavaScript: Needs TypeScript and tooling to approach your ideal; core language is dynamic and quirky.
  • Python: Friendly and tool-rich, but dynamic typing and runtime binding make strong static guarantees and precise feedback harder.

Elm

  1. Semantics - Strong
    Pure, expression-oriented, no null, no exceptions, managed effects, simple predictable semantics.

  2. Static guarantees - Strong
    Hindley-Milner types, no null/undefined, ADTs, exhaustive pattern matching; no effects system or ownership, but very safe.

  3. Regular design - Strong
    Small core, curated ecosystem, limited feature set; very little legacy or weird corner cases.

  4. Memory safety & concurrency - Strong
    Runs on JS, no raw memory; Elm’s concurrency model (signals/tasks/ports) is tightly controlled.

  5. Declarative DSLs - Strong
    Entire UI model is declarative; HTML, styling, architecture are declarative DSLs embedded in Elm.

  6. Diagnostics - Strong (human), Medium (structured)
    Famous for excellent human-friendly errors. The structured/machine side exists via tooling but is not as emphasized as in e.g. Rust.

  7. Partial programs - Medium
    Good compiler guidance when things are missing, but no explicit “typed holes” in the sense of Agda/Idris. Still: very friendly to incremental edits.

  8. Tooling APIs - Medium
    Has language server support and fast compilation, but less extensive/introspective than, say, Rust/Java ecosystems.

  9. Readability/refactorability - Strong
    Enforced formatting, simple module system, no overloading or operator madness; code is usually very uniform.

  10. Spec to implementation spectrum - Medium
    Strong types and pattern matching help, but little in the way of built-in contracts/proofs beyond the type system and tests.


C++

  1. Semantics - Weak
    Complex, decades of accreted features, UB everywhere; tricky evaluation order and aliasing rules.

  2. Static guarantees - Medium
    Strong types and templates, but limited by UB and unsafe constructs; no native ownership/effects in the language (RAII helps but is not enforced by the type system).

  3. Regular design - Weak
    Many overlapping features and paradigms; historic baggage; “there are many ways to do it.”

  4. Memory safety & concurrency - Weak
    Raw pointers, manual memory, data races easy to express; safe subsets exist by convention, not by language design.

  5. Declarative DSLs - Medium
    Template metaprogramming enables embedded DSLs, but often in very complex ways; not designed for declarativity first.

  6. Diagnostics - Medium
    Modern compilers (Clang, GCC, MSVC) give better errors and machine-readable formats, but template errors are still notorious; structured diagnostics exist but are compiler-specific.

  7. Partial programs - Weak/Medium
    Compilers cope with syntax/type errors but no explicit “holes”; error recovery exists but not designed as an interactive, typed-hole experience.

  8. Tooling APIs - Strong (via Clang, etc.)
    Clang/LLVM and related tooling provide rich introspection; language itself doesn’t define APIs, but ecosystem is strong.

  9. Readability/refactorability - Medium (highly style-dependent)
    Possible to write very readable C++, but language allows highly complex, non-obvious constructs; refactoring relies heavily on external tools and discipline.

  10. Spec to implementation spectrum - Weak/Medium
    Some contract support in newer standards, plus external tools (static analyzers, formal methods), but not a central design goal.


Java

  1. Semantics - Medium/Strong
    Deterministic, well-specified, no UB in the C++ sense; but lots of legacy quirks and a large standard library.

  2. Static guarantees - Medium
    Nominal OO types, generics, null is pervasive; no effects system or ownership, but type system is sound and helpful.

  3. Regular design - Medium
    Core language relatively simple, but Java 8+ added lambdas/streams/etc.; still more regular than C++.

  4. Memory safety & concurrency - Medium
    Memory-safe (no raw pointers), but data-race safety not enforced; concurrency primitives are low-level.

  5. Declarative DSLs - Medium
    Streams, annotations, builder-style APIs allow semi-declarative code, but language itself is largely imperative/OOP.

  6. Diagnostics - Strong
    Good compiler errors, IDEs provide structured feedback; build tools and LSP support stable machine-readable diagnostics.

  7. Partial programs - Medium
    IDEs plus compiler handle incomplete code well, but no notion of typed holes as language constructs.

  8. Tooling APIs - Strong
    Rich reflection, mature IDEs, LSP support, incremental compilation; excellent introspection and project tooling.

  9. Readability/refactorability - Strong
    Verbose but regular; strong IDE refactoring support; canonical style converges on readable, explicit code.

  10. Spec to implementation spectrum - Medium
    JML and similar tools exist; annotations and frameworks for validation, but not integrated deeply into the language core.


JavaScript

(plain JS, not TypeScript.)

  1. Semantics - Weak
    Dynamic, many historical quirks (==, this, coercions), event loop semantics, subtle edge cases.

  2. Static guarantees - Weak
    Dynamic types, no static checking beyond linters; TypeScript exists precisely to fix this.

  3. Regular design - Weak/Medium
    Modern JS is more regular, but legacy features and multiple paradigms coexist; many “gotchas”.

  4. Memory safety & concurrency - Medium
    Memory-safe (no pointer arithmetic), but data races via shared memory are rare in typical browser JS; async model is single-threaded but subtle.

  5. Declarative DSLs - Medium/Strong (via ecosystem)
    React/JSX, functional style, array combinators make a lot of UI/data-flow declarative; this is more library-level than language-level.

  6. Diagnostics - Medium
    Runtime errors often decent; static diagnostics depend on linters/TypeScript; machine-readable error formats exist but fragmented.

  7. Partial programs - Medium
    Tools (IDEs, browsers) handle incremental code reasonably well, but the language doesn’t have holes/typed feedback.

  8. Tooling APIs - Strong (ecosystem)
    Language servers, AST tools (Babel, ESLint), bundlers; excellent introspection through external tooling.

  9. Readability/refactorability - Medium
    Very style- and framework-dependent; you can write clean or very messy JS; refactoring relies heavily on TS/IDEs.

  10. Spec to implementation spectrum - Weak/Medium
    Test frameworks and schema validators help; no built-in contract/property language; most “spec” lives in tests and documentation.


Python

  1. Semantics - Medium
    Mostly simple and consistent at the surface, but dynamic features, metaprogramming, and import system quirks exist; still far friendlier than C++/JS.

  2. Static guarantees - Weak/Medium
    Dynamic by design; type hints + mypy/pyright improve things but are optional and unsound in many real-world uses.

  3. Regular design - Medium
    Core language is relatively small and consistent; some historical warts (Python 2 legacy, metaclasses, etc.).

  4. Memory safety & concurrency - Medium
    Memory-safe from the programmer’s view; GIL simplifies some concurrency concerns but is a performance and design constraint; no static race checking.

  5. Declarative DSLs - Strong (via ecosystem)
    Libraries (SQLAlchemy, Pandas, TensorFlow, etc.) offer many declarative/DSL-like APIs; again, mostly library-level.

  6. Diagnostics - Medium/Strong
    Tracebacks are clear; newer versions add better error messages; type checkers give structured diagnostics; machine-readability via tools is good.

  7. Partial programs - Medium
    REPL culture and notebooks support incremental development; static analysis on incomplete programs is less robust than in strongly typed languages.

  8. Tooling APIs - Strong
    Rich introspection (reflection, inspect), language servers, static analyzers; good ecosystem for tools.

  9. Readability/refactorability - Strong (by culture)
    “There should be one obvious way”; enforced indentation; common style via PEP 8; dynamic nature still makes some large-scale refactors risky.

  10. Spec to implementation spectrum - Medium
    Property-based testing (hypothesis), contracts libraries, type hints; but nothing like a built-in, enforced spec language.

Possibly relevant thing that was shared at work a month ago Why Elixir is the best language for AI - Dashbit Blog.

One concern I have regarding AI’s ability to work with Elm code (although this is conjectural rather than empirical) is that Elm tends to have large files (see the life of a file), and that will pollute an AI’s context window if it tries to read the whole thing. It may be possible to make it better for an AI to work with by providing a command line tool that will print a particular definition from a file, rather than having it need to read the whole thing.

Elixir

  1. Semantics - Strong
    Functional, immutable data, pattern matching, and BEAM’s process model give it simple, well-defined, deterministic semantics with no UB in user code.

  2. Static guarantees - Weak
    Dynamically typed with optional typespecs + Dialyzer; useful for documentation and some checks but far weaker and less sound than ML/Rust-style static typing.

  3. Regular design - Strong
    Small, coherent core (modules, functions, pattern matching, processes, macros); most complexity is library-level and the core language stays uniform and orthogonal.

  4. Memory safety & concurrency - Strong
    Memory-safe via GC on BEAM; actor-style concurrency with isolated processes and message passing avoids shared-memory data races by construction.

  5. Declarative DSLs - Strong
    Macros and quoting make it very good for DSLs; frameworks like Phoenix and Ecto lean heavily on declarative routing, queries, schemas, and configurations.

  6. Diagnostics - Medium
    Runtime errors and stack traces are clear and helpful; ElixirLS/language tooling expose diagnostics, but there’s less emphasis on highly structured, versioned error codes than in Rust/TypeScript.

  7. Partial programs - Medium
    Great REPL (IEx), live reload, and a dynamic runtime make it easy to work with incomplete systems, but there’s no notion of typed holes or rich static feedback on partial terms.

  8. Tooling APIs - Strong
    mix, Hex, ElixirLS (LSP), and BEAM introspection provide rich, scriptable tooling and fast feedback loops attractive for AI-driven workflows.

  9. Readability/refactorability - Strong
    Pipeline operator, clear conventions, enforced formatting, and functional style make Elixir code generally very readable and modular, though dynamic typing limits fully automatic refactors.

  10. Spec to implementation spectrum - Medium
    Typespecs, @behaviour, docs, and property-based testing offer a light spec layer, but there’s no built-in contract or proof system tightly integrated with the language core.

A minor note. The Elixir LSP is expert, the others are considered deprecated. My employer actually employs someone to work on expert! They’ve put a TON of work into the LSP and it’s really great considering how much they’ve done in the past year or 2. However, the LSP does have limited APIs. It was pointed out to me a few weeks ago that the LSP cannot do function renaming. Supposedly it’s a technical limitation, though I don’t know all the details (can share links if there’s interest).

Yes, this study is also entirely conjectural.

Interesting though that @wolfadex linked article on Elixir is taking the empirical approach and measuring some kind of benchmarks against AI models and languages.

Rust is a strong contender too, and probably the best of the major languages.

Rust

  1. Semantics - Strong
    Well-specified, mostly deterministic semantics; the safe subset rules out undefined behavior, and the boundary with unsafe is explicit.

  2. Static guarantees - Strong
    Rich static types with ownership/borrowing and lifetimes, algebraic data types (enums), traits, and pattern matching; strong guarantees about memory safety and aliasing in safe code.

  3. Regular design - Medium/Strong
    Modern, mostly orthogonal core, but lifetimes, traits, and generics introduce real complexity compared to simpler ML-style languages.

  4. Memory safety & concurrency - Strong
    Safe Rust prevents data races and most memory bugs by construction; low-level unsafety is confined to unsafe blocks with clear syntactic fences.

  5. Declarative DSLs - Medium
    Macros, traits, and builder patterns allow embedded DSLs and declarative styles, though they’re heavier than in pure functional or macro-heavy languages like Haskell or Elixir.

  6. Diagnostics - Strong (human + structured)
    Excellent compiler errors with suggestions, spans, and notes, plus structured, machine-readable diagnostics (error codes, JSON output) and tight IDE integration via rust-analyzer.

  7. Partial programs - Medium
    Good error recovery and helpful messages when code is incomplete, but no first-class typed holes as in Agda/Idris/Haskell; the experience is “close but not explicit.”

  8. Tooling APIs - Strong
    rustc plus rust-analyzer, Cargo, Clippy, Miri, and stable compiler flags/JSON outputs provide rich introspection, fast incremental checking, and strong LSP-based tooling.

  9. Readability/refactorability - Medium/Strong
    Clear idioms, enforced formatting (rustfmt), and a culture that values explicitness and safety, but advanced features (lifetimes, complex generics) can make some code hard to read and refactor.

  10. Spec to implementation spectrum - Medium/Strong
    Types, traits, and pattern matching encode a lot of invariants; external tools (Prusti, Creusot, etc.) bring formal verification, though they’re not yet mainstream parts of typical Rust workflows.