I am writing an article about what makes a good language for AI to program in and why, and also trying to make a case for Elm as being a strong candidate here. There is also definitely a big overlap between what is good for humans to program in with what is good for AI to program in - I think for the reason that compilers exist to enforce structure, accuracy, efficiency, reliability, safety, correctness and so on.
Here are 10 qualities that I think can be evaluated against:
- Simple, precise, mostly deterministic semantics
- Strong static guarantees (types/effects/ownership/contracts)
- Regular, orthogonal design
- Memory safety & safe concurrency by default
- Declarative sublanguages/DSLs where possible
- Structured, machine-readable diagnostics + good messages
- Partial/incomplete programs (typed holes, stubs)
- Strong tooling APIs (fast checks, LSP, introspection)
- Readable, refactorable code by construction
- Smooth spectrum from spec to implementation (properties, contracts, proofs)
And my evaluation for Elm and some other languages is below.
Elm is a ‘minor’ language compared to these, and that sets it at a disadvantage in terms of available code to train on but I really do think it has great qualities in this area. Another criticism might be that if AI makes working with difficult languages like C++ easy, does having a better language even matter?
Do you agree with my assessment ? And choice of evaluation criteria ?
Would you like me to assess any other languages also against these 10 criteria ?
Summary
- Elm: Very close to “ideal for AI” on most dimensions; small, pure, strongly typed, with excellent errors.
- C++: Powerful but hostile on most of these criteria; good tooling via Clang, but language semantics and safety are poor for AI-generated code.
- Java: Solid middle ground: safe, regular, tool-rich, but less expressive statically than ideal.
- JavaScript: Needs TypeScript and tooling to approach your ideal; core language is dynamic and quirky.
- Python: Friendly and tool-rich, but dynamic typing and runtime binding make strong static guarantees and precise feedback harder.
Elm
-
Semantics - Strong
Pure, expression-oriented, no null, no exceptions, managed effects, simple predictable semantics. -
Static guarantees - Strong
Hindley-Milner types, no null/undefined, ADTs, exhaustive pattern matching; no effects system or ownership, but very safe. -
Regular design - Strong
Small core, curated ecosystem, limited feature set; very little legacy or weird corner cases. -
Memory safety & concurrency - Strong
Runs on JS, no raw memory; Elm’s concurrency model (signals/tasks/ports) is tightly controlled. -
Declarative DSLs - Strong
Entire UI model is declarative; HTML, styling, architecture are declarative DSLs embedded in Elm. -
Diagnostics - Strong (human), Medium (structured)
Famous for excellent human-friendly errors. The structured/machine side exists via tooling but is not as emphasized as in e.g. Rust. -
Partial programs - Medium
Good compiler guidance when things are missing, but no explicit “typed holes” in the sense of Agda/Idris. Still: very friendly to incremental edits. -
Tooling APIs - Medium
Has language server support and fast compilation, but less extensive/introspective than, say, Rust/Java ecosystems. -
Readability/refactorability - Strong
Enforced formatting, simple module system, no overloading or operator madness; code is usually very uniform. -
Spec to implementation spectrum - Medium
Strong types and pattern matching help, but little in the way of built-in contracts/proofs beyond the type system and tests.
C++
-
Semantics - Weak
Complex, decades of accreted features, UB everywhere; tricky evaluation order and aliasing rules. -
Static guarantees - Medium
Strong types and templates, but limited by UB and unsafe constructs; no native ownership/effects in the language (RAII helps but is not enforced by the type system). -
Regular design - Weak
Many overlapping features and paradigms; historic baggage; “there are many ways to do it.” -
Memory safety & concurrency - Weak
Raw pointers, manual memory, data races easy to express; safe subsets exist by convention, not by language design. -
Declarative DSLs - Medium
Template metaprogramming enables embedded DSLs, but often in very complex ways; not designed for declarativity first. -
Diagnostics - Medium
Modern compilers (Clang, GCC, MSVC) give better errors and machine-readable formats, but template errors are still notorious; structured diagnostics exist but are compiler-specific. -
Partial programs - Weak/Medium
Compilers cope with syntax/type errors but no explicit “holes”; error recovery exists but not designed as an interactive, typed-hole experience. -
Tooling APIs - Strong (via Clang, etc.)
Clang/LLVM and related tooling provide rich introspection; language itself doesn’t define APIs, but ecosystem is strong. -
Readability/refactorability - Medium (highly style-dependent)
Possible to write very readable C++, but language allows highly complex, non-obvious constructs; refactoring relies heavily on external tools and discipline. -
Spec to implementation spectrum - Weak/Medium
Some contract support in newer standards, plus external tools (static analyzers, formal methods), but not a central design goal.
Java
-
Semantics - Medium/Strong
Deterministic, well-specified, no UB in the C++ sense; but lots of legacy quirks and a large standard library. -
Static guarantees - Medium
Nominal OO types, generics, null is pervasive; no effects system or ownership, but type system is sound and helpful. -
Regular design - Medium
Core language relatively simple, but Java 8+ added lambdas/streams/etc.; still more regular than C++. -
Memory safety & concurrency - Medium
Memory-safe (no raw pointers), but data-race safety not enforced; concurrency primitives are low-level. -
Declarative DSLs - Medium
Streams, annotations, builder-style APIs allow semi-declarative code, but language itself is largely imperative/OOP. -
Diagnostics - Strong
Good compiler errors, IDEs provide structured feedback; build tools and LSP support stable machine-readable diagnostics. -
Partial programs - Medium
IDEs plus compiler handle incomplete code well, but no notion of typed holes as language constructs. -
Tooling APIs - Strong
Rich reflection, mature IDEs, LSP support, incremental compilation; excellent introspection and project tooling. -
Readability/refactorability - Strong
Verbose but regular; strong IDE refactoring support; canonical style converges on readable, explicit code. -
Spec to implementation spectrum - Medium
JML and similar tools exist; annotations and frameworks for validation, but not integrated deeply into the language core.
JavaScript
(plain JS, not TypeScript.)
-
Semantics - Weak
Dynamic, many historical quirks (==,this, coercions), event loop semantics, subtle edge cases. -
Static guarantees - Weak
Dynamic types, no static checking beyond linters; TypeScript exists precisely to fix this. -
Regular design - Weak/Medium
Modern JS is more regular, but legacy features and multiple paradigms coexist; many “gotchas”. -
Memory safety & concurrency - Medium
Memory-safe (no pointer arithmetic), but data races via shared memory are rare in typical browser JS; async model is single-threaded but subtle. -
Declarative DSLs - Medium/Strong (via ecosystem)
React/JSX, functional style, array combinators make a lot of UI/data-flow declarative; this is more library-level than language-level. -
Diagnostics - Medium
Runtime errors often decent; static diagnostics depend on linters/TypeScript; machine-readable error formats exist but fragmented. -
Partial programs - Medium
Tools (IDEs, browsers) handle incremental code reasonably well, but the language doesn’t have holes/typed feedback. -
Tooling APIs - Strong (ecosystem)
Language servers, AST tools (Babel, ESLint), bundlers; excellent introspection through external tooling. -
Readability/refactorability - Medium
Very style- and framework-dependent; you can write clean or very messy JS; refactoring relies heavily on TS/IDEs. -
Spec to implementation spectrum - Weak/Medium
Test frameworks and schema validators help; no built-in contract/property language; most “spec” lives in tests and documentation.
Python
-
Semantics - Medium
Mostly simple and consistent at the surface, but dynamic features, metaprogramming, and import system quirks exist; still far friendlier than C++/JS. -
Static guarantees - Weak/Medium
Dynamic by design; type hints + mypy/pyright improve things but are optional and unsound in many real-world uses. -
Regular design - Medium
Core language is relatively small and consistent; some historical warts (Python 2 legacy, metaclasses, etc.). -
Memory safety & concurrency - Medium
Memory-safe from the programmer’s view; GIL simplifies some concurrency concerns but is a performance and design constraint; no static race checking. -
Declarative DSLs - Strong (via ecosystem)
Libraries (SQLAlchemy, Pandas, TensorFlow, etc.) offer many declarative/DSL-like APIs; again, mostly library-level. -
Diagnostics - Medium/Strong
Tracebacks are clear; newer versions add better error messages; type checkers give structured diagnostics; machine-readability via tools is good. -
Partial programs - Medium
REPL culture and notebooks support incremental development; static analysis on incomplete programs is less robust than in strongly typed languages. -
Tooling APIs - Strong
Rich introspection (reflection,inspect), language servers, static analyzers; good ecosystem for tools. -
Readability/refactorability - Strong (by culture)
“There should be one obvious way”; enforced indentation; common style via PEP 8; dynamic nature still makes some large-scale refactors risky. -
Spec to implementation spectrum - Medium
Property-based testing (hypothesis), contracts libraries, type hints; but nothing like a built-in, enforced spec language.