I’m trying to figure out why function equality is considered hard (or undecidable).
Is it because of an overly aggressive definition of equality? For example, I can understand why it would be difficult to determine that two implementations of a function were identical without comparing the generated code or ASTs or whatever and that this would become even worse (impossible) in the presence of commutative and associative operations, etc. After all, the halting problem is essentially asking whether a particular function is equivalent to the function that fails to halt.
But this seems like an overly aggressive definition of equality in a programming context even in a context with referential transparency. It seems like all a language with referential transparency needs to provide is:

A set of “primitive” values for which equality is defined. For example, equality for integers is numerical equality for integers.

Equality needs to be reflexive, symmetric, and transitive.

A guarantee that if
fn1
“equals”fn2
andarg1
“equals”arg2
, thenfn1 arg1
“equals”fn2 arg2
. In other words, enough to make substitution work since function application is our only method of combining items.
Now, think of functions in terms of “templates” and “instantiations”. A function template includes its code and its references to other global values. A function template essentially corresponds to a particular piece of source code and gains its identity for equality purposes from that relationship. A function instance is created from a function template by binding any lexicallyscoped free variables not tied to globals (those having been handled at the template level). Given this construction, just like other data structures we now have a DAG terminating in primitive values at the leaves. Function templates are primitives and we don’t attempt to identify them as equal beyond source code identity. Function instances reference templates and lexically bound values which have to exist before the instance is created and hence cannot create cycles.
Am I missing something? I can understand where Elm’s actual implementation of functions might make this harder to check. I also see where it might be desirable to have a more expansive definition of equality that would recognize more values as equal to one another, but I would then ask about the reasoning that drives that trade off in terms of issues like performance and inability to test function equality.