Extracting type metadata from Elm code?

The problem: Given a big, hairy, often-changing Route custom type, and a pair of functions Route -> String and String -> Maybe Route, how can we be sure that (almost) every Route turned into a string and back into Route produces the same value that we started with?


Route looks like:

module Route exposing (..)

import UserRoutes exposing UserRoutes
import TaskRoutes exposing TaskRoutes

type Route
    = Home
    | Settings
    | About
    | User String UserRoutes
    | Task Int TaskRoutes
    -- ...dozens more...

Is there some tool I can point at Route.elm, say “give me the structure of the Route type”, and have it follow the module imports, so that it returns an enumerable list of properties? Something like:

{
  type: "CustomType",
  name: "Route",
  members: [
    {
      name: "Home",
    },
    {
      name: "Settings",
    },
    {
      name: "About",
    },
    {
      name: "User",
      args: [
        {
          type: "String",
        },
        {
          type: "CustomType",
          name: "UserRoutes",
          members: [
            {
              name: "Summary",
            },
            {
              name: "Activity",
            },
            // ... etc ...
          ],
        }
      ],
    },
    {
      name: "Task",
      args: [
        {
          type: "Int",
        },
        {
          type: "CustomType",
          name: "TaskRoutes",
          members: [
            {
              name: "Active",
            },
            {
              name: "Overdue",
            },
            // ... etc ...
          ],
        }
      ],
    },
    // ... etc ...
  ]
}

I’m thinking that I’d use that to generate a RouteTest.elm file that uses default (or fuzzed) values for the Ints and Strings, which might look like:

suite : Test
suite =
    describe "routes"
        [ test "should parse generated cases" <|
            \_ ->
                let
                    data =
                        [ Home
                        , Settings
                        , About
                        , User "a" UserRoutes.Summary
                        , User "a" UserRoutes.Activity
                        , Task 1 TaskRoutes.Active
                        , Task 1 TaskRoutes.Overdue
                        -- ... plus the rest ...
                        ]

                    actual =
                        data |> List.map (routeToString >> routeFromString)

                    expected =
                        data |> List.map Just
                in
                actual |> Expect.equalLists expected
        ]
2 Likes

I also encountered this situation (almost everyone writing this kind of Route module does!).

What I do for now is to write (by hand) a routeFuzzer : Fuzzer Route being in charge of randomly generate all possible routes and then I only have one test being:

fuzz routeFuzzer
        "Test route serialization/parsing matching"
        (\route ->
            ("http://domain.com" ++ Route.toString route)
                |> Url.fromString
                |> Maybe.andThen Route.parse
                |> Expect.equal (Just route)
        )

This is a bit painful to write but if you change something in your route structure, this won’t compile and you cannot miss that. The only situation where you have to be very careful is when you add a new route (but this doesn’t happen very often).

1 Like

Maybe https://package.elm-lang.org/packages/MartinSStewart/elm-serialize/1.2.1/ can help, but I’ve never used it. @MartinS could maybe give more details.

If you want to store data in a url and the format doesn’t matter (i.e. you’re okay with domain.com/<nonsense text>) then elm-serialize could work. In this case it looks like the url needs to have user readable paths so this wouldn’t work.

Edit: Codecs (generating both the encoder and decoder from a single function) could be an useful approach to handling urls though. I haven’t heard of any attempts to do this. Not sure if that’s because no one has bothered or if people have tried and it didn’t work.

1 Like

Thanks sebbes! I had tried to go the Fuzzer route, but not being able to automatically catch and test new variants seemed like a deal-breaker.

I hadn’t considered the codec approach… keeping the encoder and decoder right next to each other should make it obvious when one’s been missed, and hopefully also when they’re not symmetric (one of them isn’t correct)!

I had a go at a sample codec-style Route:

module RouteCodec exposing (..)

import Url exposing (Url)
import Url.Parser as UP exposing ((</>), Parser, s)


type Route
    = Home
    | Settings
    | User String UserRoute
    | Task Int TaskRoute


type UserRoute
    = Summary
    | Activity


type TaskRoute
    = Active Bool
    | Overdue


type alias Codec a b c =
    { encode : c -> String
    , decode : Parser (a -> b) b
    }


routeToString : Route -> String
routeToString route =
    case route of
        Home ->
            routeCodecs.home.encode ()

        Settings ->
            routeCodecs.settings.encode ()

        User username userRoute ->
            routeCodecs.user.encode ( username, userRoute )

        Task id taskRoute ->
            routeCodecs.task.encode ( id, taskRoute )


routeFromString : String -> Maybe Route
routeFromString string =
    string
        |> (++) "https://x.y/"
        |> Url.fromString
        |> Maybe.andThen (UP.parse routeParser)


routeParser : Parser (Route -> a) a
routeParser =
    UP.oneOf
        [ routeCodecs.home.decode
        , routeCodecs.settings.decode
        , routeCodecs.user.decode
        , routeCodecs.task.decode
        ]


routeCodec : Codec Route a Route
routeCodec =
    { encode = routeToString
    , decode = routeParser
    }


routeCodecs =
    let
        home : Codec Route a ()
        home =
            { encode = always "home"
            , decode = s "home" |> UP.map Home
            }

        settings : Codec Route a ()
        settings =
            { encode = always "settings"
            , decode = s "settings" |> UP.map Home
            }

        user : Codec Route a ( String, UserRoute )
        user =
            { encode =
                \( username, route ) ->
                    "user/" ++ username ++ "/" ++ userRouteCodec.encode route
            , decode =
                (s "user" </> UP.string </> userRouteCodec.decode)
                    |> UP.map User
            }

        task : Codec Route a ( Int, TaskRoute )
        task =
            { encode =
                \( taskId, route ) ->
                    ("task/" ++ String.fromInt taskId ++ "/")
                        ++ taskRouteCodec.encode route
            , decode =
                (s "task" </> UP.int </> taskRouteCodec.decode)
                    |> UP.map Task
            }
    in
    { home = home
    , settings = settings
    , user = user
    , task = task
    }



-------------------------------------------------------------- ↓ UserRoutes.elm


userRouteCodec : Codec UserRoute a UserRoute
userRouteCodec =
    { encode = userRouteToString
    , decode = userRouteParser
    }


userRouteToString : UserRoute -> String
userRouteToString route =
    case route of
        Summary ->
            userRouteCodecs.summary.encode ()

        Activity ->
            userRouteCodecs.activity.encode ()


userRouteParser : Parser (UserRoute -> a) a
userRouteParser =
    UP.oneOf
        [ userRouteCodecs.summary.decode
        , userRouteCodecs.activity.decode
        ]


userRouteCodecs =
    let
        summary : Codec UserRoute a ()
        summary =
            { encode = always "summary"
            , decode = s "summary" |> UP.map Summary
            }

        activity : Codec UserRoute a ()
        activity =
            { encode = always "activity"
            , decode = s "activity" |> UP.map Summary
            }
    in
    { summary = summary
    , activity = activity
    }



-------------------------------------------------------------- ↓ TaskRoutes.elm


taskRouteCodec : Codec TaskRoute a TaskRoute
taskRouteCodec =
    { encode = taskRouteToString
    , decode = taskRouteParser
    }


taskRouteToString : TaskRoute -> String
taskRouteToString route =
    case route of
        Active bool ->
            taskRouteCodecs.active.encode bool

        Overdue ->
            taskRouteCodecs.overdue.encode ()


taskRouteParser : Parser (TaskRoute -> a) a
taskRouteParser =
    UP.oneOf
        [ taskRouteCodecs.active.decode
        , taskRouteCodecs.overdue.decode
        ]


taskRouteCodecs =
    let
        active : Codec TaskRoute a Bool
        active =
            { encode =
                \bool ->
                    if bool then
                        "active"

                    else
                        "inactive"
            , decode =
                UP.oneOf
                    [ s "active" |> UP.map (Active True)
                    , s "inactive" |> UP.map (Active False)
                    ]
            }

        overdue : Codec TaskRoute a ()
        overdue =
            { encode = always "overdue"
            , decode = s "overdue" |> UP.map Overdue
            }
    in
    { active = active
    , overdue = overdue
    }

(gist version here, might be easier to read)

I may have missed something fundamental about Codecs, though… does a record with encode and decode properties meet the definition, or should it be constructed some other way?

Martin’s idea about Codecs is to safely build decoder/encoder from small composable blocks. If you build a Codec only using his lib, you’ll have the GUARANTEE the decoder/encoder are in sync (except maybe only for customTypes I think, where the definitions are separate ; so you only have to be very careful for those kind of codec)!

It is a bit stronger than keeping encoder/decoder “close enough” (you have to be very careful at each codec definition).

So the idea would be to write a new codec package for symmetrically parsing and building URLs?

I’d love to have this! But since I am a bit lazy, I didn’t write this lib, I wrote fuzzers by hand instead… My other dream would be a tool automagically writing those fuzzers!

1 Like

Ah, thanks. So, the same limitation affects elm-serialize:

semaphoreCodec : S.Codec Semaphore
semaphoreCodec =
    S.custom
        (\redEncoder yellowEncoder greenEncoder value ->
            case value of
                Red i s b ->
                    redEncoder i s b

                Yellow f ->
                    yellowEncoder f

                Green ->
                    greenEncoder
        )
        -- Note that removing a variant, inserting a variant before an existing one, or swapping two variants will prevent you from decoding any data you've previously encoded.
        |> S.variant3 Red S.int S.string S.bool
        |> S.variant1 Yellow S.float
        |> S.variant0 Green
        -- It's safe to add new variants here later though
        |> S.finishCustom

In fact, even the relatively simple “codec-style Route” code I posted earler had a bug - routeParser didn’t handle the User and Task routes… I wasn’t careful enough!

(aside: that we don’t have to be so careful with languages like Rust and Elm is a big part of their appeal; features that require “you have to be careful!” compel me to seek alternatives)

So it seems fundamental that within Elm, there’s no way to statically enforce the symmetry of encoder/decoder pairs. From outside Elm, there seems to be two options:

Generate the Routes, code, and tests from data external to the Elm code

  • Would have to write the Route type(s) indirectly, using the generator tool rather than plain Elm.
  • Existing editor tooling wouldn’t help while writing the routes?
  • Would probably be specific to Route-like structures; might not be applicable to general symmetry problems.
  • Hard to convert an existing codebase to using the generated code

Extract the Route type information from existing Elm code, generate the tests from that

  • Lets us keep writing Routes in plain Elm, like usual
  • Should be low-effort to set up; tell it where to find the Type and the two functions, and what to name the test, that’s it?
  • Easy to tack onto existing codebases
  • May be harder to “extract” the Type from existing code; e.g., crossing module boundaries, nested types (elm-syntax gives the AST, but that may be too low-level… would we need to recreate a subset of the compiler?)

Both approaches might need some way to limit unbounded types like Int, String (and types that wrap those)… how much could we do for the general case? How much would the user need to tell the generators about how to generate their own unbounded types?

About the “extraction Route type”, my idea is even a bit more general: for a given type T in a module M , have a fuzzer generating value of this type using all the functions available in M.

This way you can easily check invariants about T and this “parsing issue” for routes is then a trivial sub-issue of that.

Generating unbounded types is the role of the fuzzers (there is even a complete “theory” about how to “shrink” values).

Warning: trying to have a “perfect solution” sometimes leads to nightmare. We don’t launch spatial rockets, we don’t need 100% safety. Roughly speaking, I’d say with elm we have “95% safety” (by comparison of “50% safety” in JS) which is quite enough for web sites.

Honestly, my hand-written fuzzers work fine, adding totally new a route is pretty a rare. However, modifying the url parameters is pretty frequent and is well caught! So, yes, sometimes I could have some url-related bug (ergo “5% unsafe”), but I don’t need to mess up with additional tool or abstraction.

Incidentally, I did try to do just that only 2 days ago.

It would certainly be possible with an API similar to elm-codec, but coming up with an API that is similar to Url.Parser seemed very difficult.

Elm-codec is essentially similar to e.g. Json.Decoder. But Url.Parser works differently.

-- hypothetical syntax
infix <*> = D.andMap

decode : D.Decoder Person
decode =
    (D.map Person <*> D.field "name" D.string) <*> D.field "age" D.int
route : Parser (Person -> a) a
route =
     Parser.map Person (Parser.string </> Parser.int)

They both have a map, but with very different types. Both have some form of combination operator, D.andMap and </>. But you can see how the brackets associate differently.

While Json.Decode works like a basic Applicative, Url.Parser seems to have some kind of continuation-style API. I haven’t seen such an API used like that. I’d love to see more examples of it.

And yeah. I couldn’t easily build an API that feels similar to Url.Parser but achieves something like what elm-codec improves over elm/json.

You have type-nerd-sniped me. I think it might be possible to do an elm-url-codec, but I’ll have to check the types closely

3 Likes

Work in progress, this works-ish but it will need more complex types to transform oneOf into the custom/variant style from elm-codec

https://ellie-app.com/9WkhnsWqzgDa1

4 Likes

Thanks miniBill, that’s very interesting! The oneOf problem is a stumper…

Disclaimer: I’m not sure the API is general enough but LGTM: https://ellie-app.com/9WCmVb7gHswa1

EDIT: wrong link

P.S.: https://www.youtube.com/watch?v=CduA0TULnow

2 Likes

keen!
I’m not sure how to add a custom type route though… here’s where I got up to: https://ellie-app.com/9WDZnrhbJpta1

type Route
    = Blog Int
    | Article String Int
    | User UserRoute -- this is the new route
    | Home


type UserRoute
    = About
    | Settings
route : UrlCodec (Route -> a) a Route
route =
    adt
        (\fhome fblog farticle fuser value ->
            case value of
                Home ->
                    fhome

                Blog i ->
                    fblog i

                Article se i ->
                    farticle se i

                User user_route ->
                    fuser user_route
        )
        |> variant0 Home
        |> variant1 Blog (comb (s "blog") int)
        |> variant2 Article string int
        |> variant1 User (comb (s "user") userRoute) -- using a `variant1` feels right, here...
        |> buildCustom


userRoute : UrlCodec (UserRoute -> a) a UserRoute
userRoute =
    adt
        (\fabout fsettings value ->
            case value of
                About ->
                    fabout

                Settings ->
                    fsettings
        )
        |> variant0 About
        --|> variant1 Settings userSettings
        |> variant0 Settings -- but neither `variant0` or `variant1` feel right here?
        |> buildCustom

Indeed! The type for variant0 wasn’t general enough!

https://ellie-app.com/9WK46jQ8QHta1

4 Likes

@miniBill awesome job! Any chance of making it into a library? :slight_smile:

1 Like

I’m considering it.
Especially because I finally managed to have the API I wanted all along:
https://ellie-app.com/9Z34Xrhwm7Xa1

You can tell this is the “correct” API because the types are finally symmetric and clean.

Nontrivial insights I had to find:

  1. The type of prettyPrinter inside the UrlCodec / how to implement int;
  2. the implementation of prettyPrinter inside variant.

The rest is basically all blindly solving the type puzzle.

3 Likes