How to understand OCaml's GADTs.

OCaml has this weird feature of its type system called "Generalized Algebraic Data Types", or GADTs, for short. Experts typically use this feature to haze newcomers to the language (unintentionally, of course). This page aims to fix this problem by enabling the reader to be the expert inflicting the suffering, rather than the sufferer.

The first step I want to take is to re-frame your understanding of "normal" variant constructors: constructors are just plain old functions. In OCaml, this is a bit of a lie, but it's not too far off; we can't pass constructors around as first-class functions, but building something with them looks pretty similar to calling a function.

Here's the option type in OCaml.

type 'a option =
  | None
  | Some of 'a

This is the syntax we would usually use, but there is an alternate syntax that lets you write the constructor as if it were a function.

type 'a option =
  | None : 'a option
  | Some : 'a -> 'a option

That's really not too far from what you often see in OCaml module signatures.

type 'a option

val none : 'a option
val some : 'a -> 'a option

What I just showed you was a translation from presenting option as a list of constructors to presenting it as a list of functions. You can go the other way too! Consider this subset of the List module.

type 'a list

val empty : 'a list
val cons : 'a -> 'a list -> 'a list
val map : 'a list -> f:('a -> 'b) -> 'b list

Translating this back to constructors is not immediately straightforward because constructors are only allowed to take a single parameter, so let's start with the intermediate step of translating these curried functions into functions that take a tuple of arguments.

type 'a list

val empty : 'a list
val cons : 'a * 'a list -> 'a list
val map : 'a list * ('a -> 'b) -> 'b list

Now it should be easy to translate these functions to constructors.

type 'a list =
  | Empty : 'a list
  | Cons : 'a * 'a list -> 'a list
  | Map : 'a list * ('a -> 'b) -> 'b list

It's perfectly understandable if the Map constructor is a bit baffling. We're used to List.map being a function that does something, but here we've just packaged it up as some data. On the other hand, if you're comfortable with functions being first-class values, then this shouldn't bother you at all; Map is just an ordinary constructor that contains two pieces of sub-data.

One way to look at constructors is that they are functions that don't do anything in the moment, but then later you can look at them to see what parameters they were called with. We can pattern match on the fancy list type and eliminate the Map constructor by actually doing the mapping.

let rec do_maps : type a. (a -> b) -> a list -> b list =
  fun f xs ->
    match xs with
    | Empty -> Empty
    | Cons (x, xs) -> Cons (f x, do_maps f xs)
    | Map (xs, g) -> do_maps (fun x -> f (g x)) xs

Don't worry if this looks foreign; it's a really weird function. In fact, it's even weirder than you probably realize; the type inference algorithm is hiding some of the nuance of this code. The key point of intrigue is in what type the xs variable has in the pattern Map (xs, g); it's worth pondering for a minute or two before reading on.

The answer is that the compiler has to make up a temporary abstract type for the items of the list, so xs has a type like $1 list. The $1 is the type of the list items when Map was originally called, but the compiler doesn't remember what the type is, so the body of that match case has to treat it as an abstract type. In other words, we know there exists a type for the items of that list, but we don't know what it is. Weird, huh?

Normal variants are more limited than GADTs.

This is the section of the post where we start talking about GADTs. Just kidding! We've been talking about GADTs this whole time; did you notice? GADTs are not a new kind of thing, but rather an extension of the abilities of an existing thing, namely variants.

The view that constructors are like functions is helpful for seeing the extra power that GADTs provide. The ordinary of-syntax for variants requires any type variables in the constructor to be matched by type variables in the type itself. For example, here is some invalid code in which we try to define the fancy list type with of-syntax.

type 'a list =
  | Empty
  | Cons of 'a * 'a list
  | Map of 'x list * ('x -> 'a)

The Map constructor here is illegal because it mentions a 'x variable that wasn't mentioned in list's type parameters.

Another limitation on constructor "functions" enforced by of-syntax is that the result of the function is always the type itself, with its unadorned type variables. To illustrate this, let us try to translate the zip function to a constructor.

val zip : 'a list -> 'b list -> ('a * 'b) list

Doing it with the new syntax we learned is easy.

  | Zip : 'a list * 'b list -> ('a * 'b) list

It should be similarly easy to do this with of-syntax, right? After all, all the type variables in the parameters do show up in the result.

  | Zip of 'a list * 'b list

Wait, but how do we say that the result is ('a * 'b) list? That's exactly the problem. We might try to add it in the type declaration.

type ('a * 'b) list =
  | Zip of 'a list * 'b list

But this is not allowed.

Why are GADTs scary?

I think the idea that constructors are like functions is pretty approachable. So why do people find GADTs so scary? One reason is that people often given scary explanations of GADTS, sadly. However, another reason is that they are legitimately more complicated to use. It turns out that the limitations afforded by the of-syntax of normal variants prevents weird type situations (like with do_maps) from arising.

let rec do_maps : type a. (a -> b) -> a list -> b list =
  fun f xs ->
    match xs with
    | Empty -> Empty
    | Cons (x, xs) -> Cons (f x, do_maps f xs)
    | Map (xs, g) -> do_maps (fun x -> f (g x)) xs

We already mentioned that the compiler has to make up some types in the middle of this function. However, we didn't address another crucial piece of this code.

let rec do_maps : type a. (a -> b) -> a list -> b list =
  ...

While this may look like any other type annotation, it's actually a special kind that allows this function to do "polymorphic recursion". The reason is that the recursive calls in do_maps may have to run on lists of many different types, due to the flexibility added by the Map constructor.

Understanding every part of this example is unnecessary to grasp the point that GADTs do make things more complicated.

When should I use GADTs?

This is the wrong question. You shouldn't be choosing between normal variants and GADTs. Instead, think of constructors as functions whose calls you want to inspect later. You'll know you need a GADT if the type signatures of those functions violate the rules of OCaml's of-syntax. (the compiler will tell you when that happens)

Fortunately, most variants don't need the extra power. It would be sad if they did, since several things get worse with GADTs. For instance, you need more type annotations to make the compiler happy. Related to this, OCaml's ppx_deriving extensions don't work because they don't have access to type information, so they can't generate the proper type information required by generated code.

To me, the bottom line is that you should use them if they are useful, with a slight bias toward avoiding them, since they tend to increase code noise.