OCaml has this weird feature of its type system called "Generalized Algebraic Data Types", or GADTs, for short. Experts typically use this feature to haze newcomers to the language (unintentionally, of course). This page aims to fix this problem by enabling the reader to be the expert inflicting the suffering, rather than the sufferer.
The first step I want to take is to re-frame your understanding of "normal" variant constructors: constructors are just plain old functions. In OCaml, this is a bit of a lie, but it's not too far off; we can't pass constructors around as first-class functions, but building something with them looks pretty similar to calling a function.
Here's the option
type in OCaml.
type 'aoption = | None | Someof 'a
This is the syntax we would usually use, but there is an alternate syntax that lets you write the constructor as if it were a function.
type 'aoption = | None : 'aoption | Some : 'a -> 'aoption
That's really not too far from what you often see in OCaml module signatures.
type 'aoption val none : 'aoption val some : 'a -> 'aoption
What I just showed you was a translation from presenting
option
as a list of constructors to presenting it as a list of
functions. You can go the other way too! Consider this subset of the
List
module.
type 'alist val empty : 'alist val cons : 'a -> 'alist -> 'alist val map : 'alist -> f:('a -> 'b) -> 'blist
Translating this back to constructors is not immediately straightforward because constructors are only allowed to take a single parameter, so let's start with the intermediate step of translating these curried functions into functions that take a tuple of arguments.
type 'a listval empty : 'alist val cons : 'a * 'alist -> 'alist val map : 'alist * ('a -> 'b) -> 'blist
Now it should be easy to translate these functions to constructors.
type 'alist = | Empty : 'alist | Cons : 'a * 'alist -> 'alist | Map : 'alist * ('a -> 'b) -> 'blist
It's perfectly understandable if the Map
constructor is a
bit baffling. We're used to List.map
being a function that
does something, but here we've just packaged it up as some data.
On the other hand, if you're comfortable with functions being first-class
values, then this shouldn't bother you at all; Map
is just an
ordinary constructor that contains two pieces of sub-data.
One way to look at constructors is that they are functions that don't do
anything in the moment, but then later you can look at them to see what
parameters they were called with. We can pattern match on the fancy list
type and eliminate the Map
constructor by actually doing the
mapping.
let rec do_maps :type a. (a -> b) -> alist -> blist =fun f xs ->match xswith | Empty -> Empty | Cons (x, xs) -> Cons (f x, do_maps f xs) | Map (xs, g) -> do_maps (fun x -> f (g x)) xs
Don't worry if this looks foreign; it's a really weird function. In
fact, it's even weirder than you probably realize; the type inference
algorithm is hiding some of the nuance of this code. The key point of
intrigue is in what type the xs
variable has in the pattern
Map (xs, g)
; it's worth pondering for a minute or two before
reading on.
The answer is that the compiler has to make up a temporary abstract type
for the items of the list, so xs
has a type like $1
.
The $1
is the type of the list items when
Map
was originally called, but the compiler doesn't remember
what the type is, so the body of that match case has to treat it as an
abstract type. In other words, we know there exists a type for the
items of that list, but we don't know what it is. Weird, huh?
This is the section of the post where we start talking about GADTs. Just kidding! We've been talking about GADTs this whole time; did you notice? GADTs are not a new kind of thing, but rather an extension of the abilities of an existing thing, namely variants.
The view that constructors are like functions is helpful for seeing the
extra power that GADTs provide. The ordinary
-syntax for
variants requires any type variables in the constructor to be matched by
type variables in the type itself. For example, here is some invalid code
in which we try to define the fancy list type with
-syntax.
type 'alist = | Empty | Consof 'a * 'alist | Mapof 'xlist * ('x -> 'a)
The Map
constructor here is illegal because it mentions a
'x
variable that wasn't mentioned in
's type
parameters.
Another limitation on constructor "functions" enforced by
-syntax is that the result of the function is always the
type itself, with its unadorned type variables. To illustrate this, let us
try to translate the zip
function to a constructor.
val zip : 'alist -> 'blist -> ('a * 'b)list
Doing it with the new syntax we learned is easy.
| Zip : 'alist * 'blist -> ('a * 'b)list
It should be similarly easy to do this with
-syntax, right?
After all, all the type variables in the parameters do show up in
the result.
| Zipof 'alist * 'blist
Wait, but how do we say that the result is ('a * 'b) list
?
That's exactly the problem. We might try to add it in the type declaration.
type ('a * 'b)list = | Zipof 'alist * 'blist
But this is not allowed.
I think the idea that constructors are like functions is pretty
approachable. So why do people find GADTs so scary? One reason is that
people often given scary explanations of GADTS, sadly. However, another
reason is that they are legitimately more complicated to use. It turns out
that the limitations afforded by the
-syntax of normal
variants prevents weird type situations (like with do_maps
)
from arising.
let rec do_maps :type a. (a -> b) -> alist -> blist =fun f xs ->match xswith | Empty -> Empty | Cons (x, xs) -> Cons (f x, do_maps f xs) | Map (xs, g) -> do_maps (fun x -> f (g x)) xs
We already mentioned that the compiler has to make up some types in the middle of this function. However, we didn't address another crucial piece of this code.
let rec do_maps :type a. (a -> b) -> alist -> blist = ...
While this may look like any other type annotation, it's actually a special
kind that allows this function to do "polymorphic recursion". The reason is
that the recursive calls in do_maps
may have to run on lists
of many different types, due to the flexibility added by the
Map
constructor.
Understanding every part of this example is unnecessary to grasp the point that GADTs do make things more complicated.
This is the wrong question. You shouldn't be choosing between normal
variants and GADTs. Instead, think of constructors as functions whose calls
you want to inspect later. You'll know you need a GADT if the type
signatures of those functions violate the rules of OCaml's
-syntax. (the compiler will tell you when that happens)
Fortunately, most variants don't need the extra power. It would be sad
if they did, since several things get worse with GADTs. For instance, you
need more type annotations to make the compiler happy. Related to this,
OCaml's ppx_deriving
extensions don't work because they don't
have access to type information, so they can't generate the proper type
information required by generated code.