Your task is to write a library to work with an HTTP API of a service provider you use. A common pattern in these payloads is to use a string that is documented to be like an enumeration:
{ ...
"detailed_result": "INVALID_TOKEN"
...
}
Where the values of the key "detailed_result"
belong to a set of
values which have some meaning attached to them.
You might be tempted, after reading their documentation, to enumerate each of the values in a sum-type:
data DetailedResult
= InvalidToken
| AddressMatchFailed
...
deriving (Bounded, Eq, Enum, Show)
We can then use the aeson
library to derive the JSON parser and
serializer for us. This will work for a time until the service
provider decides to add new values to the, pseudo-enumeration. For
it is not a true enumeration but an open set of string constants. One
that the service provider might add new members to without informing
you.
In this post I will show you three ways we can handle this situation in Haskell. We want to make sure that our code only operates on valid values of this string. But we also want to make sure that we can accept new values without causing a parse failure. It’s also worth noting that we do trust this system to a degree and so storing unrecognized values is okay as long as we don’t use them.
The Cop Out: Avoid Types
This is the easiest solution. Don’t make it an enumeration or try to
parse it at all. Treat the value as Text
and match on the known
values at run-time:
case detailedReason of
"INVALID_TOKEN" -> _
"ADDRESS_MATCH_FAILED" -> _
-> error "Unrecognized detailed_reason" _
The trade-off here is that we will not be able to lean on the type checker to help us. We will have to document the valid, recognized values somewhere. Programmers using our code will have to know to look those values up… and we better be careful about checking for spelling errors in the string literals we’re matching against.
The Open-Ended Sum Type
This approach has many of the benefits of encoding an enumeration in a sum type with the addition of a constructor that holds any unrecognized value.
data DetailedResult
= InvalidToken
| AddressMatchFailed
...
| UnrecognizedDetailedResult Text
deriving (Eq, Show)
We will have to write the JSON instances by hand here.
All that is required from code that uses this type now is to avoid
using the value of UnrecognizedDetailedResult
when pattern matching.
processResult :: DetailedResult -> IO ()
= case detailedResult of
processResult detailedResult InvalidToken -> putStrLn "Handling INVALID_TOKEN"
AddressMatchFailed -> putStrLn "Handling ADDRESS_MATCH_FAILED"
...
UnrecognizedDetailedResult _ ->
-- Do not use the value of `UnrecognizedDetailedResult`...
putStrLn "Skipping unrecognized result"
This approach allows us to lean more on the type system. It gives us the benefit of enumerating the valid, recognized values as constructors which will play nicely with our tooling. However the trade-off is that we will have to always handle the unrecognized case in our pattern matches. This means we have to avoid using the unrecognized value by convention since the type system will not prevent any callers from using it.
The Phantom Parameter
This solution uses type-level machinery. We want to be able to add a tag to our structure which will tell the type checker whether the value is recognized. This way we can write functions that only accept recognized values and get a type error if we make a mistake.
First we will need to use some language extensions:
{-# LANGUAGE GADTs #-}
{-# LANGUAGE EmptyDataDeriving #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneDeriving #-}
Let’s define some empty types to represent whether a value is recognized or not:
data Recognized deriving (Eq, Show)
data Unrecognized deriving (Eq, Show)
We need the EmptyDataDeriving
extension for these
definitions. Normally you can’t derive stock instances for types with
no constructors. We may want to be able to use these instances, at
least when our code is under development, so we use this handy
extension.
We will use these types as “tags” to tell the type system which values we consider “recognized”.
Next we will define a generalized algebraic data type or GADT which will take a type parameter:
data DetailedResult a where
InvalidToken :: DetailedResult Recognized
AddressMatchFailed :: DetailedResult Recognized
UnrecognizedResult :: String -> DetailedResult Unrecognized
deriving instance Show a => Show (DetailedResult a)
Using a GADT allows us to “set” the type of a
with our
constructors. When I say that we will be tagging the structure with
a type, this is what I mean. The type parameter here will be used as a
tag by us to tell the type checker whether our function cares about
Recognized
values or Unrecognized
values. This means that we can
write explicit types such as:
processDetailedResult :: DetailedResult Recognized -> IO ()
= putStrLn "processing valid detailed result" processDetailedResult _
And we will get a type error if we try to pass in an unrecognized value:
ghci> processDetailedResult (UnrecognizedResult "WAT")
<interactive>:184:24-48: error: [GHC-83865]
• Couldn't match type ‘Unrecognized’ with ‘Recognized’
Expected: DetailedResult Recognized
Actual: DetailedResult Unrecognized
...
It also means that the body of processDetailedResult
doesn’t have to
handle unrecognized values because it will never accept them by
definition.
But why not use this type parameter on a regular sum-type?
-- Note the addition of the `a` type parameter...
data DetailedResult' a
= InvalidToken'
| AddressMatchFailed'
...
| UnrecognizedResult' Text
deriving (Eq, Show)
Well that’s because we can’t “set” the type of a
… it’s implicitly
defined as, for all types represented by “a”. This means if we try
to block unrecognized values from being processed with:
processResult' :: DetailedResult' Recognized -> IO ()
= putStrLn "Processing recognized result..." processResult' _
The fact that the type signature says DetailedResult' Recognized
doesn’t mean only values of DetailedResult' Recognized
:
ghci> processDetailedResult' (UnrecognizedResult' "WAT")
Processing recognized result...
It turns out the type of UnrecognizedResult' "WAT"
is
DetailedResult' a
which fits when we want to evaluate
processDetailedResult'
. Every constructor returns this value and so
each one will match the signature.
When we use a GADT however we plug this “for all types” hole by
setting the type of the type parameter depending on the constructor
used in the GADT. This allows the compiler to infer what type a
is. It can use this information when evaluating processResult
as a
consequence.
Now let us write a basic text parser. It should give you an idea of how to write the JSON instances for this type:
fromText :: Text -> DetailedResult ???
This isn’t going to work. What type do we need for ???
? If we try a
polymorphic variable when we try to return RecognizedResult
we will
get a type error because Recognized
is only one type and a
is
implicitly defined as “for all types,” as in:
fromText :: forall a. Text -> DetailedResult a
Recognized
is only one type of all types.. There’s one more special
type we need to make this work:
data Some t where
Some :: Show a => t a -> Some t
It might not be clear to you what this is useful for if you don’t have
a strong grasp of GADTs and pattern matching yet. That’s okay! What
this type is doing is filling in for our ???
type. This type is
telling callers of our function that they will get some
DetailedResult
and they will have to figure out which one they have.
So we can change our type signature and fill in the definition like so:
fromText :: String -> Some DetailedResult
= case s of
fromText s "INVALID_TOKEN" -> Some InvalidToken
"ADDRESS_MATCH_FAILED" -> Some AddressMatchFailed
...
-> Some $ UnrecognizedResult s _
Callers can figure out which DetailedResult
they have by using
pattern matching.
Before we demonstrate it’s use let’s add a helpful Show
instance for
Some DetailedResult
:
-- This is why we need the FlexibleInstances extension
deriving instance Show (Some DetailedResult)
This means we can use fromText
like so:
ghci> fromText "INVALID_TOKEN"
Some InvalidToken
And we can determine which kind of DetailedResult
we have received
like this:
case fromText "INVALID_TOKEN" of
Some (UnrecognizedResult _) ->
putStrLn "Cannot process unrecognized result"
Some r@InvalidResult -> processDetailedResult r
Some r@AddressMatchFailed -> processDetailedResult r
Did you notice the symmetry between the arrow in the Some
constructor and the pattern match above? Pay attention to the “shapes”
of the expressions: t a -> Some t
and Some InvalidToken -> _
. Even though the a
is on the left side of the arrow in the GADT
definition we can match on it’s value in the left side of the pattern
match. It turns out this notion is generally useful and there’s the
some library for
working with them.
The benefit here is that we no longer have to write functions that always handle the case of unrecognized values. Instead we can write functions that only accept recognized values.
The trade off here is that we need to express more of what we want to the type system which requires a little more code/effort.
The Symbolic Approach
We can go even further with the type system in Haskell. With some more
extensions and libraries it’s possible for us to write code that will
promote our "detailed_result"
values to type-level Symbol
s… so
long as the values are valid Haskell symbols as well.
However the benefits of doing so don’t remove any of the trade-offs listed in the previous section and make the code significantly more difficult to read as it will rely on inference to determine which value is being matched instead of terms.
I may discuss this approach in a future post.
Conclusion
Use the approach that is sufficient for the task at hand.
I would go for the Cop Out for a task-oriented script where the code isn’t going to be shared or re-used. A simple comment enumerating the values or a URL where the reader can find them is good enough. When it’s more important to get the job done and out of the way this will be my preferred approach.
However if I intend to write a library to interact with this service that will be shared and re-used in many places I would prefer the Phantom Parameter. The added benefit of tagging the values that are recognized means that we can eliminate the need to handle unrecognized values in every function while still getting the benefit of exhaustive pattern matching and support from the type checker.
If I am working with a team that is predominantly junior-to-intermediate Haskell developers that will be maintaining this library I might consider the Open-Ended Sum Type since it requires a little less Haskell knowledge to get going with and is still type safe. We may have to be careful about handling unrecognized values but hopefully this can be caught by code review and testing.
All in all, how much you leverage the type system, is up to you. Happy hacking!