Azavea’s GeoTrellis library has been developed using the Scala language. Scala provides some elements of both functional and object-oriented approaches to programming. We selected it because it provided support for the functional approach, but as a hybrid between the two approaches, the core language is sometimes frustrating, particularly for people that have experience with “pure” functional languages. Over the past few years, two libraries, Scalaz and Cats, have been developed to provide more purely functional abstractions to the language. The GeoTrellis team recently considered the question: Should we use Scalaz or Cats?
Scalaz and Cats are libraries which provide Functional Programming constructs for Scala.
The move to adopt one or the other stems from a desire to reduce boilerplate and simplify
our API using community-understood Functional Programming concepts.
After a thorough research period that compared the two libraries in depth (something that
had apparently not been done before in the community), GeoTrellis has decided
to use the Cats library.
Below I’ll describe the reasons for our decision, and layout some recommendations
for usage should other Scala teams wish to tread a similar path.
1 The Decision
As the one who did the research, I got to know both libraries and both communities
fairly well. When it came down to my own vote, I was conflicted on which to choose.
The two libraries have similar APIs and comparable performance, and I found the
contributors to both libraries to be welcoming, hard-working, and intelligent.
I’ve seen where both libraries came from, where they are today, and where they’re going.
I have my own inner thoughts about the future of Functional Programming in Scala,
but as of today GeoTrellis is going with Cats for one major reason: Discoverability.
We could likely keep usage of Cats hidden in our internals, but more than likely
some of it will trickle up to the user-facing API. For instance, here is a new Layer
typeclass that we’re considering for GeoTrellis:
@typeclass trait Layer[F[_]] extends Functor[F[_]] { ... }
Should a user investigate Layer
in our Scaladocs, they would see Functor
.
As the GeoTrellis authors, it’s then our responsibility to make sure that curious users
have immediate access to supplementary resources, should they want
to learn more. While both Scalaz and Cats have a wealth of learning materials,
we found that Cats has more approachable documentation “up front”.
Fortunately, even with things like Functor
visible on top-level symbols, we’re
confident that the introduction of Cats and Simulacrum typeclasses will
greatly simplify GeoTrellis for both users and developers.
2 Usage Recommendations
2.1 Eq
and Show
The typeclasses Eq
and Show
can supply immediate type safety guarantees
to Scala code. Eq
exposes the type-safe equality operator ===
:
scala> import cats.implicits._ scala> 1 === 1 res0: Boolean = true scala> List(1,2,3) === List(2,3,4) res1: Boolean = false
Unlike vanilla Scala’s ==
, which can compare any two types for equality (even when
doing so is meaningless), Eq.===
will only compile if used on two values of the same
type. Enforcing this has two benefits:
- No sneaky bugs from accidental comparisons of meaningless things.
- A 20% performance improvement over
==
for some structures!
Show
exposes .show
, a type-safe variant of .toString
. While .toString
can be used on any type (even when doing so is meaningless), only types
for which stringification is meaningful have an instance of Show
.
The benefit of this is primarily in avoiding subtle bugs.
Code quality analysis tools like Codacy consider usage of ==
and .toString
to be bad practice, and can potentially fail your CI if it catches you using them.
2.2 Semigroup
and Monoid
These are things that are “fundamentally combinable”. Like Int
under addition,
if you have a type that satisfies:
/* Arithmetic */ 1 + (5 + 7) == (1 + 5) + 7 /* Your type */ a <> (b <> c) == (a <> b) <> c
then your type is a Semigroup
. If your type also has some analogue
to 0 under addition:
/* Arithmetic */ x + 0 == x /* Your type */ x <> zeroishThing == x
then your type is also a Monoid
! By defining instances of Semigroup
and
Monoid
for your type, you can take advantage of a number of “free” operations
that are mathematically guaranteed to behave sanely.
2.3 Functor
Many Scala types have a .map
method. If you’ve ever done:
val foo: Option[Int] = Some(1) foo.map(_ + 1) // Some(2) val bar: List[Int] = List(1, 2, 3) bar.map(_ + 1) // List(2, 3, 4)
then you’ve take advantage of the fact that Option
and List
are both Functor
s.
Most “mappable” things are a Functor
. By being honest about this behaviour and giving
our own types instances of Functor
too, we can write clean, generic code, and also
utilize more interesting typeclasses that rely on Functor
.
2.4 Foldable
If your type is a Functor
, it’s almost certainly a Foldable
too. Foldable
generalizes the idea of foldLeft
and foldRight
by using Monoid
. It says
“if you give me a container full of Monoid
things, I can crush them down
sanely into a single value”. My favourite operation is .fold
(also aliased
as .combineAll
):
val foo: List[Int] = List(1, 2, 3) foo.combineAll // 6 val bar: List[String] = List("My", "cat", "is", "named", "Jack") bar.combineAll // "MycatisnamedJack" val baz: List[Option[Int]] = List(Some(1), Some(2)) baz.combineAll // Some(3) val boof: List[Option[Int]] = List(Some(1), Some(2), None, Some(4)) boof.combineAll // None
2.5 Traversable
If your type is both a Functor
and a Foldable
it’s almost certainly a
Traversable
too. Traversable
exposes .traverse
and .sequence
, two
invaluable methods for handling “effects”.
.sequence
“flips” nested effects:
val foo: List[Option[Int]] = List(Some(1), Some(2), Some(3)) foo.sequence // Some(List(1, 2, 3)) val bar: List[Option[Int]] = List(Some(1), Some(2), None, Some(3)) bar.sequence // None
.traverse
accomplishes something similar, but is .map
-like:
val foo: List[Int] = List(2, 4, 6) val bar: List[Int] = List(2, 5, 6) val f: Int => Option[String] = { n => if (n % 2 == 0) Some(n.show) else None } foo.traverse(f) // Some(List("2", "4", "6")) bar.traverse(f) // None
Note: If you ever see foo.map(f).sequence
in code, this can always be replaced
with foo.traverse(f)
, which would be much more efficient.
All of these examples used List
and Option
, but of course there are many other
combinations. Most of the vanilla Scala collections can be used and combined
in this way.
2.6 IO
IO
is not a typeclass, it’s a normal data type. It’s power comes from
segregation of side-effects, which are usually allowed anywhere in Scala.
One of Scala’s weaknesses is that it’s not referentially transparent. Any
function/method in Scala can perform input/output or mutable global state.
This means that all uses of vals/vars are assignments and not declarations
of mathematical equality:
def foo: Int val x: Int = foo
Here x
and foo
are not referentially transparent (they are not equivalent in
the mathematical sense – one can not be replaced with the other at use-sites).
This means the following two lines are not the same:
val a: Int = x + x val b: Int = foo + foo
Even if a == b
! Why? Scala allows side-effects to be performed anywhere, so
foo
could be defined as:
def foo: Int = { println("hi!"); 1 }
Looking at the exposed API and not the code, the user has no idea what lurks
under the covers of foo
. If we now draw back the curtains on our a-b example
above, we see:
// also prints "hi!" once the first time x
is evaluated.
val a: Int = 1 + 1
val b: Int = { println("hi!"); 1 } + { println("hi!"); 1 }
The real-world effects of this are two-fold:
- Optimization/inlining becomes hard for the compiler, since you can’t guarantee
behaviour of any functions ahead of time. - Users and devs can’t trust APIs – there’s no way to know what a function really does
until you look right at the code, which is a huge failure of abstraction and generally
a waste of people’s time.
The IO
type from cats-effect helps with this. It asks us to be honest about which
parts of our code are effectful and which aren’t:
/* Read some runtime configuration. Application secret keys, maybe? */
def readConf(path: String): IO[Conf] = { ... }
/* Activate your database */
def initDB(conf: Conf): IO[DBHandle] = { ... }
/* Perform some query */
def lookup(h: DBHandle, query: Query): IO[Foo] = { ... }
/* Some pure transformation. No IO! */
def transform(foo: Foo): Foo = { ... }
def work(args: Array[String]): IO[Foo] = {
val path: String = ??? // from args somehow
val query: Query = ???
/* (>>=) is the canonical alias for flatMap
*/
readConf(path) >>= initDB >>= { lookup(_, query).map(transform) }
/* Equivalent:
for {
conf <- readConf(path)
hand <- initDB(conf)
foo <- lookup(hand, query)
} yield transform(foo)
*/
}
def main(args: Array[String]): Unit = {
work(args).attempt.unsafeRunSync match {
case Left(err) => ... // handle the error safely
case Right(foo) => println(s"Success: ${foo.show}")
}
}
If all side-effects are contained to things marked with IO
, then we know
that strange runtime errors could never come from pure functions like
transform
. Luckily, IO
also catches exceptions for us similar to Try
,
and lets us handle them gracefully as seen in main
.
Unlike Haskell, Scala does not force usage of IO
in applications. So,
its usage would have to be a “best practice” on the team. To help with this,
a future version of the scalafix linter will ban usage of side-effectful
functions like println
in methods that do not return in the IO
type.
Allow me to be frank: if we find ourselves having thoughts like “escape hatches
are unavoidable in real code” or “oh, it couldn’t hurt to just slip some
innocent file reading in here…” we must stop ourselves. We are being lazy and
our design is almost certainly incorrect. For the sake of the future sanity of
both us and our colleagues, we must take a step back and rework things to use
the IO
type. The result will be much cleaner, I promise you. Likewise, if we
hear colleagues utter the sentiments above, we should douse them in Holy Water
and show them this article.
2.7 Defining Typeclass Instances
When defining a “typeclass instance” for your type, please do so in that type’s
companion object:
import cats._ case class Pair(a: Int, b: Int) object Pair { /* via the "kittens" library */ implicit val pairEq: Eq[Pair] = derive.eq[Pair] implicit val pairSemi: Semigroup[Pair] = new Semigroup[Pair] { def combine(x: Pair, y: Pair): Pair = Pair(x.a + y.a, x.b + y.b) } }
Not doing so is called writing “Orphan Instances”, which are a source of great
import confusion. Languages that have first-class typeclass support throw
compiler warnings when you write orphans, so please believe me that it’s an
anti-pattern.
The kittens library can be used to automatically derive instances for the standard
typeclasses.