Notes on pattern matching in Scala

2015-09-30

Geckos

This is a write-up of a quick presentation for summarizing Scala's pattern matching facilities. I'll walk through a couple of examples and try to point out interesting options and some lessons I've learned.

Background

Pattern matching has many meanings to different people, even if we limit the context to programming languages – so what do I mean? I don't mean regular expression matching. I also don't mean splitting function definitions up based on patterns like you can in Haskell and many other languages:

fac 0 = 1
fac n = n * fac n - 1

For me, pattern matching is a useful tool that allows binding functionality to data structures in a declarative way. This comes back to a difference that I see between the two paradigms of object-oriented and functional programming: The former aims to keep data and functionality close together, while the latter separates them as orthogonal concerns. Pattern matching is most common in functional languages and can be seen as supportive glue between data and functionality.

Basics

Let's look at a short example for pattern matching in Scala:

val target = Option("Peter")
target match {
  case Some(n) ⇒ println(s"yay, found $n!")
  case None    ⇒ println(":(")
}

Will print:

yay, found Peter!

From a high-level the match functionality looks like a switch statement: We have a couple of clauses with a pattern that acts as a guard on the left and associated functionality on the right side. One difference to note is that we can bind a value to a name and refer to it via the name on the right side.

In Scala there is no need to be complete when you list the case statements, the following works just fine:

val target = Option("Hans") target match { case Some(n) ⇒ println(s"yay, found $n!") }

Will print:

yay, found Hans!

But keep in mind that all of this ends up being matched at runtime, and there is no guarantee that the clauses include patterns that match a given value. The following will fail at runtime with a MatchError:

val target = Option.empty[String]
target match {
  case Some(n) ⇒ println(s"yay, found $n!")
}

But the compiler tries to warn you about this:

warning: match may not be exhaustive.
It would fail on the following input: None

But let's keep matching values. We used n to bind a string value in a previous example. Lower case identifiers can be seen as wildcard patterns, they match anything:

val target = Option("Hans")
target match {
  case p ⇒ println(s"yay, got value $p!")
}

This will print:

yay, got value Some(Hans)!

Really, anything:

null match {
  case p ⇒ println(s"yay, got a strange value: $p!")
}

Will print:

yay, got a strange value: null!

So if lower case identifiers are wildcard patterns, how can I refer back to a value that is bound outside the match statement? There are at least two ways: The first just escapes the naming convention by using an upper case letter to start the identifier:

val target = Option("Hans")
val FirstName = "Hans"
target match {
  case Some(FirstName) ⇒ println(s"yay, found $FirstName!")
}

Will cheerfully print:

yay, found Hans!

Where the following would run through the second clause:

val target = Option("Hans")
val FirstName = "Peter"
target match {
  case Some(FirstName) ⇒ println(s"yay, found $FirstName!")
  case wildcard        ⇒ println(s"unsure, found: $wildcard")
}

And print:

unsure, found: Some(Hans)

An alternative approach is to add markup to the name to tell the parser that you mean to refer to an identifier rather than declaring a wildcard pattern. You can do this by adding backticks around the name:

val target = Option("Hans")
val firstName = "Hans"
target match {
  case Some(`firstName`) ⇒ println(s"yay, found $firstName!")
}

Will print:

yay, found Hans!

Now that we know that we can bind values to names via wildcard patterns a follow up question, that @archevel brought up, is whether one could refer back to a name and what the semantics would be in that case. So for example, what happens here:

val target = Tuple2("Hans", "Hans")
target match {
  case Tuple2(n, n) ⇒ println(s"yay, found $n!")
}

The Scala compiler won't allow this and tells us that we can't re-use names:

error: n is already defined as value n

Typecase

Another basic pattern is a simple type check:

val x: Any = 21
x match {
  case i: Int ⇒ println(s"got an int smaller than ${i + 1}")
  case _      ⇒ println("not sure what i got")
}

Which will print:

got an int smaller than 22

I haven't used this pattern much, but one of the benefits you can see is that it combines an isInstanceOf with an asInstanceOf in a concise manner. If you do require a cast, this way you don't forget the check to prevent a ClassCastException. For more details consider the follow excerpt from a scalac -Xprint:patmat invocation:

val x: Any = 21;
  {
    case <synthetic> val x1: Any = x;
    case5(){
      if (x1.isInstanceOf[Int])
        {
          <synthetic> val x2: Int = (x1.asInstanceOf[Int]: Int);
          matchEnd4(scala.this.Predef.println(scala.StringContext.apply("got an int smaller than ", "").s(x2.+(1))))
        }
      else
        case6()
    };
    case6(){
      matchEnd4(scala.this.Predef.println("not sure what i got"))
    };
    matchEnd4(x: Unit){
      x
    }
  }

Deep destructuring

One of the features that Scala's pattern matching facilities enable is deep destructuring of data structures. In our first example we destructured a Some value and "extracted" the contained string:

val target = Option("Hans")
target match {
  case Some(n) ⇒ println(s"yay, found $n!")
}

This works for arbitrarily nested structures. Consider the following example:

case class Person(firstName: String, lastName: String, address: Address)
case class Address(
  street: String,
  houseNumber: String,
  city: String,
  postCode: String,
  country: Country
)
case class Country(name: String, code: String)

val hans = Person(
  "Hans",
  "Schmitt",
  Address(
    "Queen St",
    "220",
    "Auckland",
    "1010",
    Country("New Zealand", "NZ")
  )
)

hans match {
  case Person(fn, _, addr @ Address(_, _, _, _, Country(_, "NZ"))) ⇒
     println(s"yay, found a Kiwi named $fn from at $addr!")
}

Will print out:

yay, found a Kiwi named Hans living at Address(Queen St,220,Auckland,1010,Country(New Zealand,NZ))!

The destructuring allows us to easily access sub-parts of a nested data structure. This means that we can selectively pick values that we're interested in and also restrict the pattern (cf. the "NZ" literal in the above example).

Two more things to point out about the example above: _ is an anonymous wildcard pattern and matches anything, while the @ allows us to bind the value a pattern matched to a name. In the above example I use it to access the address on the right side while also destructuring and matching on sub-parts.

Disjunctions and guards

Another feature that can be helpful to concisely express a pattern are disjunctions:

val target = Option("Hans")
target match {
  case Some(n @ ("Peter" | "Hans")) ⇒
    println(s"yay, found $n!")
}

This will print the following:

yay, found Hans!

But would also find "Peter". More commonly, I've seen this solved via a pattern guard:

val target = Option("Hans")
target match {
  case Some(n) if n == "Peter" || n == "Hans" ⇒
    println(s"yay, found $n!")
}

There you escape the pattern and can define a predicate as in a regular if statement. But in the pattern guard you can even drop the parenthesis ;)

List destructuring

Scala has additional helpers to match and destructure List values:

List(1, 3, 5) match {
  case Nil ⇒
    println("empty list!")
  case head :: tail ⇒
    println(s"head was $head, and tail $tail")
}

You can split a list into its head and tail via :: in a pattern and match on an empty list via the Nil value. This allows for concise definitions of base cases when working on lists:

def qs(lst: List[Int]): List[Int] = {
  lst match {
    case Nil | _ :: Nil ⇒
      lst
    case lst ⇒
      val p = lst(lst.size / 2)
      lst.partition(_ <= p) match {
        case (Nil, sorted) ⇒ sorted
        case (sorted, Nil) ⇒ sorted
        case (lte, gt)     ⇒ qs(lte) ++ qs(gt)
      }
  }
}

println(qs(List(3, 1, 4, 1, 2, 6, 5, 3, 5, 9)))

Which reconstructs a sorted list and prints:

List(1, 1, 2, 3, 4, 5, 3, 5, 6, 9)

The first clause uses a disjunction to match both the empty list as well as lists with a single element. Destructuring into head and tail of a list also often comes up with recursive definitions over lists.

Partial functions

Sometimes you can save some line noise by using a partial function. So instead of explicitly matching on the single argument to the closure that we pass to foreach in the following example:

val people = List(Option("Hans"), None, Option("Peter"))
people foreach { p ⇒
  p match {
    case Some(fn) ⇒ println(s"yay, found $fn!")
    case None     ⇒ println(s":(")
  }
}

You can pass a partial function that allows you to match and destructure in a more concise fashion:

val people = List(Option("Hans"), None, Option("Peter"))
people foreach {
  case Some(fn) ⇒ println(s"yay, found $fn!")
  case None     ⇒ println(s":(")
}

And both print:

yay, found Hans! :( yay, found Peter!

But keep in mind that not every receiver of a partial function knows what to do in case it isn't defined for a given input:

val people = List(Option("Hans"), None, Option("Peter"))
people foreach {
  case Some(fn) ⇒ println(s"yay, found $fn!")
}

Which will throw a MatchError at runtime.

Extractors

Case classes allow for easy matching and destructuring, but not every value can be used as easily in a match statement. Consider the following (silly) example:

import scala.concurrent._
import ExecutionContext.Implicits.global

Future.successful(3) match {
  case Future(3) ⇒ println("yay??")
  case wildcard  ⇒ println(s"caught a: $wildcard")
}

The compiler won't let that pass and complains:

error: object Future is not a case class, nor does it have an unapply/unapplySeq member

So it seems we're missing an unapply method. Let's add that. It's called an extractor and is Scala's way of adding pattern match functionality where you can't use a case class:

import scala.concurrent._
import ExecutionContext.Implicits.global
import scala.util._

object MatchFuture {
  def unapply[T](v: Future[T]): Option[Try[T]] = {
    v.value
  }
}

Future.successful(3) match {
  case MatchFuture(Success(3)) ⇒ println("yay!!")
  case wildcard                ⇒ println(s"caught a: $wildcard")
}

Which happily prints:

yay!!

An extractor generally is a method that accepts a value and returns an Option[T] to indicate whether a given value matches this pattern. This is also where the destructuring takes place: We can transform a given value as we want. In the above example we convert a Future[T] into a Try[T].

But futures are values that are (often) computed on a different thread. So what happens if we get lazy and sleep in our future computation:

import scala.concurrent._
import ExecutionContext.Implicits.global
import scala.util._

object MatchFuture {
  def unapply[T](v: Future[T]): Option[Try[T]] = {
    v.value
  }
}

Future { println("Let's sleep!"); Thread.sleep(100); 3 } match {
  case MatchFuture(Success(3)) ⇒ println("yay!!")
  case wildcard                ⇒ println(s"caught a: $wildcard")
}

This will print the following:

Let's sleep!
caught a: scala.concurrent.impl.Promise$DefaultPromise@76cec888

So we start to sleep and the value call to the future value will produce a None value because the value hasn't been computed yet. So instead we fall to the second clause and find an unfinished value.

To push this little example a bit further, you could add a blocking extractor:

import scala.concurrent._
import ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.util._

object MatchFutureBlocking {
  def unapply[T](v: Future[T]): Option[T] = {
    Try(Await.result(v, 1.seconds)).toOption
  }
}

Future { println("Let's sleep!"); Thread.sleep(100); 3 } match {
  case MatchFutureBlocking(3) ⇒ println("yay!!")
  case wildcard               ⇒ println(s"caught a: $wildcard")
}

In this case we calmly await the future value to be computed successfully and print:

Let's sleep!
yay!!

But keep in mind that the toOption call will silently hide failures that happen as the future value is being computed:

import scala.concurrent._
import ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.util._

object MatchFutureBlocking {
  def unapply[T](v: Future[T]): Option[T] = {
    Try(Await.result(v, 1.seconds)).toOption
  }
}

Future.failed[Int](new RuntimeException("blablubb")) match {
  case MatchFutureBlocking(3) ⇒ println("yay??")
  case wildcard ⇒ println(s"caught a: $wildcard")
}

This will silently fall to the second clause and print:

caught a: scala.concurrent.impl.Promise$KeptPromise@3084c696

So my example is a bit silly, but if you're interested in more useful extractors have a look at NonFatal or Duration. Matching Objects With Patterns is a nice start for more details as well.

Happy hacking!