Friday, July 25, 2008

A Short Road to TCPoly(morphism) in Scala

In my previous post, I talked about adding rich methods to Java Collections with the help of traits. If you go back and look at that post, you will see that I stayed well away from rich methods of a certain kind: those that return a new Collection when they are applied to an existing Collection. Examples of such methods are: map, flatMap, and filter.

Why did I not talk about methods of this nature? Well - because they would have led me down a short and slippery slope to TCPoly (aka Type Constructor Polymorphism )!

I'm going to navigate down that slope today...

But first, a quick recap. Here's a trait that we saw in the previous post:
trait RichIterableOps[A] {

// required method
def iterator: Iterator[A]

def foreach(f: A => Unit) = {
val iter
= iterator
while (iter.hasNext) f(iter.next)
}

def foldLeft[B](seed: B)(f: (B, A) => B) = {
var result = seed
foreach(e => result = f(result, e))
result
}
}
The job of this trait is to add rich operations to an Iterable, which is anything that has an iterator method; Java Collection classes are Iterables.

Let's say that we now want to add a map method to this trait:
  def map[B](f: A => B): ??? = { /* todo */}
Right away, I'm faced with the choice of trying to figure out the return type of the map method. Some requirements for this return type are:
  • Methods available on the original Collection type that mixes in this trait should be available via the return type. So, for example, if we mix the trait into a Set, and then call map on this Set, the methods of the Set interface/type should be available to us via the return type of map.
  • We should be able to chain rich methods.
  • Equality should work in a natural fashion for the enriched Collection.
Let me capture these requirements as (JUnit-4) tests:
class TestRichCollections {
val richSet = new RichHashSet[Int]

@Before
def setupSet: Unit = {
richSet.add(1); richSet.add(2); richSet.add(3)
}

@Test
def testFoldForSet = {
assertEquals(6, richSet.foldLeft(0)((x,y) => x+y))
}

@Test
def testOriginalCollectionOpForSet = {
assertTrue(richSet.map(e => 2*e).contains(4))
}

@Test
def testChainedRichOpForSet = {
assertTrue(richSet.map(e => 2*e).map(e => 2*e).contains(8))
}

@Test
def testEqualityForSet = {
val expected = new HashSet[Int]
expected.add(2); expected.add(4); expected.add(6)
assertEquals(expected, richSet.map(e => 2*e))
}
}
So, based on the set of requirements before us, what can we say about the return type of RichIterableOps.map?

It seems pretty clear that the return type of map should be:
  • a subtype of RichIterableOps: to allow chaining of rich methods.
  • a subtype of the Collection into which RichIterableOps is mixed: to make the methods of that Collection available via the return type.
A potential solution is to get Collections that mix in this trait to redefine the map method, to make it return the appropriate type. But this is highly sub-optimal, because it forces every mixing-in Collection to do substantial extra work. Plus, if such a Collection is accessed via a supertype variable, the redefined return type is not available via this variable without type-casting.

Ideally, I want to capture the logic of the map method right within the RichIterableOps trait. I also want to be able to say that the return type of map is a Container with certain constraints; subclasses should be able to specify the exact type of the Container.

This is precisely what TCPoly allows me to do.

So what exactly is TCPoly? To give a clear definition of that term, I first need to define a couple of other terms:

Parametric Polymorphism: a mechanism that allows us to abstract over types using type paramaters, so that a single piece of code can work with multiple types. An example of this is:
class List[A] {}
In the above code snippet, class List employs parametric polymorphism to abstract over the type of its members, and makes use of a type parameter: A.
This mechanism is known by the name Generics in the Java world.

Type Constructor: A type that is used to construct another (proper) type. In the above code snippet, List is a type constructor. It can be used, for example, to construct proper types like List[Int] and List[String]. A type constructor is also a higher-kinded type. Type constructors cannot have instances; only proper types can have instances.

Given these definitions, here, finally, is the definition of TCPoly:

Type Constructor Polymorphism: a form of Parametric Polymorphism which allows Type Constructors as type parameters. TCPoly also allows Type Constructors as abstract type members.

As far as the definition of TcPoly is concerned, that is it! But let me dig a little deeper...

Without TcPoly, the following is invalid:
class C1[A[_]]
Here, I'm trying to say that C1 takes a type parameter that itself needs a type parameter to denote a proper type.

TCPoly enables this kind of definition.

So how does that help me with giving RichIterableOps.map an appropriate return type? Here's a first attempt at answering that question:
import java.util.Set
import java.util.Collection
import java.util.HashSet
import java.util.LinkedHashSet
import java.util.Iterator

// a builder for containers of type C with element A
trait Builder[C[_], A] {
def add(el: A): Boolean
def build(): C[A]

def using(op: Builder[C, A] => Unit): C[A] = {
op(this); build()
}
}

trait RichIterableOps[A, Iterable[_], Container[X] <: RichIterableOps[X, Iterable, Container] with Iterable[X]] {

def iterator: Iterator[A]
def builder[T]: Builder[Container, T]

def foreach(f: A => Unit) = {
val iter = iterator
while (iter.hasNext) f(iter.next)
}

def foldLeft[B](seed: B)(f: (B, A) => B) = {
var result = seed
foreach(e => result = f(result, e))
result
}

def map[B](f: A => B): Container[B] = builder.using { b =>
foreach(e => b.add(f(e)))
}
}

class RichHashSet[A] extends HashSet[A] with RichIterableOps[A, HashSet, RichHashSet] {
def builder[T] = new RichHashSet[T] with Builder[RichHashSet, T] {def build() = this}
}

The important points in the above code fragement are:
  • RichIterableOps now uses a Type Constructor, Container[_], as a type parameter. Container[_] stands for the return type of the map method. The ability to do this, as enabled by TCPoly, is the key to the whole problem (of returning an appropriate type from map).
  • RichIterableOps uses another Type Constructor, Iterable[_], as a type parameter. This type parameter stands for the Collection/Iterable into which RichIterableOps is mixed, and is used to enforce the property that the Container should be a subtype of the mixing-in Collection; this is show next.
  • Bounds on the Container capture our requirements for the return type of map:
    Container[X] <: RichIterableOps[X, Iterable, Container] with Iterable[X].
  • map now returns the Container:
    def map[B](f: A => B): Container[B]
  • Subclasses specify the precise type of the Container:
    class RichHashSet[A] extends HashSet[A] with RichIterableOps[A, HashSet, RichHashSet]
  • RichIterableOps makes use of a builder to create the Container; subclasses supply an appropriate builder:
    class RichHashSet[A] extends HashSet[A] with RichIterableOps[A, HashSet, RichHashSet] {
    def builder[T] = new RichHashSet[T] with Builder[RichHashSet, T] {def build() = this}
    }
The code above, which employs TCPoly and uses Type Constructors as type parameters, allows us to do exactly what we want to do with the return type of map (and other higher order functions that return Collections). But it has one drawback: it is a little difficult to understand because of the use of a self-reference for specifying the bounds of the Container. Let me re-write the code using an abstract type member as opposed to a type parameter:
trait RichIterableOps[A, Iterable[_]] {

type Container[X] <: RichIterableOps[X, Iterable] with Iterable[X]

def iterator: Iterator[A]
def builder[T]: Builder[Container, T]

def foreach(f: A => Unit) = {
val iter = iterator
while (iter.hasNext) f(iter.next)
}

def foldLeft[B](seed: B)(f: (B, A) => B) = {
var result = seed
foreach(e => result = f(result, e))
result
}

def map[B](f: A => B): Container[B] = builder.using { b =>
foreach(e => b.add(f(e)))
}
}

class RichHashSet[A] extends HashSet[A] with RichIterableOps[A, HashSet] {
type Container[X] = RichHashSet[X]
def builder[T] = new RichHashSet[T] with Builder[Container, T] {def build() = this}
}

This version definitely looks easier to read. Also, given that RichIterableOps is designed to be mixed-in with a Collection into a subclass, the use of an abstract type member (which forces the use of a subclass) is not a burden.

The key difference here from the previous code fragment is that Container[_] is now an abstract type member as opposed to a type parameter.

Let me run my tests to make sure that the code does what it is supposed to do:



That looks good.

RichIterableOps should also work with Collections other than Sets. Let me try it with an ArrayList:
class RichArrayList[A] extends ArrayList[A] with RichIterableOps[A, ArrayList] {
type Container[X] = RichArrayList[X]
def builder[T] = new RichArrayList[T] with Builder[Container, T] {def build() = this}
}
The tests for this are:
class TestRichCollections {
// [snip]

val richList = new RichArrayList[Int]

@Before
def setupList: Unit = {
richList.add(1); richList.add(2); richList.add(3)
}

@Test
def testFoldForList = {
assertEquals(6, richList.foldLeft(0)((x,y) => x+y))
}

@Test
def testOriginalCollectionOpForList = {
assertTrue(richList.map(e => 2*e).contains(4))
}

@Test
def testChainedRichOpForList = {
assertTrue(richList.map(e => 2*e).map(e => 2*e).contains(8))
}

@Test
def testEqualityForList = {
val expected = new ArrayList[Int]
expected.add(2); expected.add(4); expected.add(6)
assertEquals(expected, richList.map(e => 2*e))
}
}
Let me run the full set of tests:



That's exactly what we wanted!

Conclusion


The availability of good Data-Structures/Collections is an important aspect of the 'power' of a programming language. As we saw in this post, TCPoly quickly becomes useful when we want to provide powerful Collection operations like map, flatMap, and filter in the context of a statically typed and parametrically polymorphic language like Scala. Without TCPoly, Collections in Scala would have to resort to code duplication and type-casting to support such operations. We saw this problem when I tried to add the map method to existing Java Collections via a trait.

TCPoly provides an elegant solution to this problem. Using TCPoly, we saw how I was able to extend RichIterableOps with a map method that abstracts over its return type. We saw this method working without any problem, and without any code duplication, in the context of two very different types of Java Collections: a HashSet and an ArrayList.

Relevant Reads:
Adriaan Moors's TCPoly Page

No comments: