scala 2.8 arrays - scala-lang.org

6
Scala 2.8 Arrays Martin Odersky, EPFL October 1, 2009 The Problem Arrays have turned out to be one of the trickiest concepts to get right in Scala. This has mostly to do with very hard constraints that clash with what’s desirable. On the one hand, we want to use arrays for interoperation with Java, which means that they need to have the same representation as in Java. This low-level representation is also useful to get high performance out of arrays. But on the other hand, arrays in Java are severely limited. First, there’s actually not a single array type representation in Java but nine different ones: One rep- resentation for arrays of reference type and another eight for arrays of each of the primitive types byte, char, short, int, long, float, double, and boolean. There is no common type for these different rep- resentations which is more specific than just java.lang.Object, even though there are some reflective methods to deal with arrays of arbitrary type in java.lang.reflect.Array. Second, there’s no way to cre- ate an array of a generic type; only monomorphic array creations are allowed. Third, the only operations supported by arrays are indexing, updates, and get length. Contrast this with what we would like to have in Scala: Arrays should slot into the collections hierar- chy, supporting the hundred or so methods that are defined on sequences. And they should certainly be generic, so that one can create an Array[T] where T is a type variable. The Past How to combine these desirables with the representation restrictions imposed by Java interoperability and performance? There’s no easy answer, and I believe we got it wrong the first time when we designed Scala. The Scala language up to 2.7.x “magically” wrapped and unwrapped arrays when required in a process called boxing and unboxing, similarly to what is done to treat primitive numeric types as ob- jects. “Magically” means: the compiler generated code to do so based on the static types of expressions. Additional magic made generic array creation work. An expression like new Array[T] where T is a type parameter was converted to new BoxedAnyArray[T]. BoxedAnyArray was a special wrapper class which changed its representation depending on the type of the concrete Java array to which it was cast. This 1

Upload: lydang

Post on 28-Jan-2017

228 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Scala 2.8 Arrays - scala-lang.org

Scala 2.8 Arrays

Martin Odersky, EPFL

October 1, 2009

The Problem

Arrays have turned out to be one of the trickiest concepts to get right in Scala. This has mostly to dowith very hard constraints that clash with what’s desirable. On the one hand, we want to use arrays forinteroperation with Java, which means that they need to have the same representation as in Java. Thislow-level representation is also useful to get high performance out of arrays. But on the other hand, arraysin Java are severely limited.

First, there’s actually not a single array type representation in Java but nine different ones: One rep-resentation for arrays of reference type and another eight for arrays of each of the primitive types byte,char, short, int, long, float, double, and boolean. There is no common type for these different rep-resentations which is more specific than just java.lang.Object, even though there are some reflectivemethods to deal with arrays of arbitrary type in java.lang.reflect.Array. Second, there’s no way to cre-ate an array of a generic type; only monomorphic array creations are allowed. Third, the only operationssupported by arrays are indexing, updates, and get length.

Contrast this with what we would like to have in Scala: Arrays should slot into the collections hierar-chy, supporting the hundred or so methods that are defined on sequences. And they should certainly begeneric, so that one can create an Array[T] where T is a type variable.

The Past

How to combine these desirables with the representation restrictions imposed by Java interoperabilityand performance? There’s no easy answer, and I believe we got it wrong the first time when we designedScala. The Scala language up to 2.7.x “magically” wrapped and unwrapped arrays when required in aprocess called boxing and unboxing, similarly to what is done to treat primitive numeric types as ob-jects. “Magically” means: the compiler generated code to do so based on the static types of expressions.Additional magic made generic array creation work. An expression like new Array[T] where T is a typeparameter was converted to new BoxedAnyArray[T]. BoxedAnyArray was a special wrapper class whichchanged its representation depending on the type of the concrete Java array to which it was cast. This

1

Page 2: Scala 2.8 Arrays - scala-lang.org

scheme worked well enough for most programs but the implementation “leaked” for certain combina-tions of type tests and type casts, as well as for observing uninitialized arrays. It also could lead to unex-pectedly low performance. Some of the problems have been described by David MacIver[Mac08b] andMatt Malone[Mal09]

Boxed arrays were also unsound when combined with covariant collections. In summary, the oldarray implementation technique was problematic because it was a leaky abstraction that was complicatedenough so that it would be very tedious to specify where the leaks were to be expected.

Exploring the Solution Space

The obvious way to reduce the amount of magic needed for arrays is to have two representations: Onewhich corresponds closely to a Java array and another which forms an integral part of Scala’s collectionhierarchy. Implicit conversions can be used to transparently convert between the two representations.This is the gist of the array refactoring proposal of David MacIver (with contributions by Stepan Koltsov)[Mac08a]. The main problem with this proposal, as I see it, is that it would force programmers to choosethe kind of array to work with. The choice would not be clear-cut: The Java-like arrays would be fast andinteroperable whereas the Scala native arrays would support a much nicer set of operations on them.With a choice like this, one would expect different components and libraries to make different decisions,which would result in incompatibilities and brittle, complex code. MacIver and Koltsov introduce somecompiler magic to alleviate this. They propose to automatically split a method taking an array as an argu-ment into two overloaded versions: one taking a Java array and one taking a generic Scala array. I believethis would solve some of the more egregious plumbing issues, but it would simply hide the problem a bitbetter, not solve it.

A similar idea—but with a slightly different slant—is to “dress up” native arrays with an implicit con-version that integrates them into Scala’s collection hierarchy. This is similar to what’s been done with theString to RichString conversion in pre-2.8 Scala. The difference to the MacIver/Koltsov proposal is thatone would not normally refer to Scala native arrays in user code, just as one rarely referred to RichString

in Scala. One would only rely on the implicit conversion to add the necessary methods and traits to Javaarrays. Unfortunately, the String/RichString experience has shown that this is also problematic. In par-ticular, in pre 2.8 versions of Scala, one had the non-intuitive property that

"abc".reverse.reverse == "abc" , yet

"abc" != "abc".reverse.reverse !

The problem here was that the reverse method was inherited from class Seq where it was defined toreturn another Seq. Since strings are not sequences, the only feasible type reverse could return whencalled on a String was RichString. But then the equals method on Strings which is inherited from Javawould not recognize that a String could be equal to a RichString.

2

Page 3: Scala 2.8 Arrays - scala-lang.org

2.8 Collections

The new scheme of Scala 2.8 solves the problems with both arrays and strings. It makes critical use ofthe new 2.8 collections framework which accompanies collection traits such as Seq with implementationtraits that abstract over the representation of the collection. For instance, in addition to trait Seq there isnow a trait

trait SeqLike[+Elem, +Repr] { ... }

That trait is parameterized with a representation type Repr. No assumptions need to be made about thisrepresentation type; in particular it not required to be a subtype of Seq. Methods such as reverse in traitSeqLike will return values of the representation type Repr rather than Seq. The Seq trait then inherits allits essential operations from SeqLike, instantiating the Repr parameter to Seq.

trait Seq[+Elem] extends ... with SeqLike[Elem, Seq[Elem]] { ... }

A similar split into base trait and implementation trait applies to most other kinds of collections, includingTraversable, Iterable, and Vector.

Integrating Arrays

We can integrate arrays into this collection framework using two implicit conversions. The firstconversion will map an Array[T] to an object of type ArrayOps, which is a subtype of typeVectorLike[T, Array[T]]. Using this conversion, all sequence operations are available for arrays at thenatural types. In particular, methods will yield arrays instead of ArrayOps values as their results. Becausethe results of these implicit conversions are so short-lived, modern VM’s can eliminate them altogetherusing escape analysis, so we expect the calling overhead for these added methods to be essentially zero.

So far so good. But what if we need to convert an array to a real Seq, not just call a Seq method on it?For this there is another implicit conversion, which takes an array and converts it into a WrappedArray.WrappedArrays are mutable Vectors that implement all vector operations in terms of a given Java array.The difference between a WrappedArray and an ArrayOps object is apparent in the type of methods likereverse: Invoked on a WrappedArray, reverse returns again a WrappedArray, but invoked on an ArrayOps

object, it returns an Array. The conversion from Array to WrappedArray is invertible. A dual implicitconversion goes from WrappedArray to Array. WrappedArray and ArrayOps both inherit from an imple-mentation trait ArrayLike. This is to avoid duplication of code between ArrayOps and WrappedArray; alloperations are factored out into the common ArrayLike trait.

Avoiding Ambiguities

So now that we have two implicit conversions from Array to ArrayLike values, how does one choosebetween them and how does one avoid ambiguities? The trick is to make use of a generalization of over-loading and implicit resolution in Scala 2.8. Previously, the most specific overloaded method or implicit

3

Page 4: Scala 2.8 Arrays - scala-lang.org

conversion would be chosen based solely on the method’s argument types. There was an additional clausewhich said that the most specific method could not be defined in a proper superclass of any of the otheralternatives. This scheme has been replaced in Scala 2.8 by the following, more liberal one: When com-paring two different applicable alternatives of an overloaded method or of an implicit, each method getsone point for having more specific arguments, and another point for being defined in a proper subclass.An alternative “wins” over another if it gets a greater number of points in these two comparisons. Thismeans in particular that if alternatives have identical argument types, the one which is defined in a sub-class wins.

Applied to arrays, this means that we can prioritize the conversion from Array to ArrayOps over theconversion from Array to WrappedArray by placing the former in the standard Predef object and by plac-ing the latter in a class LowPriorityImplicits, which is inherited from Predef. This way, calling a se-quence method will always invoke the conversion to ArrayOps. The conversion to WrappedArray will onlybe invoked when an array needs to be converted to a sequence.

Integrating Strings

Essentially the same technique is applied to strings. There are two implicit conversions: The first, whichgoes from String to StringOps, adds useful methods to class String. The second, which goes from String

to WrappedString, converts strings to sequences.

Generic Array Creation and Manifests

That’s almost everything. The only remaining question is how to implement generic array creation. Un-like Java, Scala allows an instance creation new Array[T] where T is a type parameter. How can this beimplemented, given the fact that there does not exist a uniform array representation in Java? The onlyway to do this is to require additional runtime information which describes the type T. Scala 2.8 has anew mechanism for this, which is called a Manifest. An object of type Manifest[T] provides completeinformation about the type T. Manifest values are typically passed in implicit parameters; and the com-piler knows how to construct them for statically known types T. There exists also a weaker form namedClassManifest which can be constructed from knowing just the top-level class of a type, without nec-essarily knowing all its argument types. It is this type of runtime information that’s required for arraycreation.

Here’s an example. Consider the method tabulate which forms an array from the results of applyinga given function f on a range of numbers from 0 until a given length. Up to Scala 2.7, tabulate could bewritten as follows:

def tabulate[T](len: Int, f: Int => T) = {

val xs = new Array[T](len)

for (i <- 0 until len) xs(i) = f(i)

4

Page 5: Scala 2.8 Arrays - scala-lang.org

xs

}

In Scala 2.8 this is no longer possible, because runtime information is necessary to create the right rep-resentation of Array[T]. One needs to provide this information by passing a ClassManifest[T] into themethod as an implicit parameter:

def tabulate[T](len: Int, f: Int => T)(implicit m: ClassManifest[T]) = {

val xs = new Array[T](len)

for (i <- 0 until len) xs(i) = f(i)

xs

}

As a shorthand form, a context bound1 can be used on the type parameter T instead, giving:

def tabulate[T: ClassManifest](len: Int, f: Int => T) = {

val xs = new Array[T](len)

for (i <- 0 until len) xs(i) = f(i)

xs

}

When calling tabulate on a type such as Int, or String, or List[T], the Scala compiler can create a classmanifest to pass as implicit argument to tabulate. When calling tabulate on another type parameter,one needs to propagate the requirement of a class manifest using another implicit parameter or contextbound. For instance:

def tabTen[T: ClassManifest](f: Int => T) = tabulate(10, f)

The move away form boxing and to class manifests is bound to break some existing code that generatedgeneric arrays as in the first version of tabulate above. Usually, the necessary changes simply involveadding a context bound to some type parameter.

Class GenericArray

For the case where generic array creation is needed but adding manifests is not feasible, Scala 2.8offers an alternative version of arrays in the GenericArray class. This class is defined in packagescala.collection.mutable along the following lines.

class GenericArray[T](length: Int) extends Vector[T] {

val array: Array[AnyRef] = new Array[AnyRef](length)

1Generally, a type parameter with a context bound is of the form [T: Bound]; it is expanded to plain type parameter T togetherwith an implicit parameter of type Bound[T].

5

Page 6: Scala 2.8 Arrays - scala-lang.org

...

// all vector operations defined in terms of ‘array’

}

Unlike normal arrays, GenericArrays can be created without a class manifest because they have a uni-form representation: all their elements are stored in an Array[AnyRef], which corresponds to an Object[]

array in Java. The addition of GenericArray to the Scala collection library does demand a choice from theprogrammer—should one pick a normal array or a generic array? This choice is easily answered, how-ever: Whenever a class manifest for the element type can easily be produced, it’s better to pick a nor-mal array, because it tends to be faster, is more compact, and has better interoperability with Java. Onlywhen producing a class manifest is infeasible one should revert to a GenericArray. The only place whereGenericArray is used in Scala’s current collection framework is in the sortWith method of class Seq. Acall xs.sortWith(f) converts its receiver xs first to a GenericArray, passes the resulting array to a Javasorting method defined in java.util.Arrays, and converts the sorted array back to the same type of Seqas xs. Since the conversion to an array is a mere implementation detail of sortWith, we felt that it wasunreasonable to demand a class manifest for the element type of the sequence. Hence the choice of aGenericArray.

Conclusion

In summary, the new Scala collection framework resolves some long-standing problems with arrays andwith strings. It removes a considerable amount of compiler magic and avoids several pitfalls which ex-isted in the previous implementation. It relies on three new features of the Scala language that should begenerally useful in the construction of libraries and frameworks: First, the generalization of overloadingand implicit resolution allows one to prioritize some implicits over others. Second, manifests provide typeinformation at run-time that was lost through erasure. Third, context bounds are a convenient shorthandfor certain forms of implicit arguments. These three language features will be described in more detail inseparate notes.

References

[Mac08a] David MacIver. Refactoring scala.array. Pre-SIP (Scala Improvement Proposal), October 2008.http://www.drmaciver.com/repos/scala-arrays/sip-arrays.xhtml.

[Mac08b] David MacIver. Scala arrays. Blog, June 2008. http://www.drmaciver.com/2008/06/scala-arrays.

[Mal09] Matt Malone. The mystery of the parameterized array. Blog, August 2009.http://oldfashionedsoftware.com/2009/08/05/the-mystery-of-the-parameterized-array.

6