-
Notifications
You must be signed in to change notification settings - Fork 70
Add groupBy #108
Conversation
m.foreach { case (k, v) => | ||
result = result + ((k, v.view)) | ||
} | ||
result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the implementation of groupBy
in View
: each group is represented as an ArrayBuffer
, and eventually we get view
s of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable overall, but a few questions and issues to fix or at least consider.
val m = mutable.Map.empty[K, ArrayBuffer[A]] | ||
for (elem <- coll) { | ||
val key = f(elem) | ||
val bldr = m.getOrElseUpdate(key, ArrayBuffer.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a normal ArrayBuffer
allocates too much memory. If there isn't much redundancy you could end up increasing the memory usage by about 5x (!). I am not sure we have a great alternative data structure right now, but we should think about one. At the very least I think we should have an array builder and a view of the built array so at least we can reclaim all the extra memory after construction is complete.
def empty: BitSet = new BitSet1(0L) | ||
|
||
def newBuilder(): Builder[Int, BitSet] = | ||
new ImmutableBuilder[Int, BitSet](empty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a performance bug waiting to happen. Larger immutabe bitsets should absolutely not be built this way; they should be built with a mutable bitset, and upon asking for the result the array should be handed over and/or copied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, indeed, definitely!
with MapOps[K, V, HashMap, HashMap[K, V]] | ||
with Serializable { | ||
with MapOps[K, V, HashMap, HashMap[K, V]] | ||
with Buildable[(K, V), HashMap[K, V]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't MapOps[K, V, HashMap, HashMap[K, V]]
imply Buildable[(K, V), HashMap[K, V]]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the decision of extending Buildable
can only be made at the leaf level (for each concrete collection type). For instance, Seq
does not imply Buildable
, which is good because we don’t want LazyList
to implement Buildable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's just because someone might write MapOps[K, V, Map, Map[K, V]]
that we don't want to be Buildable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Map
it probably makes sense to extend Buildable
, just like for Set
(which I am working on at the moment)
with SetOps[A, HashSet, HashSet[A]] | ||
with Serializable { | ||
with SetOps[A, HashSet, HashSet[A]] | ||
with Buildable[A, HashSet[A]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about SetOps
implying Buildable
@@ -124,5 +127,10 @@ object ListSet extends IterableFactory[ListSet] { | |||
|
|||
def empty[A]: ListSet[A] = EmptyListSet.asInstanceOf[ListSet[A]] | |||
|
|||
def newBuilder[A](): Builder[A, ListSet[A]] = | |||
new ImmutableBuilder[A, ListSet[A]](empty) { | |||
def add(elem: A): this.type = { elems = elems + elem; this } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way to push this implementation up into the trait without losing performance? There's a lot of repetition of this pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could make ImmutableBuilder[A, C]
take as parameter a (C, A) => C
function defining how to add an element to a given collection. Then at use site that would look like the following:
new ImmutableBuilder[A, ListSet[A]](empty, _ + _)
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not much better, and we want to be really careful about per-element overhead. I was thinking about pushing it up into a SetFactory
which extends IterableFactory
. Not sure if there will be performance consequences, though.
@@ -12,7 +12,7 @@ import scala.Predef.{assert, intWrapper} | |||
|
|||
/** Concrete collection type: ListBuffer */ | |||
class ListBuffer[A] | |||
extends Seq[A] | |||
extends GrowableSeq[A] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to separately say this is growable, buildable, and a builder? Also, can't every mutable collection be treated as its own builder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to separately say this is growable, buildable, and a builder?
Indeed we could probably assume that Growable
imply Buildable
. I see no example where we would have the former without also having the latter.
I decided to put the method that returns the builder in the companion object because I think that could be useful for other purposes (see #97).
can't every mutable collection be treated as its own builder?
That’s a good question. Actually, @odersky initially designed mutable collections to be their own builders. I thought that it would make the hierarchy slightly more complicated (that’s yet another type that shows up in the linearization) and also I think the public result
method of builders makes little sense on a concrete collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to separately say this is growable, buildable
Actually, in the current design Growable[A]
takes just one type parameter (the type of the elements). If we want to make Growable[A]
imply Buildable[A, C]
then we need to add the C
type parameter to Growable
and on all its subclasses. A consequence is that this C
should be supplied by a XxxOps
trait, which means that we would have a GrowableIterableOps
branch in the hierarchy that would be refined for Set
, Map
and Seq
. That would be a lot of additional traits :(
Or, could we assume that all mutable collections are Growable
? (in such a case we could make mutable.IterableOps
extend Buildable
and get rid of the mutable.Iterable
/ mutable.GrowableIterable
distinction)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot that Arrays are not growable. I guess there is a distinction there to maintain.
@Ichoran can you summarize what needs to be changed? |
The implementation is based on builders, as in the current collections.
I rebased the PR and implemented |
@julienrf - I'll try to check it this evening. |
Changes made, so original review not relevant.
Hrm, not sure if there's a way to "unreview". Dismissing the review doesn't seem to be what I wanted. |
Superseded by #111. |
Summary
groupBy
method toIterable
Buildable
(copied from the current collections), and custom implementations inView
andLazyList
getOrElse
method toMap
getOrElseUpdate
method tomutable.Map
newBuilder()
method to appropriate collection factoriesDiscussion
The current (I mean, in the current collections) implementation of
groupBy
relies on an availablenewBuilder
method.In our strawman we didn’t have such builders so I wanted to experiment with different implementations.
I compared three implementations:
empty
and acons
method to build the groupsGrowable
to build the groupsI compared the implementations with collections (
List
,ImmutableArray
,HashSet
andArrayBuffer
) of various sizes (from 1 to 7,312,102), containing elements of typeLong
. The benchmark calls thegroupBy
method with the following functionx => x % 5
. (So, the result is aMap
of 5 groups of equivalent size)The benchmarks show that for immutable collections with more than 7 elements it’s faster to go with the current implementation.
For mutable collections, the approach based on
Growable
s has the same performance as the current implementation, and it has even slightly better performance on small collections (less than 4 elements).Therefore, I decided to keep the same implementation as in the current collections (based on builders). We could override the implementation of mutable collections if you think it is worth it…
For reference, here are the different
groupBy
implementations, followed by the code of the benchmarks and the charts.