possible visions for mahout 1.0
DESCRIPTION
These are the slides that we used to ignite the conversation with the audience at Hadoop Summit EU. Come over to the Mahout dev list to be part of the ongoing conversation.TRANSCRIPT
© 2014 MapR Technologies 1
What’s Coming in Mahout 1.0?
Ted Dunning, Chief Application ArchitectMapR Technologies
© 2014 MapR Technologies 2
© 2014 MapR Technologies 3
© 2014 MapR Technologies 4
A typical encounter with a potential Mahout
user
© 2014 MapR Technologies 5
Which leads us to
the Mahout 1.0 vision
© 2014 MapR Technologies 6
© 2014 MapR Technologies 7
© 2014 MapR Technologies 8
© 2014 MapR Technologies 9
Example: Cooccurrence Analysis
© 2014 MapR Technologies 10
How often do items co-occur?// load distributed matrixval A = drmFromHDFS(...)
// compute co-occurrencesval C = A.t %*% A
© 2014 MapR Technologies 11
How often do items co-occur?// load distributed matrixval A = drmFromHDFS(...)
// compute co-occurrencesval C = A.t %*% A
Under the covers:
Optimizer rewrites the matrix multiplication and transpose operations to a TransposeSelf operator
Optimizer chooses from two physical operators for TransposeSelf
© 2014 MapR Technologies 12
Which items co-occur anomalously?
// compute & broadcast number // of interactions per itemval numInteractions =
drmBroadcast(A.colSums)
// create indicator matrixval I = C.mapBlock() { case (keys, block) =>
// allocate sparse block of indicator matrix val indicatorBlock = sparse(block.nrow, block.ncol) // compute indicators with loglikelihood ratio test for (row <- block)
indicatorBlock(row.index,::) = computeLLR(row,numInteractions) keys -> indicatorBlock
}
© 2014 MapR Technologies 13
Runtime
• prototype on Apache Spark– fast and expressive cluster
computing system– general computation graphs, in-memory primitives, rich API, interactive
shell
• future: add Stratosphere– project proposed to
Apache Incubator recently– similar to Apache Spark, adds data flow optimization and efficient out-
of-core execution
© 2014 MapR Technologies 14
© 2014 MapR Technologies 15
© 2014 MapR Technologies 16
How Does This Apply?
© 2014 MapR Technologies 17
How Can I Start?
© 2014 MapR Technologies 18
Q & A
@ted_dunning @mapr maprtech
Engage with us!
MapR
maprtech
mapr-technologies
© 2014 MapR Technologies 20