possible visions for mahout 1.0

19
© 2014 MapR Technologies 1 What’s Coming in Mahout 1.0? Ted Dunning, Chief Application Architect MapR Technologies

Upload: ted-dunning

Post on 26-Jan-2015

109 views

Category:

Technology


2 download

DESCRIPTION

These are the slides that we used to ignite the conversation with the audience at Hadoop Summit EU. Come over to the Mahout dev list to be part of the ongoing conversation.

TRANSCRIPT

Page 1: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 1

What’s Coming in Mahout 1.0?

Ted Dunning, Chief Application ArchitectMapR Technologies

Page 2: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 2

Page 3: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 3

Page 4: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 4

A typical encounter with a potential Mahout

user

Page 5: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 5

Which leads us to

the Mahout 1.0 vision

Page 6: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 6

Page 7: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 7

Page 8: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 8

Page 9: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 9

Example: Cooccurrence Analysis

Page 10: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 10

How often do items co-occur?// load distributed matrixval A = drmFromHDFS(...)

// compute co-occurrencesval C = A.t %*% A

Page 11: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 11

How often do items co-occur?// load distributed matrixval A = drmFromHDFS(...)

// compute co-occurrencesval C = A.t %*% A

Under the covers:

Optimizer rewrites the matrix multiplication and transpose operations to a TransposeSelf operator

Optimizer chooses from two physical operators for TransposeSelf

Page 12: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 12

Which items co-occur anomalously?

// compute & broadcast number // of interactions per itemval numInteractions =

drmBroadcast(A.colSums)

// create indicator matrixval I = C.mapBlock() { case (keys, block) =>

// allocate sparse block of indicator matrix val indicatorBlock = sparse(block.nrow, block.ncol) // compute indicators with loglikelihood ratio test for (row <- block)

indicatorBlock(row.index,::) = computeLLR(row,numInteractions) keys -> indicatorBlock

}

Page 13: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 13

Runtime

• prototype on Apache Spark– fast and expressive cluster

computing system– general computation graphs, in-memory primitives, rich API, interactive

shell

• future: add Stratosphere– project proposed to

Apache Incubator recently– similar to Apache Spark, adds data flow optimization and efficient out-

of-core execution

Page 14: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 14

Page 15: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 15

Page 16: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 16

How Does This Apply?

Page 17: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 17

How Can I Start?

Page 18: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 18

Q & A

@ted_dunning @mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies

Page 19: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 20