dfw big data talk on mahout recommenders
DESCRIPTION
This talk focussed on how to build recommenders using new technology and capabilities from Mahout. The key here is that recommenders can be built much more easily than you might expect.TRANSCRIPT
![Page 1: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/1.jpg)
1©MapR Technologies 2013- Confidential
Introduction to MahoutAnd How To Build a Recommender
![Page 2: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/2.jpg)
2©MapR Technologies 2013- Confidential
Me, Us
Ted Dunning, Chief Application Architect, MapRCommitter PMC member, Mahout, Zookeeper, DrillBought the beer at the first HUG
MapRDistributes more open source components for HadoopAdds major technology for performance, HA, industry standard API’s
TonightHash tag - #dfwbd #maprSee also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR
![Page 3: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/3.jpg)
3©MapR Technologies 2013- Confidential
Requested Topic For Tonight
What is Mahout? What makes it different? How can big data technology solve impossible problems? How is big data affecting the world?
![Page 4: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/4.jpg)
4©MapR Technologies 2013- Confidential
Also
What is MapR? What is MapR doing? How does MapR’s technology work? How are customers making use of MapR? How can anyone make use of MapR to solve problems?
![Page 5: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/5.jpg)
5©MapR Technologies 2013- Confidential
Oh … Also This
Detailed break-down of a live machine learning system running with Mahout on MapR
With code examples
![Page 6: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/6.jpg)
6©MapR Technologies 2013- Confidential
I may have to summarize
![Page 7: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/7.jpg)
7©MapR Technologies 2013- Confidential
I may have to summarize
just a bit
![Page 8: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/8.jpg)
8©MapR Technologies 2013- Confidential
Part 1:5 minutes of math
![Page 9: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/9.jpg)
9©MapR Technologies 2013- Confidential
Part 2:12 minutes: I want a pony
![Page 10: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/10.jpg)
10©MapR Technologies 2013- Confidential
Part 3:A working example
![Page 11: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/11.jpg)
11©MapR Technologies 2013- Confidential
What Does Machine Learning Look Like?
![Page 12: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/12.jpg)
12©MapR Technologies 2013- Confidential
What Does Machine Learning Look Like?
O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high qualityO(κ d log k) or O(d log κ log k) for larger k, looser quality
But tonight we’re going to show you how to keep it simple yet powerful…
![Page 13: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/13.jpg)
13©MapR Technologies 2013- Confidential
Comparison of Three Main ML Topics
Recommendation: – Involves observation of interactions between people taking action (users)
and items for input data to the recommender model– Goal is to suggest additional appropriate or desirable interactions– Applications include: movie, music or map-based restaurant choices;
suggesting sale items for e-stores or via cash-register receipts
![Page 14: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/14.jpg)
14©MapR Technologies 2013- Confidential
![Page 15: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/15.jpg)
15©MapR Technologies 2013- Confidential
![Page 16: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/16.jpg)
16©MapR Technologies 2013- Confidential
Part 1:A bit of math
(the math of bits)
![Page 17: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/17.jpg)
17©MapR Technologies 2013- Confidential
Mahout Math
Goals are– basic linear algebra,– and statistical sampling,– and good clustering,– decent speed,– extensibility,– especially for sparse data
But not – totally badass speed– comprehensive set of algorithms– optimization, root finders, quadrature
![Page 18: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/18.jpg)
18©MapR Technologies 2013- Confidential
Matrices and Vectors
At the core:– DenseVector, RandomAccessSparseVector– DenseMatrix, SparseRowMatrix
Highly composable API
Important ideas: – view*, assign and aggregate– iteration
m.viewDiagonal().assign(v)
![Page 19: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/19.jpg)
19©MapR Technologies 2013- Confidential
Assign? View?
Why assign?– Copying is the major cost for naïve matrix packages– In-place operations critical to reasonable performance– Many kinds of updates required, so functional style very helpful
Why view?– In-place operations often required for blocks, rows, columns or diagonals– With views, we need #assign + #views methods– Without views, we need #assign x #views methods
Synergies– With both views and assign, many loops become single line
![Page 20: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/20.jpg)
24©MapR Technologies 2013- Confidential
Examples
double alpha; a.assign(alpha);
a.assign(b, Functions.chain( Functions.plus(beta), Functions.times(alpha));
![Page 21: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/21.jpg)
26©MapR Technologies 2013- Confidential
More Examples
The trace of a matrix
Set diagonal to zero
Set diagonal to negative of row sums
![Page 22: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/22.jpg)
27©MapR Technologies 2013- Confidential
Examples
The trace of a matrix
Set diagonal to zero
Set diagonal to negative of row sums
m.viewDiagonal().zSum()
![Page 23: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/23.jpg)
28©MapR Technologies 2013- Confidential
Examples
The trace of a matrix
Set diagonal to zero
Set diagonal to negative of row sums
m.viewDiagonal().zSum()
m.viewDiagonal().assign(0)
![Page 24: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/24.jpg)
29©MapR Technologies 2013- Confidential
Examples
The trace of a matrix
Set diagonal to zero
Set diagonal to negative of row sums excluding the diagonal
m.viewDiagonal().zSum()
m.viewDiagonal().assign(0)
Vector diag = m.viewDiagonal().assign(0);diag.assign(m.rowSums().assign(Functions.MINUS));
![Page 25: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/25.jpg)
32©MapR Technologies 2013- Confidential
Clustering and Such
Streaming k-means and ball k-means– streaming reduces very large data to a cluster sketch– ball k-means is a high quality k-means implementation– the cluster sketch is also usable for other applications– single machine threaded and map-reduce versions available
SVD and friends– stochastic SVD has in-memory, single machine out-of-core and map-reduce
versions– good for reducing very large sparse matrices to tall skinny dense ones
Spectral clustering– based on SVD, allows massive dimensional clustering
![Page 26: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/26.jpg)
33©MapR Technologies 2013- Confidential
Mahout Math Summary
Matrices, Vectors– views– in-place assignment– aggregations– iterations
Functions– lots built-in– cooperate with sparse vector optimizations
Sampling– abstract samplers– samplers as functions
Other stuff … clustering, SVD
![Page 27: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/27.jpg)
34©MapR Technologies 2013- Confidential
Part 2:How recommenders work
(I still want a pony)
![Page 28: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/28.jpg)
35©MapR Technologies 2013- Confidential
Recommendations
Behavior of a crowd helps us understand what individuals will do
![Page 29: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/29.jpg)
36©MapR Technologies 2013- Confidential
Recommendations
Alice got an apple and a puppy
Charles got a bicycle
Alice
Charles
![Page 30: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/30.jpg)
37©MapR Technologies 2013- Confidential
Recommendations
Alice got an apple and a puppy
Charles got a bicycle
Bob got an apple
Alice
Bob
Charles
![Page 31: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/31.jpg)
38©MapR Technologies 2013- Confidential
Recommendations
What else would Bob like??
Alice
Bob
Charles
![Page 32: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/32.jpg)
39©MapR Technologies 2013- Confidential
Recommendations
What if everybody gets a pony?
Now what does Bob want??
Alice
Bob
Charles
![Page 33: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/33.jpg)
40©MapR Technologies 2013- Confidential
Log Files
Alice
Bob
Charles
Alice
Bob
Charles
Alice
![Page 34: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/34.jpg)
41©MapR Technologies 2013- Confidential
Log Files
u1
u3
u2
u1
u3
u2
u1
t1
t2
t3
t4
t3
t3
t1
![Page 35: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/35.jpg)
42©MapR Technologies 2013- Confidential
Log Files and Dimensions
u1
u3
u2
u1
u3
u2
u1
t1
t2
t3
t4
t3
t3
t1
t1
t2
t3
t4
Things u1 Alice
BobCharles
u3u2
Users
![Page 36: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/36.jpg)
43©MapR Technologies 2013- Confidential
History Matrix
Alice
Bob
Charles
✔ ✔ ✔
✔ ✔
✔ ✔
![Page 37: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/37.jpg)
44©MapR Technologies 2013- Confidential
Cooccurrence Matrix
1 2
1 1
1
1
2 1
![Page 38: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/38.jpg)
45©MapR Technologies 2013- Confidential
Indicator Matrix
✔
![Page 39: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/39.jpg)
46©MapR Technologies 2013- Confidential
Indicator Matrix
✔
id: t4title: puppydesc: The sweetest little puppy ever.keywords: puppy, dog, pet
indicators: (t1)
![Page 40: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/40.jpg)
47©MapR Technologies 2013- Confidential
Problems with Raw Cooccurrence
Very popular items co-occur with everything– Welcome document– Elevator music
That isn’t interesting– We want anomalous cooccurrence
![Page 41: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/41.jpg)
48©MapR Technologies 2013- Confidential
Recommendation Basics
Coocurrence
t3 not t3
t1 2 1
not t1 1 1
![Page 42: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/42.jpg)
49©MapR Technologies 2013- Confidential
Spot the Anomaly
Root LLR is roughly like standard deviations
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 2
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
0.44 0.98
2.26 7.15
![Page 43: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/43.jpg)
50©MapR Technologies 2013- Confidential
A Quick Simplification
Users who do h (a vector of things a user has done)
Also do r
User-centric recommendations(transpose translates back to things)
Item-centric recommendations(change the order of operations)
A translates things into users
![Page 44: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/44.jpg)
51©MapR Technologies 2013- Confidential
Symmetry Gives Cross Recommentations
Conventional recommendations with off-line learning
Cross recommendations
![Page 45: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/45.jpg)
52©MapR Technologies 2013- Confidential
For example
Users enter queries (A)– (actor = user, item=query)
Users view videos (B)– (actor = user, item=video)
ATA gives query recommendation– “did you mean to ask for”
BTB gives video recommendation– “you might like these videos”
![Page 46: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/46.jpg)
53©MapR Technologies 2013- Confidential
The punch-line
BTA recommends videos in response to a query– (isn’t that a search engine?)– (not quite, it doesn’t look at content or meta-data)
![Page 47: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/47.jpg)
54©MapR Technologies 2013- Confidential
Real-life example
Query: “Paco de Lucia” Conventional meta-data search results:– “hombres del paco” times 400– not much else
Recommendation based search:– Flamenco guitar and dancers– Spanish and classical guitar– Van Halen doing a classical/flamenco riff
![Page 48: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/48.jpg)
55©MapR Technologies 2013- Confidential
Real-life example
![Page 49: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/49.jpg)
56©MapR Technologies 2013- Confidential
Hypothetical Example
Want a navigational ontology? Just put labels on a web page with traffic– This gives A = users x label clicks
Remember viewing history– This gives B = users x items
Cross recommend– B’A = label to item mapping
After several users click, results are whatever users think they should be
![Page 50: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/50.jpg)
57©MapR Technologies 2013- Confidential
Nice. But we can do better?
![Page 51: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/51.jpg)
58©MapR Technologies 2013- Confidential
users
things
![Page 52: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/52.jpg)
59©MapR Technologies 2013- Confidential
users
thingtype 1
thingtype 2
![Page 53: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/53.jpg)
60©MapR Technologies 2013- Confidential
![Page 54: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/54.jpg)
61©MapR Technologies 2013- Confidential
Part 3:What about that worked example?
![Page 55: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/55.jpg)
62©MapR Technologies 2013- Confidential
http://bit.ly/18vbbaT
![Page 56: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/56.jpg)
63©MapR Technologies 2013- Confidential
SolRIndexerSolR
IndexerSolrindexing
Cooccurrence(Mahout)
Item meta-data
Indexshards
Complete history
Analyze with Map-Reduce
![Page 57: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/57.jpg)
64©MapR Technologies 2013- Confidential
SolRIndexerSolR
IndexerSolrsearchWeb tier
Item meta-data
Indexshards
User history
Deploy with Conventional Search System
![Page 58: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/58.jpg)
65©MapR Technologies 2013- Confidential
Objective Results
At a very large credit card company
History is all transactions
Development time to minimal viable product about 4 months
General release 2-3 months later
Search-based recs at or equal in quality to other techniques
![Page 59: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/59.jpg)
66©MapR Technologies 2013- Confidential
Summary
Input: Multiple kinds of behavior on one set of things
Output: Recommendations for one kind of behavior with a different set of things
Cross recommendation is a special case
![Page 60: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/60.jpg)
67©MapR Technologies 2013- Confidential
Objective Results
At a very large credit card company
History is all transactions
Development time to minimal viable product about 4 months
General release 2-3 months later
Search-based recs at or equal in quality to other techniques
![Page 61: DFW Big Data talk on Mahout Recommenders](https://reader035.vdocuments.site/reader035/viewer/2022062513/554f5bc8b4c905b9508b541c/html5/thumbnails/61.jpg)
68©MapR Technologies 2013- Confidential
Me, Us
Ted Dunning, Chief Application Architect, MapRCommitter PMC member, Mahout, Zookeeper, DrillBought the beer at the first HUGtdunning@{apache.org,maprtech.com} [email protected]
MapRDistributes more open source components for HadoopAdds major technology for performance, HA, industry standard API’s
TonightHash tag - #dfwbd #maprSee also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR