large-scale recommendation systems on just a pc

41
Large-scale Recommender Systems on Just a PC LSRS 2013 keynote (RecSys ’13 Hong Kong) Aapo Kyrölä Ph.D. candidate @ CMU http://www.cs.cmu.edu/~aky rola Twitter: @kyrpov Big Data – small machine

Upload: aapo-kyroelae

Post on 09-May-2015

3.502 views

Category:

Technology


5 download

DESCRIPTION

My keynote at the large-scale recommender systems workshop at Recsys 2013.

TRANSCRIPT

  • 1.Large-scale Recommender Systems on Just a PC LSRS 2013 keynote (RecSys 13 Hong Kong)Aapo Kyrl Ph.D. candidate @ CMU http://www.cs.cmu.edu/~akyrola Twitter: @kyrpovBig Data small machine

2. My Background Academic: 5th year Ph.D. @ Carnegie Mellon. Advisors: Guy Blelloch, Carlos Guestrin (UW) 2009 2012 + Shotgun : Parallel L1-regularized regression solver (ICML 2011). + Internships at MSR Asia (2011) and Twitter (2012) Startup Entrepreneur Habbo : founded 2000 3. Outline of this talk 1. Why single-computer computing? 2. Introduction to graph computation and GraphChi 3. Recommender systems with GraphChi 4. Future directions & Conclusion 4. Large-Scale Recommender Systems on Just a PCWhy on a single machine?Cant we just use the Cloud? 5. Why use a cluster? Two reasons: 1. One computer cannot handle my problem in a reasonable time.1. I need to solve the problem very fast. 6. Why use a cluster? Two reasons: 1. One computer cannot handle my problem in a reasonable time. Our work expands the space of feasible (graph) problems on one machine: - Our experiments use the same graphs, or bigger, than previous papers on distributed graph computation. (+ we can do Twitter graph on a laptop) - Most data not that big.1. I need to solve the problem very fast. Our work raises the bar on required performance for a complicated system. 7. Benefits of single machine systems Assuming it can handle your big problems 1. Programmer productivity Global state Can use real data for development2. Inexpensive to install, administer, less power. 3. Scalability. 8. Efficient Scaling Distributed Graph System Task 7Task 6Task 5Task 4Task 3Single-computer system (capable of big tasks)Task 2Task 1 Task 2 Task 3 Task 4 Task 5 Task 6Task 16 machines (Significantly) less than 2x throughput with 2x machines T11T10T9T8T7T6T5T4T3T2T1Task 1 Exactly 2x 2 Task Task 3 throughput with 2x Task 4 machines 5 Task Task 6 Task 10 Task 11 Task 1212 machines TimeTTimeT 9. GRAPH COMPUTATION AND GRAPHCHI 10. Why graphs for recommender systems? Graph = matrix: edge(u,v) = M[u,v] Note: always sparse graphs Intuitive, human-understandable representation Easy to visualize and explain. Unifies collaborative filtering (typically matrix based) with recommendation in social networks. Random walk algorithms. Local view vertex-centric computation 11. Vertex-Centric Computational Model Graph G = (V, E) directed edges: e = (source, destination) each edge and vertex associated with a value (user-defined type) vertex and edge values can be modified (structure modification also supported)ABDataDataDataDataDataData DataData DataDataGraphChi Aapo Kyrola12 12. Vertex-centric Programming Think like a vertex Popularized by the Pregel and GraphLab projects DataDataDataDataData{ // modify neighborhood }Data DataDataData DataMyFunc(vertex) 13. What is GraphChiBoth in OSDI12! 14. The Main Challenge of Disk-based Graph Computation: Random Access 20B edges [Gupta et al 2013]) 18. GraphChi is Open Source C++ and Java-versions in GitHub: http://github.com/graphchi Java-version has a Hadoop/Pig wrapper. If you really really want to use Hadoop. 19. RECSYS MODEL TRAINING WITH GRAPHCHI 20. Overview of Recommender Systems for GraphChi Collaborative Filtering toolkit (next slide) Link prediction in large networks Random-walk based approaches (Twitter) Talk on Wednesday. 21. GraphChis Collaborative Filtering Toolkit Developed by Danny Bickson (CMU / GraphLab Inc) Includes: Alternative Least Squares (ALS) Sparse-ALS SVD++ LibFM (factorization machines) GenSGD Item-similarity based methods PMF CliMF (contributed by Mark Levy) .See Dannys blog for more information: http://bickson.blogspot.com /2012/12/collaborativefiltering-with-graphchi.html Note: In the C++ -version. Java-version in development by a CMU team. 22. TWO EXAMPLES: ALS AND ITEM-BASED CF 23. Example: Alternative Least Squares Matrix Factorization (ALS) Task: Predict ratings for items (movies) by users. Model: Latent factor model (see next slide)Reference: Y. Zhou, D. Wilkinson, R. Schreiber, R. Pan: Large-Scale Parallel Collaborative Filtering for the Netflix Prize (2008) 24. ALS: Product Item bipartite graph 0.42.3-1.82.91.24Women on the Verge of a Nervous Breakdown 2.32.53.90.020.042.13.1413 The Celebration 8.7-3.22.80.90.22.90.04City of God4.12Wild Strawberries 5 Users rating of a movie modeled as a dot-product: La Dolce Vita 25. ALS: GraphChi implementation Update function handles one vertex a time (user or movie) For each user: Estimate latent(user): minimize least squares of dot-product predicted ratings GraphChi executes the update function for each vertex (in parallel), and loads edges (ratings) from disk Latent factors in memory: need O(V) memory. If factors dont fit in memory, can replicate to edges. and thus store on disk Scales to very large problems! 26. ALS: Performance Matrix Factorization (Alternative Least Squares) Netflix (99M edges), D=20GraphChi (Mac Mini)GraphLab v1 (8 cores)024681012MinutesRemark: Netflix is not a big problem, but GraphChi will scale at most linearly with input size (ALS is CPU bounded, so should be sub-linear in #ratings). 27. Example: Item Based-CF Task: compute a similarity score [e,g. Jaccard] for each movie-pair that has at least one viewer in common. Similarity(X, Y) ~ # common viewers Output top K similar items for each item to a file. or: create edge between X, Y containing the similarity. Problem: enumerating all pairs takes too much time. 28. Women on the Verge of a Nervous Breakdown 3 Solution: Enumerate all The Celebration triangles of the graph.New problem: how to City of God enumerate triangles if the graph does not fit in RAM? Wild Strawberries La Dolce Vita 29. Enumerating Triangles (Item-CF) Triangles with edge (u, v) = intersection(neighbors(u), neighbors(v)) Iterative memory efficient solution (next slide) 30. Algorithm: Let pivots be a subset of the vertices; Load all neighbor-lists (adjacency lists) of pivots into RAM Use now GraphChi to load all vertices from disk, one by one, and compare their adjacency lists to the pivots adjacency lists (similar to merge). Repeat with a new subset of pivots. PIVOTS 31. Triangle Counting Performance Triangle Countingtwitter-2010 (1.5B edges) GraphChi (Mac Mini) Hadoop (1636 machines)050100150200250 Minutes300350400450 32. FUTURE DIRECTIONS & FINAL REMARKS 33. Single-Machine Computing in Production? GraphChi supports incremental computation with dynamic graphs: Can keep on running indefinitely, adding new edges to the graph Constantly fresh model. However, requires engineering not included in the toolkit. Compare to a cluster-based system (such as Hadoop) that needs to compute from scratch. 34. Unified Recsys Platform for GraphChi? Working with masters students at CMU. Goal: ability to easily compare different algorithms, parameters Unified input, output. General programmable API (not just file-based) Evaluation process: Several evaluation metrics; Cross-validation, held-out data Run many algorithm instances in parallel, on same graph. Java. Scalable from the get-go. 35. DataDescriptor data denition column1 : categorical column2: real column3: key column4: categoricalInput dataAlgorithm X: Input Algorithm Input Descriptor map(input: DataDescriptor)GraphChi Preprocessoraux dataGraphChi Input 36. aux dataDiskGraphChi InputAlgorithm X Training ProgramHeld-out data (test data)Algorithm Y Training ProgramAlgorithm X Predictor training metricstest quality metricsAlgorithm Z Training Program 37. Recent developments: Disk-based Graph Computation Recently two disk-based graph computation systems published: TurboGraph (KDD13) X-Stream (SOSP13 in October) Significantly better performance than GraphChi on many problems Avoid preprocessing (sharding) But GraphChi can do some computation that XStream cannot (triangle counting and related); TurboGraph requires SSD Hot research area! 38. Do you need GraphChi or any system? Heck, for many algorithms, you can just mmap() over your (binary) adjacency list / sparse matrix, and write a for-loop. See Lin, Chau, Kang Leveraging Memory Mapping for Fast and Scalable Graph Computation on a PC (Big Data 13) Obviously good to have a common API And some algos need more advanced solutions (like GraphChi, XStream, TurboGraph) Beware of the hype! 39. Conclusion Very large recommender algorithms can now be run on just your PC or laptop. Additional performance from multi-core parallelism. Great for productivity scale by replicating. In general, good single machine scalability requires care with data structures, memory management natural with C/C++, with Java (etc.) need low-level byte massaging. Frameworks like GraphChi hide the low-level. More work needed to productize current work. 40. Thank you!Aapo Kyrl Ph.D. candidate @ CMU soon to graduate! (Currently visiting U.W) http://www.cs.cmu.edu/~akyrola Twitter: @kyrpov