Apache Mahout Distributed Matrix Math for Machine Learning
About Me
• Senior Director of Data Science at Lucidworks (Apache Solr/Lucene, Fusion search tools)
• Formerly Chief Data Scientist, Technical Lead of Data Science Practice at Accenture
• Committer and PMC Member, Apache Mahout
• On Twitter @akm
• Email at [email protected], [email protected]
• Adversarial Learning podcast with @joelgrus at http://adversariallearning.com
Apache Mahout Recent Trends in 0.12/0.13
• Simplify and improve performance of distributed matrix-math programming
• Provide flexible computation options for software and hardware
• Enable easier and quicker new algorithm development
• Allow polyglot programming and plotting in notebooks via Apache Zeppelin
Introduction to Apache Mahout
Apache Mahout is an environment for creating scalable, performant, machine-learning applications
Apache Mahout provides:
• Mathematically expressive Scala DSL
• A collection of pre-canned math and statistics algorithms
• Interchangeable distributed engines
• Interchangeable native solvers (JVM, CPU, GPU, CUDA, or custom)
Feature Highlights in Recent Releases
• v 0.13.1, Soon — CUDA Solvers, Apache Spark 2.1/Scala 2.11 support
• New web site platform, May 2017 — Moved from ASF CMS system to Markdown and Jekyll; allows documentation pull requests to be merged in and published automatically
• v 0.13.0, Apr 2017 — GPU/CPU Solvers, algorithm framework
• v 0.12.2, Nov 2016 — Apache Zeppelin integration for notebooks and visualization
• v 0.12.0, Apr 2016 — Apache Flink backend support
• New Mahout book, Feb 2016 — ‘Apache Mahout: Beyond MapReduce’ by Dmitriy Lyubimov and Andrew Palumbo
• v 0.10.0 - Apr 2015 - Mahout-Samsara vector-math DSL, MapReduce jobs soft-deprecated, Spark backend support
Topic Overview
• Mahout-Samsara: Declarative, R-like, domain-specific language (DSL) for matrix math
• Backend-agnostic programming
• Apache Zeppelin notebooks
• Algorithm development framework (modeled after sk-learn)
• Solve on available CPU cores, single or multiple GPUs, or in the JVM
• Next steps, and how to get involved
Mahout-Samsara
Mahout-Samsara
MapReduce is dead; long live the little clip-art blue man!
Mahout-Samsara
• Mahout-Samsara is an easy-to-use domain-specific language (DSL) for large-scale machine learning on distributed systems like Apache Spark and Flink
• Uses Scala as programming/scripting environment
• Algebraic expression optimizer for distributed linear algebra
• Provides a translation layer to distributed engines
• Support for Spark and Flink DataSets, RDDs
• System-agnostic, R-like DSL; actual formula from (d)spca:
val G = B %*% B.t - C - C.t + (ksi dot ksi) * (s_q cross s_q)
Mahout-Samsara
• Mahout-Samsara computes C = A’A via row-outer-product formulation:
• Executes in a single pass over row-partitioned A
Example of an algebraic optimization
• Logical optimization
• Optimizer rewrites plan to use logical operator for Transpose-Times-Self matrix multiplication
• Single pass: multiply partitioned rows by themselves as transposed columns
• Computation of A’A:
val C = A.t %*% A
• Naïve execution
• 1st pass: transpose A (requires repartitioning of A)
• 2nd pass: multiply result with A (expensive, potentially requires repartitioning again)
Mahout-Samsara
• Mahout-Samsara computes C = A’A via row-outer-product formulation:
• Executes in a single pass over row-partitioned A
Example of an algebraic optimization
Mahout-Samsara
• Mahout-Samsara computes C = A’A via row-outer-product formulation:
• Executes in a single pass over row-partitioned A
Example of an algebraic optimization
Mahout-Samsara
• Mahout-Samsara computes C = A’A via row-outer-product formulation:
• Executes in a single pass over row-partitioned A
Example of an algebraic optimization
Mahout-Samsara
• Mahout-Samsara computes C = A’A via row-outer-product formulation:
• Executes in a single pass over row-partitioned A
Example of an algebraic optimization
Backend-Agnostic Programming
Backend-Agnostic Programming
Apache Zeppelin Notebooks
Apache Zeppelin Notebooks
• Notebooks for polyglot programming with all types of data
• Plotting with R and Python off of computed data from other tools in the same notebook
• Share variables between interpreters
• For more: https://zeppelin.apache.org
• Mahout interpreter for Zeppelin released June 2016
• Post by Trevor Grant on how to use it at https://rawkintrevo.org/2016/05/19/visualizing-apache-mahout-in-r-via-apache-zeppelin-incubating
• https://mahout.apache.org/docs/0.13.1-SNAPSHOT/tutorials/misc/mahout-in-zeppelin/
Apache Zeppelin Notebooks
Add the Mahout Interpreter
Apache Zeppelin Notebooks
Add the Mahout Interpreter, click “Create”
Apache Zeppelin Notebooks
Example usage
Apache Zeppelin Notebooks
Example usage
Apache Zeppelin Notebooks
Hand results to R for plotting
Algorithm Development Framework
Algorithm Development Framework
• Patterned after R and Python (sk-learn) APIs
• Fitter populates a Model, which contains the parameter estimates, fit statistics, a summary, and has a predict() method
• https://rawkintrevo.org/2017/05/02/introducing-pre-canned-algorithms-apache-mahout
• https://mahout.apache.org/docs/0.13.1-SNAPSHOT/tutorials/misc/contributing-algos
Solve on CPU, GPU, or JVM
Solve on CPU, GPU, or JVM
Current architecture with native CPU and GPU support and unreleased jCUDA bindings
Solve on CPU, GPU, or JVM
Initial benchmarking on latest release
Solve on CPU, GPU, or JVM
Initial benchmarking on latest release
• Sparse MMul at geometry of 1000 x 1000 %*% 1000 x 1000 density = 0.2, with 5 runs Mahout JVM Sparse multiplication time: 1501 msMahout jCUDA Sparse multiplication time: 49 ms
30x speedup
• Sparse MMul at geometry of 1000 x 1000 %*% 1000 x 1000 density = .02, with 5 runs Mahout JVM Sparse multiplication time: 34 ms Mahout jCUDA Sparse multiplication time: 4 ms
8.5x speedup
• Sparse MMul at geometry of 1000 x 1000 %*% 1000 x 1000 density = .002, with 5 runs Mahout JVM Sparse multiplication time: 1 ms Mahout jCUDA Sparse multiplication time: 1 ms
0x speedup
Solve on CPU, GPU, or JVM
• jCUDA work is still in a branch, will be in master in the next couple months
• Currently the modes of compute are JVM, CPU (using all available cores), and single GPU
• Multi-GPU is next priority
• Currently multiplication takes place in different solvers based on matrix shape (banding, triangularity, etc.)
• Directing location for data and compute based on shape and density is another priority
• Watch this space for other speedups
Next steps
How to Use Mahout and Get Involved
How to Use Mahout and Get Involved
Web: https://mahout.apache.org
Source code, PRs welcome: https://github.com/apache/mahout
Mailing lists: https://mahout.apache.org/community/mailing-lists.html
Download, install, embed: https://mahout.apache.org/downloads.html
Thank YouQ&A
h2ps://mahout.apache.org h2ps://github.com/apache/mahout