apache mahoutarchive.apachecon.com/eu2008/program/materials/mahout... · 2011-09-20 · gsoc @...
TRANSCRIPT
![Page 1: Apache Mahoutarchive.apachecon.com/eu2008/program/materials/mahout... · 2011-09-20 · GSoC @ Mahout DRAFT “Implementing Logistic Regression in Mahout” “Codename Mahout.GA](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea44987c370600c391878c8/html5/thumbnails/1.jpg)
DRAFT
Apache Mahout
Bringing Machine Learning to Industrial Strength
Presented by: Isabel Drost(neofonie GmbH)
![Page 2: Apache Mahoutarchive.apachecon.com/eu2008/program/materials/mahout... · 2011-09-20 · GSoC @ Mahout DRAFT “Implementing Logistic Regression in Mahout” “Codename Mahout.GA](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea44987c370600c391878c8/html5/thumbnails/2.jpg)
DRAFTProblem Setting● Huge amounts of data at our fingertips.
● Need means to deal with all the data.
Mail archives
Search engine logs
News articlesInformation on proteins
Source control logs
Social network graphsTraffic data
Corporate job postings
Wiki collaboration data
Pictures tagged with topics
Tagged and rated videos
Web pages
![Page 3: Apache Mahoutarchive.apachecon.com/eu2008/program/materials/mahout... · 2011-09-20 · GSoC @ Mahout DRAFT “Implementing Logistic Regression in Mahout” “Codename Mahout.GA](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea44987c370600c391878c8/html5/thumbnails/3.jpg)
DRAFTProblem Setting● Nature generates data. ● Archimedes generates
model.
Density of ObjectDensity of Fluid
=.
WeightWeight−Apparent immersed weight
![Page 4: Apache Mahoutarchive.apachecon.com/eu2008/program/materials/mahout... · 2011-09-20 · GSoC @ Mahout DRAFT “Implementing Logistic Regression in Mahout” “Codename Mahout.GA](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea44987c370600c391878c8/html5/thumbnails/4.jpg)
DRAFTProblem Setting● Nature generates data. ● ML generates models.
![Page 5: Apache Mahoutarchive.apachecon.com/eu2008/program/materials/mahout... · 2011-09-20 · GSoC @ Mahout DRAFT “Implementing Logistic Regression in Mahout” “Codename Mahout.GA](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea44987c370600c391878c8/html5/thumbnails/5.jpg)
DRAFTWhere is ML used already?● Search result clustering.
● “Did you mean” feature.
● Auto completion.
● Language detection.
● Analysis of tags.
![Page 6: Apache Mahoutarchive.apachecon.com/eu2008/program/materials/mahout... · 2011-09-20 · GSoC @ Mahout DRAFT “Implementing Logistic Regression in Mahout” “Codename Mahout.GA](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea44987c370600c391878c8/html5/thumbnails/6.jpg)
DRAFTOnce upon a time● How it all began:
– Summer 2007: Crazy developers needed scalable ML.
– Mailing list and wiki followed quickly.
● Contacted people – from research.
– from related Apache projects.
● Rather large community even before project start.
● 25.01.2008: Project Mahout launched.
![Page 7: Apache Mahoutarchive.apachecon.com/eu2008/program/materials/mahout... · 2011-09-20 · GSoC @ Mahout DRAFT “Implementing Logistic Regression in Mahout” “Codename Mahout.GA](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea44987c370600c391878c8/html5/thumbnails/7.jpg)
DRAFTWho we are
Grant IngersollLucene PMC
Ted DunningThe Veoh guy
Isabel Drost(that would be myself)Otis Gospodetnic – Lucene
Erik Hatcher – Lucene (among others)
Dawid WeissCarrot2
Karl WettinLucene
Jeff EastmanWelcome!
![Page 8: Apache Mahoutarchive.apachecon.com/eu2008/program/materials/mahout... · 2011-09-20 · GSoC @ Mahout DRAFT “Implementing Logistic Regression in Mahout” “Codename Mahout.GA](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea44987c370600c391878c8/html5/thumbnails/8.jpg)
DRAFTOur Mission● Build learning algorithms that are scalable.
● Context:
Hadoop – parallelization
Hama – matrix support Lucene – provides the use cases
![Page 9: Apache Mahoutarchive.apachecon.com/eu2008/program/materials/mahout... · 2011-09-20 · GSoC @ Mahout DRAFT “Implementing Logistic Regression in Mahout” “Codename Mahout.GA](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea44987c370600c391878c8/html5/thumbnails/9.jpg)
DRAFTInitial contributions● kMeans implementation.
– Started with nonparallel version.
– Ported to Hadoop already.
● Matrix computation package.– Building block of many machine learning algorithms.
– Together with Hama towards parallel matrices.
![Page 10: Apache Mahoutarchive.apachecon.com/eu2008/program/materials/mahout... · 2011-09-20 · GSoC @ Mahout DRAFT “Implementing Logistic Regression in Mahout” “Codename Mahout.GA](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea44987c370600c391878c8/html5/thumbnails/10.jpg)
DRAFTInitial Contributions● Work on Naïve Bayes, Perceptron, PLSI/EM.
● Integrate Taste– Collaborative filtering project at sourceforge.
– Item based recommendation.
– User based recommendation.
![Page 11: Apache Mahoutarchive.apachecon.com/eu2008/program/materials/mahout... · 2011-09-20 · GSoC @ Mahout DRAFT “Implementing Logistic Regression in Mahout” “Codename Mahout.GA](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea44987c370600c391878c8/html5/thumbnails/11.jpg)
DRAFTGSoC @ Mahout● “Implementing Logistic Regression in Mahout”
● “Codename Mahout.GA for mahoutmachinelearning”
● “The Implementation of Support Vector Machine Algorithm at Hadoop Platform“
● “Application to participate in Mahout”
● “Mahout application (Neural Networks)”
● “DeCoDe A smart code search engine based on lucene to show how Mahout work.”
● “Applying for mahout machine learning (Neural Networks)”
● “Implementation of the Principal Compenents Analysis algorithm for Mahout“
● “MAHOUT Naive Bayes implementation“
![Page 12: Apache Mahoutarchive.apachecon.com/eu2008/program/materials/mahout... · 2011-09-20 · GSoC @ Mahout DRAFT “Implementing Logistic Regression in Mahout” “Codename Mahout.GA](https://reader033.vdocuments.site/reader033/viewer/2022042203/5ea44987c370600c391878c8/html5/thumbnails/12.jpg)
DRAFTConclusions● This is just the beginning: Even the logo is a draft :)
● High demand for scalable machine learning.
● We need you – in case you have:– A good deal of enthusiasm.
– Solid mathematical knowledge to understand ML papers.
– Either proficient in or willing to learn about Hadoop.
– Or: A lot of data and want to know what to learn from it.
● mahout[email protected] mahout[email protected]