big data 2013-05-23
DESCRIPTION
Big Data from an AI perspective.TRANSCRIPT
Abstract
• Short wrap up of Big Data history
• But, what’s new? Why are we here?
• What can we do now (from our couch) ?
torsdag 23. mai 13
Who am I... to talk about this?
• Ardent interest, B.Sc. in ITMaths (I particularly recommend discrete maths for Big Data!)Computational LinguisticsAI stuff!esis on data mining Unix system logs for surveillance
• M.Sc. degree in Artificial Intelligence (AI)!esis on artificial life:«Incrementally Evolving a Dynamic Neural Network for Tactile-olfactory Insect Navigation»Nature is packed with Big Data
• Intern at CERNDeveloping the search engine
Indexing (and making sense) of > 6 million documents
torsdag 23. mai 13
Mini-history
• Before ~2000Save just the stuff that could prove useful. Query/filter/select data to present it.
• After ~2000 Just store everything - it’s cheap and we can look into it later.
OLAP automates «looking».
• Gartner 2012:«Big data are high volume, high velocity, and/or high variety information assets that require new forms of processing to
enable enhanced decision making, insight discovery and process optimization.»(puh!)
Neo: Do you always look at it encoded?
Cypher: Well, you have to (...) there's way too
much information to decode the Matrix. You get
used to it. I — I don't even see the code. All I see is
blonde, brunette, redhead...Hey, you want a drink?
To much data
torsdag 23. mai 13
What’s new, then?
• Data capacity has doubled every 3-4 years since 1980‘ies!
• We used to have a small amount of interesting data
• Now we have tons of boring stuff!!
• We must handle so that we«don’t even see the code»
torsdag 23. mai 13
What’s new, then?
• We used algorithms such as apriori and ID3 for log analysis. Fine for 40MB of data per day.
• In artificial life, there could easily be this amount of data ... per minute.
• Google processed ~24PB of data per day in 2009.
• Your 1.4kg brain can interpret this slide instantly.
torsdag 23. mai 13
!is is new
• Your braincells solve one little problem each, they tell 10 other cells about the result, and then they tell 10 others ... you get it (fast!)
• Google distributes their computing ...somewhat like your brain.
• !ey called it MapReduce.Node 1
Node 1Node 1
Node 1Node 1
Node n
Map Reduce
torsdag 23. mai 13
You have it at home
• Free MapReduce-a-likes (Hadoop) are cheap in the cloud.
• MySQL is probably not a good choice for BigData analysis.
• !ere are free NoSQL-databases (Cassandra, Berkeley DB, MongoDB,++) available.
• Lots of data is freely available to play with. Analyze in the cloud.
• «!e Matrix is everywhere. It is all around us. Even now, in this very room.»
torsdag 23. mai 13
!at’s it
• Data is growing.
• More information, but harder to find among all the garbage.
• Free software exists. You can make sense of your data too.
• Unleash hidden knowledge and work smarter!
torsdag 23. mai 13