big data 2013-05-23

9
Big Data Øyvin Halfan uv CTO Whitefox AS e: [email protected] t: @oyvinht torsdag 23. mai 13

Upload: oyvin-halfdan-thuv

Post on 18-Dec-2014

150 views

Category:

Technology


0 download

DESCRIPTION

Big Data from an AI perspective.

TRANSCRIPT

Page 1: Big data 2013-05-23

Big DataØyvin Halfan !uvCTO Whitefox AS

e: [email protected]: @oyvinht

torsdag 23. mai 13

Page 2: Big data 2013-05-23

Abstract

• Short wrap up of Big Data history

• But, what’s new? Why are we here?

• What can we do now (from our couch) ?

torsdag 23. mai 13

Page 3: Big data 2013-05-23

Who am I... to talk about this?

• Ardent interest, B.Sc. in ITMaths (I particularly recommend discrete maths for Big Data!)Computational LinguisticsAI stuff!esis on data mining Unix system logs for surveillance

• M.Sc. degree in Artificial Intelligence (AI)!esis on artificial life:«Incrementally Evolving a Dynamic Neural Network for Tactile-olfactory Insect Navigation»Nature is packed with Big Data

• Intern at CERNDeveloping the search engine

Indexing (and making sense) of > 6 million documents

torsdag 23. mai 13

Page 4: Big data 2013-05-23

Mini-history

• Before ~2000Save just the stuff that could prove useful. Query/filter/select data to present it.

• After ~2000 Just store everything - it’s cheap and we can look into it later.

OLAP automates «looking».

• Gartner 2012:«Big data are high volume, high velocity, and/or high variety information assets that require new forms of processing to

enable enhanced decision making, insight discovery and process optimization.»(puh!)

Neo: Do you always look at it encoded?

Cypher: Well, you have to (...) there's way too

much information to decode the Matrix. You get

used to it. I — I don't even see the code. All I see is

blonde, brunette, redhead...Hey, you want a drink?

To much data

torsdag 23. mai 13

Page 5: Big data 2013-05-23

What’s new, then?

• Data capacity has doubled every 3-4 years since 1980‘ies!

• We used to have a small amount of interesting data

• Now we have tons of boring stuff!!

• We must handle so that we«don’t even see the code»

torsdag 23. mai 13

Page 6: Big data 2013-05-23

What’s new, then?

• We used algorithms such as apriori and ID3 for log analysis. Fine for 40MB of data per day.

• In artificial life, there could easily be this amount of data ... per minute.

• Google processed ~24PB of data per day in 2009.

• Your 1.4kg brain can interpret this slide instantly.

torsdag 23. mai 13

Page 7: Big data 2013-05-23

!is is new

• Your braincells solve one little problem each, they tell 10 other cells about the result, and then they tell 10 others ... you get it (fast!)

• Google distributes their computing ...somewhat like your brain.

• !ey called it MapReduce.Node 1

Node 1Node 1

Node 1Node 1

Node n

Map Reduce

torsdag 23. mai 13

Page 8: Big data 2013-05-23

You have it at home

• Free MapReduce-a-likes (Hadoop) are cheap in the cloud.

• MySQL is probably not a good choice for BigData analysis.

• !ere are free NoSQL-databases (Cassandra, Berkeley DB, MongoDB,++) available.

• Lots of data is freely available to play with. Analyze in the cloud.

• «!e Matrix is everywhere. It is all around us. Even now, in this very room.»

torsdag 23. mai 13

Page 9: Big data 2013-05-23

!at’s it

• Data is growing.

• More information, but harder to find among all the garbage.

• Free software exists. You can make sense of your data too.

• Unleash hidden knowledge and work smarter!

torsdag 23. mai 13