distributed data management - tu...

Distributed Data ManagementSummer Semester 2015

TU Kaiserslautern

Prof. Dr.-Ing. Sebastian Michel

Databases and Information Systems Group (AG DBIS)

http://dbis.informatik.uni-kl.de/

The Big Data Era: Some Numbers

• Google: 15 000 PB (=15 Exabytes)

• Facebook: 300 PB

• Ebay: 90 PB

• Spotify: 10 PB

Distributed Data Management, SoSe 2015, S. Michel 2

• Google: 100 PB• Ebay: 100 PB• NSA: 29 PB• Facebook: 600 TB• Twitter: 100 TB• Spotify: 2,2 TB

Estimated Size of Data

Data Processed per DayMB = 106 BytesGB = 109 BytesTB (Terabyte) = 1012 BytesPB (Petabyte) = 1015 BytesEB (Exabyte) = 1018 Bytes

How does Data Look Like?

• Not necessarily like you got used to in database lectures: usually not nicely structured (BCNF or 3NF) relations with known schema information.

• But:

– Twitter Tweets

– Server Access Logs

– Web Pages

– Web Graph

– Huge CSV files in general (e.g., holding a “relation”)


{"created_at":"Wed Jan 21 15:21:04 +0000 2015","id":557920823764586496,"id_str":"557920823764586496","text":"#T ulsaAirport #Oklahoma Jan 21 08:53 Temperature 37\u00b0F clouds Wind NW 7 km\/h Humidity 85% .. http:\/\/t.co\ /SnC8ST3gQC","source":"\u003ca href=\"http:\/\/www.woweather.com\/USA\/TulsaIAP.htm\" rel=\"nofollow\"\u003eupd ate weather tulsa\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":nu ll,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":255167 921,"id_str":"255167921","name":"Weather Tulsa","screen_name":"wo_tulsa","location":"Tulsa","url":"http:\/\/itunes.apple.com\/app\/weatheronline\/id299504833?mt=8","description":"Weather Tulsa\n\nhttp:\/\/www.woweather.com \/USA\/Tulsa.htm","protected":false,"verified":false,"followers_count":111,"friends_count":60,"listed_count":5, "favourites_count":0,"statuses_count":33805,"created_at":"Sun Feb 20 20:31:42 +0000 2011","utc_offset":7200,"ti me_zone":"Athens","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_b ackground_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.pn g","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_ color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\ /pbs.twimg.com\/profile_images\/1249942071\/WO-20px-linien_normal.png","profile_image_url_https":"https:\/\/pbs .twimg.com\/profile_images\/1249942071\/WO-20px-linien_normal.png","default_profile":true,"default_profile_imag e":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"TulsaAirport", "indices":[0,13]},{"text":"Oklahoma","indices":[14,23]}],"trends":[],"urls":[{"url":"http:\/\/t.co\/SnC8ST3gQC","expanded_url":"http:\/\/bit.ly\/188eNcw","display_url":"bit.ly\/188eNcw","indices":[93,115]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"e n","timestamp_ms":"1421853664710"} {"created_at":"Wed Jan 21 15:21:04 +0000 2015","id":557920823877464064,"id_str":"557920823877464064","text":"An ime episode updated: Kyoukai no Kanata: Mini Theater # 6 ( http:\/\/t.co\/kjEPWveEHM ) #MalUpdater","source":"\ u003ca href=\"http:\/\/www.malupdater.com\" rel=\"nofollow\"\u003eMal Updater\u003c\/a\u003e","truncated":false ,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_ str":null,"in_reply_to_screen_name":null,"user":{"id":1049083842,"id_str":"1049083842","name":"OriginGenesis",


Big Data


The BIG Data Challenge: The 4 Vs

• Volume

– Lots of data

• Velocity

– Changing / growing data

• Variety

– Heterogeneity

• Verity

– True or not?


Addressed in this lecture

According to Gartner and others.

Showcase: Critical Volume

• Assume you got 10 TB data on disk

• Now, do some analysis of it

• With a 100MB/s disk, reading alone takes

– 100000 seconds

– 1666 minutes

– 27 hours


Need to do something about it

Distributed Data Management, SoSe 2015, S. Michel 8http://flickr.com/photos/jurvetson/157722937/

http://www.google.com/about/datacenter

Scale-out

• Many machines (hundreds, thousands)

• As opposed to scale-up, where one very powerful (single) server is used


Data Centers


source: http://www.google.com/about/datacenters/inside/index.html

Hardware Failures• Lots of machines (commodity hardware)

failure is not an exception but very common

• P[machine fails today] = 1/365• n machines: P[failure of at least 1 machine] =

1-(1-P[machine fails today])^n

– for n=1: 0.0027– for n=10: 0.02706– for n=100: 0.239– for n=1000: 0.9356– for n=10 000: ~ 1.0


source: google.com

Fallacies of Distributed Computing

1. The network is reliable

2. Latency is zero

3. Bandwidth is infinite

4. The network is secure

5. Topology doesn't change

6. There is one administrator

7. Transport cost is zero

8. The network is homogeneous


source: Peter Deutschand others at Sun

Failure Handling & Recovery

• Hardware failures happen virtually at any time

• Algorithms/Infrastructures have to compensatethat

• Replication of data, logging of state, also redundancy in task execution


Cost Numbers (=>Complex Cost Model)• L1 cache reference 0.5 ns

• L2 cache reference 7 ns

• Main memory reference 100 ns

• Compress 1K bytes with Zippy 10,000 ns

• Send 2K bytes over 1 Gbps network 20,000 ns

• Read 1 MB sequentially from memory 250,000 ns

• Round trip within same datacenter 500,000 ns

• Disk seek 10,000,000 ns

• Read 1 MB sequentially from network 10,000,000 ns

• Read 1 MB sequentially from disk 30,000,000 ns

• Send packet CA->Netherlands->CA 150,000,000 ns


Numbers source: Jeff Dean

1ns = 10-6 ms

What you will learn in this Lecture

• Most of the lecture is on processing big data– Map Reduce, NoSQL, Cloud computing

• Will operate on state of the art research results and tools

• Middle way between pure systems/tools discussion and learning how to build algorithms on top of them (see Joins over MR, n-grams, etc.)

• But also basic, fundamental techniques, like consistent hashing, PageRank, Bloom filters


Lecture Contents (Tentative)

• MapReduce

– Fundamentals

– Various algorithms on top of it

• NoSQL approaches

– E.g., Key/Value Stores

– And techniques/theory behind them (e.g., CAP theorem, BASE)

• (Distributed) Data Stream Processing

• Cloud Computing and Big Data in general


Prerequisites

• Successfully attended the information systems lecture or a similar database lectures.

• And knowledge in standard math/cs stuff, e.g., probability theory and Java/C++ coding.

• Work with systems/tools requires will to dive into APIs and installation procedures


People

• Lecturer:

– Prof. Sebastian Michel

– smichel (at) cs.uni-kl.de

• Teaching Assistants:

– MSc. Evica Milchevski and MSc. Kiril Panev

– milchevski (at) panev (at) cs.uni-kl.de




Organization & Regulations

Lecture:

Thursday

15:30 – 17:00

Room 42-110 (with at least one exception)

Exercise:

Tuesday (bi-weekly)

15:30 - 17:00

Room 46-210 (again, with one exception)

First session: April 28


Lecture Organization

• Pretty new Lecture

• On topics that are often brand new.

• Later topics are still tentative.

• Please provide feedback. E.g., too slow / too fast? Important topics you want to have covered?


Exercises

• Assignment sheet, every two weeks

• Mixture of:– Practical: Implementation (e.g., Map Reduce)

– Practical: Algorithms on “paper”

– Theory: Where appropriate (show that …)

– Brief Essay: Explain the difference of x and y (short summary)

• Need to successfully participate to be admitted to final exam

• Regulations on next slides


Regulations for Admission to Exam

• Successful participation in exercise sessions

• There will be 6 exercise sheets

• Each comprises 3 mandatory assignments

• No handing in of solutions, instead:

– Tutor asks at beginning of TA session to mark on a sheet the assignments you have solved and can present


Name Assignment 1 Assignment 2 Assignment 3

John Doe

Britney Clinton

….

Regulations for Admission to Exam (2)

• Each mark is equivalent to one point

• You need to obtain 13 points throughout the semester to get admitted to the exam

• Full point is given if solution is correct or close to it

• Zero points is given if assignment has proven incorrect to large extent

• Zero points on entire sheet will be given in case you marked an assignment solved but it is obvious you didn’t really do it (->cheating)


Exam

• Written or oral exam at the end of teaching period in semester (last week or week thereafter)

• Everything mentioned in lecture or exercises is relevant for exam. Unless explicitly stated.

• We assume you actively participated in the exercises to be prepared.


Note in Credit Points and Work

• Lecture is worth 4 ECTS points

• Each point is assumed to describe 30 hours ofwork

• 4 x 30h = 120h

• 14 weeks, makes around 9h of work each week


Registration

• If not done already, please register through the KIS system

• Registration is closing on May 10, 2015

• Without registration, no marks in TA session possible, hence, no exam qualification.


Note on Amazon Grant• We are grateful for having obtained a grant from

Amazon for using (some) of their web services (AWS)


http://aws.amazon.com

• $100 credit for your AWS account.

• To get an AWS account; register with credit card.

• Send us email with your registered email address.

• First come first serve (as amount of vouchers is limited).

Note on Amazon Grant (2)

• Due to restrictzions in terms of #vouchers and availabity of credit cards, there won‘t be any mandatory assignment on AWS.

• Just see it as a possibility to get to know AWS if you want to (and have a credit card).

• Local installations of relevant tools like MapReduceand NoSQL stores is anyway possible.

• Check out this virtual machine with lots of stuff already installed: http://hortonworks.com/hdp/downloads/


http://hortonworks.com/hdp/downloads/

Literature (Books)

• Pramodkumar J. Sadalage, Martin Fowler. NoSQLDistilled. Addison Wesley, 2012.

• Eric Redmond, Jim R. Wilson. Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement.

• Stefan Endlich et al. NoSQL: Einstieg in die Welt nichtrelationaler Web 2.0 Datenbanken. Carl Hanser Verlag, 2011. (in German)

• Tom White. Hadoop: The Definitive Guide. O’Reilly, 2012.


Literature (Books) (Cont’d)

Books on standard database topics

• R. Elmasri, S. B. Navathe. Fundamentals of Database Systems. Addison Wesley, 2006.

• R. Ramakrishnan, J. Gehrke. Database Management Systems. Mcgraw-Hill, 2002.

• H. Garcia-Molina, J. D. Ullman, J. Widom. Database Systems: The Complete Book. Prentice Hall, 2008.


Specific Literature

• Specific literature will be given throughout the lecture.

• Primarily by pointers to original research articles


MAPREDUCE (MR)


MR Motivation: Word Count


The Elwedritsch is a cryptid or mythical creature that supposedly inhabits the Palatinate of Germany. It is described as being a chicken-like creature with antlers. It also has scales instead of feathers. However, it is said that their wings are of little use. That is why they live mainly in underbrush and under vines. Sometimes Elwetritschen are depicted with antlers of a stag and their beaks often appear to be very long. In the second half of the 20th century, artists increasingly portrayed Elwetritschen as female by adding breasts. Elwetritschensupposedly originate from crossbreeding chickens, ducks, and geese with mythical wood creatures such as goblins and elves. Being a fowl, they naturally lay eggs, which as a result of descending from forest spirits, grow during breeding season. Eggs in various sizes are artistically depicted at the Elwetritschenbrunnen in Neustadtan der Weinstraße. Geographical Distribution: The area in which tales of the Elwetritsch are spread expands from the Palatinate Forest in the west of Germany towards the east across the Upper Rhine Plain to the southern parts of the Odenwald. The mythical creature also appears in the north of Baden-Württemberg. In the Main-Tauber-Kreis, where they are known as “Ilwedridsche”, the children are told that at night the creatures sleep in the crowns of the willow trees standing next to the river Tauber. In Neustadt an der Weinstraße, which is said to be the “capital” of the Elwetritsches, there is an Elwetritsche-fountain, created by Gernot Rumpf. Other sources consider Dahn in the southwestern Palatinate, which also has an Elwetritsche-fountain, Erfweiler or other villages as secret capitals of these creatures. The idea is very similar to the "snipe hunt." The Elwetritsch is supposedly very shy, but also very curious. A hunting party consists of a "Fänger" (catcher), equipped with a big potato sack and a lantern, and the "Treiber" (beaters). The catcher is led into the woods where the Elwetritsch is supposed to live, instructed to wait in a clearing with his sack and lantern, while the beaters go off, supposedly to flush out the Elwetritsch. The light of the lantern is said to be attractive to the curious creature, so it will come to investigate and will then be caught by the catcher. While he waits, everyone heads back to the pub or wherever the party had previously assembled, to wait for the catcher to realize he has been fooled

Imagine this file is several TB or PB in size!

MR Motivation: Word Count


The Elwedritsch is a cryptid or mythical creature that supposedly inhabits the Palatinate of Germany. It is described as being a chicken-like creature with antlers. It also has scales instead of feathers. However, it is said that their wings are of little use. That is why they live mainly in underbrush and under vines. Sometimes Elwetritschen are depicted with antlers of a stag and their beaks often appear to be very long. In the second half of the 20th century, artists increasingly portrayed Elwetritschen as female by adding breasts. Elwetritschensupposedly originate from crossbreeding chickens, ducks, and geese with mythical wood creatures such as goblins and elves. Being a fowl, they naturally lay eggs, which as a result of descending from forest spirits, grow during breeding season. Eggs in various sizes are artistically depicted at the Elwetritschenbrunnen in Neustadtan der Weinstraße. Geographical Distribution: The area in which tales of the Elwetritsch are spread expands from the Palatinate Forest in the west of Germany towards the east across the Upper Rhine Plain to the southern parts of the Odenwald. The mythical creature also appears in the north of Baden-Württemberg. In the Main-Tauber-Kreis, where they are known as “Ilwedridsche”, the children are told that at night the creatures sleep in the crowns of the willow trees standing next to the river Tauber. In Neustadt an der Weinstraße, which is said to be the “capital” of the Elwetritsches, there is an Elwetritsche-fountain, created by Gernot Rumpf. Other sources consider Dahn in the southwestern Palatinate, which also has an Elwetritsche-fountain, Erfweiler or other villages as secret capitals of these creatures. The idea is very similar to the "snipe hunt." The Elwetritsch is supposedly very shy, but also very curious. A hunting party consists of a "Fänger" (catcher), equipped with a big potato sack and a lantern, and the "Treiber" (beaters). The catcher is led into the woods where the Elwetritsch is supposed to live, instructed to wait in a clearing with his sack and lantern, while the beaters go off, supposedly to flush out the Elwetritsch. The light of the lantern is said to be attractive to the curious creature, so it will come to investigate and will then be caught by the catcher. While he waits, everyone heads back to the pub or wherever the party had previously assembled, to wait for the catcher to realize he has been fooled

Imagine this file is several TB or PB in size but chunked-up and spread accross many machines!

MR: Scale-out Architecture

• Many machines (hundreds, thousands)

• Data is spread across machines

• Processing tasks initiated (ideally) where data resides



Screenshot of HDFS (Hadoop/MR) FilesystemUI. Showing info on a large file of Twitter tweets/updates, stored in 509 blocks (chunks) over several machines

Map and Reduce: Key Idea

• Spread task of processing data on machines

• According to map and reduce rules/functions

• No need to deal with node failures, load balancing, etc. system takes care of this.

• Map phase: Data is put to a number of machines. Output is partitioned (grouped) by a key (e.g., a term)

• Reduce: For each key-group, data is aggregated (reduced)


Map Reduce from High Level


Intermediate Results

D MAP REDUCE

T

A

A

MAP

MAP

MAP

REDUCE

REDUCE

Result

Result

Result

Brief History of MapReduce

• First described in an article in 2004.

– MapReduce paradigm and how it is used in Google(Google file system, etc.)

– Paper by J. Dean and S. Ghemawat in 2004.

• Many MapReduce implementations

• Hadoop is arguable the most prominent one

• Will look at MR in general and Hadoopspecifically


Jeffrey Dean, Sanjay Ghemawat: MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004: 137-150

Architectural Issues

• Data lies in a distributed file system

• Block based, big chunks (usually 64MB or 128MB)

• Chunks are replicated and distributed over machines

• If possible, data processing is moved to data hosting machines.


Functional Programming: Map

Expression: map

Of type: (a -> b) -> [a] -> [b]

Definition:

• map f [] = []

• map f (x:xs) = f x : map f xs

Example (using Hugs98 Haskell):

• map (\x-> x*x) [1,2,3,4]


[1,4,9,16]

f

f

f

f

f

f

f

Map

Observation:

Execution of function f can be done fully in parallel!

Then: Output is aggregated (reduced).


Functional Programming: Reduce (aka. fold)

Expression : foldl (note: there is also foldr=right)

Of type : (a -> b -> a) -> a -> [b] -> a

Definition:

• foldl f z [] = z

• foldl f z (x:xs) = foldl f (f z x) xs

Example:

• foldl (+) 0 [1,2,3,4,5]


15

Note on “Functional Programming”

• What was commonly restricted to functionalprog. languages is getting more and more“standard”

• Python, Ruby, Scala (Java++), Clojure, C#, C++(11)

• Example, in Ruby:

[1,2,3,4,5].map{|x| x**2 } => [1, 4, 9, 16, 25]

[1,2,3,4,5].inject(0){|x,a| x+a} => 15


Going Distributed: Key Principle

• Many data chunks

• Map function on each of the chunks

• Map process outputs data with keys

=> Partitions based on keys

• Aggregate (fold/reduce) mapped data per key

• E.g., count number occurrences of each terms in set of documents.


Map Reduce from High Level


D MAP REDUCE

T

A

A

MAP

MAP

MAP

REDUCE

REDUCE

Result

Result

Result

Intermediate Results

Map and Reduce: Types

• Map (k1,v1) list(k2,v2)

• Reduce (k2, list(v2)) list(k3, v3)

• For instance:

– k1= document identifier

– v1= document content

– k2= term

– v2=count


– k3= term

– v3= final count

keys allow grouping data to

machines/tasks

Move Computation to Data

• Data is stored in a distributed file system (for Google: GFS=Google File System)

• Large chunks (blocks)

• Master node of GFS knows locations

• Can/should! initiate computation at such nodes


block

node

Computation (Workflow)

• A master node controls computation

– this is where you submit your job (task) to

– computes necessary map and reduce tasks

– selects and activates worker nodes

• Worker node

– for map; selected if possible close to data

– reduce; consumed intermediate results and creates final output


Example: Grep

• Given: file

• Want: all lines that contain certain pattern

• Map(String key, String value)

if value.contains(pattern):

emit(value, “”)

This is a map only task (no reducer; no grouping by key): output is written directly to distributed file system


MapReduce: Example Map + Count

• Data Part 1

– “One ring to rule them all, one ring to find them,

• Data Part 2

– “One ring to bring them all and in the darkness bind them.”


Map Line to Terms and Counts

{"one"=>["1", "1"],

"ring"=>["1", "1"],

"to"=>["1", "1"],

"rule"=>["1"],

"them"=>["1", "1"],

"all"=>["1"],

"find"=>["1"]}


{"one"=>["1"],"ring"=>["1"],"to"=>["1"],"bring"=>["1"],"them"=>["1", "1"],"all"=>["1"],"and"=>["1"],"in"=>["1"],"the"=>["1"],"darkness"=>["1"],"bind"=>["1"]}

Line 1

Line 2

Group by Term


{"one"=>["1", "1"],

"ring"=>["1", "1"],

….

{"one"=>["1"],"ring"=>["1"],

…

{"one"=>[["1”,”1”],[“1”]],"ring"=>[["1”,”1”],[“1”]],

…

Sum Up


{"one"=>[["1”,”1”],[“1”]],"ring"=>[["1”,”1”],[“1”]],

…

{"one"=>[“3”],"ring"=>[“3”],

…

Example: Wordcount

Map(String key, String value)

for each word w in value:

emit(w, 1)

Reduce(String key, Iterator values)

int result=0

for each v in values:

result += v

emit(result)


Note: depends also in which context you want to count, e.g.,

- overall occurrences of word in collection

-or number of documents in which word occurs

- or number of sentences in collection where word occurs

- …

Example: Inverted Index

• Given: set of documents

• Want: A -> list of document ids in which A occurs, for each term A

• How can this be computed in MapReduce?


A D61 D12 D43 D49

Example: Inverted Index

• Why useful?

– Consider Google-style query: A B C

– How to find relevant documents? Parse through all? No.

– Which documents are relevant for the result? Check (pre-computed inv. index):


A D61 D12 D43 D49

B D31 D52 D61 D49

C D43 D61 D98 D31

Example: Co-occurrences

• Given: text file

• Want: for terms a, b, how often does a and b occur close together, e.g., within sentence?

• That is, output = ([a,b], count)

• How can this be computed?


Example: Co-occurrences (Cont’d)

• Solution 1: pairs approach– mapper for string s:

• for all term pairs (a,b) in s: emit({a,b}, 1)

– reducer just aggregates counts

• Solution 2: “stripes” approach– mapper for string s:

• collect all t_i that co-occur with a

• emit (a,{t_1, t_2, …. t_n})

– reducer aggregates


Code: WordCount in Hadoop (Excerpt)

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

String line = value.toString();StringTokenizer tokenizer = new StringTokenizer(line);while (tokenizer.hasMoreTokens()) {

word.set(tokenizer.nextToken());context.write(word, one);

}}}


Code: WordCount in Hadoop (Excerpt)

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {sum += val.get();

}

context.write(key, new IntWritable(sum));

}

}


Source: http://wiki.apache.org/hadoop/WordCount

http://wiki.apache.org/hadoop/WordCount

Additional Combiner

• Map phase might output large amounts of data that could be reduced already locally

• As network bandwidth is often limiting factor

• Works for functions like: max(1,2,6,2,1,9) = max(max(1,2,6), max(2,1,9))

• Add combiner to be run on map output.

• Usually, same as reducer (code)

• Not a replacement of reducer (as it sees only local information!)


Combiner Caveats

• Note that some aggregates can’t be done locally.

– like: output if sum(value)>threshold. Why? Can’t decide that threshold crossing because it sees only local info.

• Note: this application makes still a good case for the combiner, but it should just sum up the local values and not “prune” based on threshold. So, it is different from the final reducer.

– if aggregation function is not associative “((x*y)*z=x*(y*z))” and commutative “(x*y=y*x)”

– also problematic: average (but can be fixed: reducer need to know also the number of items then)


distributed data management - tu...

Documents