primers cmsc 491 hadoop-based distributed computing spring 2015 adam shook some content adapted from...

35
Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Upload: giles-flynn

Post on 23-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Primers

CMSC 491Hadoop-Based Distributed Computing

Spring 2015Adam Shook

Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Page 2: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Agenda

• Distributed Computing– Evolution of Computing Infrastructure– Networking Infrastructure– Properties of Distributed Systems– Example System Architectures

• Java• Linux

Page 3: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

EVOLUTION OF COMPUTING INFRASTRUCTURE

Page 4: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Mainframe – 50s to 70s

• Custom hardware• Custom low-level specialized code

• Very expensive solutions

Page 5: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Client/Server – 80s to 00s

• IT-led architectures• More portable solutions• Scalable solutions based on demand

• Reign of the Enterprise Data Warehouse

Page 6: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Cloud – 00s to Today

• Consumer-grade infrastructure• Growing IaaS and PaaS markets• Data revolution

• Focus on applications and not infrastructure

Page 7: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Where does Hadoop fit?

• A piece of your data infrastructure– Can crunch data for analytics– Can expose data for web applications

• Exploration of raw data• Augments today’s infrastructure

• IMO, a big toolbox that can do a bit of everything

Page 8: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

NETWORKING INFRASTRUCTURE

Page 9: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Single Server

HDD

HDDCPUCPU

RAMRAM

NICNIC

Server Scale Up

Scale Out

Faster CPUsBigger Storage

More Servers

Page 10: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Local-Area Network (LAN)Rack

HDD

HDDCPUCPU

RAMRAM

NICNIC

Server

HDDHDDCPU

CPURAMRAM

NICNIC

Server

HDD

HDDCPUCPU

RAMRAM

NICNIC

Server

HDD

HDDCPUCPU

RAMRAM

NICNIC

Server

Rack

HDDHDDCPU

CPURAMRAM

NICNIC

Server

HDD

HDDCPUCPU

RAMRAM

NICNIC

Server

HDD

HDDCPUCPU

RAMRAM

NICNIC

Server

HDD

HDDCPUCPU

RAMRAM

NICNIC

Server

WAN

Gat

eway

Page 11: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Wide Area Network (WAN)

London, England

Beijing, ChinaNew York, NY

Page 12: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

PROPERTIES OF DISTRIBUTED SYSTEMS

Page 13: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Distributed Systems

• The development of low-cost powerful microprocessors, together with the invention of high speed networks, enable us to construct computer systems by connecting a large number of computers

• A distributed system is a collection of independent computers that appears to its users as a single coherent system.

Page 14: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Properties of Distributed Systems

• Reliability• Scalability• Availability• Efficiency• CAP Theorem

Page 15: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Reliability

• Can the system deliver services in face of several component failures?

Page 16: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Scalability

• Can the system scale to support a growing number of tasks?

Page 17: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Availability

• How much latency is imposed on the system when a failure occurs?

Page 18: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Efficiency

• How efficient is the system, in terms of latency and throughput?

Page 19: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

CAP Theorem

• Consistent• Available• Partition Tolerant

• Trade-off between Consistency and Availability

Page 20: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Stateful vs. Stateless

• Whether or not a distributed system saves their state on an attached device for recovery

Page 21: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

EXAMPLE SYSTEM ARCHITECTURES

Page 22: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Simple Client/Server

Page 23: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Multi-Tiered Client/Server

Page 24: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Round-Robin Client/Server

Page 25: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides
Page 26: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Java

• Object-oriented class-based programming language designed for code reuse and portability

• Programs compile to bytecode that can run on any Java Virtual Machine (JVM)

• Memory is managed for you and automatically cleaned up by the JVM’s garbage collector

• Syntax is similar to C++

Page 27: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

public class Animal {// Member Variablesprotected int age = 0;protected String species = null;

public Animal() { }

public Animal(int a, String s) {setAge(a);setSpecies(s);

}

public String getSpecies() { return species; }

public void setSpecies(String s) { this.species = s; }

public int getAge() { return age; }

public void setAge(int a) { this.age = a; }

public String toString() { return age + " " + species } ;}

Page 28: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

// Inherits all the public/protected items from Animalpublic class Human extends Animal {

// Additional member Variablesprivate String name = null

public Human(String name, int age) { super(age, "Human”);

setName(name); }

public String getName() { return name; }

public void setName(String n) { this.name = n; }

public String toString() { return name + " " +super.toString() } ;

}

Page 29: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

// Main class to be executedpublic class Main {

public static void main(String[] args) {Animal a = new Animal();a.setAge(10);a.setSpecies("Hiphopopotamus”);

System.out.println(a);

a = new Human("Adam", 85);

System.out.println(a);}

}

10 HiphopopotamusAdam 85 Human

Page 30: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

// Templated classpublic class Pair<FIRST, SECOND> {

public FIRST first;public SECOND second;

public String toString() { return first + ":" + second; }}public class Main {

public static void main(String[] args) {Pair<Integer, String> p1 = new Pair<Integer, String>();p1.first = 10;p1.second = "Rhymenocerous";

System.out.println(p1);

Pair<Human, String> p2 = new Pair<Human, String>();p2.first = new Human("Adam", 85);p2.second = "Hiphopopotamus";

System.out.println(p2);}

}

10:RhymenocerousAdam 85 Human:Hiphopopotamus

Page 31: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

101’d

• Simply scratched the surface of Java• Includes interfaces, abstract classes, lots of

libraries for data structures, networking, multi-threading, etc.

• We will be using Eclipse and Maven in this class

Page 32: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Let’s look at Maven

• sorry

Page 33: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

LINUX

Page 34: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Linux Reference

• A free and open source operating system• In this course, we live in Eclipse and the

command line• Mastery of 'vi' gets you +4 charisma

http://www.ibm.com/developerworks/library/l-lpic1-v3-103-1/http://www.linuxdevcenter.com/excerpt/LinuxPG_quickref/linux.pdf