2012 09-08-josug-jeff

28
1 2012.09.08 OpenStack Japan Zheng Xu openstack Open source software to build public and private clouds. Hadoop on OpenStack Swift OSC 2012 Tokyo - Experiment of using swift as storage for Apache Hadoop

Upload: xu-zheng

Post on 14-May-2015

2.575 views

Category:

Business


0 download

TRANSCRIPT

Page 1: 2012 09-08-josug-jeff

1

2012.09.08OpenStack Japan

Zheng Xu

openstackOpen source software to build public and private clouds.

Hadoop on OpenStack Swift

OSC 2012 Tokyo

- Experiment of using swift as storage for Apache Hadoop

Page 2: 2012 09-08-josug-jeff

2

Self introduction

● Software designer(engineer) for embedded system and web system(60%hobbit, 40%job).

● Major: openstack, linux, web browser, html, epub, OSS

● Contact● @xz911● https://www.facebook.com/xuzheng2001

Page 3: 2012 09-08-josug-jeff

3

Abstract

● This slide is to introduce how to use OpenStack Swift as storage service for Apache Hadoop instead of HDFS(which is storage service of Hadoop project).

● This slide is based on http://bigdatacraft.com/archives/349, and really appreciate Constantine Peresypkin and David Gruzman for providing their idea and implementation.

Page 4: 2012 09-08-josug-jeff

4

Agenda

● OpenStack Swift

● Apache Hadoop and HDFS

● Experiment of replacing HDFS by OpenStack Swift

Page 5: 2012 09-08-josug-jeff

5

What is OpenStack and Swift

From http://www.openstack.org/

Page 6: 2012 09-08-josug-jeff

6

What is OpenStack and Swift

httpUser Application

Proxy Server Proxy Server

Container Server Container Server Container Server

Object Server Object Server Object Server Object Server

Account Server Account Server Account Server

http

http

Page 7: 2012 09-08-josug-jeff

7

What is OpenStack and Swift

● OpenSource written in Python● diversity

● Swift can be a part of OpenStack or an individual service it self.

● zones, devices, partitions, and replicas● No SPOF

Page 8: 2012 09-08-josug-jeff

8

Agenda

● OpenStack Swift

● Apache Hadoop and HDFS

● Experiment of replacing HDFS by OpenStack Swift

Page 9: 2012 09-08-josug-jeff

9

Apache Hadoop and HDFS

From http://hadoop.apache.org/

Page 10: 2012 09-08-josug-jeff

10

Apache Hadoop and HDFS

User Application

Name Node

Data Node Data Node Data Node

Map-Reduce

Hive

Page 11: 2012 09-08-josug-jeff

11

Agenda

● OpenStack Swift

● Apache Hadoop and HDFS

● Experiment of replacing HDFS by OpenStack Swift

Page 12: 2012 09-08-josug-jeff

12

Experiment (Concept)

Name Node

Data Node Data Node Data Node

Map-Reduce

Hive

User Application

Page 13: 2012 09-08-josug-jeff

13

Experiment (Concept)

Data Node Data Node

Map-Reduce

Swift

User Application

Hivejava-cloudfiles java-cloudfiles

java-cloudfiles

http

Page 14: 2012 09-08-josug-jeff

14

Experiment (Software)● Swift v1.6

● https://github.com/openstack/swift.git● r21616cf, Jul 25

● Java Client java-cloudfiles● https://github.com/rackspace/java-cloudfiles● r0807fa6, Jun 4

● Apache Hadoop● 1.0.3

● Swift fs for Apache Hadoop(just part of following source code)● https://github.com/Dazo-org/hadoop-common.git (branch-0.20-security-

205.swift )

Page 15: 2012 09-08-josug-jeff

15

Experiment (infra)

192.168.0.9192.168.0.4

Page 16: 2012 09-08-josug-jeff

16

Experiment(install swift)

● Install swift based on http://docs.openstack.org/developer/swift/development_saio.html

● Do not forget to set bind_ip of proxy-server.conf● 192.168.0.9 in my case

● Suppose we have username as "test:tester" with password as "testing", the account name is AUTH_test and have some container based on steps in above Url.

Page 17: 2012 09-08-josug-jeff

17

Experiment (cloudfiles)

● Run "ant compile"● Change cloudfiles.properties to following

# Auth info

auth_url=http://192.168.0.9:8080/auth/v1.0

auth_token_name=X-Auth-Token

#auth_user_header=X-Storage-User

#auth_pass_header=X-Storage-Pass

# user properties

username=test:tester

password=testing

# cloudfs properties

version=v1

connection_timeout=15000

Page 18: 2012 09-08-josug-jeff

18

Experiment(cloudfiles)

● Connect cloudfiles to swift(this is option)● Change cloudfiles.sh as following and run it to try

connection with swift#!/bin/sh

export CLASSPATH=lib/httpcore-4.1.4.jar:lib/commons-cli-1.1.jar:lib/httpclient-4.1.3.jar:lib/commons-lang-2.4.jar:lib/junit.jar:lib/commons-codec-1.3.jar:lib/commons-io-1.4.jar:lib/commons-logging-1.1.1.jar:lib/log4j-1.2.15.jar:dist/java-cloudfiles.jar:.

java com.rackspacecloud.client.cloudfiles.sample.FilesCli $@

Page 19: 2012 09-08-josug-jeff

19

Experiment (cloudfiles)

● Packaging java-cloudfiles to jar file for Apache Hadoop (clone java-cloudfiles to ~/java-cloudfiles)● We need to put *.properties into java-cloudfiles.jar

$ ant package

$ cd cloudfiles/dist

$ cp ../*.properties .

$ rm java-cloudfiles.jar

$ jar cvf java-cloudfiles.jar ./*

Page 20: 2012 09-08-josug-jeff

20

Experiment (hadoop)● Prepare

● download hadoop to ~/hadoop-1.0.3 (newest stable version of original hadoop) and git clone https://github.com/Dazo-org/hadoop-common.git to ~/hadoop-common (old hadoop source code with swift fs plugin)

● At ~/hadoop-1.0.3 (copy java-cloudfiles and related library to hadoop lib folder)– cd lib;cp ~/java-cloudfiles/cloudfiles/dist/java-cloudfiles.jar .– cp ~/java-cloudfiles/lib/httpc* .

Page 21: 2012 09-08-josug-jeff

21

Experiment (setting hadoop)

● ./hadoop-1.0.3/src/core/core-default.xml● Add following to make hadoop can recognize

handle "swift://" schema to SwiftFileSystem class<property>

<name>fs.swift.impl</name> <value>org.apache.hadoop.fs.swift.SwiftFileSystem</value>

<description>The FileSystem for swift: uris.</description>

</property>

Page 22: 2012 09-08-josug-jeff

22

Experiment (hadoop)

● Copy implementation for swift fs to hadoop 1.0.3 and build● cp -R ../hadoop-

common/src/core/org/apache/hadoop/fs/swift ./src/core/org/apache/hadoop/fs

● ant

Page 23: 2012 09-08-josug-jeff

23

Experiment(hadoop setting)

● ./conf/core-site.xml (part1)● Add following property for example

<property>

<name>fs.swift.userName</name>

<value>test:tester</value>

</property>

Page 24: 2012 09-08-josug-jeff

24

Experiment (hadoop setting)● ./conf/core-site.xml (part2)

● Add following property for example

<property>

<name>fs.swift.userPassword</name>

<value>testing</value>

</property>

<property>

<name>fs.swift.acccountname</name>

<value>AUTH_test</value>

</property>

Page 25: 2012 09-08-josug-jeff

25

Experiment (hadoop setting)● ./conf/core-site.xml (part3)

● Add following property for example

<property>

<name>fs.swift.authUrl</name>

<value>http://192.168.0.9:8080/auth/v1.0</value>

</property>

<property>

<name>fs.default.name</name>

<value>swift://192.168.0.9:8080/v1/AUTH_test</value>

</property>

Page 26: 2012 09-08-josug-jeff

26

Experiment (check swift fs)

● At this time, we should can list account information via following command● ./bin/hadoop -fs -ls /● or ./bin/hadoop fs -put ./conf/core-site.xml

/test_container/core-site.xml (test_container is a test container created after swift installed)

Page 27: 2012 09-08-josug-jeff

27

Finally

● We installed swift for storage service of hadoop● We built origin java-cloudfiles and created

packages for hadoop● We copied fs.swift plugin from

https://github.com/Dazo-org/hadoop-common.git to new hadoop source tree and build hadoop

● We set up core-site.xml of hadoop to connect to swift via java-cloudfiles

Page 28: 2012 09-08-josug-jeff

28

Thank you for listening.