하둡알아보기, netapp fas nfs connector for hadoop

하둡 알아보기

백승용2016/09/09

© 2016 NetApp, Inc. All rights reserved. 1

Agenda

1. 하둡 개요

2. 하둡 기본 구성 – 3 노드 구성

3. 하둡 샘플 테스트(MapReduce) WordCount, TeraGen, TeraSort, TeraValidate

4. NetApp 하둡 커넥터 구성


Subtitle text placeholder

하둡 개요


하둡 개요Apache Hadoop 이란


What Is Apache Hadoop?The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

The project includes these modules:

Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput

access to application data. Hadoop YARN: A framework for job scheduling and cluster resource management. Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

하둡 개요Apache Hadoop 연관 프로젝트


Other Hadoop-related projects at Apache include: Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes

support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambarialso provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.

Avro™: A data serialization system. Cassandra™: A scalable multi-master database with no single points of failure. Chukwa™: A data collection system for managing large distributed systems. HBase™: A scalable, distributed database that supports structured data storage for large tables. Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout™: A Scalable machine learning and data mining library. Pig™: A high-level data-flow language and execution framework for parallel computation. Spark™: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming

model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.

Tez™: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive™, Pig™ and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop™ MapReduce as the underlying execution engine.

ZooKeeper™: A high-performance coordination service for distributed applications.

http://incubator.apache.org/ambari/

http://avro.apache.org/

http://cassandra.apache.org/

http://incubator.apache.org/chukwa/

http://hbase.apache.org/

http://hive.apache.org/

http://mahout.apache.org/

http://pig.apache.org/

http://spark.incubator.apache.org/

http://tez.incubator.apache.org/

http://zookeeper.apache.org/

빅 데이터 플랫폼??


데이터 저장,처리, 관리 데이터 분석, 시각화데이터 수집, 통합, 정제

데이터를 분석하고 사용자가사용할 수 있는 형태로 가공하는기술

분석, 시각화

R, SAS, SPSS, Tableau, Fusion Tables, Gephi, Tag Cloud등

마이닝, 알고리즘

텍스트 마이닝, 오피니언 마이닝, 리얼리티 마이닝, 군집화, 그래프마이닝, SNS 분석, 머신 러닝, Mahout, NLTK, OpenNLP, BolierPipe, WEKA등

통합된 데이터를 저장하고분산처리 및 관리하는 기술

NoSQL

HBase, DynamoDB, MongoDB, CouchDB, Cassandra, Hypertable, Riak, Redis, Voldermort

처리(분산, 배치, 실시간등), 관리

Hadoop(MapReduce), Ambari, Spark, Storm ZooKeeper, Pig, Hive, Mrjob, Azkaban, Oozie, Solr, ElastricSearch, Cascading, Cascalog등

파일시스템

HDFS, S3, NFS, GPFS등

정형, 반정형, 비정형의 소스데이터를 분석 시스템으로통합하고 분석에 용이한 형태로가공하는 기술

Flume, Chukwa, Scribe – 로그 수집

SQOOP(SQL to HADOOP) – RDBMS와NoSQL의 연동

Nutch – 웹 크롤링

Kafka – 메시지 전송 및 수집

OpenRefine – 대용량 데이터 정제

Thrift – 비정형 데이터 정형화 및 관리

Avro – 데이터 직렬화 등

ClouderaCDH – Cloudera Hadoop


CDH is the most complete, tested, and popular distribution of Apache Hadoop and related projects. CDH delivers the core elements of Hadoop – scalable storage and distributed computing – along with a Web-based user interface and vital enterprise capabilities. CDH is Apache-licensed open source and is the only Hadoop solution to offer unified batch processing, interactive SQL and interactive search, and role-based access controls.

HortonworksHDP – Hortonworks Data Platform


HDP is the industry's only true secure, enterprise-ready open source Apache™ Hadoop® distribution based on a centralized architecture (YARN). HDP addresses the complete needs of data-at-rest, powers real-time customer applications and delivers robust analytics that accelerate decision making and innovation.

MAPRMapR Converged Data Platform


The MapR Converged Data Platform integrates Hadoop and Spark with real-time database capabilities, global event streaming, and scalable enterprise storage to power a new generation of big data applications. The MapR Platform delivers enterprise grade security, reliability, and real-time performance while dramatically lowering both hardware and operational costs of your most important applications and data.

Hadoop Echosystem


https://www.mapr.com/products/open-source-engineshttp://blrunner.com/99

하둡 기본 구성


하둡 기본 구성목표 구성도 – 3 노드 구성


hostname: hadoop01

OS: CentOS 7

가상메모리: 1GB

IP: 192.168.2.191

Role: Master NameNode, DataNode

Virtual Machine

hostname: hadoop02

OS: CentOS 7


IP: 192.168.2.192

Role: Secondary NameNode, DataNode

Virtual Machine

hostname: hadoop03

OS: CentOS 7


IP: 192.168.2.193

Role: DataNode

Virtual Machine

DOT: 8.3.2

SVM: hadoop

NFS IP: 192.168.2.194

ONTAP Simulator

테스트 환경: VMWare Workstation

하둡 기본 구성하둡 구성 수순 개요


1. OS 설치 2. 사용자 추가 (3. JAVA 설치) 4. OS 환경 설정

CentOS 기본 OpenJDK 또는

별도 설치도 가능(Optional)

5. 하둡 설치 6. 하둡 구성 파일수정 7. 하둡 복사 8. HDFS 생성 및

결과 확인

NameNode 에서 수행 NameNode 에서 수행 NameNode DataNode 로 복사 NameNode 에서 수행

OS 설치 시에 추가 가능(hadoop 사용자 추가)

1. /etc/hosts

2. .bashrc

3. ssh 접속 환경 구성

4. 방화벽 해제

1. Basic Server with GUI

2. JAVA

하둡 기본 구성1. OS 설치 및 2. 사용자 추가


1. VMware 또는 물리 서버 환경- Basic Server with GUI와 JAVA 설치

2. OS 설치 시에 또는 OS 설치 후에 사용자 추가

# useradd hadoop# passwd hadoop

3. 간혹, 일부 JAVA 패키지가 없을 경우 수동 설치

# rpm -qa |grep java-1.x.0# yum install java-1.8.0-openjdk-devel.x86_64

4. 필요시에 OS 업데이트

# yum update

5. VMware인 경우, Clone 활용 가능 Clone 후에, “별첨 1. 기타리눅스 설정”을 참고하여 호스트명과 IP만 수정

하둡 기본 구성3. JAVA 설치


1. 하둡은 자바 기반의 프레임 워크로 자바 설치는 필수

2. 오라클 자바와 리눅스에 보통 기본 탑재되는 OpenJDK와테스트가 되었음

3. CentOS 설치 시에, JAVA 패키지 선택하면 기본 설치 됨

4. 필요시, 별도 오라클 자바 설치후에 OS 환경변수 설정

5. 하둡 2.6.x 는 JAVA 6까지만 지원되며, 2.7.x가 JAVA 7이상 지원

하둡 기본 구성4. OS 환경 설정 – /etc/hosts, $HOME/.bashrc


1. 아래의 구성 파일 수정/etc/hosts root로 수행$HOME/.bashrc hadoop 사용자로 수행

2. “별첨 2. OS 환경 설정 및 하둡 구성 파일”의 첨부 파일 참고

3. 3개의 노드 모두 수행

하둡 기본 구성4. OS 환경 설정 – ssh 접속 환경 구성


[hadoop@hadoop01 ~]$ ssh-keygen -t rsa 계속 엔터, 암호 물어보면 그냥 엔터

수행 결과로 $HOME/.ssh 디렉토리에 id_rsa, id_rsa.pub 파일이 생김

[hadoop@hadoop01 ~]$ ssh-copy-id hadoop@hadoop01 yes, 암호 물어보면 한번 입력

[hadoop@hadoop01 ~]$ ssh-copy-id hadoop@hadoop02

[hadoop@hadoop01 ~]$ ssh-copy-id hadoop@hadoop03

ssh-copy-id 수행 결과로 각 노드의 $HOME/.ssh 디렉토리에 authorized_keys 파일이 생김

모든 노드에서 다른 노드로 동일하게 수행

아래와 같이 수행했을 때에, 암호 물어보지 않고 바로 로그인 가능해야 정상

[hadoop@hadoop01 ~]$ ssh hadoop01



하둡 기본 구성4. OS 환경 설정 – 방화벽 해제


1. 아래와 같이 방화벽 설정 확인 및 disable[root@hadoop01 ~]# systemctl status firewalld[root@hadoop01 ~]# systemctl stop firewalld[root@hadoop01 ~]# systemctl disable firewalld

2. 3개 노드 모두에서 실행

3. firewalld가 수행되고 있으면, 하둡 클러스터 구성이 불가능(하둡에서 사용하는 포트 사용 불가능)

하둡 기본 구성5. 하둡 설치


1. NameNode에서 하둡 설치

2. 하둡 설치는 패키지 설치가 아니라, 압축 파일만 해제하면 됨

3. 하둡 홈페이지에서 최신 바이너리 버전 다운로드하여 scp등으로 NameNode에 업로드 후에 4번 수행- http://hadoop.apache.org/releases.html

4. 또는, NameNode 에서 wget으로 다운로드하고, 압축해제 및 링크 생성[hadoop@hadoop01 ~]$ wget http://redrockdigimark.com/apachemirror/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz[hadoop@hadoop01 ~]$ tar xvzf hadoop-2.7.2.tar.gz[hadoop@hadoop01 ~]$ ln -s hadoop-2.7.2 hadoop

http://hadoop.apache.org/releases.html

http://redrockdigimark.com/apachemirror/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz

하둡 기본 구성6. 하둡 구성 파일 수정 및 7. 하둡 복사


1. 아래의 구성 파일 수정/home/hadoop/hadoop/etc/hadoop/core-site.xml/home/hadoop/hadoop/etc/hadoop/hdfs-site.xml/home/hadoop/hadoop/etc/hadoop/mapred-site.xml/home/hadoop/hadoop/etc/hadoop/yarn-site.xml/home/hadoop/hadoop/etc/hadoop/slaves

2. “별첨 2. OS 환경 설정 및 하둡 구성 파일”의 첨부 파일 참고

3. 하둡 바이너리 복사(NameNode DataNode, hadoop01 hadoop02, hadoop03)[hadoop@hadoop01 ~]$ scp -r hadoop-2.7.2 hadoop@hadoop02:~[hadoop@hadoop01 ~]$ scp -r hadoop-2.7.2 hadoop@hadoop03:~

[hadoop@hadoop01 ~]$ ssh hadoop@hadoop02 "ln -s hadoop-2.7.2 hadoop"[hadoop@hadoop01 ~]$ ssh hadoop@hadoop03 "ln -s hadoop-2.7.2 hadoop"

하둡 기본 구성8. HDFS 생성 및 결과 확인


1. 하둡 NameNode 포맷[hadoop@hadoop01 ~]$ hdfs namenode -format

2. 하둡 클러스터 실행[hadoop@hadoop01 ~]$ start-dfs.sh && start-yarn.sh[hadoop@hadoop01 ~]$ mr-jobhistory-daemon.sh start historyserver(JOB history 확인용으로Optional)

3. 클러스터 및 파일 시스템 확인[hadoop@hadoop01 ~]$ hdfs dfsadmin -report[hadoop@hadoop01 ~]$ hadoop fs -df -h[hadoop@hadoop01 ~]$ jps 각 노드에서 수행해 보면, 각 노드별 수행 역할 확인 가능

하둡 기본 구성8. HDFS 생성 및 결과 확인


3. 클러스터 및 파일 시스템 확인

WEB GUI 확인: http://192.168.2.191:50070

Resource Manager GUI 확인: http://192.168.2.191:8088

Job History GUI: http://192.168.2.191:19888/jobhistory

http://192.168.2.191:50070/

http://192.168.2.191:8088/

http://192.168.2.191:19888/jobhistory

WordCount, TeraGen, TeraSort, TeraValidate

하둡 샘플 테스트(MapReduce)


하둡 샘플 테스트(MapReduce)WordCount 예제


[hadoop@hadoop01 ~]$ hadoop fs -mkdir /wc_input

[hadoop@hadoop01 ~]$ cd hadoop ; ls

[hadoop@hadoop01 hadoop]$ cat LICENSE.txt

[hadoop@hadoop01 hadoop]$ hadoop fs -copyFromLocal LICENSE.txt /wc_input

[hadoop@hadoop01 hadoop]$ hadoop fs -ls / ; hadoop fs -ls /wc_input

[hadoop@hadoop01 hadoop]$ cd $HOME/hadoop/share/hadoop/mapreduce

[hadoop@hadoop01 mapreduce]$ pwd ; ls

[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar mapreduce 예제 목록 확인

[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /wc_input/wc_output

[hadoop@hadoop01 mapreduce]$ hadoop fs -ls /wc_output

[hadoop@hadoop01 mapreduce]$ hadoop fs -cat /wc_output/part-r-00000

작업 진행 및 결과등은 Resource Manager 및 JOB History WEB GUI에서 확인 가능 또는 yarn 명령어로 확인 가능

하둡 샘플 테스트(MapReduce)HDFS의 실체


하둡 샘플 테스트(MapReduce)TeraGen, TeraSort, TeraValidate


1. TeraGen, TeraSort, TeraValidate는 기본 하둡에 포함된, 범용적인 벤치마크 툴

teragen: Generate data for the terasort

terasort: Run the terasort

teravalidate: Checking results of terasort

2. NetApp FAS NFS Connector for Hadoop(TR-4382 )에서 성능 테스트에 사용됨

[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar teragen 10000 /teragen

[hadoop@hadoop01 mapreduce]$ hadoop fs -ls /teragen

[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar terasort /teragen /terasort

[hadoop@hadoop01 mapreduce]$ hadoop fs -ls /terasort

[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar terasort /terasort /teravalidate

[hadoop@hadoop01 mapreduce]$ hadoop fs -ls /teravalidate

작업 진행 및 결과등은 Resource Manager 및 JOB History WEB GUI에서 확인 가능 또는 yarn 명령어로 확인 가능

NetApp 하둡 커넥터 구성


NetApp scalable Hadoop deployment


Network Architecture E-Series and FAS arrays for Hadoop

Hadoop in the Enterprise


빅 데이터 분석이 일반화가 되고 있으나, 그 앞에는 많은 도전 과제가 있다. 현재 운영중인 분석 시스템에 바로 하둡을 적용하기도 어렵다.

1. Enterprises have storage and compute imbalance. 기업 각각의 환경에서는 컴퓨팅과 스토리지의 불균형이 존재하지만, 전형적인 하둡 시스템은 분리가 불가능하다. 분리형 시스템(decoupled design)은, 컴퓨팅과 스토리지의 독립적인 확장이 가능하다.

2. Enterprises have existing hardware. 대부분 기존의 하드웨어와 데이터를 보유하고 있는데, 이를 활용하기 위해서는 데이터를 분석 시스템으로 이관이 필요하다.

그러나 분리형 시스템(decoupled design) 에서는, 기존 하드웨어와 데이터를 이용하여 분석이 가능하다.

3. Analytics storage JBOD is not efficient. 하둡에서의 JBOD는 가용성과 성능을 위해서 비효율적인 3벌 복제를 사용한다. 넷앱은 RAID-DP로 효율화가 가능하다.

4. Analytics storage JBOD lacks data management. 하둡과 같은 분석 시스템에서 사용되는 파일시스템은 중복제거, 고가용성 재해복구 등과 같은 기능들이 부족하다. 넷앱 하둡 커넥터는, 데이터 노드와 무관하게 분석용 용량을 추가할 수 있다. 커넥터는 기존 스토리지 시스템의 데이터를 분산처리를 가능하게 해준다. 기본 HDFS와 넷앱 NFS를 동시에 사용이 가능하다. Snapshot, FlexClone등의 기술을 적용할 수 있다.

Benefits and Use Cases, Deployment Options, Ease of Deployment


넷앱 하둡 NFS Connector 는 분리형 시스템(decoupled design)으로, 기존 하둡 시스템보다 높은 기능성을 제공할 수 있다.

1. Analyzing data on enterprise storage. – 하둡 커넥터는 기존의 스토리지의 데이터를, 분석을 위한 데이터 수집 과정없이 바로 분석할수 있다. 즉, 단일 스토리지로 운영 데이터와 분석 데이터에 대한 서비스를 해줄 수 있다.

2. Cross data–center deployments. – 분리형 시스템이므로 데이터를 분산하여 저장하고 독립적인 확장이 가능하다. 또한, NPS 같은넷앱 솔루션을 이용하여, 클라우드 컴퓨팅을 활용할 수 있다.

3. Analyze data on existing NFS storage.

4. Build testing and QA environments by using clone of existing data. – FlexClone 을 이용하여, 또 다른 용도의 데이터 셋을 즉시생성하여 활용할 수 있다.

5. Leverage storage-level caching for iterative machine learning algorithms. – 반복적인 머신러닝과 같은 알고리즘은 캐시친화적이므로, FlashCache를 통한 성능 가속화가 가능하다.

6. Use a backup site for analytics. – NPS 를 사용할 경우, 클라우드 자원을 분석에 활용할 수 있다.

7. Deployment Options – HDFS+NFS 또는 NFS의 두 가지 형태 모두 사용이 가능하다.

8. Ease of Deployment – 하둡 커넥터는 JAR 압축 파일의 복사와 설정 파일의 수정만으로 손 쉽게 구현이 가능하다.

Technical Advantages


1. The connector works with Apache Hadoop, Apache Spark, Apache HBase, and Tachyon.

2. No changes are needed to existing applications.

3. No changes are needed to existing deployments; only configuration files are modified (core-site.xml, hbase-site.xml, and so on).

4. Data storage can be modified and upgraded nondestructively by using clustered Data ONTAP.

5. The connector supports the latest networks (10GbE) and multiple NFS connections.

6. The connector enables high-availability and nondisruptive operations by using clustered Data ONTAP.

NetApp NFS Connector for Hadoop plugs into Apache Hadoop


1. Connection Pool

여러 노드와 멀티 링크 사용 가능

2. File Handle Cache

LRU 캐싱 활용 가능

3. NFS InputStream – 하둡 노드에서 읽기 작업

Large sequential reads. – nfsReadSizeBits

Multiple outstanding I/Os.

Prefetching. – nfsSplitSizeBits

4. NFS OutputStream – 하둡 노드에서 쓰기 작업

write buffer – nfsWriteSizeBits

all write requests only when the output stream is closed

5. Authentication

none or UNIX – nfsAuthScheme

NetApp 하둡 커넥터 구성NetApp FAS NFS Connector for Hadoop 다운로드 및 복사


1. NetApp FAS NFS Connector for Hadoop 다운로드

https://github.com/NetApp/NetApp-Hadoop-NFS-Connector/releases

hadoop-nfs-connector-1.0.6.jar

hadoop-nfs-3.0.0-SNAPSHOT.jar

2. 모든 노드에 해당 jar 업로드 및 복사

중요한 점은 hadoop의 classpath를 확인하고 복사할 것

[hadoop@hadoop01 ~]$ hadoop classpath

[hadoop@hadoop01 common]$ pwd

/home/hadoop/hadoop/share/hadoop/common

[hadoop@hadoop01 common]$ scp hadoop-nfs-3.0.0-SNAPSHOT.jar hadoop-nfs-connector-1.0.6.jar hadoop@hadoop02:/home/hadoop/hadoop/share/hadoop/common

https://github.com/NetApp/NetApp-Hadoop-NFS-Connector/releases

NetApp 하둡 커넥터 구성FAS 스토리지 옵션 변경


Cluster832::> vserver nfs modify -vserver hadoop -nfs-rootonly disabled

Cluster832 ::> vserver nfs modify -vserver hadoop -mount-rootonly disabled

Cluster832 ::> set advanced

Warning: These advanced commands are potentially dangerous; use them only when directed to do so

by NetApp personnel.

Do you want to continue? {y|n}: y

Cluster832 ::*> vserver nfs modify -vserver hadoop -v3-tcp-max-read-size 1048576

Cluster832 ::*> vserver nfs modify -vserver hadoop -v3-tcp-max-write-size 65536

Cluster832 ::*>

DOT 8.3.2에서 -v3-tcp-max-read-size, -v3-tcp-max-write-size 옵션은 DEPRECATED로 표기됨

(DEPRECATED)-NFSv3 TCP Maximum Read Size (bytes): 1048576

(DEPRECATED)-NFSv3 TCP Maximum Write Size (bytes): 65536

NetApp 하둡 커넥터 구성NFS 구성 파일 작성 및 하둡 구성 파일 수정


1. nfs-mapping.json 파일 작성 및 업로드

모든 노드의 하둡 설정 파일 디렉토리에 업로드

/home/hadoop/hadoop/etc/hadoop

2. core-site.xml 구성 파일 수정

모든 노드의 하둡 설정 파일 수정

넷앱 하둡 커넥터 부분 추가

{"spaces": [

{"name": "DOT832","uri": "nfs://192.168.2.194:2049/","options": {"nfsExportPath": "/hadoop","nfsReadSizeBits": 20,"nfsWriteSizeBits": 20,"nfsSplitSizeBits": 30,"nfsAuthScheme": "AUTH_SYS","nfsUsername": "root","nfsGroupname": "root","nfsUid": 0,"nfsGid": 0,"nfsPort": 2049,"nfsMountPort": -1,"nfsRpcbindPort": 111

},"endpoints": [

{"host": "nfs://192.168.2.194:2049/","exportPath": "/hadoop","path": "/"

},]

}]

}

NetApp 하둡 커넥터 구성NFS 볼륨에 데이터 업로드 테스트


[hadoop@hadoop01 hadoop]$ pwd

/home/hadoop/hadoop

[hadoop@hadoop01 hadoop]$ ls

[hadoop@hadoop01 hadoop]$ hadoop fs -ls nfs://192.168.2.194:2049/

[hadoop@hadoop01 hadoop]$ hadoop fs -copyFromLocal *.txt nfs://192.168.2.194:2049/

Store with ep Endpoint: host=nfs://192.168.2.194:2049/ export=/hadoop path=/ has fsId 2147888298

Found 1 items

drwxrwxrwx - 0 0 4096 2016-08-31 11:05 nfs://192.168.2.194:2049/.snapshot

[hadoop@hadoop01 hadoop]$ hadoop fs -copyFromLocal *.txt nfs://192.168.2.194:2049/

Store with ep Endpoint: host=nfs://192.168.2.194:2049/ export=/hadoop path=/ has fsId 2147888298

16/08/31 11:45:35 WARN stream.NFSBufferedOutputStream: Flushing a closed stream. Check your code.

16/08/31 11:45:35 INFO stream.NFSBufferedOutputStream: STREAMSTATSstreamStatistics:

STREAMSTATS name: class org.apache.hadoop.fs.nfs.stream.NFSBufferedInputStream/LICENSE.txt._COPYING_

STREAMSTATS streamID: 1

STREAMSTATS ====OutputStream Statistics====

……………………….. 생략…………………………..

[hadoop@hadoop01 hadoop]$ hadoop fs -ls nfs://192.168.2.194:2049/

NetApp 하둡 커넥터 구성NFS 볼륨에 TeraGen, TeraSort, TeraValidate 테스트


[hadoop@hadoop01 mapreduce]$ pwd

/home/hadoop/hadoop/share/hadoop/mapreduce

[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar teragen 100 nfs://192.168.2.194:2049/teragen

[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar terasort nfs://192.168.2.194:2049/teragen nfs://192.168.2.194:2049/terasort

[hadoop@hadoop01 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar teravalidate nfs://192.168.2.194:2049/terasort nfs://192.168.2.194:2049/teravalidate

[hadoop@hadoop01 mapreduce]$ hadoop fs -ls nfs://192.168.2.194:2049/


감사합니다.

별첨 1. 기타 리눅스 설정hostname 설정


별첨 1. 기타 리눅스 설정IP 주소 변경


별첨 2. OS 환경 설정 및 하둡 구성 파일


1. /home/hadoop/.bashrc

2. /etc/hosts

3. /home/hadoop/hadoop/etc/hadoop/core-site.xml

4. /home/hadoop/hadoop/etc/hadoop/hdfs-site.xml

5. /home/hadoop/hadoop/etc/hadoop/mapred-site.xml

6. /home/hadoop/hadoop/etc/hadoop/yarn-site.xml

7. /home/hadoop/hadoop/etc/hadoop/slaves

별첨 2. OS 환경 설정 및 하둡 구성 파일


5. /home/hadoop/hadoop/etc/hadoop/mapred-site.xml

<configuration><property>

<name>mapreduce.framework.name</name><value>yarn</value>

</property></configuration>

6. /home/hadoop/hadoop/etc/hadoop/yarn-site.xml


<name>yarn.resourcemanager.hostname</name><value>hadoop01</value>

</property><property>

<name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value>


7. /home/hadoop/hadoop/etc/hadoop/slaves

hadoop01hadoop02hadoop03

1. /home/hadoop/.bashrc

export JAVA_HOME=/usr/lib/jvm/javaexport HADOOP_HOME=/home/hadoop/hadoopexport HADOOP_COMMON_HOME=$HADOOP_HOMEexport HADOOP_HDFS_HOME=$HADOOP_HOMEexport HADOOP_MAPRED_HOME=$HADOOP_HOMEexport HADOOP_YARN_HOME=$HADOOP_HOMEexport PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

2. /etc/hosts

192.168.2.191 hadoop01192.168.2.192 hadoop02192.168.2.193 hadoop03

3. /home/hadoop/hadoop/etc/hadoop/core-site.xml


<name>fs.defaultFS</name><value>hdfs://hadoop01:9000</value>


4. /home/hadoop/hadoop/etc/hadoop/hdfs-site.xml


<name>dfs.namenode.name.dir</name><value>file:///home/hadoop/hdfs/namenode</value>


<name>dfs.datanode.data.dir</name><value>file:///home/hadoop/hdfs/datanode</value>


<name>dfs.namenode.secondary.http-address</name><value>hadoop02:50090</value>


<name>dfs.replication</name><value>3</value>


하둡알아보기, netapp fas nfs connector for hadoop

Data & Analytics