제1회 korea community day 발표자료 bigdata
DESCRIPTION
Bigdata Platform, Hadoop, Hive, ...TRANSCRIPT
![Page 1: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/1.jpg)
Cloud Computing
BigData2011.12
- - 2.0 .
![Page 2: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/2.jpg)
![Page 3: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/3.jpg)
[email protected]) (www.gruter.com)
SDS, NHN
www.jaso.co.krwww.cloudata.orgwww.cloumon.orgwww.twitter.com/babokimwww.facebook.com/babokim
![Page 4: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/4.jpg)
BigData Definition(1)
Big Data(BD) / /, ,
What is
BigData?
DB (McKinsey, 2011)- SW , ,
DB (IDC, 2011)- Big Data( ) , ,
, ,
SNS M2M
, , ,
Economist(2010.05)
Gartner(2011.03)
21
Information silo
McKinsey(2011.05)
/,
/, 5
6
: Big Data, (KT )
![Page 5: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/5.jpg)
BigData Definition(2)
Very large, distributed aggregations of loosely structured data
Petabytes/exabytes of data,Millions/billions of people,Billions/trillions of records,Loosely-structured and often distributed data,Flat schemas with few complex interrelationships,Often involving time-stamped events,Often made up of incomplete data,Often including connections between data elements that must be probabilistically inferred,
Applications that involved Big-data can beTransactional (e.g., Facebook, PhotoBox), or,Analytic (e.g., ClickFox, Merced Applications).
http://wikibon.org/wiki/v/Enterprise_Big-data
![Page 6: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/6.jpg)
Big-data Analytics Complements Data Warehouse
Traditional Data Warehouse
- Complete record from transactional system- All data centralized- Analytics designed against stable environment- Many reports run on a production basis
Big-data Analytic Environment
- Data from many sources inside and outside of organization(including traditional DW)
- Data often physically distributed- Need to iteration solution to test/improve models- Large-memory analytics also part of iteration- Every iteration usually requires complete reload of information
http://wikibon.org/wiki/v/Enterprise_Big-data
![Page 7: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/7.jpg)
Facebook Social plug-in
Feedback
process over 20 billion events per day (200,000 events per second) with a lag of less than 30 seconds.
Analytic
Transactional
![Page 8: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/8.jpg)
BigData
Collecting Reporting/SearchingAnalysis
Repository/
Cluster-ing
Classifi-cation
Senti-mentalAnalysis
Indexing
, SNS
Store
( )
(DBMS, NoSQL)
Index
Robot
RSS Reader
OpenAPI
/
/
User Define Query Script
ETL
Data Aggregator
![Page 9: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/9.jpg)
Workers schemify tweetsand append to Hadoop
Workers update statistics on URLs byincrementing counters in Cassandra
Distribute tweets randomlyon multiple queues
Workers choose queue to enqueueto using hash/mod of URL
All updates for same URLguaranteed to go to same worker
Workers share the load ofschemifying tweets
Twitter : backtype
![Page 10: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/10.jpg)
BigData
Architectural Requirements
Scalability- Scale-out - Elasticity
Reliability--
Flexibility- Easy for adding Analysis Rule- Support various data format
Latency- Real time, Near Real time, Batch
High Throughput- Global web scale traffic- ~ /sec
---- Hadoop
Component ,
IBM, HP, Oracle
-- BI/DW
?
![Page 11: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/11.jpg)
BigData
Flume, Scribe, Chukwa
Hadoop FileSystemMogileFS
, NoSQL(Cloudata, HBase,Cassandra)Katta, ElasticSearch
count, sum aggregation S4, Storm
, Hadoop MapReduce(Hive,Pig)Giraph, GoldenOrb
/ Cluster, Classification Mahout, R
ZooKeeper, HUE, Cloumon
Serialization Thrift, Avro, ProtoBuf
![Page 12: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/12.jpg)
Hadoop Echo System
http://indoos.wordpress.com/2010/08/16/hadoop-ecosystem-world-map/
![Page 13: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/13.jpg)
Software Stack
Data Store
File System(HadoopFS)
NoSQL(Cloudata, HBase, Cassandra)
Batch Analysis
Data Analysis Platform(hadoop)
Man
agem
ent
Monito
ring(clo
umon)
Cluster
Manag
ement
(ZooKeep
er)Interface
Web Phone Pad
(Near)Real-timeAnalysis
Aggregator
Job Workflow Engine(oozie, cascade)
Data Visualization
Collector(flume, scribe, chukwa)
Script Language(Hive, Pig)
CEP Engine(Esper)
Real-time Analysis Platform
Analysis Job
Rule M
anagem
ent
Search(ElasticSearch)
Analysis Job
Mining Lib(Mahout)
Statistics Lib(R)
![Page 14: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/14.jpg)
Chukwa(Yahoo)Hadoop FileSystem
HDFSMapReduce ( )
Scribe(Facebook)
(thrift)Hadoop JNI
Flume(Cloudera)
, , Hadoop, HBase, Search Engine
CentralizedStorage(HDFS)Agent
(local)
ApplicationServer
log
ApplicationServer Log4j
Temp Log
Collector #1
Collector #2
![Page 15: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/15.jpg)
- Esper Event
- Gruter ClouStream, Yahoo S4, Twitter Storm, Facebook Puma
ClouStream
Puma
![Page 16: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/16.jpg)
: Hadoop File System
BigData Defacto Standardx86
/
NameNode SPOF(Single Point Of Failure)
![Page 17: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/17.jpg)
: MapReduce
![Page 18: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/18.jpg)
: Hadoop MapReduce
MapReduce , MapReduceMapReduce
Hadoop FileSystem/
DB, FTP Server
FIFO, Fair, Capacity /
MapReduce , (streaming)
![Page 19: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/19.jpg)
: Script Language
Hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);hive> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE invites;hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';hive> FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar) INSERT OVERWRITE TABLE events SELECT t1.bar, t1.foo, t2.foo;
Visits = load /data/visits as (user, url, time);Visits = foreach Visits generate user, Canonicalize(url), time;Pages = load /data/pages as (url, pagerank);VP = join Visits by url, Pages by url;UserVisits = group VP by user;UserPageranks = foreach UserVisits generate user,AVG(VP.pagerank) as avgpr;GoodUsers = filter UserPageranks by avgpr > 0.5 ;store GoodUsers into '/data/good_users';
Hive
Pig
![Page 20: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/20.jpg)
Next Generation Hadoop(0.23)
YARN(Next MapReduce Framework)
HDFS Federation
![Page 21: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/21.jpg)
NoSQL
, , Scale-out ,
Key/value, Document , Simple Column Schema Free
Big Data x86
Eventually consistent / BASE (not ACID)Simple API
Twitter: Cassandra, HBase, Hadoop, Scribe, FlockDB, RedisFacebook: Cassandra, HBase, Hadoop, Scribe, HiveNetflix: Amazon SimpleDB, CassandraDigg: CassandraSimpleGeo: CassandraStumbleUpon: HBase, OpenTSDBYahoo!: Hadoop, HBase, PNUTSRackspace: CassandraDAUM: MongoDBNCSoft: Cassandra
CAP(Brewers Conjecture)
![Page 22: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/22.jpg)
NoSQL: Cloudata/HBase
Distributed Data Storagesemi-structured data store(not file system)
/Google Bigtable clone
Data Model, Architecture, FeaturesOpen source
http://www.cloudata.orgGoal
500 nodes300 GB /node, Peta bytes
Create, drop, modify table schema
Single row operationMulti row operation: like, between
Scanner, Direct Uploader, MapReduce Adapter
Automatic table split & re-assignment
(Hadoop)Failover
~
![Page 23: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/23.jpg)
![Page 24: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/24.jpg)
seenal.com
![Page 25: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/25.jpg)
![Page 26: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/26.jpg)
![Page 27: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/27.jpg)
10.29
![Page 28: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/28.jpg)
![Page 29: 제1회 Korea Community Day 발표자료 Bigdata](https://reader034.vdocuments.site/reader034/viewer/2022052618/54c6d5d94a7959395d8b4574/html5/thumbnails/29.jpg)
BigData ., BigData
., , , BigData
.BigData , Data
.,
. .
.(6 ~ 1 ).
.
..