hadoop and subsystems in livedoor #hcj11f
DESCRIPTION
TRANSCRIPT
![Page 1: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/1.jpg)
Hadoop and Subsystems
inlivedoor
Hadoop Conference Japan2011 Fall
2011/09/26tagomoris
2011年9月26日月曜日
![Page 2: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/2.jpg)
2011年9月26日月曜日
![Page 3: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/3.jpg)
we are hiring!
2011年9月26日月曜日
![Page 4: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/4.jpg)
what's livedoor?
2011年9月26日月曜日
![Page 5: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/5.jpg)
2011年9月26日月曜日
![Page 6: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/6.jpg)
large scale web services
2800+ servers3200+ hosts
530+ web servers
2011年9月26日月曜日
![Page 7: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/7.jpg)
20 Aug 2009
http://www.amazon.co.jp/dp/47973543642011年9月26日月曜日
![Page 8: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/8.jpg)
Aug 2011
15Gbps(10Gbps + CDN 5Gbps)
2011年9月26日月曜日
![Page 9: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/9.jpg)
Hadoop in livedoor
• 10 nodes (1+9)
• 36 core, 32TB HDFS
• CDH3b2
•with libhdfs, fuse-hdfs
•Hive 0.6.0 (community package)
2011年9月26日月曜日
![Page 10: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/10.jpg)
Hadoop in livedoor
data mining
reportingpage views, unique users,
traffic amount per page,
...2011年9月26日月曜日
![Page 11: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/11.jpg)
super large scale
'sed | grep | wc'with
Hadoop Streaming + Hive
2011年9月26日月曜日
![Page 12: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/12.jpg)
httpd logs
from 96 servers(apache / nginx)
580GB/day (raw)
2011年9月26日月曜日
![Page 13: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/13.jpg)
overview
webservers
hadoopstreaming(perl)
hivescribe loadinsert
hourlyon
demand
hourlydaily
2011年9月26日月曜日
![Page 14: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/14.jpg)
topics
•log delivery network with scribe
•and 'scribeline'
•hive client web application 'shib'
2011年9月26日月曜日
![Page 15: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/15.jpg)
overview
webservers
hadoopstreaming(perl)
hivescribe loadinsert
hourlyon
demand
hourlydaily
2011年9月26日月曜日
![Page 16: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/16.jpg)
scribelog delivery daemon
based on Thriftscalable, reliable
supports HDFS
https://github.com/facebook/scribe
2011年9月26日月曜日
![Page 17: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/17.jpg)
scribe nodesserver
server
server
server
server
server
deliver
central
backup
disk(backup)
disk(archive)
scribed
scribed
scribed
HDFS
2011年9月26日月曜日
![Page 18: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/18.jpg)
deliver node traffic
2011年9月26日月曜日
![Page 19: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/19.jpg)
scribe nodesserver
server
server
server
server
server
deliver
central
backup
HDFS disk(backup)
disk(archive)
scribed
scribed
scribed
2011年9月26日月曜日
![Page 20: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/20.jpg)
what we wantfrom scribe agent•easy to deploy
•works w/o any httpd configurations
•delivery target failover/takeback
•lightweight (without JVM)
•stable
2011年9月26日月曜日
![Page 21: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/21.jpg)
scribe nodesserver
server
server
server
server
server
deliver
central
backup
HDFS disk(backup)
disk(archive)scribeline
scribed
scribed
scribed
2011年9月26日月曜日
![Page 22: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/22.jpg)
scribelinelog delivery agent tool
python 2.4, thrift
easy to setup and start/stopworks without any httpd configurations
works with logrotate-ed log filesautomatic delivery target failover/takeback
https://github.com/tagomoris/scribe_line
2011年9月26日月曜日
![Page 23: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/23.jpg)
how to setup scribelinein livedoor
1. yum install scribeline(tar xzf && cd && sudo make install)
2. vi /etc/scribeline.confblog /var/log/httpd/access_log
blogimg /var/log/nginx/access_log
3. /etc/init.d/scribeline start
2011年9月26日月曜日
![Page 24: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/24.jpg)
scribe nodesserver
server
server
server
server
server
deliver
central
backup
HDFS disk(backup)
disk(archive)
scribed
scribed
scribed
2011年9月26日月曜日
![Page 25: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/25.jpg)
overview
webservers
hadoopstreaming(perl)
hivescribe loadinsert
hourlyon
demand
hourlydaily
2011年9月26日月曜日
![Page 26: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/26.jpg)
what we wantabout hive client•easy to experiment
•from PC on our desks
•result caching
•protection against data loss
•friendly look & feel
2011年9月26日月曜日
![Page 27: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/27.jpg)
shibhive client web applicationnode.js, thrift, kyoto tycoon
query history browserquery editor, based on copy&pasteresult caching & download tsv/csv
filter INSERT/DROP/CREATE ...
https://github.com/tagomoris/shib2011年9月26日月曜日
![Page 28: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/28.jpg)
2011年9月26日月曜日
![Page 29: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/29.jpg)
shib system overview
hadoopcluster
hiveserver
shib
KTstorage
browser
2011年9月26日月曜日
![Page 30: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/30.jpg)
what shib cannot do now
•access control
•graph & chart
•hive 0.7.0+ features support
•database, authentication and ...
•mapreduce status notification
2011年9月26日月曜日
![Page 31: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/31.jpg)
what we are trying now
•New cluster
•more nodes
•CDH3b2 + Hive 0.6.0 -> CDH3u1
•New tools
•Hoop (instead of fuse-hdfs)
•Any stream processing framework
2011年9月26日月曜日
![Page 32: Hadoop and subsystems in livedoor #Hcj11f](https://reader034.vdocuments.site/reader034/viewer/2022051412/54b767a04a795957768b460b/html5/thumbnails/32.jpg)
thanks!
2011年9月26日月曜日