Download - Scaling big-data-mining-infra2
Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience
黃振修 (Chris Huang)SPN 主動式雲端截毒技術架構師
About Me
• SPN 主動式雲端截毒技術架構師• SPN Hadoop 基礎運算架構師• Hadoop in Taiwan 2013 講師• Hadoop.TW 活躍成員
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 2
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.
The Journey to Big Data
3
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 4
YesterdayYesterdayYesterdayYesterday~40 Hadoop nodes
~15 Service/user accounts
3 Teams
<50 TB storage
<100 Jobs per day
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 5
TodayTodayTodayToday~200 Hadoop nodes
~130 Service/user accounts
11 Teams
~500 TB storage
>16000 Jobs per day
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 6
1 MapReduce Job Submitted
Each 5.4 Seconds
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 7
Why?Why?Why?Why?
Raw DataActionable Intelligence
Collaboration in the underground
網路威脅呈現爆炸性的成長
New Unique Malware Discovered
各式各樣的變種病毒、垃圾郵件、不明的下載來源等等,這些來自網路上的威脅,躲過傳統安全防護系統的偵測,一直持續呈現爆炸性的成長,形成嚴重的資安威脅
1M
unique
Malwares
every
month
1M
unique
Malwares
every
month
Reality Check
2011
New Unique Threats per Hour(worldwide estimate*)
NetworkWorms
Threats Found in Enterprises(Real-world data from 150+ assessments*)
Data-StealingMalware
IRCBots
TargetingMalware
COMPLEXITY
DA
NG
ER
Dangerous RisksSkyrocketing Volume Avoiding Detection
42%
56%
77%
100%2010200920082007
12600
NEW Threat Every
0.28Seconds
2400
• 52% of companies failed to report or remediate a cyber breach in 2011. --- SAIC, 2011
• Two new pieces of malwares are created every second. ---Trend Micro, 2012
• A cyber intrusion occurs every 5 minutes. --- US CERT 2012
Traditional approach is no more sufficient!
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.
Big Data Exploration
17
New approach for cyber threat solution
Web CrawlerWeb Crawler
Trend Micro
Endpoint Protection
Trend Micro
Endpoint Protection
Trend Micro
Mail Protection
Trend Micro
Mail Protection
Trend Micro
Web Protection
Trend Micro
Web Protection
HoneypotHoneypot
CDN / xSPCDN / xSP Researcher
Intelligence
Researcher
Intelligence
3+ Billion Worldwide Sensors
SPN: Smart Protection Network
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 19
Collects
Protects
Identifies
BIGDATA
ANALYTICS(Data Mining,
Machine Learning,
Modeling, Correlation)
DAILY STATS:• 7.2 TB data correlated
• 1B IP addresses
• 90K malicious
threats identified
• 100+M good files
SPN High Level Architecture
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 20
Receiver
Trend Message Exchange (Message Bus)
Hadoop Distributed File System (HDFS)
HBaseMapReduce
Adhoc-Query (Pig)
Oozie
CDN/xSP
Log
Honey
Pot
SPN
Feedback
Data SourcingData Sourcing
APP 1
MySPN Platform
Solr Cloud
API Server/Portal
Service Platform
APP 2
Service DeliveryService Delivery
MySPN Ecosystem
Portal
& API
Single
Entry-Point
SPN Infrastructure
APT KB Service
TopCVE Service
APT KB
VE DB
FB Logs
Census
MySPN
Market Place
Service Platform
SSO
New App
OPS RD / Team
Monitor SDK
All My
Guard
Threat
Connect
Dashboard
Service
Catalog
Census
Profile Alert
New App
Dispatcher
Access
Login
Trender
Need
Solution
Customer
Publish
ImplementOperate
Develop
Solution
backed-by
Data Catalogue
SPN Solution Architecture
File
URLWeb / URL
Domain
IP
File Reputation ServiceFile Reputation Service
Email Reputation ServiceEmail Reputation Service
Custo
mer
Sm
art P
rote
ctio
n
Community Intelligence
(Feedback loop)
Web Reputation ServiceWeb Reputation Service
SourcingProcessing
& Analysis
Validate &
Create Solution
Quality
Assurance
Solution
Distribution
Solution
Adoption
SPN Correlation
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.
Big Data Case Study
23
Internet Web Server
4. Access page1. Intercept URL
SPN Cloud
9/10/2013 24
200K+ new URL created every day
Case Study: Web Reputation Services
8+ billions URL process daily
User Traffic / Sourcing
CDN vender
Rating Server for Known
Threats
Unknown & Prefilter
Page Download
Threat
Analysis
8 billions/day
4.8 billions/day
860 millions/day
40% filtered
82% filtered
25,000 malicious URL /day
99.98% filtered
Trend Micro
Products / Technology
CDN Cache
High Throughput Web Service
Hadoop Cluster
Web Crawling
Machine Learning
Data Mining
Technology Process Operation
Block malicious URL within 15 minutes once it goes online!
WRS Architecture Overview
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.
Big Data Lesson Learned
27
How to Scale?
• Un-structure data first
• If you really need structure data
– Use Google Protocol Buffers or
– JSON string
• Purify your data before processing
• Leverage HBase more
– Well design row key to prevent hot-spot
• Use MapReduce to create Lucene index
• Leverage SolrCloud for complex real-time use cases
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 28
Our Learning
• Has clear strategy first
• Start small, scale quickly
• Chose right solution for right problem
Q&A
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 30
9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 31
Big ChallengeBig Opportunity
Thank You