Download - Apache Eagle @ IEEE International Conference
![Page 1: Apache Eagle @ IEEE International Conference](https://reader036.vdocuments.site/reader036/viewer/2022062412/589b9f6f1a28abd63e8b5dc9/html5/thumbnails/1.jpg)
EAGLE: User Profile-based Anomaly Detection for Securing Hadoop Clusters
01 NOV, 2015
CHAITALI GUPTA, RANJAN SINHA, YONG ZHANG
![Page 2: Apache Eagle @ IEEE International Conference](https://reader036.vdocuments.site/reader036/viewer/2022062412/589b9f6f1a28abd63e8b5dc9/html5/thumbnails/2.jpg)
Outline
Why EAGLE?
Architecture of EAGLE
User Profiles in EAGLE
Experiments
Performance Results
Future Work
![Page 3: Apache Eagle @ IEEE International Conference](https://reader036.vdocuments.site/reader036/viewer/2022062412/589b9f6f1a28abd63e8b5dc9/html5/thumbnails/3.jpg)
Big Data @ eBay
800MListings *
159M Global Active Buyers *
*Q3 2015 data
7 Hadoop Clusters*
800MHDFS operations (single cluster)*
120 PB Data*
![Page 4: Apache Eagle @ IEEE International Conference](https://reader036.vdocuments.site/reader036/viewer/2022062412/589b9f6f1a28abd63e8b5dc9/html5/thumbnails/4.jpg)
Motivation
Who is accessing the data?
What data are they accessing?
Is someone trying to access data that they don’t have access to?
Are there any anomalous access patterns?
Is there a security threat?
How to monitor and get notified during or prior to an anomalous event occurring?
![Page 5: Apache Eagle @ IEEE International Conference](https://reader036.vdocuments.site/reader036/viewer/2022062412/589b9f6f1a28abd63e8b5dc9/html5/thumbnails/5.jpg)
ARCHITECTURE
STREAM PROCESSINGENGINE
Dat
a C
olle
ctor
Kaf
ka
HDFS, Audit, Security
METADATA MANAGER
DATA STO
RESREMEDIATION
ENGINEApache Ranger
MACHINE LEARNING MODULE
Custom module
Alerts
Activities
Alerts
PolicyThresholdsUser properties
ML Thresholds
Real Time Alert Dashboard
HDFS Archive
Security Analyst
Admin Console
Security Engineer
Insights
Metadata
Management
MACHINE LEARNING TRAINING MODULE
![Page 6: Apache Eagle @ IEEE International Conference](https://reader036.vdocuments.site/reader036/viewer/2022062412/589b9f6f1a28abd63e8b5dc9/html5/thumbnails/6.jpg)
USER PROFILE ALGORITHMSDensity Estimation
• Compute mean and standard deviation
• Compute probability density estimation
• Detect anomaly if probability density below minimum probability density seen so far from training set
![Page 7: Apache Eagle @ IEEE International Conference](https://reader036.vdocuments.site/reader036/viewer/2022062412/589b9f6f1a28abd63e8b5dc9/html5/thumbnails/7.jpg)
USER PROFILE ALGORITHMS…Eigen Value Decomposition
• Compute mean and variance
• Compute Eigen Vectors and determine Principal
Components
• Normal data points lie near first few principal
components
• Abnormal data points lie further from first few
principal components and closer to later
components
![Page 8: Apache Eagle @ IEEE International Conference](https://reader036.vdocuments.site/reader036/viewer/2022062412/589b9f6f1a28abd63e8b5dc9/html5/thumbnails/8.jpg)
USER PROFILE ARCHITECTURE
![Page 9: Apache Eagle @ IEEE International Conference](https://reader036.vdocuments.site/reader036/viewer/2022062412/589b9f6f1a28abd63e8b5dc9/html5/thumbnails/9.jpg)
EXPERIMENTAL METHODOLOGY
User Population
• 1500 ebay users accessing Hadoop clusters
Features• HDFS operation frequencies aggregated across one
minute interval • Examples
• Command frequencies• Time of the job
![Page 10: Apache Eagle @ IEEE International Conference](https://reader036.vdocuments.site/reader036/viewer/2022062412/589b9f6f1a28abd63e8b5dc9/html5/thumbnails/10.jpg)
EXPERIMENTAL METHODOLOGY…
Determine users who are behaviorally different
• Compute Mahalanobis distance between users data
,where are mean and standard deviation
• Compute clusters
• Use behaviorally different users from a user as cross-validation set
![Page 11: Apache Eagle @ IEEE International Conference](https://reader036.vdocuments.site/reader036/viewer/2022062412/589b9f6f1a28abd63e8b5dc9/html5/thumbnails/11.jpg)
PERFORMANCE RESULTS
Sensitivity
![Page 12: Apache Eagle @ IEEE International Conference](https://reader036.vdocuments.site/reader036/viewer/2022062412/589b9f6f1a28abd63e8b5dc9/html5/thumbnails/12.jpg)
FUTURE WORK
• Apache incubation releases• Twitter feed: https://twitter.com/theapacheeagle
• Extend to HIVE, HBASE, Pig and other Big Data Technologies
• Explore alternative algorithms
• Consider more features
![Page 13: Apache Eagle @ IEEE International Conference](https://reader036.vdocuments.site/reader036/viewer/2022062412/589b9f6f1a28abd63e8b5dc9/html5/thumbnails/13.jpg)
APACHE EAGLE - OPEN SOURCE
Eagle Site: http://goeagle.io
Tech Blog: http://www.ebaytechblog.com
Github Repo:https://github.com/eBay/Eagle
Apache Incubator Project: Oct 26, 2015
![Page 14: Apache Eagle @ IEEE International Conference](https://reader036.vdocuments.site/reader036/viewer/2022062412/589b9f6f1a28abd63e8b5dc9/html5/thumbnails/14.jpg)
Thank You!