a deep learning approach for ip hijack detection based on ......a deep learning approach for ip...
TRANSCRIPT
A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding
TAL SHAPIRA & YUVAL SHAVITT
8/18/20
A D
eep Lea
rning Ap
proa
ch for IP Hija
ck Detection Ba
sed
on ASN
Emb
edd
ing
1
School of Electrical Engineering
NetAI 2020
Intro - Autonomous Systems
u Autonomous System (AS): a collection of physical networks glued together using IP, have a unified administrative routing policy, and has been assigned a number (ASN - 32 bits).u ISP Internal networks: Verizon – 701, 702, 703 …, Leve3: 3356, 3549 …
u Campus networks: University of Delaware – 2, MIT - 3
u Corporate networks: Intel - 4983
u Content provider: Google - 15169, 16591 …, Facebook - 32934, 63293 …
u Border Gateway Protocol (BGP) coordinates the Inter-AS routing in the Internet u BGP routing's update messages list the entire AS path to reach an IP
address prefix (AP)
u Policy-Based routing protocol
A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding
2
NetAI 2020
u Prefix hijacking in a nutshell - another AS originates the prefix
u More than 40% of the network operators reported that their organization had been a victim of a hijack in the past
u What’s to stop someone else?u BGP does not verify that the AS is authorized
u Registries of prefix ownership are inaccurate
u How to?u Sub-prefix hijack (e.g. 1.1.1.1/24 instead of 1.1.1.1/22)
u Path shortening (BGP may choose path based on cost and length)
u Add a legitimate AS at the end of the path (and therefore it’s hard to tell that the AS path is bogus)
Intro – IP Hijack
A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding
3
Example – April 2010, China Telecom
NetAI 2020
Previous Approaches
u Prevention solutions (or reactive solutions):u Based on cryptographic authentications – RPKI1 and BGPsec2
u Operators are reluctant to deploy them due to technical and financial costs
u Detection solutions - based on the type of information:u Control-plane approaches3 (passive solutions) – based on a distributed set of BGP monitors and route collectors
u Data-plane approaches4 - only relies on real-time data plane information that is obtained from multiple sensors that deploy active probing (pings/traceroutes)
u Hybrid approaches5
u Most of the previous detection solutions rely on:u Features engineering + ML algorithm6
u Heuristic assumptions (e.g. VF)
A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding
4
1. [Geoff Huston and Randy Bush 2011] “Securing BGP and SIDR” (RPKI)2. [Matt Lepinski and K Sriram 2017] “BGPSEC protocol specification”3. [Sermpezis et al. 2018] “ARTEMIS: Neutralizing BGP Hijacking within a Minute”4. [Zhang et al. 2008] “Ispy: detecting IP prefix hijacking on my own”5. [Schlamp et al. 2016] “HEAP: reliable assessment of BGP hijacking attacks”6. [Fontugne et al. 2019] “BGP hijacking classification”
NetAI 2020
‘Valley Free’ Routing
u Routing rules:u Provider accepts everything
u Peer only if it is for its customers
u Path Properties:u Up then down
u No up-down-up
u At most 1 P2P step (and only at the top)
A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding
5
[Gao 2001] “On inferring autonomous system relationships in the Internet”
Valid Path
Invalid Path
i
i
A
B
C
D
NetAI 2020
Motivation
u Using an assumption-free method for IP hijack Detectionu Our method is based only on BGP announcements (or AS-level routes)
u We introduce the first end-to-end deep learning approach
u Our goal is to use a generic approach based on ASN embeddingu We aim to learn the dense representations of ASNs from BGP routes
u Apply machine/deep learning techniques based on the representations
A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding
6
NetAI 2020
Method
u An end-to-end deep learning approach:u First stage – BGP2Vec – ASN Embeddingu Second stage –
u IP hijack detection using LSTM networks
A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding
7
NetAI 2020
Datasets
u Route Views1 BGP announcements (RV), collected in March 2018u 3,600,000 BGP paths
u 62,525 Ases
u 113,400 undirected AS links
u Labeled BGP routes:u Consists of approximately 2,648,900 standard routes
(’GREEN’) and 47,800 hijacked routes (’RED’)
u The labeling was generated by combinations of VF algorithms2 and manual work
A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding
8
1. University of Oregon, Route Views Project, http://www.routeviews.org, March 20182. [Shavitt et al. 2009] “Near-Deterministic Inference of AS Relationships” (ND-ToR)
NetAI 2020
BGP2VEC1 – ASN Embeding
u Based on Word Embedding (Word2Vec2), broadly used in NLPu Embedding = represent discrete variables as continuous vectorsu An ASN is characterized by its context, i.e., neighboring ASNsu V = 62,525, N= 32
An example with V= 4:
A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding
9
1. [Shapira and Shavitt 2020] “Unveiling the Type of Relationship Between Autonomous Systems Using Deep Learning” (BGP2Vec)2. [Mikolov et al. 2013] “Distributed representations of words and phrases and their compositionality” (Word2Vec)
NetAI 2020
Exploration of ASN Embedings
A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding
10
NetAI 2020
Training Neural Neyworks (IP Hijack)
u An LSTM Neural Network which is comprised of five layers
u Categorical cross entropy loss functionu Using the Adam gradient-based optimizer
with default hyper-parametersu We build and run our network usingu We train our network based on our labeled
datasetu We use 20% of the samples as a test setu We run our network for 10 epochsu Inference time: 0.1 milliseconds on a single
Intel CPU
A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding
11
NetAI 2020
Experiments and Results – IP Hijack
u IP hijack detection based on ASN embedingu 99.99% Accuracy, 0.00% FA
u 50% of our misclassified predictions were wrong, i.e., we found errors in the labeled dataset
A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding
12
NetAI 2020
Results on BGP data for Ground Truth Events
u The dataset1 contains 70 events from February 2008 to July 2018u with an average number of 669
AS paths per event
u We classified correctly all the events within 2 years of our training data, or 2/3 of all the valid events
A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding
13
1. [Cho et al. 2019] “BGP hijacking classification”, TMA
NetAI 2020
Summary
u A novel approach for ASN embedding using deep learning (BGP2VEC)u Unsupervised methodu Based only on BGP announcements without any side-informationu A building block for many problems
u Achieves excellent results for IP Hijack Detectionu Without any assumptions (no ‘VF’)u 99.99% Accuracy with 0.00% FA on our own proprietry datasetu Although our method was trained with a dataset from March 2018 , we classified
correctly 2/3 of past events, and al all recent hijack eventsu As far as we know, we are the first to employs deep learning for this problem.
A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding
14
NetAI 2020
Thank you for your attention.Questions?
8/18/20
A D
eep Lea
rning Ap
proa
ch for IP Hija
ck Detection Ba
sed
on ASN
Emb
edd
ing
15
https://www.eng.tau.ac.il/~shavitt/https://talshapira.github.io/
NetAI 2020