a deep learning approach for ip hijack detection based on ......a deep learning approach for ip...

A Deep Learning Approach for IP Hijack Detection Based on ASN Embedding

TAL SHAPIRA & YUVAL SHAVITT

8/18/20

A D

eep Lea

rning Ap

proa

ch for IP Hija

ck Detection Ba

sed

on ASN

Emb

edd

ing

1

School of Electrical Engineering

NetAI 2020

Intro - Autonomous Systems

u Autonomous System (AS): a collection of physical networks glued together using IP, have a unified administrative routing policy, and has been assigned a number (ASN - 32 bits).u ISP Internal networks: Verizon – 701, 702, 703 …, Leve3: 3356, 3549 …

u Campus networks: University of Delaware – 2, MIT - 3

u Corporate networks: Intel - 4983

u Content provider: Google - 15169, 16591 …, Facebook - 32934, 63293 …

u Border Gateway Protocol (BGP) coordinates the Inter-AS routing in the Internet u BGP routing's update messages list the entire AS path to reach an IP

address prefix (AP)

u Policy-Based routing protocol


2

NetAI 2020

u Prefix hijacking in a nutshell - another AS originates the prefix

u More than 40% of the network operators reported that their organization had been a victim of a hijack in the past

u What’s to stop someone else?u BGP does not verify that the AS is authorized

u Registries of prefix ownership are inaccurate

u How to?u Sub-prefix hijack (e.g. 1.1.1.1/24 instead of 1.1.1.1/22)

u Path shortening (BGP may choose path based on cost and length)

u Add a legitimate AS at the end of the path (and therefore it’s hard to tell that the AS path is bogus)

Intro – IP Hijack


3

Example – April 2010, China Telecom

NetAI 2020

Previous Approaches

u Prevention solutions (or reactive solutions):u Based on cryptographic authentications – RPKI1 and BGPsec2

u Operators are reluctant to deploy them due to technical and financial costs

u Detection solutions - based on the type of information:u Control-plane approaches3 (passive solutions) – based on a distributed set of BGP monitors and route collectors

u Data-plane approaches4 - only relies on real-time data plane information that is obtained from multiple sensors that deploy active probing (pings/traceroutes)

u Hybrid approaches5

u Most of the previous detection solutions rely on:u Features engineering + ML algorithm6

u Heuristic assumptions (e.g. VF)


4

1. [Geoff Huston and Randy Bush 2011] “Securing BGP and SIDR” (RPKI)2. [Matt Lepinski and K Sriram 2017] “BGPSEC protocol specification”3. [Sermpezis et al. 2018] “ARTEMIS: Neutralizing BGP Hijacking within a Minute”4. [Zhang et al. 2008] “Ispy: detecting IP prefix hijacking on my own”5. [Schlamp et al. 2016] “HEAP: reliable assessment of BGP hijacking attacks”6. [Fontugne et al. 2019] “BGP hijacking classification”

NetAI 2020

‘Valley Free’ Routing

u Routing rules:u Provider accepts everything

u Peer only if it is for its customers

u Path Properties:u Up then down

u No up-down-up

u At most 1 P2P step (and only at the top)


5

[Gao 2001] “On inferring autonomous system relationships in the Internet”

Valid Path

Invalid Path

i

i

A

B

C

D

NetAI 2020

Motivation

u Using an assumption-free method for IP hijack Detectionu Our method is based only on BGP announcements (or AS-level routes)

u We introduce the first end-to-end deep learning approach

u Our goal is to use a generic approach based on ASN embeddingu We aim to learn the dense representations of ASNs from BGP routes

u Apply machine/deep learning techniques based on the representations


6

NetAI 2020

Method

u An end-to-end deep learning approach:u First stage – BGP2Vec – ASN Embeddingu Second stage –

u IP hijack detection using LSTM networks


7

NetAI 2020

Datasets

u Route Views1 BGP announcements (RV), collected in March 2018u 3,600,000 BGP paths

u 62,525 Ases

u 113,400 undirected AS links

u Labeled BGP routes:u Consists of approximately 2,648,900 standard routes

(’GREEN’) and 47,800 hijacked routes (’RED’)

u The labeling was generated by combinations of VF algorithms2 and manual work


8

1. University of Oregon, Route Views Project, http://www.routeviews.org, March 20182. [Shavitt et al. 2009] “Near-Deterministic Inference of AS Relationships” (ND-ToR)

NetAI 2020

http://www.routeviews.org/

BGP2VEC1 – ASN Embeding

u Based on Word Embedding (Word2Vec2), broadly used in NLPu Embedding = represent discrete variables as continuous vectorsu An ASN is characterized by its context, i.e., neighboring ASNsu V = 62,525, N= 32

An example with V= 4:


9

1. [Shapira and Shavitt 2020] “Unveiling the Type of Relationship Between Autonomous Systems Using Deep Learning” (BGP2Vec)2. [Mikolov et al. 2013] “Distributed representations of words and phrases and their compositionality” (Word2Vec)

NetAI 2020

Exploration of ASN Embedings


10

NetAI 2020

Training Neural Neyworks (IP Hijack)

u An LSTM Neural Network which is comprised of five layers

u Categorical cross entropy loss functionu Using the Adam gradient-based optimizer

with default hyper-parametersu We build and run our network usingu We train our network based on our labeled

datasetu We use 20% of the samples as a test setu We run our network for 10 epochsu Inference time: 0.1 milliseconds on a single

Intel CPU


11

NetAI 2020

Experiments and Results – IP Hijack

u IP hijack detection based on ASN embedingu 99.99% Accuracy, 0.00% FA

u 50% of our misclassified predictions were wrong, i.e., we found errors in the labeled dataset


12

NetAI 2020

Results on BGP data for Ground Truth Events

u The dataset1 contains 70 events from February 2008 to July 2018u with an average number of 669

AS paths per event

u We classified correctly all the events within 2 years of our training data, or 2/3 of all the valid events


13

1. [Cho et al. 2019] “BGP hijacking classification”, TMA

NetAI 2020

Summary

u A novel approach for ASN embedding using deep learning (BGP2VEC)u Unsupervised methodu Based only on BGP announcements without any side-informationu A building block for many problems

u Achieves excellent results for IP Hijack Detectionu Without any assumptions (no ‘VF’)u 99.99% Accuracy with 0.00% FA on our own proprietry datasetu Although our method was trained with a dataset from March 2018 , we classified

correctly 2/3 of past events, and al all recent hijack eventsu As far as we know, we are the first to employs deep learning for this problem.


14

NetAI 2020

Thank you for your attention.Questions?

8/18/20

A D

eep Lea

rning Ap

proa

ch for IP Hija

ck Detection Ba

sed

on ASN

Emb

edd

ing

15

https://www.eng.tau.ac.il/~shavitt/https://talshapira.github.io/

NetAI 2020

https://www.eng.tau.ac.il/~shavitt/

https://talshapira.github.io/

a deep learning approach for ip hijack detection based on ......a deep learning approach for ip...

Documents