VASTA Unified Platform for Interactive Network Forensics
Matthias Vallentin1,2 Vern Paxson1,2 Robin Sommer2,3
1UC Berkeley
2International Computer Science Institute (ICSI)
3Lawrence Berkeley National Laboratory (LBNL)
March 17, 2016
USENIX NSDI
1 / 28
Omnipresent Data Breaches
2 / 28
Breach Timeline
Compromise Forensics
Detection
Time
3 / 28
Breach Timeline
Compromise
Detection
Time
3 / 28
Breach Timeline
Compromise
Detection
Time
?
3 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
Organization
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
?
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
?
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
?
4 / 28
Network Forensics — Characteristics
Interactive data explorationI Iterative query refinement
I High-dimensional search
Disparate data accessI Temporal
I Spatial
Massive data volumesI 50–100K events/sec
I 10s TBs/day
?
4 / 28
Log Example — Bro Connection Log
#separator \x09#set_separator ,#empty_field (empty)#unset_field -#path conn#open 2016-01-06-15-28-58#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_..#types time string addr port addr port enum string interval count count string bool bool count string1258531.. Cz7SRx3.. 192.168.1.102 68 192.168.1.1 67 udp dhcp 0.163820 301 300 SF - - 0 Dd 1 329 1 328 (empty)1258531.. CTeURV1.. 192.168.1.103 137 192.168.1.255 137 udp dns 3.780125 350 0 S0 - - 0 D 7 546 0 0 (empty)1258531.. CUAVTq1.. 192.168.1.102 137 192.168.1.255 137 udp dns 3.748647 350 0 S0 - - 0 D 7 546 0 0 (empty)1258531.. CYoxAZ2.. 192.168.1.103 138 192.168.1.255 138 udp - 46.725380 560 0 S0 - - 0 D 3 644 0 0 (empty)1258531.. CvabDq2.. 192.168.1.102 138 192.168.1.255 138 udp - 2.248589 348 0 S0 - - 0 D 2 404 0 0 (empty)1258531.. CViJEOm.. 192.168.1.104 137 192.168.1.255 137 udp dns 3.748893 350 0 S0 - - 0 D 7 546 0 0 (empty)1258531.. CSC2Hd4.. 192.168.1.104 138 192.168.1.255 138 udp - 59.052898 549 0 S0 - - 0 D 3 633 0 0 (empty)1258531.. Cd3RNm1.. 192.168.1.103 68 192.168.1.1 67 udp dhcp 0.044779 303 300 SF - - 0 Dd 1 331 1 328 (empty)1258531.. CEwuIl2.. 192.168.1.102 138 192.168.1.255 138 udp - - - - S0 - - 0 D 1 229 0 0 (empty)1258532.. CXxLc94.. 192.168.1.104 68 192.168.1.1 67 udp dhcp 0.002103 311 300 SF - - 0 Dd 1 339 1 328 (empty)1258532.. CIFDQJV.. 192.168.1.102 1170 192.168.1.1 53 udp dns 0.068511 36 215 SF - - 0 Dd 1 64 1 243 (empty)1258532.. CXFISh5.. 192.168.1.104 1174 192.168.1.1 53 udp dns 0.170962 36 215 SF - - 0 Dd 1 64 1 243 (empty)1258532.. CQJw4C3.. 192.168.1.1 5353 224.0.0.251 5353 udp dns 0.100381 273 0 S0 - - 0 D 2 329 0 0 (empty)1258532.. ClfEd43.. fe80::219:e3ff:fee7:5d23 5353 ff02::fb 5353 udp dns 0.100371 273 0 S0 - - 0 D 2 369 0 01258532.. C67zf02.. 192.168.1.103 137 192.168.1.255 137 udp dns 3.873818 350 0 S0 - - 0 D 7 546 0 0 (empty)1258532.. CG1FKF1.. 192.168.1.102 137 192.168.1.255 137 udp dns 3.748891 350 0 S0 - - 0 D 7 546 0 0 (empty)1258532.. CNFkeF2.. 192.168.1.103 138 192.168.1.255 138 udp - 2.257840 348 0 S0 - - 0 D 2 404 0 0 (empty)1258532.. Cq4eis4.. 192.168.1.102 1173 192.168.1.1 53 udp dns 0.000267 33 497 SF - - 0 Dd 1 61 1 525 (empty)1258532.. CHpqv31.. 192.168.1.102 138 192.168.1.255 138 udp - 2.248843 348 0 S0 - - 0 D 2 404 0 0 (empty)1258532.. CFoJjT3.. 192.168.1.1 5353 224.0.0.251 5353 udp dns 0.099824 273 0 S0 - - 0 D 2 329 0 0 (empty)1258532.. Cc3Ayyz.. fe80::219:e3ff:fee7:5d23 5353 ff02::fb 5353 udp dns 0.099813 273 0 S0 - - 0 D 2 369 0 0
5 / 28
Existing Solutions
MapReduce (Hadoop)
3 Scalability
7 Batch-oriented: no iterative, exploratory analysis
In-Memory Cluster Computing (Spark)
3 E�cient & complex analysis
7 Thrashing when working set does not fit in aggregate memory
6 / 28
Existing Solutions
MapReduce (Hadoop)
3 Scalability
7 Batch-oriented: no iterative, exploratory analysis
In-Memory Cluster Computing (Spark)
3 E�cient & complex analysis
7 Thrashing when working set does not fit in aggregate memory
6 / 28
Contribution
VASTVisibility Across Space and Time
ArchitectureI
Performance: concurrent & modular design
IScaling: intra-machine & inter-machine
ITyping: strong & rich
ImplementationI
Composition: high-level bitmap indexing framework
IAdaptation: fine-grained component flow-control
IAsynchrony: finite state machines for query execution
7 / 28
Contribution
VASTVisibility Across Space and Time
ArchitectureI
Performance: concurrent & modular design
IScaling: intra-machine & inter-machine
ITyping: strong & rich
ImplementationI
Composition: high-level bitmap indexing framework
IAdaptation: fine-grained component flow-control
IAsynchrony: finite state machines for query execution
7 / 28
Contribution
VASTVisibility Across Space and Time
ArchitectureI
Performance: concurrent & modular design
IScaling: intra-machine & inter-machine
ITyping: strong & rich
ImplementationI
Composition: high-level bitmap indexing framework
IAdaptation: fine-grained component flow-control
IAsynchrony: finite state machines for query execution
7 / 28
Outline
1. Architecture
2. Implementation
3. Evaluation
VAST Architecture — Single Machine
8 / 28
VAST Architecture — Single Machine
importer
archive
index
exporter
node
source sink
10.0.0.1 10.0.0.254 53/udp10.0.0.2 10.0.0.254 80/tcp
8 / 28
VAST Architecture — Ingestion
10.0.0.1 53/udp10.0.0.2 80/tcp…
source
type 10.0.0.1 53/udpmetatype 10.0.0.2 80/tcpmeta
generateevent batch
9 / 28
VAST Architecture — Ingestion
10.0.0.1 53/udp10.0.0.2 80/tcp…
source
type 10.0.0.1 53/udpmetatype 10.0.0.2 80/tcpmeta
generateevent batch
importer
assign IDs
9 / 28
VAST Architecture — Ingestion
10.0.0.1 53/udp10.0.0.2 80/tcp…
source
type 10.0.0.1 53/udpmetatype 10.0.0.2 80/tcpmeta
generateevent batch
importer
assign IDs
archive
compressbatch
9 / 28
VAST Architecture — Ingestion
10.0.0.1 53/udp10.0.0.2 80/tcp…
source
type 10.0.0.1 53/udpmetatype 10.0.0.2 80/tcpmeta
generateevent batch
importer
assign IDs
archive
compressbatch
index
9 / 28
VAST Architecture — Ingestion
10.0.0.1 53/udp10.0.0.2 80/tcp…
source
type 10.0.0.1 53/udpmetatype 10.0.0.2 80/tcpmeta
generateevent batch
importer
assign IDs
archive
compressbatch
index
10.0.0.2 80/tcp
append datato bitmap index
10.0.0.1 53/udp
type
9 / 28
VAST Architecture — Index
partition
index
partition partition
meta index
10 / 28
VAST Architecture — Index
partition
index
partition partition
meta index
conn
10.0.0.2 53/udp 8.8.4.4 53/udp “dns”
indexer
10 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
11 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
index
11 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
index
lookup bit vectorsfrom partitions
80/tcp==X
10.0.0.0/8inX
�
11 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
index
lookup bit vectorsfrom partitions
80/tcp==X
10.0.0.0/8inX
�
11 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
index
lookup bit vectorsfrom partitions
80/tcp==X
10.0.0.0/8inX
�
archive
locate & shipevent batch for ID
11 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
index
lookup bit vectorsfrom partitions
80/tcp==X
10.0.0.0/8inX
�
archive
locate & shipevent batch for ID
candidatecheck
decompressbatch
11 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
index
lookup bit vectorsfrom partitions
80/tcp==X
10.0.0.0/8inX
�
archive
locate & shipevent batch for ID
candidatecheck
decompressbatch
sink
11 / 28
VAST Architecture — Querying
exporter
X in 10.0.0.0/8||
X == 80/tcp
index
lookup bit vectorsfrom partitions
80/tcp==X
10.0.0.0/8inX
�
archive
locate & shipevent batch for ID
candidatecheck
decompressbatch
sink
10.0.0.1 53/udp10.0.0.2 80/tcp…
type 10.0.0.1 53/udpmetatype 10.0.0.2 80/tcpmeta
render results
11 / 28
VAST Architecture — Distributed
12 / 28
VAST Architecture — Distributed
12 / 28
VAST Architecture — Distributed
12 / 28
VAST Architecture — Distributed
12 / 28
VAST Architecture — Distributed
12 / 28
VAST Architecture — Distributed
12 / 28
VAST Architecture — Distributed
12 / 28
VAST Architecture — Distributed
12 / 28
Outline
1. Architecture
2. Implementation
3. Evaluation
Indexing Basics — Tree Indexes
13 / 28
Indexing Basics — Composition
( )� �
14 / 28
Indexing Basics — Composition
( )� �
14 / 28
Indexing Basics — Inverted Index
10 2 3 4 5 6 7 8 9
3
1
4
8
9
5
0
4
2
5
6
2
A B C D
15 / 28
Indexing Basics — Bitmap Index
0 1 1 0
1
10 2 3 4 5 6 7 8 9
0
0
1
1
0
1
2
3
4
5
0
1
0
0
0
0
0
1
0
1
0
0
1
0
0
A B C D
16 / 28
Indexing Basics — Bitmap Index
10 2 3 4 5 6 7 8 9
012345
A B C D
16 / 28
Indexing Basics — Bitmap Composition
17 / 28
Indexing Basics — Bitmap Composition
X 2 192.168.0.0/24 Y � 60s
17 / 28
Indexing Basics — Bitmap Composition
X 2 192.168.0.0/24 Y � 60s
�
17 / 28
Indexing Challenges
High-cardinality valuesI Represent millions of distinct values compactly
I Provide low-latency lookups
High-level operationsI Support type-specific operations
I Relational operators: {<, , =, 6=, �, >, 2, /2}
18 / 28
Query Language
Boolean ExpressionsI Conjunctions &&
I Disjunctions ||
I Negations !I Predicates
I LHS op RHSI (expr)
ExamplesI A && B || !(C && D)
I orig h in 10.0.0.1 && &time < now - 2h
I &type == "conn" || "foo" in :string
I duration > 60s && service == "tcp"
ExtractorsI &tag
I x.y.z
I :type
Relational OperatorsI <, <=, ==, >=, >
I in, ni, [+, +]
I !in, !ni, [-, -]
I ⇠, !⇠
ValuesI T, F
I +42, 1337, 3.14
I "foo"
I 10.0.0.0/8
I 80/tcp, 53/?
I {1, 2, 3}19 / 28
Data Model
TYPE
record
vector set
table
KEY VALUE
TYPETYPE
field 1
TYPE
field n
TYPE
…
container types
basic types
compound types
recursive types
bool
int
count
real
duration
time
string
pattern
address
subnet
port
none
20 / 28
Data Model
TYPE
record
vector set
table
KEY VALUE
TYPETYPE
field 1
TYPE
field n
TYPE
…
container types
basic types
compound types
recursive types
bool
int
count
real
duration
time
string
pattern
address
subnet
port
none
20 / 28
Bitmap Index for IP Addresses
192.168.0.42
21 / 28
Bitmap Index for IP Addresses
11000000.10101000.00000000.00101010
21 / 28
Bitmap Index for IP Addresses
11000000.10101000.00000000.00101010
21 / 28
Bitmap Index for IP Addresses
11000000.10101000.00000000.00101010
21 / 28
Bitmap Index for IP Addresses
X 2 192.168.0.0/27
21 / 28
Bitmap Index for IP Addresses
X 2 192.168.0.0/27
�
21 / 28
Outline
1. Architecture
2. Implementation
3. Evaluation
Data Set
Single-Machine
Data:
I 10M packets from a 24-hourtrace (5 fields/event)
I 3.4M derived Broconnection logs (20fields/event)
Machine:
I 2 ⇥ 8-core Intel Xeon CPUs
I 128GB RAM
I 4 ⇥ 3TB SAS 7.2K disks
I 64-bit FreeBSD
ClusterData:
I 1.24B Bro connection logs(152GB)
I Split into N slices for Nnodes
I N 2 [1, 24]
Nodes:
I 2 ⇥ 8-core Intel Xeon CPUs
I 12GB of RAM
I 2 ⇥ 500MB SATA disks
I 64-bit FreeBSD
22 / 28
Data Set
Single-Machine
Data:
I 10M packets from a 24-hourtrace (5 fields/event)
I 3.4M derived Broconnection logs (20fields/event)
Machine:
I 2 ⇥ 8-core Intel Xeon CPUs
I 128GB RAM
I 4 ⇥ 3TB SAS 7.2K disks
I 64-bit FreeBSD
ClusterData:
I 1.24B Bro connection logs(152GB)
I Split into N slices for Nnodes
I N 2 [1, 24]
Nodes:
I 2 ⇥ 8-core Intel Xeon CPUs
I 12GB of RAM
I 2 ⇥ 500MB SATA disks
I 64-bit FreeBSD
22 / 28
Queries
23 / 28
Performance – Index Latency
●
● ●
●
●
● ● ● ● ● ● ● ●●
● ●
0
2
4
6
8
10
12
14
16
4 8 12 16Cores
Late
ncy
(sec
onds
) Query● A
BCDEFGHI
24 / 28
Performance — Scaling
Import
●
●
●
●
●
●
●
0.5
1.0
1.5
2.0
5 10 15 20 25Nodes
1 / U
tiliz
atio
n
Export
●
●●
●●●
0.5
1.0
1.5
2.0
2.5
5 10 15 20 25Nodes
Late
ncy
(sec
onds
)
25 / 28
Details in the paper
26 / 28
Conclusion
Network Forensics ChallengesI Explorative high-dimensional search
I Disparate data access
I Massive data volumes
VAST: Visibility Across Space and TimeI Platform for network forensics
I Interactive & iterative search
IInter-machine and intra-machine scaling
I Open-source, permissive license (BSD)
27 / 28