wiopt v7 [兼容模式] - xiamen university

Liang XiaoXiamen University

International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt)

ShanghaiMay 7, 2018

Game-theoretic Methods for Modeling and Defending Against Advanced Persistent Threats in Wireless Systems

Outline Background & motivations Game theoretic study on APT defense APT defense game model Prospect theory based study on APT Colonel Blotto game based study on APT Dynamic APT defense game

Reinforcement learning based APT defense solution

Conclusion

2

Slingshot APT used malware to spy on international targets from 2012 until Feb. 2018

Lazarus APT stole $81 million from Central Bank of Bangladesh in 2016

$31 million was stolen from Russian central bank in 2016 500 million Yahoo user accounts were leaked in 2016

APTs in Cyber Systems

3 [Threat Landscape Survey'17]

APT Attacks Target the objective Infiltrate malware onto the target

system Control by command-and-control

server (installing other malware) Spread the attack onto several

systems Exfiltrate target data from the

victim's network Cover tracks to maintain access to

the victim's system over a long time

4

Another APT Example Reconnaissance: Leverage information from a variety

of factors to understand their target Incursion: Use social engineering to deliver targeted

malware Discovery: Stay “low and slow”, map the defenses from

the inside and create a battle plan Capture information over an extended period, install

malware to secretly acquire data or disrupt operations Exfiltration: Send information to the attacker

5

Hackers cover their tracks to access the data for long time without being detected

APT attackers aim to steal data rather than to break the target network or crash all the data

APT attackers use multiple sophisticated attack methods Defense against APTs: Detect APTs as early as possible to minimize the damages

instead of keeping attackers out

Challenges to Detect APTs

6



Conclusion

7

APT attackers motivated by their hunger for money and secrets

APTs are stealthy, continuous, sophisticated and long-term Game theory can capture the long-term continuous

interaction between the APT attacker and the defender on the system resources Model the interactions between the APT attackers and

defenders Understand the fundamental tradeoffs between the security

risk and the APT defense cost Help design APT defense strategies

Game Theoretic Study on APTs

8

APT Defense Game APT game between an attacker choosing its attack path and a defender

selecting its best response according to the attack classification result [Fang’14]

Stealthy attacks game with asymmetric feedback model under limited attack or protection resources [Zhang’15]

Cyber-physical signaling game between a cloud and a mobile device, and a Flipit game between an APT attacker and a cloud defender [Pawlick’15]

Three-player Stackelberg game, in which a defender as the leader addresses both APTs and insider attacks [Feng’15]

Zero-sum matrix game against APT movements without being aware of the adversary model [Rass’17]

Dynamic APT game between an attacker choosing its attack resouces and a defender determining its prevention and recovery strength [Yang’17]9

FlipIt Game: Non-zero-sum game between an APT attacker and a defender competing for a cyber resource [Dijk’13] Attacker: whether to compromise the cyber system Defender: whether to restore the cyber system Goal: maximize its utility that increases with the resource

control time and decreases with the attack/defense cost

FlipIt Game

10

APT Defense Game Model APT attacker chooses its time interval to launch APTs Defender chooses its time interval to scan the devices Random duration to complete an APT attack Safe time for the data stored on storage device i:

11

1,/)(min iii xzy

APT Defense Game with Pure Strategy Normalized attack/defense interval over each device Random attack durations zi quantized into L non-zero

levels, each with probability Limited overall attack/defense computation resources Expected utilities of the defender and the attacker

Both players choose their policies to maximize their own expected utilities

12

S

i

L

lii

i

iil

EUTD Gx

LxlLyPU

1 01,min),( yx

S

ii

L

lii

i

iil

EUTA Cxy

LxlLyPU

1 0

1,min),( yx

Gain of a longer scan interval

Attack cost

, 0ilP l L

Defense Game with Mixed Strategy Defender chooses its detection interval quantized into M

levels with a distribution Attacker quantizes its non-zero attack interval into N levels

and chooses the attack distribution Known attack duration z Expected utilities of the defender and APT attacker:

13

Mmmp 1][p

Nnnq 0][q

CMm

Nn

MmzNnqpU

MmG

MmzNnqpU

n

M

m

N

nm

EUTA

n

M

m

N

nm

EUTD

1,/

/min),(

1,/

/min),(

1 0

1 0

qp

qp

NE of the Game Nash equilibrium (NE) of a game: No player can

increase its expected utility by unilaterally leaving the NE strategy Based on the expected utility theory (EUT)

NE of the APT defense game with pure-strategy:

14

* *

* *

1 1

arg max ( , )

arg max ( , )

,

0 1, 0 1, 1

EUTD

EUTA

S S

i x i yi i

i i

U

U

x T y T

x y i S

x

y

x x y

y x y



Conclusion

15

PT-based APT Defense Game EUT-based study on APT defense deviates from real-

life decision-making due to the subjectivity of attackers under uncertainties

Prospect theoretic study of the cloud storage defense against subjective APT attackers Model the decision making of a subjective attacker under

uncertainties Pure-strategy: Uncertain time to hack a storage device Mixed-strategy: Uncertain action of the opponent

16

Prospect Theory

17

Expected utility theory (EUT) cannot explain the deviations due to end-user subjectivity

Prospect theory (PT) [Kahneman,Tversky’79] as a Nobel prize winning theory explains the deviations in monetary transactions: People usually over-weigh low probability outcomes and

underweigh their outcomes with high probabilities Loss looms larger than gains

Prospect theory has recently been applied in many contexts: Social sciences [Gao’10] [Harrison’09] [Tanaka’16] Communication networks [Li’14] [Yu’14] [Yang’14] [Lee’15] Smart energy management [Wang’14] [Xiao’14]

Allais’s Paradox

18

High Probability

GainLose

95% chance to win $10,000 100% chance to win $9,499

0.95*10000>9499Low Probability

5% chance to win $10,000 100% chance to win $501

0.05*10000<501

High Probability 95% chance to lose $10,000 100% chance to lose $9,499

0.95*10000>9499Low Probability

5% chance to lose $10,000 100% chance to lose $501

0.05*10000<501

Fear of disappointment. Risk averse.

Hope to avoid loss. Risk seeking

Fear of large loss. Risk averse

Hope of large gain.Risk seeking

Probability weighting function models the subjectivity of a player Subjective probability for a player to weigh the outcome

with a probability p S-shaped and asymmetrical, ranging in [0,1] Objective weight decreases with the player’s subjective

evaluation distortion Prelec function [Prelec’98]:

Probability Weighting Functions

19

w p exp ln p

, 1

－α=1－α=0.5

Related Work PT-based channel access game between two subjective end-

users in wireless network [Li’12] Wireless operator invests spectrum to users under uncertain

spectrum supply using PT [Yu’14] PT-based random access game between two users choosing

the transmission probabilities on a radio channel [Li’14] Stackelberg game between the SP offering the bandwidth

and subjective end-users to choose services [Yang’14] PT-based microgrids energy trading game in smart grids

[Xiao’15]

20

PT-based ATP Detection Game PT-based APT detection game: Model the decision making

of a subjective attacker under uncertainties Uncertain attack durations Uncertain defense policies

21

Subjective APT Game Model Subjective storage defense game between the attacker

and defender with pure strategies under uncertain attack duration zi The attacker weighs the attack outcome with subjective

probabilities , and PT-based utilities:

22

S

i

L

lii

i

iilD

PTD Gx

LxlLyPwU

1 01,min),( yx

S

ii

L

lii

i

iilA

PTA Cxy

LxlLyPwU

1 0

1,min),( yx

, 0iA lw P l L Si 1

NE of the PT-based APT Defense Game Best response of the player in terms of the PT-based utility, if

the opponent uses the NE strategy

23

* *

* *

1 1

argmax ( , )

argmax ( , )

,

0 1,0 1, 1

PTD

PTA

S S

i x i yi i

i i

U

U

x T y T

x y i S

x

y

x x y

y x y

Example of the NE of the PT-based Game

24

The NE of the subjective APT game with S=1 and L=2 is given by

1 0

* *1 0 1

0 1

(0.5,0), exp ln , C exp ln

( , ) (1,0), exp ln , C exp ln 0.5exp ln

(1,1), C exp ln 0.5exp ln

D A

D A A

A A

G P P

x y G P P P

P P

NE of the PT-based APT defense Game

25

PT-based APT Game with Mixed-Strategy Each player holds subjective view on the opponent’s strategy PT-based utilities:

NE of the game with mixed-strategy:

26

CMm

Nn

MmzNnqpwU

MmG

MmzNnqwpU

n

M

m

N

nmA

PTA

nD

M

m

N

nm

PTD

1,/

/min),(

1,/

/min),(

1 0

1 0

qp

qp

*10

1 ,0

*

01 ,0

*

1

*

1

,

,

1, 0,

1, 0,

0, 0

T

D D k D Nk Nm M n N

TT

A A k A Mk Mm M n N

M

m mm

N

n nn

D A

m nu w qM N

m nu w pM N

p p m

q q n

1

1

Lagrange parameters

is a η-dimensional all 1 column vector1

If the NE of the

subjective game with mixed-strategy and two interval

levels is given by

27

,1

1,10,10,5.01,5.0and1

1,5.01,10,10,5.0

AA

AA

DD

DD

uuuu

uuuu

0ln1ln

1,5.01,10,10,5.0ln

0ln1ln1,10,1

0,5.01,5.0ln

*0

*0

*1

*1

DD

AA

qquu

uu

ppuuuu

DD

DD

AA

AA

Example of the NE of the PT-based Game with Mixed- Strategy

ProofKarush-Kuhn-Tucker (KKT) optimality conditions:

28

Then apply the complementary slackness to obtain:

01

1,0,0,0

0

)1(),(

1

11

*

M

m

im

im

im

im

im

im

D

M

m

im

im

M

m

im

PTDD

p

MmpppL

ppUL

qp

0

1

1,0)(),(

1

*

0

iD

M

m

im

iD

inD

N

n

iD

p

MkqNn

Mku

29

NE of the PT-based APT Defense Game with Mixed Strategy

D. Xu, et al., "Prospect Theoretic Study of Cloud Storage Defense Against Advanced Persistent Threats," IEEE Global Commun. Conf. (GLOBECOM), 2016.

Value distortion function models the framing effect of the subjective decisions Evaluate alternatives gains and losses in terms of the frame

of reference, Steeper for losses than for gains if loss aversion coefficient Risk aversion/seeking coefficient decreases with the player’s

subjective value evaluation distortion

Value distortion function:

Value Distortion Functions

30

o.w.,)()(,)()(

0

00

uUuvUuUuuv

0U1

-8 -6 -4 -2 0 2 4 6 8-8

-6

-4

-2

0

2

4

6

8

Objective value

Subj

ectiv

e va

lue

Value Distortion Function

=0.6, =0.6

=1, =1

Reordered attack duration probability

Utility in the CPT-based Game

31

1

1 1

1

1

000

0),(

L

Kl

L

liiS

L

liiS

lSSLS

LSS

K

l

l

iiS

l

iiS

lSSSSS

CPTS

pwpwuvpwuv

pwpwuvpwuvyxU

1

1 1

1

1

000

0),(

L

Kl

L

liiA

L

liiA

lAALS

LAS

K

l

l

iiA

l

iiA

lAAAAA

CPTA

qwqwuvqwuv

qwqwuvqwuvyxU

CPT-value of losses

CPT-value of gains

LlAS zyxu 0/ ,,Reorderedin an ascending order

NE of the CPT-based Game

32

D. Xu, et al., "Cumulative Prospect Theoretic Study of a Cloud Storage Defense Game Against Advanced Persistent Threats," IEEE INFOCOM -BigSecurity, 2017.



Conclusion

33

Colonel Blotto Game

34

Colonel Blotto game (CBG): Each Colonel has limited resources Powerful tool to study the strategic resource allocation in a

competitive environment

Related Work CBG-based phishing game in terms of the detect-and-

takedown defense against phishing attacks [Chia’11] CBG based anti-jamming communication game in terms of

the power allocation over multiple channels in cognitive radio networks [Wu’12]

CBG based anti-jamming communication game in heterogeneous Internet of Things [Labib’15]

CBG based spectrum allocation game among multiple network service providers [Hajimirsadeghi’16]

35

CBG-based CPU Allocation Game

36

Pure-strategy NE does not always exist of CBG

Data stored in storage device i at time k is Bi(k)

Data protection level: Normalized size of “safe” data that protected by the defender:

1

1 sgnD

k ki i ik

iR B M N

B

APT Defense Game with Mixed Strategy

37

Each player chooses its CPU allocation with randomness to fool the opponent

Expected utility of the defender/attacker at time k

1, , sgn

Dk k k

D A i i ii

U U E B M N

M x

N yx y x y

k, iPr =k

i jx M j k, iPr =k

i jy N j

, 1 ,0

=M

k ki j i D j S

x

x

, 1 ,0=

N

k ki j i D j S

y

y

Mixed strategy

NE of the CBG Based APT Game

38

600 750 900 1050 12000.75

0.8

0.85

0.9

0.95

Total number of defense CPUs, SM

Dat

a pr

otec

tion

leve

l, R

D=20D=40D=80

600 750 900 1050 120010

20

30

40

50

60

70

80

90

Total number of defense CPUs, SMU

tility

of t

he d

efen

der

D=20D=40D=80

D storage devices and SM defense CPUs against an APT attacker with 150 CPUs

M. Min, et al., "Defense Against Advanced Persistent Threats: A Colonel Blotto Game Approach," IEEE International Conference on Communications (ICC), 2017.



Conclusion

39

Dynamic APT Defense Game Repeated interactions between the defender and the APT

attacker Choose the attack/scan interval/# CPUs Defender usually does not know the attack model

40

Dynamic APT defense game can be viewed as a Markov decision process (MDP)

APT Defender

Markov decision process

StateDefense strategy

Utility

New state



Conclusion

42

Reinforcement Learning

43

Agent uses reinforcement learning such as Q-learning to choose the policy without knowing the system model in a dynamic APT defense game Achieve the optimal defense policy via trials-and-errors after a

long enough time in a finite-state MDP

Action

RewardLearner Current

state Environment

Next state

MDP

Previousstate

MDP

Reinforcement Learning Algorithms

44

Defender applies Q-learning to choose the defense interval without knowing the attack and network model in the dynamic game Q-learning: A model-free reinforcement learning for an

agent to derive its optimal strategy via trials State: Previous attack duration Q-function: Expected long-term discounted utility of taking

a given action at a given state Use epsilon-greedy to make tradeoff between exploration

and exploitation in the learning process

Q-learning Based APT Detection

45

Q-Learning Based APT Detection

46

47

Number of non-zero attack interval levels is 5, attack cost is 0.4, defense gain is 0.6, and the objective weight of the attacker/defender is 0.8/1

Attack Rate

Utility of the Defender

48

L. Xiao, et al., "Cloud Storage Defense Against Advanced Persistent Threats: A Prospect Theoretic Study," IEEE Journal on Selected Areas in Communications, 2017.

Hotbooting PHC based APT Detection Hotbooting: Exploit experiences in similar scenarios to

initialize the learning parameters at the beginning period and save the useless explorations

PHC: An extension of Q-learning for the mixed-strategy game uses randomness to fool the APT attacker

Choose the detection interval according to a mixed-strategy table updated with

49

', if =arg max Q , '

, , , o.w.

1

k

xk k

x s xs x s x

x

x

H-PHC Based APT Detection Interval

50

PHC Based CPU Allocation Against APT

51

APT Detection Performance

52

Attack cost is 0.51, APT detection gain is 0.4, objective weight and risk aversion coefficient of the security agent are 1, objective weight of the attacker is 0.3, risk seeking coefficient of the attacker is 0.6, and the reference utility of the attacker/security agent is 0

Data Protection Level

53 L. Xiao, et al., “Attacker-Centric View of a Detection Game Against Advanced Persistent Threats,” IEEE Tran. Mobile Computing, 2018.

DQN-based CPU Allocation Against APT

Deep Q-network (DQN)-based CPU allocation scheme uses deep learning techniques to compress the state space observed by the defender and further accelerate the learning speed

Use CNN to estimate the long-term expected reward of each CPU allocation policy for given state

CNN parameters are updated via minibatch using the stochastic gradient descent method

54

Data Protection Level

56

3 storage devices and 16 defense CPUs against an APT attacker with 4 attack CPUs, and the date size of each device changes every 1000 time slots

Utility of the Defender

57

Conclusion Game theoretic study on APT defense provides insights on the

design of secure wireless networks PT-based APT defense game shows how the subjectivity of an APT attacker

under uncertainties impacts on the network security CBG-based APT defense game investigates how to efficiently allocate CPUs

to scan the storage devices to detect APTs

Reinforcement learning based APT defense strategies achieve the optimal detection performance in the dynamic APT defense games PHC-based defense uses random action selection to fool the attackers DQN-based defense uses CNN to compress the state space and thus accelerate

the learning speed and

Future work Improve the game model by incorporating more APT details Accelerate the learning speed of the RL-based APT defense strategies

58

59

Questions?

[email protected] on

http://lxiao.xmu.edu.cn

wiopt v7 [兼容模式] - xiamen university

Documents