security data mining using social network analysis

17
1 1 Social Network Analysis (SNA) for Intelligence and Security Informatics (ISI) DG.O 2006, May 23, 2006 Hsinchun Chen, Ph.D. McClelland Professor of MIS Director, Artificial Intelligence Lab NSF COPLINK Center University of Arizona 2 WTC, Pentagon attacks Afghanistan, Iraqi wars Bali, Madrid, London bombing Sunni, Shia, sectarian wars Jihad, E-Jihad Infectious diseases, bioagents, WMDs International, regional, cultural, religious conflicts, … Traditional crimes, cyber crimes, narcotics, gangs (MS 13), smuggling, domestic extremists (Oklahoma bombing), cyber security, … The Networked World After 9/11, 2001

Upload: rogerio-souza

Post on 01-Jul-2015

447 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Security data mining using social network analysis

1

1

Social Network Analysis (SNA) forIntelligence and Security Informatics (ISI)

DG.O 2006, May 23, 2006

Hsinchun Chen, Ph.D.McClelland Professor of MIS

Director, Artificial Intelligence LabNSF COPLINK CenterUniversity of Arizona

2

• WTC, Pentagon attacks• Afghanistan, Iraqi wars• Bali, Madrid, London bombing• Sunni, Shia, sectarian wars• Jihad, E-Jihad• Infectious diseases, bioagents, WMDs• International, regional, cultural, religious conflicts, …• Traditional crimes, cyber crimes, narcotics, gangs

(MS 13), smuggling, domestic extremists (Oklahoma bombing), cyber security, …

The Networked World After 9/11, 2001The Networked World After 9/11, 2001

Page 2: Security data mining using social network analysis

2

3

Intelligence and Security Informatics (ISI)

• development of advanced information technologies, systems, algorithms, and databases for international, national and homeland security related applications, through an integrated technological, organizational, and policy-based approach” (Chen et al., 2003a)

ISI: OverviewISI: Overview

4

A knowledge discovery research framework for ISIA knowledge discovery research framework for ISI

A knowledge discovery research framework for ISI

Page 3: Security data mining using social network analysis

3

5

Anacapa Chart (1st generation)

Association Matrix

Link chart

6

Analyst’s Notebook, Netmap, Watson (2nd

generation)

A n a l y s t ’ s N o t e b o o k . N e t w o r k n o d e s a r e a u t o m a t i c a l l y a r r a n g e d f o r e a s y i n t e r p r e t a t i o n . S o u r c e : i 2 , I n c .

N e t m a p . D i f f e r e n t c o l o r s a r e u s e d t o r e p r e s e n t d i f f e r e n t e n t i t y t y p e s . S o u r c e : N e t m a p A n a l y t i c s , L L C .

W a t s o n . R e l a t i o n s a m o n g a g r o u p o f p e o p l e ( t h e c e n t r a l s p h e r e ) b a s e d o n t e l e p h o n er e c o r d s . S o u r c e : X a n a l y s i s , L t d .

Page 4: Security data mining using social network analysis

4

7

A 9/11 Terrorist Network

8

The proposed framework

Page 5: Security data mining using social network analysis

5

9

Experiment

• Data Sets– TPD incident summaries

• Time period—Narcotics: 2000-present; Gangs: 1995-present• Size

– Two testing networks• Narcotics (60 individuals)• Gang (24 individuals) 1-10: 264

11-20: 2021-100: 4 2,595: 1

2894,376Gangs

1-10: 2,58711-20: 3121-100: 9502: 1

2,62812,842Narcotics

Size of sub-newtorks

# sub-networks

Total # individuals

10

The COPLINK SNA Project (The narcotic network example)

Switch between narcotic network and gang network

Show network and reset network

Adjust level of details

A point represents an individual labeled by his name

A line represents a link between two persons

A bubble represents a subgroup labeled by its leaders name

A line implies that some individuals in one group interact with some individuals in the other group. The thicker the link, the more individual interactions between the two groups

The size of a bubble is proportional to the number of individuals in the group

The rankings of the members of a selected group (green).

Page 6: Security data mining using social network analysis

6

11

The COPLINK SNA Project (The gang network example)

The leader

A clique

A gatekeeper

The reduced network structure

12

Patterns Found

• The chain structure of the narcotic network

• Implications: disrupt the network by breaking the chain

• The star structure of the gang network

• Implications: disrupt the network by removing the leader

Page 7: Security data mining using social network analysis

7

13

White gangs who involved in murders and shootings

White gangs who sold crack cocaine

A group of black gangs

Expert Validation

14

Adding Other Border Agencies

Page 8: Security data mining using social network analysis

8

15

Integration and Visualization Framework used in BorderSafe

Pima County Tucson Police

Jursidiction

Transactional RMS data is transformed into Coplink’s schema

Combined CriminalActivity Network

RMS data in varying structures

Person records are combined when they have the same first name, last name, and date of birth

Primary Information

Additional Data Sources

Secondary Information

Networks of criminal activity are extracted and augmented with border crossing information on associated vehicles

Network Visualization

Pima County Tucson Police

TPD/PCSD

Transactional RMS data is transformed into COPLINK’s schema

Combined CriminalActivity Network

RMS data in varying structures

Transformed data

Person records are combined when they have the same first name, last name, and date of birth

Primary Information

Additional Data SourcesCBP Crossings

Secondary Information

Networks of criminal activity are extracted and augmented with border crossing information on associated vehicles

Network Visualization

• TPD and PCSD data is transformed and integrated.

• CBP data is used to identify border crossing vehicles.

• Criminal Activity Networks (CANs) are extracted and visualized to depict relationships from both datasets

16

Key Statistics of TPD and PCSD Data

623,656Vehicles

2.84 millionRecorded Incidents1.35 millionIndividuals

Tucson Police Department

520,539Vehicles2.18 millionRecorded Incidents1.31 millionIndividuals

Pima County Sheriff’s Department

Page 9: Security data mining using social network analysis

9

17

Customs and Border Protection (CBP) Border Crossing Information

Plates issued in AZ130,195

Plates issued in Mexico90,466Plates issued in CA5,546

Days of information over an 18 month period209Distinct vehicles226,207Records: plate, state, date, time1,125,155

18

A Vehicle to Watch?

Violent crimesNarcotics crimesViolent & Narcotics

Shape Indicates Object Typecircles are peoplerectangles are vehicles

Color Denotes Activity History

Larger Size Indicates higher levels of activity

Border Crossing Plates are outlined in Red

Gang related

Page 10: Security data mining using social network analysis

10

19

Interesting Vehicles:Douglas

• The vehicle has 100+ police contacts and has crossed twice in 2005/10.

• All individuals and vehicles in the network have criminal roles.

• Majority crossings are at L261.

AZ-698LNarcotic Network

TPD/Pima Police Data

20

Interesting Vehicles: Douglas

• Many people related with the vehicle are involved in Drug offenses.

• Vehicle has crossed 4 times in August 2005 all at L261.

AZ-042RPDNarcotic Network

TPD/Pima Police Data

Page 11: Security data mining using social network analysis

11

21

Interesting Examples using TMIFrequent Crosser with Narcotics Connections

• Vehicle A and Vehicle B have a high TMI score.• Vehicle A has crossed 51 times in a 7 month period out of which

it crossed 22 times with Vehicle B.• The figure shows their pattern of crossing.

0

500

1000

1500

2000

Nov 11

Nov 17

Dec 19

Dec 21

Dec 29

Jan 6

Jan 6

Jan 6

Jan 15

Jan 19

Jan 26

Jan 31

Feb 27

Mar 5

Mar 5

May 18

May 18

May 25

May 28

May 30

Jun 9

June 17

< 2004 Dates 2005 >

Tim

e of

Day

Vehicle A Vehicle B

After dark / No fixed schedule

Experimental Results

22

A Deeper Look into Vehicles A and B

• Vehicle A is found to be a member of a narcotics network with many other vehicles. It has been arrested with its occupants for drug sales in Tucson.

• The advantage of triangulating information about the vehicle is clearly visible.

TPD Customs and Border Protection

0

500

1000

1500

2000

Nov 11

Nov 17

Dec 19

Dec 21

Dec 29

Jan 6

Jan 6

Jan 6

Jan 15

Jan 19

Jan 26

Jan 31

Feb 27

Mar 5

Mar 5

May 18

May 18

May 25

May 28

May 30

Jun 9

June 17

< 2004 Dates 2005 >

Tim

e of

Day

Vehicle A Vehicle B

Frequent Crossers at Night

TMI

Narcotics Network

Vehicle A Vehicle B

Experimental Results

Page 12: Security data mining using social network analysis

12

23

Dark Web Project Overview

24

Results - Hyperlink Diagram of U.S. Domestic Groups’ Websites to Identify Cybercommunities

Results - Hyperlink Diagram of U.S. Domestic Groups’ Websites to Identify Cybercommunities

Page 13: Security data mining using social network analysis

13

25

• The local density of each subgroup (cluster) is calculated and compared to the overall density of the network.

• The local densities are reasonably higher than the overall density of the network, which indicates that our initial categorization/clustering is valid.

Results - Validate the Categorization of the U.S. Domestic Extremist Groups’ Websites

Results - Validate the Categorization of the U.S. Domestic Extremist Groups’ Websites

0.16Christian Identity cluster

0.75Militia cluster

0.42White-Supremacy / Neo-Nazi cluster

0.41Neo-Confederate cluster

0.08Overall Network Density

Density

Network Component

26

Jihad Sympathizers

Palestinian extremist groups

Al-Qaeda linked Websites

Hizbollah

Tanzeem-e-Islami

Hizb-ut-Tahrir

Results - Hyperlink Diagram of Middle Eastern Extremist Groups’ Websites to Identify Cybercommunities

Results - Hyperlink Diagram of Middle Eastern Extremist Groups’ Websites to Identify Cybercommunities

Page 14: Security data mining using social network analysis

14

27

• The local density of each subgroup (cluster) is calculated and compared to the overall density of the network.

• The local densities are again higher than the overall density of the network, which indicates that our initial categorization/clustering is valid.

Results - Validate the Categorization of the Middle Eastern Extremists’ Websites

Results - Validate the Categorization of the Middle Eastern Extremists’ Websites

0.27Jihad Sympathizers

0.53Al-Qaeda linked Websites cluster

0.30Palestinian cluster

0.70Hizbollah cluster

0.70Hizb-ut-Tahrir cluster

0.099Overall Network Density

DensityNetwork Component

28

Propaganda (insiders)

Webring

E-conferencing

Message Board

Text Chat Room

ListservVirtual Community

News Reporting

References to Western Media

Coverage

Propaganda (outsiders)

Narratives of Operations and

Events

Banners and Seals

Leaders

Martyrs

Dates

Slogans

Low Level Attribute

High Level Attribute

Pin-pointing Enemies

Justification of the Use of Violence

Doctrine

MissionSharing Ideology

Support Groups

Charity

Donation

Fund Transfer

External Aid MentionedFundraising

Online Feedback Form

Multimedia

Telephone

EmailCommunications

Low Level AttributeHigh Level Attribute

Explicit Invitation to Join

Operations’Geographical Area

Recruitment and Training

Documentation of Previous

Operations

Recording or Videos from

Senior Members of the Group

Organization Structure

TacticsCommand and Control

Low Level Attribute

High Level Attribute

Approach - Content Analysis Coding SchemeApproach - Content Analysis Coding Scheme

Page 15: Security data mining using social network analysis

15

29

Results - Website Patterns for U.S. Domestic Groups Results - Website Patterns for U.S. Domestic Groups

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

BlackSeparatists

ChristianIdentity

Militia Neo-confederates

Neo-Nazis/White

Supremacists

Eco-Terrorism

Nor

mal

ized

Con

tent

Lev

els

Communications

Fundraising

Ideology

Propaganda(insiders)Propaganda(outsiders)Virtual Community

Command andControl

Recruitment andTraining

30

• For Eco-extremism and animal rights groups, they allocate more Website resources for “Communications” and “Command and Control”.

• Website content is associated with “Propaganda towards outsiders” and “Virtual Community”.

Results - Summary of Website Patterns for U.S. Domestic Groups Results - Summary of Website Patterns for U.S. Domestic Groups

Animal Liberation Front Forum

(http://www.webgroups.us/animalliberationfront/)

Page 16: Security data mining using social network analysis

16

31

Results - Website Patterns for Middle Eastern Extremist GroupsResults - Website Patterns for Middle Eastern Extremist Groups

00.10.20.30.40.50.60.70.80.9

1

Hizb-ut-Tahrir Hizbollah Al-Qaeda LinkedWebsites

Jihad Sympathizers Palestinian terroristgroups

Nor

mal

ized

Con

tent

Lev

els

Communications

Fundraising

Sharing Ideology

Propaganda (Insiders)

Propaganda(outsider)

Virtual Community

Command and Control

Recruitment and Training

32

Clandestine groups (e.g., Al-Qaeda) tend to emphasize “Propaganda towards outsiders” while the more established groups (e.g., Hizballah, Hamas) direct their propaganda towards insiders.

Results - Summary of Website Patterns for Middle Eastern Extremist GroupsResults - Summary of Website Patterns for Middle Eastern Extremist Groups

English site of “Supporters of Shareeah” a London based Salafi Group Many “Al-Qaeda” linked Websites have English mirrors critiquing the West.

Page 17: Security data mining using social network analysis

17

33

Results - Comparison of Resource Allocation in Middle Eastern and U.S. Domestic Extremist Groups

Results - Comparison of Resource Allocation in Middle Eastern and U.S. Domestic Extremist Groups

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Communications Fundraising Ideology Propaganda(insiders)

Propaganda(outsiders)

VirtualCommunity

Command andControl

Recruitment andTraining

Nor

mal

ized

Con

tent

Lev

els

Middle-EasternGroups

USDomesticGroups

34

For more information:

Hsinchun Chen

[email protected]

http://ai.arizona.edu