air force institute of technologyair force institute of technology automatic generation of social...

28
Air Force Institute of Technology Automatic Generation of Social Network Data from Electronic- Mail Communications Jason W.S. Yee Robert F. Mills Gilbert L. Peterson Summer Bartczak Air Force Institute of Technology Wright-Patterson AFB OH [email protected]

Upload: others

Post on 20-Feb-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

Air Force Institute of Technology

Automatic Generation of Social Network Data from Electronic-

Mail Communications

Jason W.S. YeeRobert F. Mills

Gilbert L. PetersonSummer Bartczak

Air Force Institute of TechnologyWright-Patterson AFB OH

[email protected]

Page 2: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

BackgroundSystemExperimentConclusionsFuture Research

Overview

Page 3: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Social Network Analysis (SNA)

Blend of Psychology and SociologyWho You Know vs What You KnowHuman Behavior PatternsCollaborationPersonnel Security

Page 4: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Social Networking Relationships

Social Network PerspectiveImportance of with whom relationsGraph Theory

Page 5: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

SNA Capabilities

Social CapitalSpread of EpidemicsOrganizational Network AnalysisCovert Terrorist NetworksCommunications Networks

Page 6: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Lack of Social Network Data

Relatively New FieldAvailable Data is Limited, OldMethods

Surveys, Observation, Archived RecordsCostly, Time consuming

Objective:Gather data in automated fashionDesire for longitudinal studyChanges in social network structures

Page 7: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Research Goals

Create a system that uses automated methods of generating usefulsocial network dataAutomated:

E-mail, instant messaging, web browsing, web logs, online forums…anything that has logging capabilityOur focus: e-mail

Evaluate the execution timeliness of the system and usability ofgenerated social network dataCollect more data, at reduced cost, in shorter time

Raw Data Social Network Data

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Social Network DataSocial Network DataGeneratorGenerator

Page 8: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Solution Limitations and Scope

E-mail Log Data usedNational Center for Supercomputing Applications (NCSA) formatted e-mail logsSimple mail transfer protocol (SMTP)Collected by organization’s e-mail serversRaw data: timestamp, sender, recipientDerived data: status of users (internal/external), number of recipients (one-to-many, one-to-one, etc.)

Out of Research ScopeContent and Subject of EmailValidity of Social Network DataMethod and Meaning of Social Network Analysis

Page 9: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

System Components

Assign UIDs to Users – AttributionSanitize Logs – PrivacyProcess Logs for Database – Mine for DataDatabase Functions – Extract and Format

EE--mailmailData fromData fromMicrosoftMicrosoftExchangeExchange Social Network Social Network

DataData

SanitizeLogs

ProcessLogs forDatabase

Assign UIDsTo Users

DatabaseFunctions

Page 10: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

SMTP LogSMTP LogSanitizerSanitizer

SMTP Logs

SMTP LogSMTP LogParserParser

ProxyListProxyListTo UIDTo UID

proxy.csv

uidlist.csv(updated)

uidlist.csv

ProcessedSMTP Logs

SanitizedSMTP Logs

Implementation Overview

DatabaseDatabaseImport/QueryImport/Query Social Network DataSocial Network Data

ProxyListToUID: Assign UIDs to Users

SMTPLogSanitizer: Sanitize Logs

SMTPLogParser: Process Logs

Database Functions: Create Social Network Data

Page 11: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

SMTP LogSMTP LogSanitizerSanitizer

SMTP Logs

SMTP LogSMTP LogParserParser

ProxyListProxyListTo UIDTo UID

proxy.csv

uidlist.csv(updated)uidlist.csv Processed

SMTP Logs

SanitizedSMTP Logs

Implementation Overview

Social Network Social Network DataData

DatabaseDatabaseImport/QueryImport/Query

Page 12: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Privacy Protection

Sanitization may be performedApplication dependent

Sensitive FilesInitial Proxy List, Raw SMTP Logs, UID List

Sanitized Data can be de-sanitized

SMTP LogSMTP LogSanitizerSanitizer

SMTP Logs

SMTP LogSMTP LogParserParser

ProxyListProxyListTo UIDTo UID

proxy.csv

uidlist.csv(updated)uidlist.csv Processed

SMTP Logs

SanitizedSMTP Logs

Social Network Social Network DataData

DatabaseDatabaseImport/QueryImport/Query

Page 13: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

ProxyListToUID

0,[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]

Yee Jason 2dLt AFIT/ENG,[email protected],"X400:c=US;a= ;p=AFIT;o=HANGAR;s=Yee;g=Jason;smtp:[email protected]:[email protected]""Smith John A Civ AFIT/SC",[email protected],"X400:c=US;a= ;p=AFIT;o=HANGAR;s=Smith;g=John;smtp:[email protected]:[email protected]:[email protected]:[email protected]"

Resolves Aliases to Unique Identification Numbers (UIDs)Action Attribution

SMTP LogSMTP LogSanitizerSanitizer

SMTP Logs

SMTP LogSMTP LogParserParser

ProxyListProxyListTo UIDTo UID

proxy.csv

uidlist.csv(updated)uidlist.csv Processed

SMTP Logs

SanitizedSMTP Logs

Social Network Social Network DataData

DatabaseDatabaseImport/QueryImport/Query

Page 14: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

SMTPLogSanitizer

SMTP Log UIDList Before...-?TO:<[email protected]>... ......10 FROM:<[email protected]>... 15,[email protected]?TO:<[email protected]>... 15,[email protected]

Sanitized SMTP Log UIDList After...-?TO:<[email protected]>... ......10 FROM:<[email protected]>... 15,[email protected]?TO:<[email protected]>... 15,[email protected]

216,[email protected]

Replaces username portion of email addresses with UIDsExternal parties can determine insider/outsider status

SMTP LogSMTP LogSanitizerSanitizer

SMTP Logs

SMTP LogSMTP LogParserParser

ProxyListProxyListTo UIDTo UID

proxy.csv

uidlist.csv(updated)uidlist.csv Processed

SMTP Logs

SanitizedSMTP Logs

Social Network Social Network DataData

DatabaseDatabaseImport/QueryImport/Query

Page 15: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

SMTPLogParser

129.92.1.65 - OutboundConnectionCommand[09/Dec/2004:08:01:17 -0500] "MAIL-?FROM:<[email protected]> SMTP" 0 4129.92.1.65 - OutboundConnectionResponse[09/Dec/2004:08:01:17 -0500] "--?250 2.1.0 [email protected] OK SMTP" 0 43129.92.1.65 - OutboundConnectionCommand[09/Dec/2004:08:01:17 -0500] "RCPT-?TO:<[email protected]> SMTP" 0 4129.92.1.65 - OutboundConnectionCommand[09/Dec/2004:08:01:17 -0500] "RCPT-?TO:<[email protected]> SMTP" 0 4

Date Time SUID RUID SI RI NR2004/12/09 08:01:17 2718 1828 1 1 22004/12/09 08:01:17 2718 4590 1 0 2

Process Sanitized DataDerive Internal Status and Number of Recipients

SMTP LogSMTP LogSanitizerSanitizer

SMTP Logs

SMTP LogSMTP LogParserParser

ProxyListProxyListTo UIDTo UID

proxy.csv

uidlist.csv(updated)uidlist.csv Processed

SMTP Logs

SanitizedSMTP Logs

Social Network Social Network DataData

DatabaseDatabaseImport/QueryImport/Query

Two connections made

Page 16: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Database Functions

Source Dest TieStrength162 272 2272 162 1272 314 1314 162 2314 272 1314 426 1...

Date Time SUID RUID SI RI NR2004/12/09 07:52:31 162 272 0 1 12004/12/09 08:01:17 272 162 1 1 22004/12/09 08:01:17 272 314 1 0 22004/12/09 08:01:18 314 162 1 1 32004/12/09 08:01:18 314 426 1 0 32004/12/09 08:01:18 314 272 1 0 32004/12/09 08:02:57 162 272 0 1 12004/12/09 08:11:33 314 162 1 1 1

Import processed logsGenerate social network data from logs

SMTP LogSMTP LogSanitizerSanitizer

SMTP Logs

SMTP LogSMTP LogParserParser

ProxyListProxyListTo UIDTo UID

proxy.csv

uidlist.csv(updated)uidlist.csv Processed

SMTP Logs

SanitizedSMTP Logs

Social Network Social Network DataData

DatabaseDatabaseImport/QueryImport/Query

Page 17: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Experimental Methodology

Operational TestingAdd Months Sequentially

Use data gathered from AFITOct-Dec 2004

ParametersXP Professional SP2Pentium 4, 3.2 GHz HT, 2 GB RAMJava 1.5MySQL 4.1

Direct measurement

Page 18: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Workload

Proxy List from Active Directory SMTP logs from AFIT servers

Medium-sized organization (over 1500 users)86 days, over 3 GB of data1.8 million email messages, 3 million connections1550 internal actors

Page 19: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Timeliness Results

FastBottlenecks

SanitizationParsing (parallelizable)

Page 20: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Usability Results

UCINet ReadableCentrality, Power, and Group Statistics Taken

Page 21: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Usability Results

NetDraw Visualization Tool

Page 22: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Impact

ImmediateTime and Cost ReductionMore Data for Social Network Analysis

SanitizationMore Information for Managers

Long TermLong term studiesUnderstanding EmployeesPotential Insider Threat CharacterizationPossible Insider Threat Mitigation

Page 23: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Limitations

Presumes that e-mail behavior adequately captures social interactions

User “consent” to monitoringMost organizations have an acceptable use policyMay routinely monitor employee communicationsHowever, use of the system is somewhat voluntary

Page 24: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Data CollectionExtend to Other CMC Records – IM, Blogs, web browsingContent of messages (subject, nature, contents, “reply-to”

Analyze Social Network DataValidation – does e-mail actually capture what we are hoping?Analysis of collected data – longitudinal studiesInsider threat research, staff collaboration,

ToolsParallelization

Future Research

Page 25: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Backup Slides

Page 26: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Insider Threat Mitigation

Intrusion Detection SystemsMisuse Detection

HoneypotsDetect Attacks on False DataStudying Attackers

PreventionLeadership/ManagementPoliciesDeterrence

Combination of the aboveDefense in depth

Page 27: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

System Knowledge, Privileges

No attribute-based profileGender, age, background, marital status, position, incomeNo technical expertise needed

TrendsPlanning PreventableBehavior Change

Behavior-based Profiling

Insider Threat Profile

Page 28: Air Force Institute of TechnologyAir Force Institute of Technology Automatic Generation of Social Network Data from Electronic-Mail Communications Jason W.S. Yee Robert F. Mills Gilbert

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Potential Insider Threat Mitigation

Social Network Data CollectionSocial Network Data CollectionSocial Network Analysis Of Real DataSocial Network Analysis Of Real DataUnderstanding Employee BehaviorInformation Tool for Managers Insider Threat Mitigation