air force institute of technologyair force institute of technology automatic generation of social...

Post on 20-Feb-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Air Force Institute of Technology

Automatic Generation of Social Network Data from Electronic-

Mail Communications

Jason W.S. YeeRobert F. Mills

Gilbert L. PetersonSummer Bartczak

Air Force Institute of TechnologyWright-Patterson AFB OH

robert.mills@afit.edu

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

BackgroundSystemExperimentConclusionsFuture Research

Overview

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Social Network Analysis (SNA)

Blend of Psychology and SociologyWho You Know vs What You KnowHuman Behavior PatternsCollaborationPersonnel Security

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Social Networking Relationships

Social Network PerspectiveImportance of with whom relationsGraph Theory

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

SNA Capabilities

Social CapitalSpread of EpidemicsOrganizational Network AnalysisCovert Terrorist NetworksCommunications Networks

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Lack of Social Network Data

Relatively New FieldAvailable Data is Limited, OldMethods

Surveys, Observation, Archived RecordsCostly, Time consuming

Objective:Gather data in automated fashionDesire for longitudinal studyChanges in social network structures

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Research Goals

Create a system that uses automated methods of generating usefulsocial network dataAutomated:

E-mail, instant messaging, web browsing, web logs, online forums…anything that has logging capabilityOur focus: e-mail

Evaluate the execution timeliness of the system and usability ofgenerated social network dataCollect more data, at reduced cost, in shorter time

Raw Data Social Network Data

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Social Network DataSocial Network DataGeneratorGenerator

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Solution Limitations and Scope

E-mail Log Data usedNational Center for Supercomputing Applications (NCSA) formatted e-mail logsSimple mail transfer protocol (SMTP)Collected by organization’s e-mail serversRaw data: timestamp, sender, recipientDerived data: status of users (internal/external), number of recipients (one-to-many, one-to-one, etc.)

Out of Research ScopeContent and Subject of EmailValidity of Social Network DataMethod and Meaning of Social Network Analysis

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

System Components

Assign UIDs to Users – AttributionSanitize Logs – PrivacyProcess Logs for Database – Mine for DataDatabase Functions – Extract and Format

EE--mailmailData fromData fromMicrosoftMicrosoftExchangeExchange Social Network Social Network

DataData

SanitizeLogs

ProcessLogs forDatabase

Assign UIDsTo Users

DatabaseFunctions

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

SMTP LogSMTP LogSanitizerSanitizer

SMTP Logs

SMTP LogSMTP LogParserParser

ProxyListProxyListTo UIDTo UID

proxy.csv

uidlist.csv(updated)

uidlist.csv

ProcessedSMTP Logs

SanitizedSMTP Logs

Implementation Overview

DatabaseDatabaseImport/QueryImport/Query Social Network DataSocial Network Data

ProxyListToUID: Assign UIDs to Users

SMTPLogSanitizer: Sanitize Logs

SMTPLogParser: Process Logs

Database Functions: Create Social Network Data

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

SMTP LogSMTP LogSanitizerSanitizer

SMTP Logs

SMTP LogSMTP LogParserParser

ProxyListProxyListTo UIDTo UID

proxy.csv

uidlist.csv(updated)uidlist.csv Processed

SMTP Logs

SanitizedSMTP Logs

Implementation Overview

Social Network Social Network DataData

DatabaseDatabaseImport/QueryImport/Query

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Privacy Protection

Sanitization may be performedApplication dependent

Sensitive FilesInitial Proxy List, Raw SMTP Logs, UID List

Sanitized Data can be de-sanitized

SMTP LogSMTP LogSanitizerSanitizer

SMTP Logs

SMTP LogSMTP LogParserParser

ProxyListProxyListTo UIDTo UID

proxy.csv

uidlist.csv(updated)uidlist.csv Processed

SMTP Logs

SanitizedSMTP Logs

Social Network Social Network DataData

DatabaseDatabaseImport/QueryImport/Query

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

ProxyListToUID

0,Jason.Yee@afit.edu0,jyee@afit.edu1,John.Smith@afit.edu1,jasmith@afit.edu1,jsmith1@afit.edu1,John.Smith.1@afit.edu

Yee Jason 2dLt AFIT/ENG,Jason.Yee@afit.edu,"X400:c=US;a= ;p=AFIT;o=HANGAR;s=Yee;g=Jason;smtp:jyee@afit.eduSMTP:Jason.Yee@afit.edu""Smith John A Civ AFIT/SC",John.Smith@afit.edu,"X400:c=US;a= ;p=AFIT;o=HANGAR;s=Smith;g=John;smtp:jasmith@afit.edusmtp:jsmith1@afit.eduSMTP:John.Smith@afit.eduSMTP:John.Smith.1@afit.edu"

Resolves Aliases to Unique Identification Numbers (UIDs)Action Attribution

SMTP LogSMTP LogSanitizerSanitizer

SMTP Logs

SMTP LogSMTP LogParserParser

ProxyListProxyListTo UIDTo UID

proxy.csv

uidlist.csv(updated)uidlist.csv Processed

SMTP Logs

SanitizedSMTP Logs

Social Network Social Network DataData

DatabaseDatabaseImport/QueryImport/Query

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

SMTPLogSanitizer

SMTP Log UIDList Before...-?TO:<John.Smith@afit.edu>... ......10 FROM:<jsmith@afit.edu>... 15,John.Smith@afit.edu...-?TO:<Jane.User@domain.org>... 15,jsmith@afit.edu

Sanitized SMTP Log UIDList After...-?TO:<15@afit.edu>... ......10 FROM:<15@afit.edu>... 15,John.Smith@afit.edu...-?TO:<216@domain.org>... 15,jsmith@afit.edu

216,Jane.User@afit.edu

Replaces username portion of email addresses with UIDsExternal parties can determine insider/outsider status

SMTP LogSMTP LogSanitizerSanitizer

SMTP Logs

SMTP LogSMTP LogParserParser

ProxyListProxyListTo UIDTo UID

proxy.csv

uidlist.csv(updated)uidlist.csv Processed

SMTP Logs

SanitizedSMTP Logs

Social Network Social Network DataData

DatabaseDatabaseImport/QueryImport/Query

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

SMTPLogParser

129.92.1.65 - OutboundConnectionCommand[09/Dec/2004:08:01:17 -0500] "MAIL-?FROM:<2718@afit.edu> SMTP" 0 4129.92.1.65 - OutboundConnectionResponse[09/Dec/2004:08:01:17 -0500] "--?250 2.1.0 2718@afit.edu....Sender OK SMTP" 0 43129.92.1.65 - OutboundConnectionCommand[09/Dec/2004:08:01:17 -0500] "RCPT-?TO:<1828@afit.edu> SMTP" 0 4129.92.1.65 - OutboundConnectionCommand[09/Dec/2004:08:01:17 -0500] "RCPT-?TO:<4590@ieee.org> SMTP" 0 4

Date Time SUID RUID SI RI NR2004/12/09 08:01:17 2718 1828 1 1 22004/12/09 08:01:17 2718 4590 1 0 2

Process Sanitized DataDerive Internal Status and Number of Recipients

SMTP LogSMTP LogSanitizerSanitizer

SMTP Logs

SMTP LogSMTP LogParserParser

ProxyListProxyListTo UIDTo UID

proxy.csv

uidlist.csv(updated)uidlist.csv Processed

SMTP Logs

SanitizedSMTP Logs

Social Network Social Network DataData

DatabaseDatabaseImport/QueryImport/Query

Two connections made

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Database Functions

Source Dest TieStrength162 272 2272 162 1272 314 1314 162 2314 272 1314 426 1...

Date Time SUID RUID SI RI NR2004/12/09 07:52:31 162 272 0 1 12004/12/09 08:01:17 272 162 1 1 22004/12/09 08:01:17 272 314 1 0 22004/12/09 08:01:18 314 162 1 1 32004/12/09 08:01:18 314 426 1 0 32004/12/09 08:01:18 314 272 1 0 32004/12/09 08:02:57 162 272 0 1 12004/12/09 08:11:33 314 162 1 1 1

Import processed logsGenerate social network data from logs

SMTP LogSMTP LogSanitizerSanitizer

SMTP Logs

SMTP LogSMTP LogParserParser

ProxyListProxyListTo UIDTo UID

proxy.csv

uidlist.csv(updated)uidlist.csv Processed

SMTP Logs

SanitizedSMTP Logs

Social Network Social Network DataData

DatabaseDatabaseImport/QueryImport/Query

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Experimental Methodology

Operational TestingAdd Months Sequentially

Use data gathered from AFITOct-Dec 2004

ParametersXP Professional SP2Pentium 4, 3.2 GHz HT, 2 GB RAMJava 1.5MySQL 4.1

Direct measurement

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Workload

Proxy List from Active Directory SMTP logs from AFIT servers

Medium-sized organization (over 1500 users)86 days, over 3 GB of data1.8 million email messages, 3 million connections1550 internal actors

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Timeliness Results

FastBottlenecks

SanitizationParsing (parallelizable)

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Usability Results

UCINet ReadableCentrality, Power, and Group Statistics Taken

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Usability Results

NetDraw Visualization Tool

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Impact

ImmediateTime and Cost ReductionMore Data for Social Network Analysis

SanitizationMore Information for Managers

Long TermLong term studiesUnderstanding EmployeesPotential Insider Threat CharacterizationPossible Insider Threat Mitigation

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Limitations

Presumes that e-mail behavior adequately captures social interactions

User “consent” to monitoringMost organizations have an acceptable use policyMay routinely monitor employee communicationsHowever, use of the system is somewhat voluntary

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Data CollectionExtend to Other CMC Records – IM, Blogs, web browsingContent of messages (subject, nature, contents, “reply-to”

Analyze Social Network DataValidation – does e-mail actually capture what we are hoping?Analysis of collected data – longitudinal studiesInsider threat research, staff collaboration,

ToolsParallelization

Future Research

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Backup Slides

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Insider Threat Mitigation

Intrusion Detection SystemsMisuse Detection

HoneypotsDetect Attacks on False DataStudying Attackers

PreventionLeadership/ManagementPoliciesDeterrence

Combination of the aboveDefense in depth

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

System Knowledge, Privileges

No attribute-based profileGender, age, background, marital status, position, incomeNo technical expertise needed

TrendsPlanning PreventableBehavior Change

Behavior-based Profiling

Insider Threat Profile

I n t e g r i t y - S e r v i c e - E x c e l l e n c e

Potential Insider Threat Mitigation

Social Network Data CollectionSocial Network Data CollectionSocial Network Analysis Of Real DataSocial Network Analysis Of Real DataUnderstanding Employee BehaviorInformation Tool for Managers Insider Threat Mitigation

top related