air force institute of technologyair force institute of technology automatic generation of social...
TRANSCRIPT
Air Force Institute of Technology
Automatic Generation of Social Network Data from Electronic-
Mail Communications
Jason W.S. YeeRobert F. Mills
Gilbert L. PetersonSummer Bartczak
Air Force Institute of TechnologyWright-Patterson AFB OH
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
BackgroundSystemExperimentConclusionsFuture Research
Overview
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Social Network Analysis (SNA)
Blend of Psychology and SociologyWho You Know vs What You KnowHuman Behavior PatternsCollaborationPersonnel Security
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Social Networking Relationships
Social Network PerspectiveImportance of with whom relationsGraph Theory
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
SNA Capabilities
Social CapitalSpread of EpidemicsOrganizational Network AnalysisCovert Terrorist NetworksCommunications Networks
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Lack of Social Network Data
Relatively New FieldAvailable Data is Limited, OldMethods
Surveys, Observation, Archived RecordsCostly, Time consuming
Objective:Gather data in automated fashionDesire for longitudinal studyChanges in social network structures
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Research Goals
Create a system that uses automated methods of generating usefulsocial network dataAutomated:
E-mail, instant messaging, web browsing, web logs, online forums…anything that has logging capabilityOur focus: e-mail
Evaluate the execution timeliness of the system and usability ofgenerated social network dataCollect more data, at reduced cost, in shorter time
Raw Data Social Network Data
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Social Network DataSocial Network DataGeneratorGenerator
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Solution Limitations and Scope
E-mail Log Data usedNational Center for Supercomputing Applications (NCSA) formatted e-mail logsSimple mail transfer protocol (SMTP)Collected by organization’s e-mail serversRaw data: timestamp, sender, recipientDerived data: status of users (internal/external), number of recipients (one-to-many, one-to-one, etc.)
Out of Research ScopeContent and Subject of EmailValidity of Social Network DataMethod and Meaning of Social Network Analysis
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
System Components
Assign UIDs to Users – AttributionSanitize Logs – PrivacyProcess Logs for Database – Mine for DataDatabase Functions – Extract and Format
EE--mailmailData fromData fromMicrosoftMicrosoftExchangeExchange Social Network Social Network
DataData
SanitizeLogs
ProcessLogs forDatabase
Assign UIDsTo Users
DatabaseFunctions
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
SMTP LogSMTP LogSanitizerSanitizer
SMTP Logs
SMTP LogSMTP LogParserParser
ProxyListProxyListTo UIDTo UID
proxy.csv
uidlist.csv(updated)
uidlist.csv
ProcessedSMTP Logs
SanitizedSMTP Logs
Implementation Overview
DatabaseDatabaseImport/QueryImport/Query Social Network DataSocial Network Data
ProxyListToUID: Assign UIDs to Users
SMTPLogSanitizer: Sanitize Logs
SMTPLogParser: Process Logs
Database Functions: Create Social Network Data
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
SMTP LogSMTP LogSanitizerSanitizer
SMTP Logs
SMTP LogSMTP LogParserParser
ProxyListProxyListTo UIDTo UID
proxy.csv
uidlist.csv(updated)uidlist.csv Processed
SMTP Logs
SanitizedSMTP Logs
Implementation Overview
Social Network Social Network DataData
DatabaseDatabaseImport/QueryImport/Query
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Privacy Protection
Sanitization may be performedApplication dependent
Sensitive FilesInitial Proxy List, Raw SMTP Logs, UID List
Sanitized Data can be de-sanitized
SMTP LogSMTP LogSanitizerSanitizer
SMTP Logs
SMTP LogSMTP LogParserParser
ProxyListProxyListTo UIDTo UID
proxy.csv
uidlist.csv(updated)uidlist.csv Processed
SMTP Logs
SanitizedSMTP Logs
Social Network Social Network DataData
DatabaseDatabaseImport/QueryImport/Query
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
ProxyListToUID
0,[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]
Yee Jason 2dLt AFIT/ENG,[email protected],"X400:c=US;a= ;p=AFIT;o=HANGAR;s=Yee;g=Jason;smtp:[email protected]:[email protected]""Smith John A Civ AFIT/SC",[email protected],"X400:c=US;a= ;p=AFIT;o=HANGAR;s=Smith;g=John;smtp:[email protected]:[email protected]:[email protected]:[email protected]"
Resolves Aliases to Unique Identification Numbers (UIDs)Action Attribution
SMTP LogSMTP LogSanitizerSanitizer
SMTP Logs
SMTP LogSMTP LogParserParser
ProxyListProxyListTo UIDTo UID
proxy.csv
uidlist.csv(updated)uidlist.csv Processed
SMTP Logs
SanitizedSMTP Logs
Social Network Social Network DataData
DatabaseDatabaseImport/QueryImport/Query
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
SMTPLogSanitizer
SMTP Log UIDList Before...-?TO:<[email protected]>... ......10 FROM:<[email protected]>... 15,[email protected]?TO:<[email protected]>... 15,[email protected]
Sanitized SMTP Log UIDList After...-?TO:<[email protected]>... ......10 FROM:<[email protected]>... 15,[email protected]?TO:<[email protected]>... 15,[email protected]
Replaces username portion of email addresses with UIDsExternal parties can determine insider/outsider status
SMTP LogSMTP LogSanitizerSanitizer
SMTP Logs
SMTP LogSMTP LogParserParser
ProxyListProxyListTo UIDTo UID
proxy.csv
uidlist.csv(updated)uidlist.csv Processed
SMTP Logs
SanitizedSMTP Logs
Social Network Social Network DataData
DatabaseDatabaseImport/QueryImport/Query
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
SMTPLogParser
129.92.1.65 - OutboundConnectionCommand[09/Dec/2004:08:01:17 -0500] "MAIL-?FROM:<[email protected]> SMTP" 0 4129.92.1.65 - OutboundConnectionResponse[09/Dec/2004:08:01:17 -0500] "--?250 2.1.0 [email protected] OK SMTP" 0 43129.92.1.65 - OutboundConnectionCommand[09/Dec/2004:08:01:17 -0500] "RCPT-?TO:<[email protected]> SMTP" 0 4129.92.1.65 - OutboundConnectionCommand[09/Dec/2004:08:01:17 -0500] "RCPT-?TO:<[email protected]> SMTP" 0 4
Date Time SUID RUID SI RI NR2004/12/09 08:01:17 2718 1828 1 1 22004/12/09 08:01:17 2718 4590 1 0 2
Process Sanitized DataDerive Internal Status and Number of Recipients
SMTP LogSMTP LogSanitizerSanitizer
SMTP Logs
SMTP LogSMTP LogParserParser
ProxyListProxyListTo UIDTo UID
proxy.csv
uidlist.csv(updated)uidlist.csv Processed
SMTP Logs
SanitizedSMTP Logs
Social Network Social Network DataData
DatabaseDatabaseImport/QueryImport/Query
Two connections made
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Database Functions
Source Dest TieStrength162 272 2272 162 1272 314 1314 162 2314 272 1314 426 1...
Date Time SUID RUID SI RI NR2004/12/09 07:52:31 162 272 0 1 12004/12/09 08:01:17 272 162 1 1 22004/12/09 08:01:17 272 314 1 0 22004/12/09 08:01:18 314 162 1 1 32004/12/09 08:01:18 314 426 1 0 32004/12/09 08:01:18 314 272 1 0 32004/12/09 08:02:57 162 272 0 1 12004/12/09 08:11:33 314 162 1 1 1
Import processed logsGenerate social network data from logs
SMTP LogSMTP LogSanitizerSanitizer
SMTP Logs
SMTP LogSMTP LogParserParser
ProxyListProxyListTo UIDTo UID
proxy.csv
uidlist.csv(updated)uidlist.csv Processed
SMTP Logs
SanitizedSMTP Logs
Social Network Social Network DataData
DatabaseDatabaseImport/QueryImport/Query
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Experimental Methodology
Operational TestingAdd Months Sequentially
Use data gathered from AFITOct-Dec 2004
ParametersXP Professional SP2Pentium 4, 3.2 GHz HT, 2 GB RAMJava 1.5MySQL 4.1
Direct measurement
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Workload
Proxy List from Active Directory SMTP logs from AFIT servers
Medium-sized organization (over 1500 users)86 days, over 3 GB of data1.8 million email messages, 3 million connections1550 internal actors
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Timeliness Results
FastBottlenecks
SanitizationParsing (parallelizable)
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Usability Results
UCINet ReadableCentrality, Power, and Group Statistics Taken
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Usability Results
NetDraw Visualization Tool
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Impact
ImmediateTime and Cost ReductionMore Data for Social Network Analysis
SanitizationMore Information for Managers
Long TermLong term studiesUnderstanding EmployeesPotential Insider Threat CharacterizationPossible Insider Threat Mitigation
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Limitations
Presumes that e-mail behavior adequately captures social interactions
User “consent” to monitoringMost organizations have an acceptable use policyMay routinely monitor employee communicationsHowever, use of the system is somewhat voluntary
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Data CollectionExtend to Other CMC Records – IM, Blogs, web browsingContent of messages (subject, nature, contents, “reply-to”
Analyze Social Network DataValidation – does e-mail actually capture what we are hoping?Analysis of collected data – longitudinal studiesInsider threat research, staff collaboration,
ToolsParallelization
Future Research
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Backup Slides
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Insider Threat Mitigation
Intrusion Detection SystemsMisuse Detection
HoneypotsDetect Attacks on False DataStudying Attackers
PreventionLeadership/ManagementPoliciesDeterrence
Combination of the aboveDefense in depth
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
System Knowledge, Privileges
No attribute-based profileGender, age, background, marital status, position, incomeNo technical expertise needed
TrendsPlanning PreventableBehavior Change
Behavior-based Profiling
Insider Threat Profile
I n t e g r i t y - S e r v i c e - E x c e l l e n c e
Potential Insider Threat Mitigation
Social Network Data CollectionSocial Network Data CollectionSocial Network Analysis Of Real DataSocial Network Analysis Of Real DataUnderstanding Employee BehaviorInformation Tool for Managers Insider Threat Mitigation