© 2010 ibm corporation 1 mass declassification what if? jeff jonas, ibm distinguished engineer...
Post on 29-Dec-2015
216 Views
Preview:
TRANSCRIPT
© 2010 IBM Corporation1
Mass Declassification
What If?
Jeff Jonas, IBM Distinguished EngineerChief Scientist, IBM Entity Analytics
JeffJonas@us.ibm.com
September 23, 2010
© 2010 IBM Corporation2
The Ask
What emerging technology or innovative approaches come to mind … which may have applicability to this task?
Use your imagination. What if?
Not talking about any specific products Not focusing on the widely available COTS/GOTS technologies
(OCR, document management, case management, workflow, etc.)
© 2010 IBM Corporation3
The Problem at Hand
Volumes may be beyond human, brute force review (@5min/ea = 18,382 FTEs)
Necessitates some form of machine triage– Red: A disclosure risk
– Yellow: A possible disclosure risk
– Green: No disclosure risk
Reliable machine triage requires substantially better prediction systems
Even then, advanced means for humans to deal with the remaining large volumes of “possibles” is still required
© 2010 IBM Corporation4
Background
Early 80’s: Founded Systems Research & Development (SRD), a custom software consultancy
1989 – 2003: Built numerous systems for Las Vegas casinos including a technology known as Non-Obvious Relationship Awareness (NORA)
2001/2003: Funded by In-Q-Tel
2005: IBM acquires SRD
Cumulatively: I have had a hand in a number of systems with multi-billions of rows describing 100’s of millions of entities
Affiliations:– Member, Markle Foundation Task Force on National Security in the Information Age
– Senior Associate, Center for Strategic and International Studies (CSIS)
– Distinguished Research Faculty (adjunct), Singapore Management University, School of Information Systems
– Member, EPIC advisory board
– Board Member, US Geospatial Intelligence Foundation (USGIF), the GEOINT organizing body
© 2010 IBM Corporation5
In Today’s Session
Intro to context accumulating systems
Predictions and data points needed for mass declassification
Strawman architecture
Challenges
Q&A
© 2010 IBM Corporation7
From Pixels to Pictures to Insight
Observations
Contextualization
Context
Relevance
Consumer(An analyst, a system, the sensor itself, etc.)
© 2010 IBM Corporation8
Context, definition of:
Better understanding something by taking into account the things around it.
© 2010 IBM Corporation10
Consequences
Algorithms flat-lining (e.g., alert queues)
Enterprise amnesia on the rise
Overwhelmed by false positives and false negatives? You have seen nothing yet
Not enough humans to fix this with brute force
Risk assessment becomes the risk
© 2010 IBM Corporation11
Context Accumulation
TrustedSupplier
Job Applicant
Stolen Identity
KnownTerrorist
scrila34@msn.com
© 2010 IBM Corporation12
Puzzle Metaphor Primer
Imagine an ever-growing pile of puzzle pieces of varying sizes, shapes and colors
What it represents is unknown – there is no picture on hand
Is it one puzzle, 15 puzzles, or 1,500 puzzles?
Some pieces are duplicates and some are missing
Some are pieces are incomplete, low quality, or have been misinterpreted
Some pieces may even be professionally fabricated lies
Until you take the pieces to the table, you don’t know what you are dealing with
© 2010 IBM Corporation13
How Context Accumulates
With each new observation … one of three assertions are made: 1) Un-associated; 2) near like neighbors; or 3) connections
Asserted connections must favor the false negative
New observations sometimes reverse earlier assertions
Some observations produce novel discovery
As the working space expands, computational effort increases
The emerging picture helps focus collection interests
Given sufficient observations, there can come a tipping point
Thereafter, confidence improves while computational effort decreases!!!!
© 2010 IBM Corporation14
Observations
Un
iqu
e Id
enti
ties
True Population
False Negatives Overstate The Universe
© 2010 IBM Corporation15
Counting Is Difficult
Mark Smith6/12/1978
443-43-0000
Mark R Smith(707) 433-0000DL: 00001234
File 1
File 2
© 2010 IBM Corporation16
Observations
Un
iqu
e Id
enti
ties
True Population
The Rise and Fall of a Population
© 2010 IBM Corporation17
Data Triangulation
Mark Randy Smith443-43-0000
DL: 00001234
New Record
Mark Smith6/12/1978
443-43-0000
Mark R Smith(707) 433-0000DL: 00001234
File 1
File 2
© 2010 IBM Corporation18
Observations
Un
iqu
e Id
enti
ties
True Population
Increasing Accuracy and Performance
© 2010 IBM Corporation19
“Expert Counting” is Fundamental to Prediction
Is it 5 people each with 1 account … or is it 1 person with 5 accounts?
If one cannot count … one cannot estimate vector or velocity (direction and speed).
Without vector and velocity … prediction is nearly impossible.
Therefore, if you can’t count, you can’t predict.
© 2010 IBM Corporation21
Mass Declassification Predictions
Whose equity is it?
Machine triage – disposition
Queue prioritization
© 2010 IBM Corporation22
Using What Data Points?
FOR EXAMPLE: 450M target documents Dirty words Previous declassifications Previous declassification denials FOIA’s Intellipedia Wikipedia WikiLeaks Deceased persons Publically available accounts/facts
© 2010 IBM Corporation24
Open Source Discovery/Scoring
“Height of Pakistan’s Mufasa missile.”
– What is 15.5 meters?
– New York Times, Sept 21, 2010, C3“Pakistan unveils Mufasa 7 Warhead”
– Wikipedia: Mufasa_7_Warhead
© 2010 IBM Corporation25
Context Accumulation
FOIAMarch 2010
Open SourceReference
Dirty Word
Classified – Asserted
Mufasa 7Warhead
© 2010 IBM Corporation26
Context Accumulation + Statistics
Document Element Total | Declass | Class-Default | Class-Asserted
Author: “Billy K” 4503 1600 403 0Codeword: “Tomatoe” 4818 4600 218 0Classification: “SI/TK/001” 23 22 1 0Actors: “Salam Ahmed” 782 700 82 0
Declassification dispositions … becoming a force multiplier.
The more human dispositions, the more automated dispositions.
Humans Auto Triage5,000 2010,000 4,000100,000 65,0001,000,000 17,000,000
© 2010 IBM Corporation27
Policy Questions
What related information is already available in the public domain?
– Evidence: Exists in open source
What damage might conceivably result from disclosure and what benefits might ensue
– Evidence: Same text already released (by same equity holder)
© 2010 IBM Corporation29
Strawman Architecture
450M Docs
Historical Dispositions
DirtyWords
Etc.
Feature Extraction
& Classification
Context Accumulation
Predictions(*)
WorkflowSystem
(*) Recommendations: Equity of, Disposition, Priority
Dispositions
© 2010 IBM Corporation30
Another Idea: Crowd Sourcing
Can you predict specific people with privileges and knowledge … to whom can be routed selected documents for evaluation?
Can you publish machine-triage recommendations to a wiki or other form of internal broadcast for community crowd sourcing?
© 2010 IBM Corporation31
Another Idea: Better Classification
Using the overall declassification platform to assist in proper classification (real-time)
And, better pre-tagging to assist in future auto-declassification
© 2010 IBM Corporation33
Challenges
Entity extraction is imperfect
Predictions may still not good enough, often enough
Not in English
The user work surface and its distribution
Consequences of an inappropriate release
With super access and super tools, this may call for stronger audit and insider-threat protections
Your contracting cycle and the creation of the system might take until mid-2011 or 2012 or 2013
© 2010 IBM Corporation35
Closing Thoughts
Contextualization is essential to better prediction
There are not enough humans to ask every question every day
“Human attention directing” systems are critical to the mission
The data must find the data, the relevance must find the user
© 2010 IBM Corporation36
Worst Case Scenario
Rich context enables better hints for users, results in faster dispositions
Rich context enables improved sequencing of the work
© 2010 IBM Corporation37
Related Blog Posts
Smart Sensemaking Systems, First and Foremost, Must be Expert Counting Systems
Data Finds Data
Puzzling: How Observations Are Accumulated Into Context
The Fast Last Puzzle Piece
Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel
How to Use a Glue Gun to Catch a Liar
It Turns Out Both Bad Data and a Teaspoon of Dirt May Be Good For You
Smart Systems Flip-Flop
© 2010 IBM Corporation38
Blogging At:
www.JeffJonas.TypePad.com
Information ManagementPrivacy
National Security
and Triathlons
Questions?
© 2010 IBM Corporation39
Mass Declassification
What If?
Jeff Jonas, IBM Distinguished EngineerChief Scientist, IBM Entity Analytics
JeffJonas@us.ibm.com
September 23, 2010
top related