uir alert agent : an alert system for identifying suspicious web-site browsing leading to unintended...

19
UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State University of New York at Buffalo May 6, 2003

Post on 21-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR)

Rohini K. SrihariState University of New York at Buffalo

May 6, 2003

Page 2: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 2

Tracking suspicious web browsing

Should we let him see it? Should we monitor his next moves?

What Information has the user obtained till now? What was inferred from the visited pages? What additional information can they infer with this new web-

page? Did we intend to reveal this information? Should we be alerted if this is unintended?

User has visited these pageshttp://www.faa.gov/apa/safer_skies/fsstats.htmhttp://www.faa.gov/certification/aircraft/sfar88/01hstry2.pps

User is requesting http://www.awp.faa.gov/fsdo/docs/spm_info/what/fy2000/sdplan00.doc

Measuring Unintended Information Revelation(UIR) for visited and requested pages will answer these questions

Page 3: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 3

Outline

Unintended Information Revelation Problem Definition

Solutions with Existing Technology Proposed Solution

UIR System Architecture Extracting Concepts and Associations Creating Concept Chain Graphs (CCG) Mining and visualization of CCGs

Evaluation Methodology Preliminary Results Summary

Page 4: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 4

User’s previous request

Important Conceptssafer skies, fatal accidents,

runway incursions, hijack, etc.

Interesting InformationNumber and percentage of

Fatal Accidents in 1996 Runway Incursions Ice/Snow In-Flight fire

Fact Sheet: Aviation Accident Statistics http://www.faa.gov/apa/safer_skies/fsstats.htm

Page 5: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 5

User’s current request

Fuel tank ignition eventshttp://www.faa.gov/certification/aircraft/sfar88/01hstry2.pps

Important Conceptsfatalities, fuel tank ignition,

hull loss, electrostatics, etc.

Interesting InformationIdentifies causes for fuel

tank ignition accidents

Small bomb Faulty Wiring Pump Faults

Page 6: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 6

Synthesized Information

In-flight fire can cause accidentsFuel-tank ignitions caused by small bombs, faulty pumps/wirings, etc.Domain Knowledge: In-flight fires and fuel-tank ignitions are aviation

hazards. Inference: faulty wirings can cause in-flight fires

Page 7: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 7

1

4

12

4

6

2

UIR Alert AgentUIR is a phenomenon where information synthesized from multiple documents is more than the information provided by the sum of the individual documents Generate alerts for unintended information revelation based on user’s browsing history and requested pages

1

1

9

3

7

11

UIRAlertAgent

A

C

B

Alert Generated on User B

Alerts Log

User Browsing History

Page 8: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

Information Extraction

UIR

2 3

1110

7

412

1 10

Pre-existing Domain Ontology/Lexicon

(e.g Aviation Ontology)

85

21 4

7

3

6

9 1110 12

DocumentCollection

(web pages)

Concept Chain Graphs (CCG)

Input: User surfing web pages on sites of interest to national security

Document subset

Accident-hazard-fuel tank -…

ice/snow-hazard-fatalities-…

CCG instantiated for subset of interest

UIR Alert

Module

User alerts / logs

Output: web pages that reveal too much information; human monitor can visualize paths in CCG

Architecture of UIR System

Page 9: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 9

Proposed SolutionStep 1: Determine significant concepts and associations

intarget domain (offline, semi-automatic)

use of existing ontologies such as DAML ontology on aviation use of information extraction to automatically extract concepts

and associations from representative document collection

Step 2: Create Concept Chain Graph (CCG) consolidates underlying domain knowledge, specific documents weights concepts and associations using both domain weights,

individual document weights

Step 3: Visualization and text mining operations on CCGStep 4: UIR Alert agent invoked

tracking user surfing patterns what-if scenarios

Page 10: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 10

Evaluation Methodology

TREC Query:

find pages that discuss ways of causing air disasters

TREC Narrative:

Pages that are relevant to causing air disasters will mention aircraft maintenance operations or passenger screening procedures

Ranked web pages

Relevant web pages

Evaluate ability to generate narrative

Evaluate precision and recall of IR system

Typical IR evaluation

UIR Evaluation

IR system

includes query expansion

UIR System

CCG

Page 11: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 11

Step 1: Extracting Concepts and AssociationsExtracting Concepts: Use InfoXtract engine from Cymfony Named Entity Tagger (NE) identifies common Entities like Date,

Time, Location, State, Country, Organization, Person. InfoXtract also identifies significant noun groups, verb groups

e.g. fuel tanker, runway de-icing

Extracting Associations: Concept Co-occurrence in documents Concept Proximity in sentences/paragraphs

Advanced Techniques using machine learning

… The designation for one end of the runway should be used on the sign only when the taxiway intersects the beginning of that runway. Taxiways that intersect the runway at intermediate points must have the designations for both runway ends. ...

Association Learning

(runway, taxiway): 0.85

Output implies: System has 85% confidence that runway and taxiway associated by some relation.

Page 12: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 12

Sample Information Extraction output

DATE: October 23, 1992 NO. 92-03

TO: AIRPORT CERTIFICATION PROGRAM INSPECTORS

TOPIC: Effects Of Type II Deicing Fluid On Runway Friction

The FAA's Technical Center in conjunction with the Port Authority of New York and New Jersey conducted tests to determine the effects of Type II aircraft deicing fluids on runway friction. The tests were conducted this past July and August at La Guardia and John F. Kennedy International Airports on grooved asphaltic pavement. Since the tests were conducted in the summer no attempt was made to simulate ice or snow on the pavement surface. (See future test programs.) Two specially instrumented B-727's and two Saab friction devices were used to measure the runway friction.

The purpose of this effort was to test the premise that Type II deicing fluid deposited on a runway poses a hazard to aircraft landing on the runway. At the present time it is unknown to what extent Type II actually falls off a departing aircraft and what portion of it is deposited on the runway. (See future test programs.)

Concepts and Named Entities are marked up during information extraction

Page 13: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 13

Step 2: Create Concept Chain Graph Create concept chain graph based on underlying domain knowledge

(concepts, associations). Weight concept nodes based on frequency, type, user-defined importance weight associations based on proximity, importance of concepts they link,

uniqueness Project/Map documents viewed by user onto CCG

A document is represented as a probabilistic sub-graph in the CCG Proximity and other metrics are used to assign weights on the concepts(nodes) and

associations(edges) discovered in a document

Aviation Ontology

1

0.124

0.2324

0.54

0.013

0.101

0.123

0. 239

0. 088

0. 1065

0. 0

1

Document-specific concepts, associations, with weights

Page 14: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 14

Associations inDocumentDomain Knowledge

Fuel tank ignition events

Accident Statistics

Step 2: Instantiated Concept Chain Graph

ACCIDENT

Ice/snow

Windshear

HAZARD

In-flight fire

Air_traffic__control_tower

Runway Incursions

AVIATION

AIRPLANE

Statistics

Fuel Tank

Fuel Tank Ignition events

hull losses

Fatalities

Lightning

Wiring

Pumps

Small Bomb

Page 15: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 15

Step 3: Mining the CCG Goals

detecting information-rich concept chains e.g. air disaster - onboard explosion - fuel tanker

quantifying information revealed issue alerts when too much information is revealed “what-if” scenarios to enable dissemination of benign

information

Graph traversal generate CCG representing documents viewed by user start with explicit query/search terms as seed concepts;

could be multiple terms strategies:

try to find best paths/chains that connect “seed” concepts; could generate multiple chains

try to find best subgraph various graph traversal algorithms are suitable

Page 16: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 16

Graph Traversal Techniques minimum cover techniques

INSTANCE: Graph G = {V, E}

SOLUTION: A vertex cover for G, i.e., a subset V’ V such that, for each edge (u,v) E, at least one of u and v belongs to V'.

MEASURE: Cardinality of the vertex cover, i.e., |V’ |.

Flow networks given a network (G,s,t,c) where G = (V,E) is a directed graph

with n vertices and m edges, s and t are two vertices (source and sink), and c: E-> R+ is a function that defines capacities of edges

find maximum flow from s to t that satisfies capacity constraints

Energy minimization (used in image processing) active contours (e.g. snakes) used for tracking various shapes,

including road detection

dynamic programming solutions available

Page 17: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 17

Step 4: Track user surfing with UIR module

ACCIDENT

Ice/snow

Windshear

HAZARD

In-flight fire

Air_traffic__control_tower

Runway Incursions

AVIATION

AIRPLANE

Statistics

Fuel Tank

Fuel Tank Ignition events

hull losses

Fatalities

Lightning

Wiring

Pumps

Small Bomb

UIR module determines that these two documents reveal new association between wiring and accidents.

Previously viewed page(s)

requested page

Page 18: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 18

Preliminary Experiments

Page 19: UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR) Rohini K. Srihari State

FAA Workshop May 2003 19

Summary

Benefits to FAA Automated monitoring information acquired by users of the FAA

website and alert mechanism for unintentionally revealed information.

Shortlist and identify documents and concepts seen by the user that reveal unintended information

Domain map visualization tool facilitates concept and association based queries

Claims new, richer representation for information retrieval that

combines keyword statistics (bag-of-words model) with NLP-based information extraction

Solution is general to any domain; only domain map needs to be customized/retrained

Experts can intervene, guide the process, if desired; tools provided