semantic analysis in ia

38
1 SM S M anagem ent& Technology Semantic Analysis in IA Matthew Hodgson ACT regional-lead, Web and Information Management 23 Sept 2007

Upload: matthew-hodgson

Post on 29-Oct-2014

23 views

Category:

Technology


1 download

DESCRIPTION

English is a messy and chaotic language, with exceptions to rules, different styles of writing, and a multitude of different ways to write about the same thing. This chaos means that analysis, categorisation and building a corporate taxonomy is a very time consuming task, even if it’s just for the navigation of the local intranet- or internet website. This is my presentation at Oz-IA -- about my recent experience in turning ‘scary-bad’ medical restrictions text into something machine-usable. It introduces the concept of Semantic Analysis, the methodology I used to investigate the linguistic patterns in the text, and how this facilitated information classification and codification of content.

TRANSCRIPT

Page 1: Semantic Analysis in IA

1

SM

S M

anag

emen

t & T

echn

olog

y

Semantic Analysis in IA

Matthew HodgsonACT regional-lead, Web and Information Management

23 Sept 2007

Page 2: Semantic Analysis in IA

2

SM

S M

anag

emen

t & T

echn

olog

y

Page 3: Semantic Analysis in IA

3

SM

S M

anag

emen

t & T

echn

olog

y

Page 4: Semantic Analysis in IA

4

SM

S M

anag

emen

t & T

echn

olog

y

Jeffrey Veen on analysing content

“a mind-numbingly detailed odyssey through your web site...

…this process…is a relatively straightforward process of clicking through your web site and recording what you find.”

Source: http://www.adaptivepath.com/ideas/essays/archives/000040.php

Page 5: Semantic Analysis in IA

5

SM

S M

anag

emen

t & T

echn

olog

y

Page 6: Semantic Analysis in IA

6

SM

S M

anag

emen

t & T

echn

olog

y

Content overview – first take Medical restrictions text Free-text built in Word and hand-crafted (*grrr*) Unclassified Varied consistency within and between texts Highly complex sentence structures in pseudo-legalese Style reflects the author rather than

the meaning in the communication

Content needed for re-use Content output was needed for reuse by others Multiple audiences Multiple purposes for re-use

Codification Codification (after authoring) takes too long Need to reduce timeframes!

Page 7: Semantic Analysis in IA

7

SM

S M

anag

emen

t & T

echn

olog

y

The task . . .analyse and codify

Concept 1

Concept 2Concept 3

Concept 4 Concept 5

Concept 5

Page 8: Semantic Analysis in IA

8

SM

S M

anag

emen

t & T

echn

olog

y Linguistics…a whole discipline devoted to the

study of language

Page 9: Semantic Analysis in IA

9

SM

S M

anag

emen

t & T

echn

olog

y

“You’re joking!?” All language has structure – even someone’s pseudo-legal English

Analysing language is actually easier than you might think

Page 10: Semantic Analysis in IA

10

SM

S M

anag

emen

t & T

echn

olog

y

The approach

Analyse semantics of content There is a predicable structure It’s all just Lego™ building blocks (nouns, verbs,

adjectives, etc) Implied meaning can be made overt

New tools for IAs to play with! Understand semantics, the structure of sentences,

and you can analyse, categorise and codify English!

Page 11: Semantic Analysis in IA

11

SM

S M

anag

emen

t & T

echn

olog

y

Language as Lego™

Building blocks Subject (S) Verb (V) Object (O)

Order of blocks Differs depending on the language

Page 12: Semantic Analysis in IA

12

SM

S M

anag

emen

t & T

echn

olog

y

Order from chaos

SVO languages English, French, Chinese, Bulgarian, SwahiliSOV Japanese, Turkish, KoreanVSO Classical Arabic, Celtic and HawaiianVOS Fijian, Yoda’s amusing phrases

Page 13: Semantic Analysis in IA

13

SM

S M

anag

emen

t & T

echn

olog

y

Subjects, verbs and objects

The a

Subject Verb Object

red appleapple is

Sometimes, though, the SVO structure is hidden: The apple is red or The apple is a red apple?

Uncovering the hidden structure helps to differentiate between the subject and the object and identify the who and what

Page 14: Semantic Analysis in IA

14

SM

S M

anag

emen

t & T

echn

olog

y

Sentences as (apple) trees

VERB OBJECTSUBJECT

The apple is a red apple

NounPhrase

NounPhrase

VerbPhrase

Det NounVerb(be)

Det Adj Noun

Root

Page 15: Semantic Analysis in IA

15

SM

S M

anag

emen

t & T

echn

olog

y

Semantic analysis

Medical restrictions wording:

Restricted benefitGastro-oesophageal reflux disease; Scleroderma oesophagus;

Authority requiredPeptic ulcer

Page 16: Semantic Analysis in IA

16

SM

S M

anag

emen

t & T

echn

olog

y

Semantic analysis (cont.)

Actual sentence Peptic ulcer

Implied sentence The prescription of medicine is restricted to the

initial treatment of patients with peptic ulcer

Page 17: Semantic Analysis in IA

17

SM

S M

anag

emen

t & T

echn

olog

y

the prescription of medicine is restricted to the

DETVNDET PN

(SUBJECT)AUX

VAUX

treatment ofinitial peptic ulcerpatients with

NADJ P P ADJ NN

NounPhrase

PreposPhrase

NounPhrase

Root VerbPhrase

NounPhrase

PreposPhrase

NounPhrase

Page 18: Semantic Analysis in IA

18

SM

S M

anag

emen

t & T

echn

olog

yWHO

TREATED?

treatment of patientsinitial

Initi

al o

r co

ntin

uin

g

70 year old

mother

pregnant

Co

nd

itio

n b

ein

g t

rea

ted

form

Pra

ctic

al a

spe

cts

Ob

ject

the prescription of medicine is restricted to the

Su

bje

ct

Ve

rb

femalecontinuing

other ADJ

male

Pa

tien

t d

esc

rip

tors

(p

op

ula

tion

/gro

up

)

details of doctorrecord

daterecord

sign

receivingdBMARD treatment

previouslyPBS-

subsidised

PB

S s

ub

sid

ise

d

receivingPBS-

subsidiseddBMARD treatment

treated immunologistclinical

Lim

itatio

n o

fP

resc

rib

ing

to

a s

pe

cific

spe

cia

list

gro

up

withnausea and

vomiting

advanced psoriasis

peptic ulcerwith

tumorwith malignant

scleroderma oesphaguswith

with

with chronic pain

chemotherapycytotoxic

receivingA 5HT3

antagonist

radiotherapyreceiving

Exi

stin

g t

rea

tme

nt

de

scri

pto

rs

of

po

pu

latio

n

not toresponding anelgesics

not

ADJ

receiving

treated dermatologist

WHATCONDITION?

+

ADJ

NOUN

PREP

VERB

by

by

KEY

not previously

ACTIONREQUIRED

=complete

Authority action sheet

includewhole body

area diagrams

treat for period of time

provide historypreivous

prescribe repeatsnumber

with seizures

not toother

anti-epilepticdrugs

receiving treatment2 years

incomplete resolution

ADJ/PP

of

no indication of

surgeryhaving

responding

unable take of topiramatesolid form

partial

hormone dependent metastatic

cancerwith

Me

asu

res/

de

scri

pto

rso

f C

on

diti

on

se

veri

ty(A

DJ)

breast

contact Medicare

obtainAuthority number

Page 19: Semantic Analysis in IA

19

SM

S M

anag

emen

t & T

echn

olog

y

“Who Treated” semantic model

Age

Patient Group

Documented history

[mg ...etc]

[CLINICIAN] Requiring special expertise in

Requiring no special expertise

[EXPERTISE]

[SEVERITY] [CONDITION]

Sex

PBS subsidised

PBS non-subsidised

At a dose of

Weekly

Daily

Monthly

Yearly

Fortnightly

Hourly

Hours

Days

Weeks

Months

Years

Vocation Veteran

Male

Female

All

Ethnicity [ETHNICITY]

Entitlement [?]

[LIST]

[LIST]

Pregnant

Breastfeeding

[ADJECTIVE]

Veteran

?

[MEASURED AS]?

Co-administered with

That meet a specific definition/criteria as set out in [LIST of references]

General schedule of Lipid-lowering Drugs

and

[DEFINED BY]

Treatments

Within timeframe of

Over a period of

Trials

Treatment with

Treatment of

Treatment for

Initial

Continuing

Maintenance

Effective

Ineffective

Inappropriate

Initiation

Stabilisation

In conjunction with

Not in conjunction with

Following

Preceeding

Received

Has not received

Not responding

Responding

Failed to qualify for

Qualified for

Not indicated

Indicated

Has had

Has not had

Can have

Can not have

Can not receive

Disease progression

Disease regression

Treated by

Diagnosis confirmed by

=

[NUMBER]Over

Under

Exactly

Between

At least

[DRUG]

[TREATMENT]

Diet

Exercise

Surgery [TYPE]

[THERAPY]

Evidence of

[PROCEDURE]

in

[DISORDER]

Symptoms?

Clinical findings

Starts new prepositional-phrase in the same text-block

Starts new prepositional-phrase in the same text-block

Starts new prepositional-phrase

in the same text-block

As measured by?

As evidenced by

Starts new prepositional-phrase

in the same text-block

Page 20: Semantic Analysis in IA

20

SM

S M

anag

emen

t & T

echn

olog

y

“Authority Action” semantic model

Authority Action

(allow) Maximum

Therapy

Supply

(allow) Minimum

In writing

By telephone

[TIME]

days

weeks

months

Therapy

Supply[AMOUNT]

Repeats[AMOUNT]

Repeats[AMOUNT]

Initial

Subsequent

Ongoing

Initial

Subsequent

Ongoing

Initial

Subsequent

Ongoing In writing

By telephone

To complete

Followed by

In writing

By telephone Within timeframe of [TIME]

days

weeks

months

Treatment

Treatment

Electronically

Electronically

Electronically

Remaining

Remaining

Remaining

In writing

By telephone

Electronically

Initial

Subsequent

Ongoing

Remaining

Where approval

[TIMEFRAME]

To [AUTHORITY]

Medicare

To [AUTHORITY]

Medicare

To [AUTHORITY]

Medicare

...etc...

...etc...

...etc...

Repeats[AMOUNT]

Starts new prepositional-phrase

in the same text-block

Starts new prepositional-phrase

in the same text-block

Starts new prepositional-phrase

in the same text-block

Page 21: Semantic Analysis in IA

21

SM

S M

anag

emen

t & T

echn

olog

y

High-level semantic overview

HOWAUTHORISED

WHATCONDITION

WHO TREATED

Notes and Cautions + + + + =

DefinitionsAge

limitations

Clinical initiation or

continuation criteria

Prescribing clinicians

Prescribing adviceCondition

Contact information

Grandfathering clauses Patient

groups

Prior treatments Severity

Patient GroupDefinitions Condition Authority ActionForeword

Page 22: Semantic Analysis in IA

22

SM

S M

anag

emen

t & T

echn

olog

y

How did the ‘trees’ help?

Inferred How people think about and structure contentDescribed Business processes that produce contentIdentified Where content quality is poor so it can be improved Critical components of the sentence for codificationDesigned Taxonomies and describe folk taxonomiesBuilt Systems to help bring some structure to content authoring

Page 23: Semantic Analysis in IA

23

SM

S M

anag

emen

t & T

echn

olog

y

How can I do this stuff too?! (a side-step)

Theory is important An understanding of semantics - sentence trees

and grammar Text books by authors like Fromkin and Rodman

can help through the tricky bits

Need good tools Conexor: www.conexor.fi/demo/syntax Big sheets of paper (and an electronic whiteboard) Visio (not PowerPoint!)

Page 24: Semantic Analysis in IA

24

SM

S M

anag

emen

t & T

echn

olog

y

Demo

Connexor www.conexor.fi/demo/syntax

Page 25: Semantic Analysis in IA

25

SM

S M

anag

emen

t & T

echn

olog

y

Introducing ways to codify restrictionsHow are we actually going to codify the stuff?! Give people Lego™ or ‘fridge-magnets’ to build sentences Build a prototype to explore and demonstrate conceptual design

Communicate Talk about ideas with business owners Explore possibilities with end-users Build-in ‘no surprises’ into change management

Iterate Iterate and refine concepts and design before it was built

Inform Developers of intent and requirements The building of an ‘tool’ for codifying content (hooray for Axure!)

Page 26: Semantic Analysis in IA

26

SM

S M

anag

emen

t & T

echn

olog

y

Demo

Protyotyping with Axure

Page 27: Semantic Analysis in IA

27

SM

S M

anag

emen

t & T

echn

olog

y

Page 28: Semantic Analysis in IA

28

SM

S M

anag

emen

t & T

echn

olog

y

Page 29: Semantic Analysis in IA

29

SM

S M

anag

emen

t & T

echn

olog

y

Page 30: Semantic Analysis in IA

30

SM

S M

anag

emen

t & T

echn

olog

y

Page 31: Semantic Analysis in IA

31

SM

S M

anag

emen

t & T

echn

olog

y

Page 32: Semantic Analysis in IA

32

SM

S M

anag

emen

t & T

echn

olog

y

Page 33: Semantic Analysis in IA

33

SM

S M

anag

emen

t & T

echn

olog

y

Page 34: Semantic Analysis in IA

34

SM

S M

anag

emen

t & T

echn

olog

y

Page 35: Semantic Analysis in IA

35

SM

S M

anag

emen

t & T

echn

olog

y

Why should I care about this? Google uses semantic analysis to index content

Translation software uses semantic analysis to identify ‘components’ for translation

Good sentence structure equals: Accurate indexing Higher rank relevance of content Happy people (they find what they’re looking for)

Page 36: Semantic Analysis in IA

36

SM

S M

anag

emen

t & T

echn

olog

y

Summing upContent is still king, but: Is it’s quality any good? Does it match your website’s categories? Is your metadata ok? Can people find the content they need? Do you need to understand your content better?

Semantic analysis can: Make your content audits more objective Inform processes to improve the quality of the content Inform processes to improve search engine indexing Inform metadata creation Improve website navigation design

Page 37: Semantic Analysis in IA

37

SM

S M

anag

emen

t & T

echn

olog

y

email: [email protected]: www.smsmt.com

blog: magia3e.wordpress.comtwitter: magia3e

community: iacanberra.org

cartoons: © Garry Larson

Please Sir, can I have some more…?

Page 38: Semantic Analysis in IA

38

SM

S M

anag

emen

t & T

echn

olog

y

Fin