semantic analysis in ia

Post on 29-Oct-2014

24 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

English is a messy and chaotic language, with exceptions to rules, different styles of writing, and a multitude of different ways to write about the same thing. This chaos means that analysis, categorisation and building a corporate taxonomy is a very time consuming task, even if it’s just for the navigation of the local intranet- or internet website. This is my presentation at Oz-IA -- about my recent experience in turning ‘scary-bad’ medical restrictions text into something machine-usable. It introduces the concept of Semantic Analysis, the methodology I used to investigate the linguistic patterns in the text, and how this facilitated information classification and codification of content.

TRANSCRIPT

1

SM

S M

anag

emen

t & T

echn

olog

y

Semantic Analysis in IA

Matthew HodgsonACT regional-lead, Web and Information Management

23 Sept 2007

2

SM

S M

anag

emen

t & T

echn

olog

y

3

SM

S M

anag

emen

t & T

echn

olog

y

4

SM

S M

anag

emen

t & T

echn

olog

y

Jeffrey Veen on analysing content

“a mind-numbingly detailed odyssey through your web site...

…this process…is a relatively straightforward process of clicking through your web site and recording what you find.”

Source: http://www.adaptivepath.com/ideas/essays/archives/000040.php

5

SM

S M

anag

emen

t & T

echn

olog

y

6

SM

S M

anag

emen

t & T

echn

olog

y

Content overview – first take Medical restrictions text Free-text built in Word and hand-crafted (*grrr*) Unclassified Varied consistency within and between texts Highly complex sentence structures in pseudo-legalese Style reflects the author rather than

the meaning in the communication

Content needed for re-use Content output was needed for reuse by others Multiple audiences Multiple purposes for re-use

Codification Codification (after authoring) takes too long Need to reduce timeframes!

7

SM

S M

anag

emen

t & T

echn

olog

y

The task . . .analyse and codify

Concept 1

Concept 2Concept 3

Concept 4 Concept 5

Concept 5

8

SM

S M

anag

emen

t & T

echn

olog

y Linguistics…a whole discipline devoted to the

study of language

9

SM

S M

anag

emen

t & T

echn

olog

y

“You’re joking!?” All language has structure – even someone’s pseudo-legal English

Analysing language is actually easier than you might think

10

SM

S M

anag

emen

t & T

echn

olog

y

The approach

Analyse semantics of content There is a predicable structure It’s all just Lego™ building blocks (nouns, verbs,

adjectives, etc) Implied meaning can be made overt

New tools for IAs to play with! Understand semantics, the structure of sentences,

and you can analyse, categorise and codify English!

11

SM

S M

anag

emen

t & T

echn

olog

y

Language as Lego™

Building blocks Subject (S) Verb (V) Object (O)

Order of blocks Differs depending on the language

12

SM

S M

anag

emen

t & T

echn

olog

y

Order from chaos

SVO languages English, French, Chinese, Bulgarian, SwahiliSOV Japanese, Turkish, KoreanVSO Classical Arabic, Celtic and HawaiianVOS Fijian, Yoda’s amusing phrases

13

SM

S M

anag

emen

t & T

echn

olog

y

Subjects, verbs and objects

The a

Subject Verb Object

red appleapple is

Sometimes, though, the SVO structure is hidden: The apple is red or The apple is a red apple?

Uncovering the hidden structure helps to differentiate between the subject and the object and identify the who and what

14

SM

S M

anag

emen

t & T

echn

olog

y

Sentences as (apple) trees

VERB OBJECTSUBJECT

The apple is a red apple

NounPhrase

NounPhrase

VerbPhrase

Det NounVerb(be)

Det Adj Noun

Root

15

SM

S M

anag

emen

t & T

echn

olog

y

Semantic analysis

Medical restrictions wording:

Restricted benefitGastro-oesophageal reflux disease; Scleroderma oesophagus;

Authority requiredPeptic ulcer

16

SM

S M

anag

emen

t & T

echn

olog

y

Semantic analysis (cont.)

Actual sentence Peptic ulcer

Implied sentence The prescription of medicine is restricted to the

initial treatment of patients with peptic ulcer

17

SM

S M

anag

emen

t & T

echn

olog

y

the prescription of medicine is restricted to the

DETVNDET PN

(SUBJECT)AUX

VAUX

treatment ofinitial peptic ulcerpatients with

NADJ P P ADJ NN

NounPhrase

PreposPhrase

NounPhrase

Root VerbPhrase

NounPhrase

PreposPhrase

NounPhrase

18

SM

S M

anag

emen

t & T

echn

olog

yWHO

TREATED?

treatment of patientsinitial

Initi

al o

r co

ntin

uin

g

70 year old

mother

pregnant

Co

nd

itio

n b

ein

g t

rea

ted

form

Pra

ctic

al a

spe

cts

Ob

ject

the prescription of medicine is restricted to the

Su

bje

ct

Ve

rb

femalecontinuing

other ADJ

male

Pa

tien

t d

esc

rip

tors

(p

op

ula

tion

/gro

up

)

details of doctorrecord

daterecord

sign

receivingdBMARD treatment

previouslyPBS-

subsidised

PB

S s

ub

sid

ise

d

receivingPBS-

subsidiseddBMARD treatment

treated immunologistclinical

Lim

itatio

n o

fP

resc

rib

ing

to

a s

pe

cific

spe

cia

list

gro

up

withnausea and

vomiting

advanced psoriasis

peptic ulcerwith

tumorwith malignant

scleroderma oesphaguswith

with

with chronic pain

chemotherapycytotoxic

receivingA 5HT3

antagonist

radiotherapyreceiving

Exi

stin

g t

rea

tme

nt

de

scri

pto

rs

of

po

pu

latio

n

not toresponding anelgesics

not

ADJ

receiving

treated dermatologist

WHATCONDITION?

+

ADJ

NOUN

PREP

VERB

by

by

KEY

not previously

ACTIONREQUIRED

=complete

Authority action sheet

includewhole body

area diagrams

treat for period of time

provide historypreivous

prescribe repeatsnumber

with seizures

not toother

anti-epilepticdrugs

receiving treatment2 years

incomplete resolution

ADJ/PP

of

no indication of

surgeryhaving

responding

unable take of topiramatesolid form

partial

hormone dependent metastatic

cancerwith

Me

asu

res/

de

scri

pto

rso

f C

on

diti

on

se

veri

ty(A

DJ)

breast

contact Medicare

obtainAuthority number

19

SM

S M

anag

emen

t & T

echn

olog

y

“Who Treated” semantic model

Age

Patient Group

Documented history

[mg ...etc]

[CLINICIAN] Requiring special expertise in

Requiring no special expertise

[EXPERTISE]

[SEVERITY] [CONDITION]

Sex

PBS subsidised

PBS non-subsidised

At a dose of

Weekly

Daily

Monthly

Yearly

Fortnightly

Hourly

Hours

Days

Weeks

Months

Years

Vocation Veteran

Male

Female

All

Ethnicity [ETHNICITY]

Entitlement [?]

[LIST]

[LIST]

Pregnant

Breastfeeding

[ADJECTIVE]

Veteran

?

[MEASURED AS]?

Co-administered with

That meet a specific definition/criteria as set out in [LIST of references]

General schedule of Lipid-lowering Drugs

and

[DEFINED BY]

Treatments

Within timeframe of

Over a period of

Trials

Treatment with

Treatment of

Treatment for

Initial

Continuing

Maintenance

Effective

Ineffective

Inappropriate

Initiation

Stabilisation

In conjunction with

Not in conjunction with

Following

Preceeding

Received

Has not received

Not responding

Responding

Failed to qualify for

Qualified for

Not indicated

Indicated

Has had

Has not had

Can have

Can not have

Can not receive

Disease progression

Disease regression

Treated by

Diagnosis confirmed by

=

[NUMBER]Over

Under

Exactly

Between

At least

[DRUG]

[TREATMENT]

Diet

Exercise

Surgery [TYPE]

[THERAPY]

Evidence of

[PROCEDURE]

in

[DISORDER]

Symptoms?

Clinical findings

Starts new prepositional-phrase in the same text-block

Starts new prepositional-phrase in the same text-block

Starts new prepositional-phrase

in the same text-block

As measured by?

As evidenced by

Starts new prepositional-phrase

in the same text-block

20

SM

S M

anag

emen

t & T

echn

olog

y

“Authority Action” semantic model

Authority Action

(allow) Maximum

Therapy

Supply

(allow) Minimum

In writing

By telephone

[TIME]

days

weeks

months

Therapy

Supply[AMOUNT]

Repeats[AMOUNT]

Repeats[AMOUNT]

Initial

Subsequent

Ongoing

Initial

Subsequent

Ongoing

Initial

Subsequent

Ongoing In writing

By telephone

To complete

Followed by

In writing

By telephone Within timeframe of [TIME]

days

weeks

months

Treatment

Treatment

Electronically

Electronically

Electronically

Remaining

Remaining

Remaining

In writing

By telephone

Electronically

Initial

Subsequent

Ongoing

Remaining

Where approval

[TIMEFRAME]

To [AUTHORITY]

Medicare

To [AUTHORITY]

Medicare

To [AUTHORITY]

Medicare

...etc...

...etc...

...etc...

Repeats[AMOUNT]

Starts new prepositional-phrase

in the same text-block

Starts new prepositional-phrase

in the same text-block

Starts new prepositional-phrase

in the same text-block

21

SM

S M

anag

emen

t & T

echn

olog

y

High-level semantic overview

HOWAUTHORISED

WHATCONDITION

WHO TREATED

Notes and Cautions + + + + =

DefinitionsAge

limitations

Clinical initiation or

continuation criteria

Prescribing clinicians

Prescribing adviceCondition

Contact information

Grandfathering clauses Patient

groups

Prior treatments Severity

Patient GroupDefinitions Condition Authority ActionForeword

22

SM

S M

anag

emen

t & T

echn

olog

y

How did the ‘trees’ help?

Inferred How people think about and structure contentDescribed Business processes that produce contentIdentified Where content quality is poor so it can be improved Critical components of the sentence for codificationDesigned Taxonomies and describe folk taxonomiesBuilt Systems to help bring some structure to content authoring

23

SM

S M

anag

emen

t & T

echn

olog

y

How can I do this stuff too?! (a side-step)

Theory is important An understanding of semantics - sentence trees

and grammar Text books by authors like Fromkin and Rodman

can help through the tricky bits

Need good tools Conexor: www.conexor.fi/demo/syntax Big sheets of paper (and an electronic whiteboard) Visio (not PowerPoint!)

24

SM

S M

anag

emen

t & T

echn

olog

y

Demo

Connexor www.conexor.fi/demo/syntax

25

SM

S M

anag

emen

t & T

echn

olog

y

Introducing ways to codify restrictionsHow are we actually going to codify the stuff?! Give people Lego™ or ‘fridge-magnets’ to build sentences Build a prototype to explore and demonstrate conceptual design

Communicate Talk about ideas with business owners Explore possibilities with end-users Build-in ‘no surprises’ into change management

Iterate Iterate and refine concepts and design before it was built

Inform Developers of intent and requirements The building of an ‘tool’ for codifying content (hooray for Axure!)

26

SM

S M

anag

emen

t & T

echn

olog

y

Demo

Protyotyping with Axure

27

SM

S M

anag

emen

t & T

echn

olog

y

28

SM

S M

anag

emen

t & T

echn

olog

y

29

SM

S M

anag

emen

t & T

echn

olog

y

30

SM

S M

anag

emen

t & T

echn

olog

y

31

SM

S M

anag

emen

t & T

echn

olog

y

32

SM

S M

anag

emen

t & T

echn

olog

y

33

SM

S M

anag

emen

t & T

echn

olog

y

34

SM

S M

anag

emen

t & T

echn

olog

y

35

SM

S M

anag

emen

t & T

echn

olog

y

Why should I care about this? Google uses semantic analysis to index content

Translation software uses semantic analysis to identify ‘components’ for translation

Good sentence structure equals: Accurate indexing Higher rank relevance of content Happy people (they find what they’re looking for)

36

SM

S M

anag

emen

t & T

echn

olog

y

Summing upContent is still king, but: Is it’s quality any good? Does it match your website’s categories? Is your metadata ok? Can people find the content they need? Do you need to understand your content better?

Semantic analysis can: Make your content audits more objective Inform processes to improve the quality of the content Inform processes to improve search engine indexing Inform metadata creation Improve website navigation design

37

SM

S M

anag

emen

t & T

echn

olog

y

email: mhodgson@smsmt.comweb: www.smsmt.com

blog: magia3e.wordpress.comtwitter: magia3e

community: iacanberra.org

cartoons: © Garry Larson

Please Sir, can I have some more…?

38

SM

S M

anag

emen

t & T

echn

olog

y

Fin

top related