location, location, location why am i here? details, details, details

38

Upload: godfrey-hancock

Post on 23-Dec-2015

249 views

Category:

Documents


1 download

TRANSCRIPT

Location, Location, LocationWhy am I here?Details, Details, Details

Everybody has to be somewhere!(Eccles: Goon Show circa 1950)

Computing

Research

Laboratory

Information retrievalLanguage learning and language

teachingAutomatic translationSummarizationQuestion answeringDictionary developmentKnowledge discovery

Language Engineering at CRL

Field Guide Locations

Habitat Mainly deciduous forests and woodlands; often seen over adjacent farmlands.

Nesting 2 whitish eggs, heavily marked with dark brown, placed without nest or lining in a crevice in rocks, in a hollow tree, or in a fallen hollow log.

Range Breeds from southern British Columbia, central Saskatchewan, Great Lakes, and New Hampshire southward. Winters in Southwest, and in East northward to southern New England.

Tipster/MUC Named Entity TaskSeven Page Task Definition

Multi-name expressions containing conjoined modifiers (with elision of the head of one conjunct) should be marked up as separate expressions.

"North and South America" <ENAMEX TYPE="LOCATION">North</ENAMEX> and <ENAMEX TYPE="LOCATION">South America</ENAMEX>

+ Gazetteer – from USGS and National Geographic

Scotland (CITY) Alabama (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Arkansas (PROVINCE 1) United States (COUNTRY)Scotland (CITY) California (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Connecticut (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Florida (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Georgia (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Indiana (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Maine (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Maryland (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Massachusetts (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Mississippi (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Missouri (PROVINCE 1) United States (COUNTRY)Scotland (CITY) New Hampshire (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Ohio (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Pennsylvania (PROVINCE 1) United States (COUNTRY)Scotland (CITY) South Dakota (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Texas (PROVINCE 1) United States (COUNTRY)Scotland (CITY) Virginia (PROVINCE 1) United States (COUNTRY)Scotland (PROVINCE 1) United Kingdom (COUNTRY)Scotland (PROVINCE 2) Missouri (PROVINCE 1) United States (COUNTRY)

MINDS

User configurable summarization system based on sentence selection

Summarizes documents in Spanish, Japanese, Russian, Turkish, Korean, and English

Summaries can be biased to favor place names, or other named entities

Basic Summarization Method

Document structure analysisKeyword analysisPart of Speech and Proper Name

RecognitionSentence selection based on weighted

scores

Boas: “A Linguist in the Box”

Boas is a semi-automatic knowledge elicitation system that guides a language speaker through the process of developing the static knowledge sources for a moderate-quality, broad-coverage MT system from any “low-density” language into English in about six months.

One of the tasks is translating a long list of place names from English into the source language.

The ethnologist and linguist Franz Boas was the founder of the American school of descriptive linguistics.

In this photo, circa 1900?, he is shown posing for a model which was being made of a Kwakuitl Winter Ceremonial dancer in which the dancer emerges from within a circular hole cut in the dancing screen.

Onomastics

The study of proper names

Keizai - Human Assisted Query Translation for Cross-Language Retrieval

Document Filtering – Using Names, Locations, Keywords

Automatic Document Translation – Spanish, Arabic, Farsi…..

ATS – Differences in Arabic األقطار في اللغوية االختالفات

العربية

Document Collection

Morphological Analysis

Information Retrieval

Sub-corpus Analysis

Explore differences in lexical usage due to –

•Transliteration

•Cultural background(west – French, east – English)

•Spelling differences

For example –

AIDS االيدز ,SIDA السيدا

الأوبيك ,OPEP الأوبيبOPEC

,Teacher (Algeria)الأستاذ

Teacher (Oman)االستاذ

Arabic Speaking Countries

Word AFP English Occurrences

لوس انجليس Los Angeles 21

انجلوس لوس Los Angeles 23

لوس انجيلس Los Angeles 2

لوس انجيليس Los Angeles 34

انجلترا England 2

انكلتر England 1

انكلترا England 1

كاروالينا Carolina 26

كارولينا Carolina 14

ويسكونسين Wisconsin 8

ويسكنسن Wisconsin 2

ويسكونسن Wisconsin 16

نيوهامبشير New Hampshire 15

نيوهامبشر New Hampshire 9

Different Spelling Forms from AFP Arabic Newswire

Place Names Transliterated using National Name

Word Transliteration English Name

شارلوت Sharlote Charlotte

المانيا Alemania Germany

اوروبا Europa Europe

موسكو Moscou Moscow

طرابلس Tarabulus Tripoli

الكويت Al Kuwayt Kuwait

باولوس Paulos Pauls

بروكسل Brussells Brussels

برلين Berlien Berlin

فلسطين Palestina Palestine

بريطانيا Britania Britain

بيروت Bayreuth Beirut

الغوس Lagus Lagos

Meaning Oriented Question Answering -

MOQAAn AQUAINT project by:

Computing Research Laboratory (NMSU)

Institute for Language and Information Technology (UMBC)

CoGenTex, Inc.

This work was supported in full by the Advanced Research and Development Activity (ARDA)’s

Advanced Question Answering for Intelligence (AQUAINT) Program under contract number

2002*H167200*000.

Search FormSearch Form

ILIT

Domain Travels

Meetings

Languages

English

Persian

Arabic

Method

Fact Repository from Text

Ontology based:

- Text Retrieval

- Text Analysis

- Question Analysis

ResultsResults

Triple Inheritance hierarchy for “Nation”

FACT DATABASE: The “Asian-Nation” Instance: “Turkey”

Text Meaning Representation proposition _1

head %exit_1• agent human_54 “Mr. Smith”

• source location_23 “London”

• destination location_25 “Ankara”

• means vehicle_65 “Boeing 757” tmr-time

• time-begin YYYYMMDD “July 2, 2000” aspect

• iteration single; phase end… “departed” polarity positive mood indicative

Resources

U.S. GEOLOGICAL SURVEY FEDERAL GEOGRAPHIC DATA COMMITTEE NATIONAL IMAGERY AND MAPPING

AGENCYU.S. BUREAU OF LAND MANAGEMENT U.S. FOREST SERVICE US CENSUS BUREAUGETTY THESAURUS OF GEOGRAPHIC

NAMES

Resources from (and for) the Humanities are Multilingual!

Computers and the Visual Arts: Editorial

... These developments led a branch of the Comité Internationale Pour l'Histoire de

L'Art (CIHA) to conceive Thesaurus Artis Universalis (TAU) which soon proved ...

www.sumscorp.com/sums/articles/dahlberg.html - 11k - Cached - Similar pages

[PDF]1 Regard sur l'informatisation des collections de musées d'art ...

File Format: PDF/Adobe Acrobat - View as HTML

... audacieuses qui nous promettaient, par exemple, un renouvellement complet et inéluctable de l'histoire de l'art grâce au Thesaurus Artis universalis ? ...

www.kikirpa.be/www2/Site_irpa/ En/Publi/Doc/PYK/Kairis.pdf - Similar pages

Marco Lattanzi - [ Translate this page ]

... stata da tempo recepita dalla comunità internazionale degli studiosi che, nell’ambito

del gruppo di lavoro TAU (Thesaurus Artis Universalis), ha costituito ...

www.ibc.regione.emilia-romagna.it/soprintendenza/ arcaut/lattanzi.html -

ID: 7013032 Record Type: administrative

Edmonton (inhabited place)

Coordinates:

Lat: 53 34 00 N degrees minutes

Long: 113 25 00 W degrees minutes

Note: Located on N Saskatchewan river; flourished as center for agricultural distribution & processing after arrival of Canadian Pacific Railway 1891; petroleum was discovered nearby at Leduc, Redwater & Pembina mid-20th cen.

Names:

Edmonton (preferred, C,V,N)

Strathcona (H,V,N) ............ formerly located on river's S bank; absorbed into city 1912

Ft. Edmonton (H,V,N) ............ fur-trading post for Hudson's Bay Company constructed 20 miles downstream from current site 1795; abandoned 1810

Getty Record for Edmonton

Hierarchical Position:

World (facet (hierarchical))

North and Central America (continent)

Canada (nation)

Alberta (province)

Edmonton (inhabited place)

Place Types:

inhabited place (preferred, C)............ expanded in 19th cen.

city (C) ............ incorporated 1904

provincial capital (C) ............since 1905

industrial center (C)

transportation center (C)

university center (C)

Getty Record for Edmonton (Contd.)

Address Data Content Standard

Public Review Draft

Subcommittee on Cultural and Demographic Data

Federal Geographic Data Committee

April 17, 2003

Version 2

http://www.fgdc.gov/