part 1: overhead presentation

28
GEMET, the General Environmental Multilingual Thesaurus: Development, User Perspectives and Plans for a Thesaurus System Part 1: overhead presentation Bruno Felluga, CNR - Consiglio Nazionale delle Richerche, Rome, Italy Part 2: slide show Stefan Jensen, project leader ETC/CDS, Lower Saxony Ministry of the Environment Hannover, Germany Open Forum on Metadata Registries, Santa Fe, NM January 20, 2000 EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS) EUROPEAN ENVIRONMENT AGENCY

Upload: marcos

Post on 04-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

GEMET, the General Environmental Multilingual Thesaurus: Development, User Perspectives and Plans for a Thesaurus System. Part 1: overhead presentation Bruno Felluga, CNR - Consiglio Nazionale delle Richerche, Rome, Italy Part 2: slide show Stefan Jensen, project leader ETC/CDS, - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Part 1: overhead presentation

GEMET, the General Environmental Multilingual Thesaurus: Development, User Perspectives and Plans for a Thesaurus System

Part 1: overhead presentationBruno Felluga, CNR - Consiglio Nazionale delle Richerche, Rome, Italy

Part 2: slide showStefan Jensen, project leader ETC/CDS,Lower Saxony Ministry of the EnvironmentHannover, Germany

Open Forum on Metadata Registries, Santa Fe, NMJanuary 20, 2000

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY

Page 2: Part 1: overhead presentation

Outline GEMET presentation - part 2

“linking terminology and applications”

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY

GEMET activities performed by the ETC/CDS

- co-ordination of the thesaurus development- GEMET usage for indexing and retrieving environmental metadata - development of application around GEMET- assessing 3rd party user needs to incorporate into future developments

Page 3: Part 1: overhead presentation

Co-ordination of the development

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY

- Encouraging and co-ordinating the translation of the core terminology into 12 languages

- Contracting application development around GEMET

- implement shared coding lists (value domains)

- Promoting the use of GEMET through marketing activities

- Distributing GEMET and supplying technical helpdesk

Page 4: Part 1: overhead presentation

GEMET - usage for indexing metainformation

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY

GEMET has been used in the work of EEA to index metadata from the following resources:

- The Directory of Information Resources (DIR)

- The Reporting Obligation Database (ROD)

to do this, 2 applications were developed:

- MS-Access based tool for metadata registry (WinCDS)

- Webbased JAVA tool for online registration(prototype)

Page 5: Part 1: overhead presentation

Thesaurus part of WinCDS

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY

Page 6: Part 1: overhead presentation

JAVA based online registration - the indexing

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY

Page 7: Part 1: overhead presentation

GEMET - usage for indexing metainformation

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY

Directory of Information Resources

Total dataset sum: 931

Controlled terms in use: 655 of ~5300

Total descriptors sum: 4714(GEMET terms used for indexing)

Term ranking:

121 of 655 terms have been used more than 10 times

Page 8: Part 1: overhead presentation

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)

EUROPEAN ENVIRONMENT AGENCY

327

199

97

84

83

78

76

70

69

62

61

61

60

0 50 100 150 200 250 300 350

1

Des

crip

tors

Count

agriculture

water resource

air quality

environmentalinformation networkair pollution

biodiversity

climatic change

waste management

environmental policy

acidification

environmental protection

environmental report

legislation

Terms used more than 60 times

Page 9: Part 1: overhead presentation

Term count between 40 and 55

55

54

52

46

44

41

41

40

0 10 20 30 40 50 60

1

Des

crip

tors

Term Count (sum)

state of the environment

noise

industry

waste

ozone layer depletion

tropospheric ozone

energy

transportation

DIR : term ranking

Page 10: Part 1: overhead presentation

Term count between 30 and 39

39

38

37

37

36

35

34

33

33

32

32

30

0 5 10 15 20 25 30 35 40 45

1

Des

crip

tors

Term Count (sum)

land cover

pollution control

coastal area

report on the state of the environment

coastal environment

international convention

forestry

nature

urban ecology

atmospheric pollution

European Environment Agency

marine environment

DIR : term ranking

Page 11: Part 1: overhead presentation

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)

EUROPEAN ENVIRONMENT AGENCY

Reporting Obligations Database ROD prototype

Questions

Datasets total: 22844

Controlled terms in use: 323 of 5300 have been used 86628 times

Top ranking: Term ‚atmospheric emissions‘ has been used 14443 times

Sources

Datasets total: 42Controlled terms in use: 65 of 5300 have been used 256 times

GEMET - usage for indexing metainformation

Page 12: Part 1: overhead presentation

ROD questions: term ranking

Term count between 4212 and 1105

42122932

270124442359

18361830179117191680

14571420

12831105

0 2000 4000 6000

1

De

sc

rip

tors

Count (sum) questions

surface water

manufacturing activity

chemical industry

solvent

forest

valeur limite

environmentally dangeroussubstanceenvironmental quality objective

industry

agriculture and cattle industry

environmental licence

processus de combustion

emissions to water

industrial processes

Page 13: Part 1: overhead presentation

Term count between 600 and 899

899

868

868

729

682

618

604

600

0 100 200 300 400 500 600 700 800 900 1000

1

Des

crip

tor

count (sum) datasets

hexachlorobenzène (HCB)

manure

cadmium

incineration of waste

mercury

industrial production waste

furnace

organic chemistry

ROD questions: term ranking

Page 14: Part 1: overhead presentation

Term count between 585 and 540

585

568

566

564

560

560

552

550

540

540

510 520 530 540 550 560 570 580 590

1

Des

crip

tor

Count (sum) datasets

trichloroéthylène (TRI)

trichlorobenzène (TCB)

pentachlorophenol

paper industry

environmental data

perchloroethylene

dust

hexachlorohexane (HCH)

road transport

extractive industry

ROD questions: term ranking

Page 15: Part 1: overhead presentation

ROD Sources:Term count

33

24

21

17

15

13

12

10

0 5 10 15 20 25 30 35

1

Des

crip

tors

count (sum)

decision

air

drivers (DPSIR)

EC directive

wastes

état (DPSIR)

pressions (DPSIR)

réactions politiques (DPSIR)

between 10 and 33

ROD sources: term ranking

Page 16: Part 1: overhead presentation

ROD Sources: Term count between 3 and 8

8

8

6

6

6

4

4

3

3

3

3

3

3

0 1 2 3 4 5 6 7 8 9

1

Des

crip

tors

Count (sum) datasets

consumptive water

emissions to water

incineration of waste

municipal waste

waste management

air quality

hazardous waste

environmental quality objective

report on the state of the environment

data on the state of the environment

water quality

atmospheric emissions

freshwater

ROD sources: term ranking

Page 17: Part 1: overhead presentation

GEMET - usage to browse and retrieve metadata

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY

GEMET is used to browse or retrieve metadata within 5 applications:

• The ThesShow GEMET browser

• The WebCDS accessing the DIR via HTML

• The WebCDS accessing the DIR via JAVA applets

• The multilingual search service (MSS)

• The Reporting Obligation Database (ROD)

Page 18: Part 1: overhead presentation

JAVA based thesaurus browser for WebCDS

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY

Page 19: Part 1: overhead presentation

The Multilingual Search Service (MSS)

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY

• Motivation– distributed multilingual document collections of European

institutions (European Environment Agency, EEA)– support query formulation in user‘s native language– search and retrieve documents in all understandable languages

• Approach of EEA‘s Multilingual Search Service (MSS)– thesaurus support for query formulation

(domain specific thesaurus required, e.g. GEMET)– translation by making use of multilinguality of theseaurus

(GEMET is available in 12 languages)– use translations as input for off-the-shelf Web search engine

(e.g., Netscape Compass Server)

Page 20: Part 1: overhead presentation

Using a term for searching metadata

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY

Page 21: Part 1: overhead presentation

Search results from websites within the EERC

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY

Page 22: Part 1: overhead presentation

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)

EUROPEAN ENVIRONMENT AGENCY

Questionnaire about GEMET usage

Goals:

- learn more about current users - get guidance from usage for future development

Process:

- the current ~200 GEMET users from all over the world have been addressed by e-mail- the 2 page (+annexes) questionnaire was made available digitally and as a form on internet - Survey was performed in November and December 1999

Page 23: Part 1: overhead presentation

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)

EUROPEAN ENVIRONMENT AGENCY

Thesaurus usage frequency

17%

41%

42%

0%daily

frequently

occassionaly

not at all

42% translation

33% indexing

22% retrieval

56% translation22% indexing/retrieval

100% translation

Areas and frequency of GEMET usage

Page 24: Part 1: overhead presentation

0

14

29

0

0

14

7

0

7

0

0

14

0

14

7

86

7

57

36

7

7

21

7

21

36

7

0 10 20 30 40 50 60 70 80 90

danish

dutch

english

finnish

french

german

greek

icelandic

italian

norwegian

portuguese

spanish

swedish

languagesin need forindexing

userlanguage

%

Current usage of languages in GEMET

Page 25: Part 1: overhead presentation

Evaluation of the thesaurus content

43

71

71

71

93

86

86

57

29

7

29

7

14

14

0 10 20 30 40 50 60 70 80 90 100

Do you agree with the term count >5000

in need of more specific terms

is it too much / overwhelming

some terms should be deleted

do you need additonal groups*

do you need of additonal synonyms

do you need a polyhierarchicalappearance of terms

yes

no

%

Page 26: Part 1: overhead presentation

Usage of the GEMET browser ThesShow

%86

57

29

57

43

50

0

0

29

57

36

50

43

93

0 10 20 30 40 50 60 70 80 90 100

is the product clear/distinct

do you need the connectionwith own applications

do you want to add ownThesaurus

a table of own terms

a glossary

an userinterface in ownlanguage

do you miss functionalities

no

yes

Page 27: Part 1: overhead presentation

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)

EUROPEAN ENVIRONMENT AGENCY

GEMET - conclusions from the own indexing experience and the questionnaire

General guidelines:

- The GEMET content should remain stable, minor improvements are justified

- There is a need to add new functionalities to the tools to allow the user to customise an own “thesaurus system”

Page 28: Part 1: overhead presentation

Contact information

EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY

URL: http://www.mu.niedersachsen.deor http://etc-cds.eionet.eu.int

eMail: etc/[email protected]