part 1: overhead presentation
DESCRIPTION
GEMET, the General Environmental Multilingual Thesaurus: Development, User Perspectives and Plans for a Thesaurus System. Part 1: overhead presentation Bruno Felluga, CNR - Consiglio Nazionale delle Richerche, Rome, Italy Part 2: slide show Stefan Jensen, project leader ETC/CDS, - PowerPoint PPT PresentationTRANSCRIPT
GEMET, the General Environmental Multilingual Thesaurus: Development, User Perspectives and Plans for a Thesaurus System
Part 1: overhead presentationBruno Felluga, CNR - Consiglio Nazionale delle Richerche, Rome, Italy
Part 2: slide showStefan Jensen, project leader ETC/CDS,Lower Saxony Ministry of the EnvironmentHannover, Germany
Open Forum on Metadata Registries, Santa Fe, NMJanuary 20, 2000
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY
Outline GEMET presentation - part 2
“linking terminology and applications”
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY
GEMET activities performed by the ETC/CDS
- co-ordination of the thesaurus development- GEMET usage for indexing and retrieving environmental metadata - development of application around GEMET- assessing 3rd party user needs to incorporate into future developments
Co-ordination of the development
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY
- Encouraging and co-ordinating the translation of the core terminology into 12 languages
- Contracting application development around GEMET
- implement shared coding lists (value domains)
- Promoting the use of GEMET through marketing activities
- Distributing GEMET and supplying technical helpdesk
GEMET - usage for indexing metainformation
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY
GEMET has been used in the work of EEA to index metadata from the following resources:
- The Directory of Information Resources (DIR)
- The Reporting Obligation Database (ROD)
to do this, 2 applications were developed:
- MS-Access based tool for metadata registry (WinCDS)
- Webbased JAVA tool for online registration(prototype)
Thesaurus part of WinCDS
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY
JAVA based online registration - the indexing
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY
GEMET - usage for indexing metainformation
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY
Directory of Information Resources
Total dataset sum: 931
Controlled terms in use: 655 of ~5300
Total descriptors sum: 4714(GEMET terms used for indexing)
Term ranking:
121 of 655 terms have been used more than 10 times
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)
EUROPEAN ENVIRONMENT AGENCY
327
199
97
84
83
78
76
70
69
62
61
61
60
0 50 100 150 200 250 300 350
1
Des
crip
tors
Count
agriculture
water resource
air quality
environmentalinformation networkair pollution
biodiversity
climatic change
waste management
environmental policy
acidification
environmental protection
environmental report
legislation
Terms used more than 60 times
Term count between 40 and 55
55
54
52
46
44
41
41
40
0 10 20 30 40 50 60
1
Des
crip
tors
Term Count (sum)
state of the environment
noise
industry
waste
ozone layer depletion
tropospheric ozone
energy
transportation
DIR : term ranking
Term count between 30 and 39
39
38
37
37
36
35
34
33
33
32
32
30
0 5 10 15 20 25 30 35 40 45
1
Des
crip
tors
Term Count (sum)
land cover
pollution control
coastal area
report on the state of the environment
coastal environment
international convention
forestry
nature
urban ecology
atmospheric pollution
European Environment Agency
marine environment
DIR : term ranking
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)
EUROPEAN ENVIRONMENT AGENCY
Reporting Obligations Database ROD prototype
Questions
Datasets total: 22844
Controlled terms in use: 323 of 5300 have been used 86628 times
Top ranking: Term ‚atmospheric emissions‘ has been used 14443 times
Sources
Datasets total: 42Controlled terms in use: 65 of 5300 have been used 256 times
GEMET - usage for indexing metainformation
ROD questions: term ranking
Term count between 4212 and 1105
42122932
270124442359
18361830179117191680
14571420
12831105
0 2000 4000 6000
1
De
sc
rip
tors
Count (sum) questions
surface water
manufacturing activity
chemical industry
solvent
forest
valeur limite
environmentally dangeroussubstanceenvironmental quality objective
industry
agriculture and cattle industry
environmental licence
processus de combustion
emissions to water
industrial processes
Term count between 600 and 899
899
868
868
729
682
618
604
600
0 100 200 300 400 500 600 700 800 900 1000
1
Des
crip
tor
count (sum) datasets
hexachlorobenzène (HCB)
manure
cadmium
incineration of waste
mercury
industrial production waste
furnace
organic chemistry
ROD questions: term ranking
Term count between 585 and 540
585
568
566
564
560
560
552
550
540
540
510 520 530 540 550 560 570 580 590
1
Des
crip
tor
Count (sum) datasets
trichloroéthylène (TRI)
trichlorobenzène (TCB)
pentachlorophenol
paper industry
environmental data
perchloroethylene
dust
hexachlorohexane (HCH)
road transport
extractive industry
ROD questions: term ranking
ROD Sources:Term count
33
24
21
17
15
13
12
10
0 5 10 15 20 25 30 35
1
Des
crip
tors
count (sum)
decision
air
drivers (DPSIR)
EC directive
wastes
état (DPSIR)
pressions (DPSIR)
réactions politiques (DPSIR)
between 10 and 33
ROD sources: term ranking
ROD Sources: Term count between 3 and 8
8
8
6
6
6
4
4
3
3
3
3
3
3
0 1 2 3 4 5 6 7 8 9
1
Des
crip
tors
Count (sum) datasets
consumptive water
emissions to water
incineration of waste
municipal waste
waste management
air quality
hazardous waste
environmental quality objective
report on the state of the environment
data on the state of the environment
water quality
atmospheric emissions
freshwater
ROD sources: term ranking
GEMET - usage to browse and retrieve metadata
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY
GEMET is used to browse or retrieve metadata within 5 applications:
• The ThesShow GEMET browser
• The WebCDS accessing the DIR via HTML
• The WebCDS accessing the DIR via JAVA applets
• The multilingual search service (MSS)
• The Reporting Obligation Database (ROD)
JAVA based thesaurus browser for WebCDS
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY
The Multilingual Search Service (MSS)
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY
• Motivation– distributed multilingual document collections of European
institutions (European Environment Agency, EEA)– support query formulation in user‘s native language– search and retrieve documents in all understandable languages
• Approach of EEA‘s Multilingual Search Service (MSS)– thesaurus support for query formulation
(domain specific thesaurus required, e.g. GEMET)– translation by making use of multilinguality of theseaurus
(GEMET is available in 12 languages)– use translations as input for off-the-shelf Web search engine
(e.g., Netscape Compass Server)
Using a term for searching metadata
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY
Search results from websites within the EERC
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)
EUROPEAN ENVIRONMENT AGENCY
Questionnaire about GEMET usage
Goals:
- learn more about current users - get guidance from usage for future development
Process:
- the current ~200 GEMET users from all over the world have been addressed by e-mail- the 2 page (+annexes) questionnaire was made available digitally and as a form on internet - Survey was performed in November and December 1999
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)
EUROPEAN ENVIRONMENT AGENCY
Thesaurus usage frequency
17%
41%
42%
0%daily
frequently
occassionaly
not at all
42% translation
33% indexing
22% retrieval
56% translation22% indexing/retrieval
100% translation
Areas and frequency of GEMET usage
0
14
29
0
0
14
7
0
7
0
0
14
0
14
7
86
7
57
36
7
7
21
7
21
36
7
0 10 20 30 40 50 60 70 80 90
danish
dutch
english
finnish
french
german
greek
icelandic
italian
norwegian
portuguese
spanish
swedish
languagesin need forindexing
userlanguage
%
Current usage of languages in GEMET
Evaluation of the thesaurus content
43
71
71
71
93
86
86
57
29
7
29
7
14
14
0 10 20 30 40 50 60 70 80 90 100
Do you agree with the term count >5000
in need of more specific terms
is it too much / overwhelming
some terms should be deleted
do you need additonal groups*
do you need of additonal synonyms
do you need a polyhierarchicalappearance of terms
yes
no
%
Usage of the GEMET browser ThesShow
%86
57
29
57
43
50
0
0
29
57
36
50
43
93
0 10 20 30 40 50 60 70 80 90 100
is the product clear/distinct
do you need the connectionwith own applications
do you want to add ownThesaurus
a table of own terms
a glossary
an userinterface in ownlanguage
do you miss functionalities
no
yes
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)
EUROPEAN ENVIRONMENT AGENCY
GEMET - conclusions from the own indexing experience and the questionnaire
General guidelines:
- The GEMET content should remain stable, minor improvements are justified
- There is a need to add new functionalities to the tools to allow the user to customise an own “thesaurus system”
Contact information
EUROPEAN TOPIC CENTRE ON CATALOGUE OF DATA SOURCES (ETC/CDS)EUROPEAN ENVIRONMENT AGENCY
URL: http://www.mu.niedersachsen.deor http://etc-cds.eionet.eu.int
eMail: etc/[email protected]