multi-language cascot margaret birch and ritva ellison institute for employment research
TRANSCRIPT
Multi-language CASCOT
Margaret Birch and Ritva EllisonInstitute for Employment Research
Computer Assisted Structured Coding Tool
CASCOT
• Software tool for coding text automatically or manually
• Developed at the Institute for Employment Research at Warwick University 1993-
• Used by over 100 organisations in the UK and abroad
IER contracted under the DASISH project to develop a multilingual version of CASCOT to code job titles to ISCO 08
A large task and limited resources, so this is a pilot project The 8 selected languages:
- Dutch (Netherlands, Flemish-Belgium)- English
- Finnish- French (France, Walloon-Belgium, Switzerland)- German (Germany, Austria, Switzerland)- Italian- Slovak- Spanish
Key Tasks Translating Cascot user interface texts Constructing national language versions of the ISCO 08
structure for Cascot Indexing job titles in the selected languages to ISCO 08
- Some supplied by NSIs or other partners- Some found by exploring relevant national websites
Validating the software using raw data files from the European Social Survey (ESS) Round 6
Testing Cascot multilingual software Developing language-based coding rules Using Cascot Performance Tool to fine-tune the software
Coding with Cascot
Enter text (could be from a file)
Cascot provides a recommendation for code but user can change it
Output can be directed to a file
Selected classification
Multi-language Cascot
• 8 languages available: Dutch, English, Finnish, French, German, Italian, Slovak and Spanish
Cascot detects language automatically but it can be changed from menu
ISCO-08 classification exists for each country (some with national code)
Coding in Dutch
Finnish
French
German*
* The index is © Federal Employment Agency
Italian
Slovak
Spanish
A test of multi-language Cascot• Comparison of European Social Survey
round 6 code and automatic Cascot code• Data available from DE, ES, GB and NL
ISCO-08
Cascot Performance ToolAllows the user to analyse the performance of Cascot by comparing manually coded data with code produced by Cascot for the same data.
A delimited results file is needed that containsa reference code, Cascot code and Cascot score.
The Tool shows Performance Results Display window with Performance Graph, Summary, Statistics and Key
Opening a results file
Performance Results Display
The longer the green line stays high, the better
The more towards right the purple/blue lines are, the better
• The versions in different languages could be improved by developing coding rules
• Contribution needed from experts who know the language
• Rules are developed with Cascot Editor
Fine-tuning multi-language Cascot
Cascot Editor• Classification files for Cascot are created and modified
with the Editor• Each classification has Structure, Index, Rules for coding
Cascot Editor Rules• Downgraded words: words that are considered to be significantly less
important than other words, e.g. deputy, junior, person• Equivalent word ends: wait|er, wait|ress• Abbreviations: asst assistant, fe further education• Replacement words: taylor tailor, tesco supermarket
– Omitting noise words, e.g. replace ‘part-time’ with nothing• Input modifications: used when the rule absolutely can not be made
elsewhere• Word alternatives: words and phrases that should also be tried as
possible solution candidates
• Conclusions, retired can not conclude, agent ambiguous (score 39)
• Default coding: a set of words and phrases that should be scored as though they were a different word or phrase
Example of a new rule - English
• Add two new Replacement Words rules:
• The result:
• The problem:
Potential for rules - GermanText to be coded Cascot
ScoreBest matching index entry (Cascot)
Klassenlehrer/in (Klasse 1-3)
2341 Lehrkräfte im Primarbereich
73 2330 Lehrkräfte im Sekundarbereich
Klassenlehrer/in
Diplomingenieur/in (Fahrzeugbau)
2144 Maschinenbauingenieure 52 7231 Kraftfahrzeugmechaniker und -schlosser
Fahrzeugbauer/in
Mopedbote/-in 8321 Kraftradfahrer 34 7522 Möbeltischler und verwandte Berufe
Büchsenschäfter/in/in
Rampenpersonal 9333 Frachtarbeiter und verwandte Berufe
27 4323 Bürokräfte in der Transportwirtschaft und verwandte Berufe
Rampenmanager/in
Maniküre 5142 Kosmetiker und verwandte Berufe
0 ---- No conclusion
ISCO-08 (ESCO) ISCO-08 (Cascot)
• German occupational titles were coded fully automatically with Cascot and the result was compared with an approved code. Above some examples where rules would improve Cascot coding performance.
• It is helpful to have “gold standard” files with a large number of real life job titles for which experts have assigned correct codes.
• Cascot coding result can be compared with “gold standard” to find areas for improvement.