cwn aat talk

Download Cwn aat talk

Post on 11-May-2015

559 views

Category:

Documents

8 download

Embed Size (px)

TRANSCRIPT

  • 1.Sinica Bow Lab of Ontologies, Language Processing and e-Humanities,NTNU CWN group, Institute of Linguistics, Academia Sinica shukai@gmail.com April 24, 2010

2. BackgroundSinica BOW and Chinese Wordnet (CWN)On-going Eorts and Future Perspectives 3. BackgroundSinica BOW and Chinese Wordnet (CWN)On-going Eorts and Future Perspectives 4. Who are We?sense 5. What We have been Working on? Language Resources Construction, Evaluation and Knowledge Modelling: Corpus (ASBC, LDC-Gigaword, twWaC(balanced, domain and social media)) Lexicon (Core Vocabulary, Domain lexicon knowledge base) Ontology (Sinica BOW (SUMO), KYOTO-DOLCHE, Hanzi/radical Ontology, Domain ontologies) 6. Corpus and Query Tools 7. Ontology and Cross-languages ValidationSUMO Chinese example I y- 8. Lexicon Corpus distribution-based approach Simulation-based computational approach (Psycho-) linguistic approach 9. Latent Semantics in the Mental Lexicon 10. Random Walk in the Mental Lexicon 11. WordNet 12. WordNet Browser (e.g., Dubey) 13. BackgroundSinica BOW and Chinese Wordnet (CWN)On-going Eorts and Future Perspectives 14. Bootstrapping Bilingual Wordnet (I): Sinica BOW 15. Bootstrapping Bilingual Wordnet (II): GoogleCWN 16. Chinese-anchored Bilingual Wordnet from Scratch 17. Methodologies, Issues and Solutions 1. Word segmentatin and selection (frequency and lexicalsemantic theory-based) 2. Word sense distinction: (synset), (sense) (meaning facet) 3. Word sense relations: LSR algegra (transitivity in thenetwork), paronymy, troponymy, morpho-semantic relations,etc. 18. Implementation 1. From MS Access to MySQL database. 2. Python-NLTK modules for CWN (and other resources) 3. Convert to LMF-compatible markup 19. Lexicon Standard and Markup Languages LMF (Lexical Markup Framework) GLML(Generative Lexicon Markup Language) KAF (KYOTO-Annotation Format) 20. KAF Example 21. Current status 22. Toward a Global Wordnet Grids HanziGrid among CJKV (partly done with Chinese Hanzi and Japanese Kanji mapping) Chinese-Italian WordNet Web Service (RDF/OWL representation as a data model for Semantic Web) Global Wordnets Sense Tagging (Environmental domain for SemEval 2010) 23. Toward Mashup approach to dynamic LKB: Wordnik Test online 24. Toward a better understanding of Lexical and Social Network 25. KYOTO-CWN WORKSHOP Around mid September Release of tools, resources, technical reports, browsing system