36x48 vertical poster template - tsinghuanlp.csai.tsinghua.edu.cn/site2/images/file/introduction to...

1
State Key Lab of Intelligent Technology & Systems Natural Language Processing Group Tsinghua University http://nlp.csai.tsinghua.edu.cn THUNLP Leader: Prof. Maosong Sun Professor Sun is the Chairman of Department of Computer Science and Technology, Tsinghua University. His research interests are computational linguistics, statistical and corpus-based natural language processing (NLP), including: Chinese language computing (computational morphology, bilingual terminology extraction), information retrieval (Chinese text categorization, graphical model based keyword extraction), collective intelligence (tag generation, Web trend analysis) and social computing (query log analysis, community discovery). He has participated as project leader or principal researcher in over 20 projects founded by the National Natural Science Foundation of China, the National Social Science Foundation of China, the 863 National High-Tech R&D Program of China, the 973 National Basic Research Program of China as well as in projects funded by a number of international IT companies. Professor Sun has published, together with his students, about 130 papers in academic journals and international conferences in the above fields. The total number of citations of these papers in Google Scholar is roughly 1,400. He has served as program committee members in numerous national and international conferences, and many times as conference chairs or program committee chairs. About THUNLP The Natural Language Processing Group at the Department of Computer Science and Technology, Tsinghua University (THUNLP), also a part of the National Lab for Information Science and Technology and the State Key Lab of Intelligent Technology and Systems, is working on methodologies and algorithms for computer processing and understanding of human languages with emphasis on Chinese. We focus on basic research in language computation as well as the application- oriented NLP technologies. We have published a number of papers in the related top conferences and journals such as ACL, COLING, EMNLP, IJCAI, VLDB, Computational Linguistics, Journal of Quantitative Linguistics, IEEE Intelligent Systems in recent years. Research Interests Our research covers a range of topics in natural language processing, including: Recently, Professor Sun presented a point of view in NLP: NLP based on huge-scale naturally annotated corpora. The basic idea is with Web-scale corpora, natural annotation may help machine better perform some NLP tasks. There are two types of natural annotation: explicit, as punctuations, anchor text, query log, Wikipedia, blog tags, and implicit, as language usage patterns. He further puts forward a fundamental problem: if we could integrate all the information provided by naturally annotated corpora from different perspectives together in a systematic way, can we achieve some degree of deep understanding of languages for machine? A preliminary work by him and his student in Computational Linguistics in 2009 showed the usefulness of punctuations in Chinese word segmentation, suggesting this idea deserves further study. He is the Vice President of Chinese Information Processing Society, the council member of China Computer Federation, the council member of Chinese Association for Artificial Intelligence, the officer of ACL SIGHAN, the member-at-large of ACM China Council, the vice chairman of Expert Committee of Language Commission of Beijing Municipal Government, the member of Expert Committee of National Language Resource Surveillance and Research Center, the Editor-in-chief of the Journal of Chinese Information Processing, the Editorial Board members of many journals including the Communications of CCF, the Journal of Computer Science and Technology, the Journal of Chinese Language and Computing, Applied Linguistics and Nankai Linguistics. NLP based on Huge-scale Naturally Annotated Corpora Word segmentation using punctuations in huge-scale web articles New word detection and related word retrieval from user logs of Chinese input method Chinese abbreviation extraction from anchor texts in web pages New word detection from user logs of search engine Social Tagging and Keyword Extraction Tag disambiguation Tag suggestions using topic models Tag suggestions via Latent Reason Identification Exploring subsumption relations in social tags Keyword extraction by clustering to find exemplar terms Keyword extraction via topic decomposition Text Classification Feature selection for Chinese text classification Scalable term selection for text classification Efficient text classification using term projection Transfer learning and self training for text classification Text classification-based image classification Multilingual Analysis Fast and robust sentence alignment algorithm Bilingual terminology extraction system Statistical method for Uyghur tokenization Uyghur morpheme analysis "Female Script" pinyin input method

Upload: others

Post on 13-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 36x48 vertical poster template - Tsinghuanlp.csai.tsinghua.edu.cn/site2/images/file/Introduction to THUNLP.pdf · He has participated as project leader or principal researcher in

State Key Lab of Intelligent Technology & Systems

Natural Language Processing Group

Tsinghua University

http://nlp.csai.tsinghua.edu.cn

THUNLP Leader: Prof. Maosong Sun Professor Sun is the Chairman of Department of Computer Science and Technology, Tsinghua University. His research

interests are computational linguistics, statistical and corpus-based natural language processing (NLP), including: Chinese

language computing (computational morphology, bilingual terminology extraction), information retrieval (Chinese text

categorization, graphical model based keyword extraction), collective intelligence (tag generation, Web trend analysis) and

social computing (query log analysis, community discovery). He has participated as project leader or principal researcher in

over 20 projects founded by the National Natural Science Foundation of China, the National Social Science Foundation of

China, the 863 National High-Tech R&D Program of China, the 973 National Basic Research Program of China as well as in

projects funded by a number of international IT companies. Professor Sun has published, together with his students, about 130

papers in academic journals and international conferences in the above fields. The total number of citations of these papers in

Google Scholar is roughly 1,400. He has served as program committee members in numerous national and international

conferences, and many times as conference chairs or program committee chairs.

About THUNLP The Natural Language Processing Group at the Department of Computer Science and Technology, Tsinghua University

(THUNLP), also a part of the National Lab for Information Science and Technology and the State Key Lab of Intelligent

Technology and Systems, is working on methodologies and algorithms for computer processing and understanding of human

languages with emphasis on Chinese. We focus on basic research in language computation as well as the application-

oriented NLP technologies. We have published a number of papers in the related top conferences and journals such as ACL,

COLING, EMNLP, IJCAI, VLDB, Computational Linguistics, Journal of Quantitative Linguistics, IEEE Intelligent Systems in

recent years.

Research Interests Our research covers a range of topics in natural language processing, including:

Recently, Professor Sun presented a point of view in NLP: NLP based on huge-scale naturally annotated corpora. The basic idea is with Web-scale corpora, natural

annotation may help machine better perform some NLP tasks. There are two types of natural annotation: explicit, as punctuations, anchor text, query log, Wikipedia, blog

tags, and implicit, as language usage patterns. He further puts forward a fundamental problem: if we could integrate all the information provided by naturally annotated

corpora from different perspectives together in a systematic way, can we achieve some degree of deep understanding of languages for machine? A preliminary work by him

and his student in Computational Linguistics in 2009 showed the usefulness of punctuations in Chinese word segmentation, suggesting this idea deserves further study.

He is the Vice President of Chinese Information Processing Society, the council member of China Computer Federation, the council member of Chinese Association for

Artificial Intelligence, the officer of ACL SIGHAN, the member-at-large of ACM China Council, the vice chairman of Expert Committee of Language Commission of Beijing

Municipal Government, the member of Expert Committee of National Language Resource Surveillance and Research Center, the Editor-in-chief of the Journal of Chinese

Information Processing, the Editorial Board members of many journals including the Communications of CCF, the Journal of Computer Science and Technology, the Journal

of Chinese Language and Computing, Applied Linguistics and Nankai Linguistics.

NLP based on Huge-scale Naturally Annotated

Corpora

Word segmentation using punctuations in huge-scale web

articles

New word detection and related word retrieval from user

logs of Chinese input method

Chinese abbreviation extraction from anchor texts in web

pages

New word detection from user logs of search engine

Social Tagging and Keyword Extraction

Tag disambiguation

Tag suggestions using topic models

Tag suggestions via Latent Reason Identification

Exploring subsumption relations in social tags

Keyword extraction by clustering to find exemplar

terms

Keyword extraction via topic decomposition

Text Classification

Feature selection for Chinese text classification

Scalable term selection for text classification

Efficient text classification using term projection

Transfer learning and self training for text

classification

Text classification-based image classification

Multilingual Analysis

Fast and robust sentence alignment

algorithm

Bilingual terminology extraction system

Statistical method for Uyghur tokenization

Uyghur morpheme analysis

"Female Script" pinyin input method