harvesting crowdsourcing biodiversity data from facebook groups
TRANSCRIPT
0. Crowdsourcing - participants provide unstructured data voluntarily
Facebook interest groups
Reptile-Road-Mortality Enjoy-Moths
Main Database
1. Crawling data from Facebook via its API
Post Picture
Post message
Thread
Comment message
Comment message
…
Comment message
What a typical discussion thread looks like.
2. Using natural language processing techs with Taiwan Geographic Name and Taiwan Catalogue of Life databases as knowledge bases to extract species vernacular names and place names from a thread
細紋南蛇
Prefix2 細紋
Prefix3 細紋南
occurs in the message?
No
occurs in the message?
No
Postfix2 南蛇
Postfix1 蛇
Yes
Yes
occurs in the message?
Name doesn’t exist in the message
No
Yes
occurs in the thread?
No
Yes
Full-matched name
Matched abbreviation Calculate confidence score of this name
Yes
No
For each vernacular name in TaiCOL do:
3. Introducing content management system Drupal for easier data management (including error correction) and display
Algorithms used to recognize abbreviations of vernacular names and place names
The emergence of Web 2.0 enables people to contribute their biodiversity observations on the Web. These crowdsourcing biodiversity data are increasing their value in scientific studies due to the potentially broader spatial and temporal scales. However, the data provided in plain text hinder the process of data retrieval and analysis. In this study, we propose a framework to automatically structure the loose-format text so that volunteers can keep providing data in their own familiar ways, while interested citizens, biodiversity researchers and managers can benefit from the semantically structured information. We take 2 Facebook biodiversity interest groups Reptile-Road-Mortality and Enjoy-Moths as examples.
Harvesting crowdsourcing biodiversity data from Facebook groups Jason Guan-Shuo Mai1, Cheng-Hsin Hsu1, Dong-Po Deng2, De-En Lin3, Hsu-Hong Lin3, Kwang-Tsao Shao1
4. Publishing linked open data via D2R server for open access and usage
Our dataset is linked to other datasets on linked open data cloud such as DBPedia, GeoNames and LODE (Linked Open Data of Ecology) so it can have benefit from the large amount of meta-information they provide.
5. Developing browser plug-ins to give users digested feedback of structuralized data
6. Improving source data quality without changing users’ own familiar ways
Our algorithm picks a most related species name appearing in a thread based on social networking characteristics.
One click on a message to recognize species vernacular names and related information
Semantic annotation tool disambiguates toponymic homonyms
1 Taiwan Biodiversity Information Facility (TaiBIF), Biodiversity Research Center, Academia Sinica, Taipei, Taiwan 2 Institute of Information Science, Academia Sinica, Taipei, Taiwan 3 Taiwan Endemic Species Research Institute, Council of Agriculture, Nantou, Taiwan