a profile of applied data analysis lab (ada lab)

Post on 12-Jul-2015

207 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Applied Data Analysis Lab – a profile

Dr. Łukasz BolikowskiICM, University of Warsaw

December 2014

ADA Lab ⊆ ICM ⊆ UW

University of Warsaw (UW) is one of the top Polish higher education establishments.

Interdisciplinary Centre for Mathematical and Computational Modelling (ICM)is a supercomputing and research data centre within the University of Warsaw.

Applied Data Analysis Lab (ADA Lab) is a research group within the ICM.

ADA Lab’s Scope of Interest

Legal Text Mining

Business Data Mining

Training & Outreach

Scholarly PDF Mining

Map of SciencePersistent IDs

Data Anonymization

Scalable Text and Data Mining Informatics for Open Science

Legal Text Mining

Building a judgment analysis system for Poland.Integrating data from common courts, theSupreme Administrative Court, the SupremeCourt, and the Constitutional Tribunal.Planning a larger, European project with similargoals (Horizon 2020; currently building consor-tium and defining scope).

Business Data Mining

Leveraging high demand for data science skills.For-profit projects with business partners.Usually can’t discuss details due to NDAs.Our favourite toolset:

R for data understanding and modellingApache Spark for analysing larger data setsD3 for information visualizationCRISP-DM for managing our projects(Cross-Industry Standard Process for Data Mining)

Training and Outreach

“Web-Scale Data Mining and Processing”(Course at Polish Academy of Sciences)

“Introduction to Text Mining”(Course at Warsaw School of Data Analysis organised by ICM)

Internal trainings on Hadoop, SparkPresentations at Big Data conferences(Target audience: business partners)

Workshops and internships for talented youth(In collaboration with Polish Children’s Fund)

Scholarly PDF Mining

Extracting metadata, bibliographic references, and full textfrom scholarly PDFs. Research direction: semantic anno-tation of paragraphs, sentences, phrases.CERMINE is an open software (AGPL license), with usersworldwide: OpenAIRE.eu, Paperity.org, Public KnowledgeProject.Interfaces for humans and for machines (RESTful API).Try CERMINE at: http://cermine.ceon.pl/

Map of Science

A comprehensive map of academia. Mining availabledocuments and data sets in order to reconstruct thegraph of relations between: people, documents, insti-tutions, topics, funding sources.Final result: a publicly available data set.Why? Better understanding of science. Cool featuresin digital libraries and research information systems.Elements of the map currently developed in OpenAIREand OCEAN projects.

Persistent IDs

To achieve long-term preservation of research arti-facts, we need an identifier minting and managementscheme that can outlive the organization managingthe scheme.We are developing a distributed scheme based onpublic-key cryptography and P2P networking (a lotin common with Bitcoin).

Data Anonymization

Privacy-preserving research data publication is across-cutting issue, applies to various types ofdata analysed at ICM: legal judgments, medicalrecords, social network activity.

Thank you for your attention. Let’s stay in touch!

adalab.icm.edu.pl/blog

twitter.com/adalab_icm

linkedin.com/in/bolikowski

twitter.com/bolikowski

lukasz.bolikowski@icm.edu.pl

License

c© 2014 ICM, University of Warsaw. Some rights reserved. This presentation is available under a CC BY 3.0 license. Materials from the followingsources were used:

https://www.flickr.com/photos/86530412@N02/8213432552 (p. 4, CC BY 2.0)https://www.flickr.com/photos/124247024@N07/13903385550 (p. 5, CC BY-SA 2.0)https://www.flickr.com/photos/genista/228006200 (p. 6, CC BY-SA 2.0)https://www.flickr.com/photos/bohman/210977249 (p. 9, CC BY 2.0)https://www.flickr.com/photos/hyku/368912557 (p. 10, CC BY 2.0)

top related