將 open data 放上 open source platforms: 開源資料入口平台 ckan 開發經驗分享

68
Open Data 放上 Open Source Platforms 開源資料入口平台 CKAN 開發經驗分享 @ FOSS and Project Collaboration (Spring 2015) This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Taiwan License. Presenter: 李承錱 Cheng-Jen Lee (Sol) Email: cjlee AT iis.sinica.edu.tw

Upload: chengjen-lee

Post on 16-Jul-2015

212 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

將Open Data放上 Open Source Platforms開源資料入口平台 CKAN開發經驗分享

@ FOSS and Project Collaboration (Spring 2015)

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Taiwan License.

Presenter: 李承錱 Cheng-Jen Lee (Sol)

Email: cjlee AT iis.sinica.edu.tw

Page 2: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

2

About Me

● Sol, @u10313335

● Institute of Information Science, Academia Sinica

● https://about.me/SolLee

● Python / R / Java

● Focused Areas

– CMS– Data Repository– Open Data– *nix System Administration

Page 3: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

3

Agenda

● Open Data and Open Data Portals● About CKAN● CKAN and 5 Open Data★● Experiences● Contribution: What and How?

Page 4: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

4

Open Data and Open Data Portals

● Open Data

– The idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control1.

● Open Data Portals

– Facilitate access to and re-use of public sector information2.

– “Infrastruction” of open data

1. Wikipeida: open data https://en.wikipedia.org/wiki/Open_data 2. Open Data Portals - Digital Agenda for Europe http://ec.europa.eu/digital-agenda/en/open-data-portals

Page 5: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

5

About CKAN

Page 6: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

6

CKAN

● The Comprehensive Knowledge Archive Network

● A powerful data management system– Publishing– Sharing– Finding– Using Data

Page 7: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

7

Screenshot

Page 8: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

8

The Most Popular Platform for Open Data

116 instancesaround the worldin March 2015

http://ckan.org/instances

Page 9: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

9

The Most Popular Platform for Open Data● Widely used in government data portal

– In EU member states, 30% open data portals adopted CKAN (OpenDataMonitor1, March 2015)

● Workflow support for publishing data

● Data Visualization

● 100+ Extensions

● Powerful APIs

● Open-sourced (AGPLv3)

1. http://www.opendatamonitor.eu

Page 10: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

10

United KingdomDATA.GOV.UK

Page 11: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

11

United StatesDATA.GOV

Page 12: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

12

JapanDATA.GO.JP

Page 13: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

13

European UnionPUBLICDATA.EU

Page 14: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

14

Tainan CityDATA.TAINAN.GOV.TW

Page 15: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

15

Nantou CountyDATA.NANTOU.GOV.TW

Page 16: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

16

Hsinchu CityOPENDATA.HCCG.GOV.TW

Page 17: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

17

Taipei CityDATA.TAIPEI

Page 18: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

18

台江內海研究資料集TAIJIANG.TW

Page 19: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

19

Demo Sitedemo.ckan.org

Page 20: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

20

Publish Datasets

① Add Dataset Information

Page 21: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

21

Publish Datasets

② Add Data under the Dataset

Page 22: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

22

Find Datasets

By Keyword

Page 23: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

23

Find Datasets

By Location

Page 24: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

24

Find Datasets

By filters

Page 25: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

25

Data Preview and Visualization

recline_view (csv, xls)Grid

Page 26: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

26

Data Preview and Visualization

recline_view (csv, xls)Graph

Page 27: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

27

Data Preview and Visualization

recline_view (csv, xls)Lat/Long fields

Page 28: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

28

Data Preview and Visualization

wms_preview

Page 29: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

29

Data Preview and Visualization

geojson_preview

Page 30: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

30

Data Preview and Visualization

● Docs: recline_view, text_view, json_view, pdf_view, webpage_view, officedocs_view...

● Pics: image_view

● And more!

Page 31: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

31

Authorization

organization

http://opendata.hccg.gov.tw/organization

Page 32: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

32

Data Exchange

Harvest and Federation

Page 33: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

33

CKAN and 5 Open Data★ 1

1. Tim Berners-Lee, “Linked Data”http://www.w3.org/DesignIssues/LinkedData.html

Page 34: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

34

CKAN and 5 Open Data★

● ★ Make your stuff available on the Web (whatever format) under an open license

Customizable licenses

Page 35: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

35

CKAN and 5 Open Data★

● ★★ Make it available as structured data (e.g., Excel instead of image scan of a table)

★★★ Use non-proprietary formats (e.g., CSV instead of Excel)

– Upload any data format– Data API

● Get records from

structured data

Data API

Page 36: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

36

CKAN and 5 Open Data★

● ★★★★ Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff

● ★★★★★ Link your data to other data to provide context

– Built-in RDF exporting capabilities– Expose or consume metadata from other catalogs using RDF

(DCAT) docs1

● ckanext-qa2: Check the openess of datasets or resources

1. Supported by ckanext-dcat extension2. https://github.com/ckan/ckanext-qa

Page 37: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

37

Experiences

Page 38: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

38

System Architecture

Page 39: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

39

Installation

● Official Documents:

– http://docs.ckan.org/en/latest/● Installation Notes (In Chinese):

– https://ckan-docs-tw.readthedocs.org/

Page 40: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

40

Customizations for Taijiang.tw

● Custom Metadata● Data Visualization● Custom filters● Harvest● Localization● Source Code Released under AGPLv3 (On GitHub: u10313335)

– ckanext-taijiang– ckanext-spatial– taijiang-ckan-translations– taijiang-bulk-uploader

Page 41: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

41

Custom Metadata

● Extension ckanext-scheming1

– Configure and share CKAN schemas using a JSON schema description.

– Custom template snippets for editing and display fields.Template Name Function

text.html a simple text field for free-form text

large_text.html a larger text field

date.html a date widget

markdown.html a markdown field

select.html a select box

multiple_choice.html a group of checkboxes

repeating.html a repeating fields1. https://github.com/open-data/ckanext-scheming, only for CKAN 2.3+

Page 42: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

42

Custom Metadata – Example

{

"field_name": "data_type",

"label": {"en": "Data Type", "zh_TW": "資料類型 "},

"preset": "select",

"form_attrs": {"data-module": "autocomplete"},

"choices": [{"value": "statistics", "label": Statistics"}]

}

{

"field_name": "ref",

"preset": "repeating_text",

"label": {"en": "Reference", "zh_TW": "參考來源 "},

"form_blanks": 3

}

select

repeating_text

Page 43: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

43

Validator and Converter

● Ensure data quality

Page 44: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

44

Validator and Converter

● Validator

– Validate user inputs– Ex. json_validator

def json_validator(value, context): if value == '': return value try: json.loads(value) except ValueError: raise Invalid('Invalid JSON') return value

Page 45: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

45

Validator and Converter

● Converter

– Convert data to storage– Ex. duplicate_validator

def duplicate_validator(key, data, errors, context): if errors[key]: return value = json.loads(data[key])

unduplicated = list(set(value)) data[key] = json.dumps(unduplicated)

Page 46: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

46

Data Visualization

● There is no viewer for some GIS formats

– WMTS services– ESRI Shapefile (*.shp and *.dbf)

● Do It Ourselves!

– wmts_view– shp_view

Page 47: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

47

Write a CKAN Plugin

● PyUtilib Component Architecture (PCA)

● Inherits from

– ckan.plugins.SingletonPlugin● Implements

– one (or several) ckan.plugins.* interfaces

Page 48: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

48

To Build a "viewer"

● We need more…

– View template (Jinja template engine)– JavaScript module

● Ex. Shapefile preview includes shp2geojson.js1.

1. http://gipong.github.io/shp2geojson.js/ (Released under MIT license)

Page 49: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

49

Example: Plugin for SHP Preview

from ckan import plugins as p

class SHPView(p.SingletonPlugin): p.implements(p.IResourceView, inherit=True)

def info(self): return {'name': shp_view', 'title': 'shp', 'icon': 'map-marker', 'iframed': True, 'default_title': 'SHP', } def can_view(self, data_dict): resource = data_dict['resource'] format_lower = resource['format'].lower()

if format_lower in self.SHP: return self.same_domain or self.proxy_is_enabled return False def view_template(self, context, data_dict): return 'dataviewer/shp.html'

<div data-module="shppreview" id="data-preview" data-module-map_config="{{ h.dump_json(map_config) }}"></div>

// shapefile preview moduleckan.module('shppreview', function (jQuery, _) { Return { initialize: function () { … } showPreview: function (url, data) { … } }}

Python Plugin View Template (shp.html)

JS Module (shp_view.js)

Page 50: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

50

Result

http://taijiang.tw/dataset/tainangis-wmts

wmts_view

Page 51: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

51

Result

shp_view QGIS

http://taijiang.tw/dataset/proj4-29

shp_view

Page 52: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

52

Custom Filters

● Find Datasets by

– Time period– Self-defined categories

● A New Plugin

– For Time Search● Implement IPackageController.before_search

– For Self-defined Categories● Implement IPackageController.before_index and

Ifacets.dataset_facets– Both needs new definitions in solr schema

Page 53: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

53

Example: Plugin for Time Search

from ckan import plugins as p

class TaijiangDatasets(p.SingletonPlugin): p.implements(p.IPackageController, inherit=True) p.implements(p.IFacets)

def before_search(self, search_params): … begin = parse_date(search_params['extras']['ext_begin_date']) end = parse_date(search_params['extras']['ext_end_date']) ... query = ("(start_time: [* TO {0}Z] AND end_time: [{0}Z TO *]) OR (start_time: [{0}Z TO {1}Z] AND end_time: [{0}Z TO *])") query = query.format(begin.isoformat(), end.isoformat()) search_params['q'] = query return search_params

def dataset_facets(self, facets_dict, package_type): facets_dict['date_facet'] = p.toolkit._('Date of Dataset') return facets_dict

<dynamicField name="*_time"type="date"indexed="true" stored="true" multiValued="false"/>

Python Plugin Solr Schema

Page 54: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

54

Result

Page 55: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

55

Harvest

● ckanext-harvest

– Remote harvesting extension– https://github.com/okfn/ckanext-harvest

● Source Type

– CKAN– CSW* (Catalog Service for the Web)– WAF* (Web Accessible Folder)– Custom (csv/xls/website… etc.)

*Provided by ckanext-spatial

Page 56: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

56

HarvestJob Dashboard

Page 57: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

57

HarvestBackground Process

● Manually

– (pyenv) $ paster --plugin=ckanext-harvest harvester gather_consumer/fetch_consumer/run -c /etc/ckan/default/production.ini

● Automatically

– Supervisor (for gather & fetch consumer)

– Cron (for run)

Page 58: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

58

HarvestThe Harvesting Interface

from base import HarvesterBase

class SRDAHarvester(HarvesterBase):

def _set_config(self,config_str):

def info(self):

...

def gather_stage(self, harvest_job): …

def fetch_stage(self, harvest_object): ...

def import_stage(self, harvest_object): ...

See http://goo.gl/ZMnND7 for details.

Page 59: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

59

Localization

● Translation for UI

– Gettext Style i18n– Babel (*.po & *.mo)

● In Python

p.toolkit._('String')● In Jinja Template

{{ _('String') }}● Transifex

Open Knowledge / CKAN– Jed (For JavaScript Modules)

● _('String')_

Page 60: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

60

Localization

● Translation for Extensions

– opendatatrentino/ckan-custom-translations (GitHub)● Translation for Metadata

– Defined in JSON Schema

– "label": {"en": "Data Type", "zh_TW": "資料類型 "}

Page 61: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

61

Localization

● Chinese Search

– Solr + mmseg4j1 (A Java Tokenizer)– Maximum Matching Algorithm2 (By Dr. Chih-Hao Tsai)

– Copy to Solr folder and modify Solr schema– Ref: http://is.gd/2Vpzgb

1. https://github.com/chenlb/mmseg4j-solr (Released under Apache 2.0 license)2. http://technology.chtsai.org/mmseg/

Page 62: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

62

Contribution: What and How?

Page 63: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

63

What to Contribute?

● CKAN Core Features

– Time and spatial search for private datasets– Publish datasets as a catalogue service Ex. CSW– Web interface for bulk uploads– A simplified deployment process– Issues on GitHub: https://github.com/ckan/ckan/issues– More ideas:

https://github.com/ckan/ideas-and-roadmap

Page 64: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

64

What to Contribute?

● i18n

– Non-ascii Filename– Translate JS Modules (Ex. Recline.js)– UI Translation (Transifex)

Page 65: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

65

What to Contribute?

● More Functions for Using Data in Web Browser

– Audios & Videos playback (Ex. Integrates plyr.io)– Link to third party services1, like Shiny2 (R-based) or

Ipython Notebook (Python-based)

1. http://www.data.gov/meta/open-apps/2. https://github.com/ckan/ideas-and-roadmap/issues/35

Page 66: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

66

What to Contribute?

● Rebuild data.g0v.tw with CKAN?

● data.g0v.tw (零時資料中心 )

– Built with DKAN (A CKAN clone for Drupal)● Problems of DKAN

– Development is much slower than CKAN

– Lack of features introduced in latter versions of CKAN● Ex. Multiple persistent views of data (In CKAN 2.3)

– Most gov sites in TW use (or will use) CKAN instead of DKAN

Page 67: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

67

How to Contribute?

● CKAN Core: ckan/ckan (GitHub)

● Most plugins are also available on GitHub

– http://extensions.ckan.org/● Development Discussions (Mailing List)

– https://lists.okfn.org/mailman/listinfo/ckan-dev ● Contributing Guide

– http://docs.ckan.org/en/latest/contributing/index.html

Page 68: 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

68

Thanks for your attention!Any Q? Email: cjlee AT iis.sinica.edu.twProfile: http://about.me/solleeGoogle Groups: CKAN Taiwan Interest Group