the impact of the data revolution on official statistics: opportunities, challenges and risks

20
The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks Prof. Rob Kitchin NIRSA, Maynooth University

Upload: robkitchin

Post on 17-Jul-2015

1.520 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

The Impact of the Data Revolution on

Official Statistics:

Opportunities, Challenges and Risks

Prof. Rob Kitchin

NIRSA, Maynooth University

Page 2: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

Background

• All-Island Research Observatory (AIRO; www.airo.ie)

• Dublin Dashboard (www.dublindashboard.ie)

• Digital Repository of Ireland (DRI; www.dri.ie)

• The Programmable City

Page 3: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

The Data Revolution book

• A synoptic overview of big data, open data and data infrastructures

• An introduction to thinking conceptually about data, data infrastructures, data analytics and data markets

• A critical discussion of the technical issues and the social, political and ethical consequences of the data revolution

• An analysis of the implications of the data revolution to academic, business and government practices

Page 4: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

The data revolution

• Data infrastructures

• Open and linked data

• Big data

• Data analytics

• Data markets

• Conceptualisation of data

• Disruptive innovations that offer opportunities, challenges and risks for government, business and academy

Page 5: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

Data infrastructures

• Actively planned, curated and managed

• Enables storing, scaling, combining, sharing and consuming data

across networked archives and repositories

• Produces ‘data amplification’

• NSIs long and loosely operated as such (trusted) infrastructures,

but now organising into more coordinated platforms with:

• dedicated and integrated hardware and networked technologies;

interoperable software and middleware services and tools; shared

standards, protocols, metadata; shared services (relating to data

management and processing), analysis tools & policies (concerning

access, use, IPR, etc)

• Such infrastructures are being federated into larger pan-national

infrastructures (Eurostat, ESPON, UN, etc).

• Many other institutions catching up

Page 6: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

Open and linked data

• Opening PSI (and other) data for re-use: driven by

transparency, participation, collaboration, economic

arguments

• Linking data/metadata using non-propriety formats and

URIs and RDF so that data can be referenced and conjoined

• NSIs already very active in this space; other government

data providers much further beyond

• More to be done, especially retro opening and linking

historical records; producing APIs; upgrading extent of

openness (licensing re. re-use, reworking, redistribution,

reselling); using non-proprietary formats; opening data

about the organizations themselves

Page 7: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

Big data

Characteristic Small data Big data

Volume Limited to large Very large

Exhaustivity Samples Entire populations

Resolution and

indexicality

Coarse & weak to tight

& strong

Tight & strong

Relationality Weak to strong Strong

Velocity Slow, freeze-framed Fast

Variety Limited to wide Wide

Flexible and scalable Low to middling High

Page 8: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

Big

data

and o

ffic

ial st

ati

stic

s (s

ourc

e E

SSC

2014)

Page 9: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

Data analytics

• Challenge of making sense of big data is coping with its abundance and exhaustivity, timeliness and dynamism, messiness and uncertainty, semi-structured or unstructured nature

• Solution has been machine learning made possible by advances in computation and computational techniques

• Four broad classes of analytics: • data mining and pattern recognition

• statistical analysis

• prediction, simulation, and optimization

• data visualization and visual analytics

Page 10: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks
Page 11: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

Conceptualising data

• Technically and methodologically: data generation, handling, processing, storing, analyzing, sharing, etc.

• Philosophically: ontology, epistemology, ideology • what can we know about the world, how can we know it, what do should we

do with such knowledge

• Critical data studies • rather than understanding data as objective, neutral, pre-analytic &

commonsensical, data are understood as being framed socially, political, ethically, philosophically in terms of their form, selection, analysis and deployment

• data do not exist independently of the ideas, instruments, practices, contexts, knowledges and systems used to generate, process and analyze them

• data express a normative notion about what should be measured, for what reasons, and what they should tell us; they have normative effects; they do not simply reflect the world but actively produce it

• data are framed by and situated within data assemblages – NSI constitute such assemblages

Page 12: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

Data assemblage

Attributes Elements

Systems of thought Modes of thinking, philosophies, theories, models, ideologies, rationalities,

etc.

Forms of

knowledge

Research texts, manuals, magazines, websites, experience, word of mouth,

chat forums, etc.

Finance Business models, investment, venture capital, grants, philanthropy, profit,

etc.

Political economy Policy, tax regimes, public and political opinion, ethical considerations, etc.

Governmentalities /

Legalities

Data standards, file formats, system requirements, protocols, regulations,

laws, licensing, intellectual property regimes, etc.

Materialities &

infrastructures

Paper/pens, computers, digital devices, sensors, scanners, databases,

networks, servers, etc.

Practices Techniques, ways of doing, learned behaviours, scientific conventions, etc.

Organisations &

institutions

Archives, corporations, consultants, manufacturers, retailers, government

agencies, universities, conferences, clubs and societies, committees and

boards, communities of practice, etc.

Subjectivities &

communities

Of data producers, curators, managers, analysts, scientists, politicians, users,

citizens, etc.

Places Labs, offices, field sites, data centres, server farms, business parks, etc, and

their agglomerations

Marketplace For data, its derivatives (e.g., text, tables, graphs, maps), analysts, analytic

software, interpretations, etc.

Page 13: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

Implications and uses of data

• Scaled, open, linked, big data and associated analytics produces

knowledge that enhances governing of people, managing

organisations, leveraging value and producing capital, creating

better places, improving health and well-being, tackling social

and ecological issues, fostering civic participation, etc.

• They improve insight and wisdom, productivity, competitiveness,

efficiency, effectiveness, utility, sustainability, safety & security,

transparency ...

• Challenge established epistemologies in the academy

• “Revolutions in science have often been preceded by revolutions in

measurement” Sinan Aral

• new empiricism, data-driven science, computational social sciences,

digital humanities

• transforming how we frame, ask and answer questions

Page 14: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

Opportunities for OS/NSIs

• New sources of dynamic and linked data and more timely outputs

• Complement/replace/improve/add to existing data/approaches

• New forms of data analytics can provide greater insights from existing and new datasets

• Optimize working practices, gain efficiencies, redeploy staff

• Stronger links/partnerships with computational social science, data science (esp. viz), and data industries

• Drive creation of data-driven institutions and evidence-informed governance

• Greater visibility and use of products

Page 15: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

Challenges for OS/NSIs

• Sourcing data from third parties and associated partnering,

legal and financial issues, including opening OSs derived

from private data

• Experimenting and trialing to determine:

• suitability for official statistics, esp. when data being repurposed, is

not representatively sampled, and is flexible thus potentially

altering continuity, and has undefined data quality (re. veracity

(accuracy, fidelity), uncertainty, error, bias, reliability, calibration)

• technological feasibility re. transferring, storing, cleaning,

checking, and linking big data

• methodological feasibility re. augmenting/producing OSs.

Page 16: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

Challenges for OS/NSIs

• Building and maintaining new IT infrastructure, retro

work on older data (opening, linking); ensuring

security/data protection, deploying new data analytics

• Sourcing additional resourcing (financial and staffing)

for dealing with new data streams and opening/linking

data

• Developing new technical and methodological skills

and sourcing/retaining trained/skilled staff

• Establishing standards, standardization,

interoperability across jurisdictions

Page 17: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

Risks for OS/NSIs

• Undermining of reputation and trust • quantity and utility of data opened (moving beyond low-hanging fruit)

• quality of data (big data often messy & dirty) and losing control of generation/sampling/processing

• established statistical products become undermined or discontinued before alternatives fully established/verified

• partnering with third parties (tarnished by their reputation)

• public perception and resistance to use of big data

• Privacy and security

• Access and continuity (will private sources of data be available over long term; will flexibility alter/break time-series); resistance from third parties to sharing data (gratis);

• Fragmented landscape across jurisdictions

• Pressure to reduce staff/budget rather than redeploy

• Competition and privatisation (data brokers)

Page 18: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

Solutions

• Need:

• conceptual, practical and strategic thought re. challenges and risks

of building data infrastructures, opening data, using big data

• planning of change management from short to long-term

• coordinated response re. experimentation, processes, trialing,

standards, IPR, legislation, software, building infrastructure, etc. to

establish best practice and ensure continuity across jurisdictions

• coordinated political lobbying re. resourcing

• Alliances and sharing information with similar organisations (e.g.,

RDA, WDS)

• Some of this already happening. More needed in a fast

moving space.

Page 19: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

Conclusion

• A data revolution is underway

• a fundamental shift in data openness and sharing,

• volume, exhaustiveness, timeliness, granularity, relationality,

variety, analytics, technical infrastructures, etc.

• conceptual thought relating to data

• Creating a set of disruptive innovations that is producing

opportunities, challenges and risks for NSIs and others

• It is important for NSIs to get ahead of the curve with

respect to challenges and risks, becoming proactive not

reactive and setting the agenda for new innovations

• This requires conceptual, practical and strategic thought

and a coordinated approach across institutions

Page 20: The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks

[email protected] @robkitchin

Kitchin, R., Lauriault, T. and McArdle, G. (2015) Knowing and governing cities through urban indicators, city benchmarking and real-time dashboards. Regional Studies, Regional Science 2: 1-28

Kitchin, R. and Lauriault, T. (2014) Small data in the era of big data. GeoJournal online first

Kitchin, R. (2014) Big data, new epistemologies and paradigm shifts. Big Data and Society 1 (April-June): 1-12.

Kitchin, R. and Lauriault, T. (2014) Towards critical data studies: Charting and unpacking data assemblages and their work. The Programmable City Working Paper 2, SSRN

Kitchin, R. (2013) Big data and human geography: Opportunities, challenges and risks. Dialogues in Human Geography 3(3): 262–267

http://www.nuim.ie/progcity

@progcity