putting time into the geoweb: data persistence in a web services environment steve morris ncsu...

40
Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Upload: christian-charles

Post on 16-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Putting time into the GeoWeb:

Data persistence in a web services environment

Steve Morris

NCSU Libraries

July 23, 2008

Page 2: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Overview

• Background to the digital preservation problem

• Problems– Temporal data access issues– Capturing data state in a services or API

context– Making the business case for older data

• Preservation approaches• Future directions

Page 3: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Project background: North Carolina Geospatial Data

Archiving Project• Partnership between university library (NCSU) and

state agency (NCCGIA)• Under cooperative agreement with Library of

Congress in NDIIPP national preservation program• Focus on state and local geospatial content in North

Carolina (state demonstration)• Tied to NC OneMap initiative, which provides for

seamless access to data, metadata, and inventories• Goal: Engage spatial data infrastructure (SDI) in data

preservation and archiving

Demonstration repository as catalyst for an industry conversation

Page 4: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

SDI role in data preservation

• Data inventories support content identification• Metadata standards support discoverability

and use• Content standards support data

interoperability over time and help eliminate semantic confusion

• Data exchange networks:– Minimize need to make contact– Add technical, administrative, descriptive

metadata– Establish rights and provenance

Page 5: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Project roots: NCSU Libraries data directories

Tracking data, map servers, and web services since 2000

Ranked 3rd in traffic among entry points to entire library website

Persistent identifiers– usage tracking– ID links used in other sites

Community help in site maintenance

Page 6: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

0

10

20

30

40

50

60

70

80

90

100

2000 2001 2002 2003 2004 2005 2006 2007 2008

Nu

mb

er

of

Co

un

tie

s

Map Server

Data Download

WMS

County map and data services in NC

100 Counties in North Carolina

Page 7: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Carrboro, NC : Population 17,797 (2005 est.)

24 downloadable GIS data layers

4 WMS data layers

6 web mapping applications

9 downloadable PDF map layers

Page 8: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Note: Percentages based on the actual number of respondents to each question

Downtown Raleigh Near State Capitol

1914 Sanborn Map

Page 9: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Note: Percentages based on the actual number of respondents to each question

Downtown Raleigh Near State Capitol

1993 DOQQ

Page 10: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Note: Percentages based on the actual number of respondents to each question

Downtown Raleigh Near State Capitol

1999 Wake County Ortho

Page 11: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Note: Percentages based on the actual number of respondents to each question

Downtown Raleigh Near State Capitol

2005 Wake County Ortho

Page 12: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Note: Percentages based on the actual number of respondents to each question

Downtown Raleigh Near State Capitol

2005 Wake County Ortho

Imagery = DurableStatic Simple structureMostly open formats

Vector data = VolatileFrequent updateComplex structureMostly proprietary formats

Downtown Raleigh Near State Capitol

2005 Wake County Ortho

Imagery = DurableStatic Simple structureMostly open formats

Vector data = VolatileFrequent updateComplex structureMostly commercial formats

Page 13: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Data preservation points of failure

• Data is not saved, or …• can’t be found, or …• media is obsolete, or …• media is corrupt, or …• format is obsolete, or …• file is corrupt, or …

• meaning is lost Solutions:

MigrationEmulationEncapsulation XML

Page 14: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Problem: Data state in a web services or API-driven environment

• xxxxxxxxxxxxxxxxxx

• How to capture records from decision- making processes?• How to capture data state as well as service state?

Page 15: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Problem: Temporal data unavailability

• Industry focus on “latest and greatest” data• “Kill and fill” as a common approach to data

management (past versions of vector data lost)

Not just data loss, also: Loss of memory about data • Of superceded county orthophoto flights in NC only

22% recorded in the state’s GIS inventory

Some older inventories only available through Internet Archive

Page 16: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Availability of older orthoimagery on county map servers in NC

0

2

4

6

8

10

12

14

SupercededCounty

Orthophoto Collections

1992 1994 1996 1998 2000 2002 2004

Orthophoto Flight Year

Online

Offline

Only 30% of superceded digital ortho flights accessible through county map servers

Page 17: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Availability of older orthoimagery on county map servers in NC

0

2

4

6

8

10

12

14

SupercededCounty

Orthophoto Collections

1992 1994 1996 1998 2000 2002 2004

Orthophoto Flight Year

Online

Offline

23 Counties in NC publish ortho WMS services 0 Counties in NC publish superceded orthos as WMS services

Page 18: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Problem: Making business case for archiving

Use case: Land use and impervious surface change analysis

1993

2005

1998

2002

1999

Page 19: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Building the preservation business case

• Land use change analysis• Site location analysis• Real estate trends analysis• Disaster response• Resolution of legal challenges• Impervious surface change mapping

Page 20: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Planned 2008 NC business case survey

• Case description• Resources/Scope of effort• Benefits and results• Fiscal assessment

Based on previous experience, pending projects, examples of when a project could have been served better if archival data were available

Page 21: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Geospatial data preservation challenges

• Producer focus on current data• Future support of data formats in question• Inadequate or nonexistent metadata• Spatial databases• Complex data objects (multi-file, multi-format)• Shift to web services-based access

(ephemeral data)• Difficult to capture data state at point of

decision-making

Page 22: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Preservation approaches: Temporal data snapshots

Issue: How frequently should county and municipal vector data layers be captured in archives?

Parcels, centerlines, jurisdictions, zoning, …

Parcel Boundary Changes 2001-2004, North Raleigh, NC

Page 23: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

NC frequency of data capture surveys

• How often should continually changing vector datasets be captured?

• Tap into data custodian understanding of production patterns and uses

• Tap into local innovation• Learn about local business drivers for data archiving

– 2006 and 2008 surveys of NC cities and counties– 2008 survey of archival practice in state agencies

in NC– Planned survey of data users in NC

http://www.nconemap.com/AboutNCOneMap/tabid/289/Default.aspx#preservation

Page 24: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Preservation approaches: Dessicated data

Complex data representations can be made more preservable (and less useful) through simplification

Page 25: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Preservation approaches: Dessicated data

• Complex documents may be very hard to preserve over time– GIS project files – Layer definitions – Web services or API interactions

• Image outputs capture some sense of final product--but lose underlying data intelligence

Page 26: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Note: Percentages based on the actual number of respondents to each question

Cartographic outputs – analogous to the old paper maps

Combined datasets, with data models, classification, symbolization, annotation

More data intelligence than in images

Dessicated data: PDF and GeoPDF

Page 27: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Dessicated data: Geospatial PDF

• Explosion of geospatial PDF content in past few years

• Standards issues– GeoPDF: proprietary TerraGo technology– PDF an open ISO standard– Open PDF variants created through ISO

standards process (PDF/E, PDF/X, PDF/A, …)

• PDF content retained in addition to, NOT instead of data

Page 28: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Preservation approaches: Historical WMS tile caches?

No market for archived tiles without standard way to describe tiles and without commonly used tiling schemes

Page 29: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Preservation approaches:Historical WMS tile caches?

• Tile cache systems developed for more responsive WMS or mapping systems– WMS Tile Caching (WMS-C) incubated by

OSGEO – WMTS (Web Map Tiling) OGC white paper

• No explicit temporal component in WMS-C or WMT

To what extent do temporal geospatial systems become video-like?

Page 30: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

• Use Sanborn map slide or replacement

Pronounced local agency interestin archiving, digitizing, and geo-referencing older analog products

Old maps coming into the GeoWeb …

Page 31: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

New archiving interest: Location-based content

Present-day value in location-based services and mobile applications

Street ViewsOblique Imagery

3D Images

Page 32: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Future value of non-spatial place-based imagery as cultural heritage resource

More descriptive of place and function than spatial imagery

New archiving interest: Location-based content

Page 33: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Moving forward

• GICC Archival and Long-Term Access Committee

• Geo Multistate Archival and Preservation Partnership (GeoMAPP)

• OGC Data Preservation Working Group

Page 34: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Community response to data archiving challenge

• Nov. 2007: NC Geographic Information Coordinating Council (GICC):

Ten Recommendations in Support of Geospatial Data Sharing released– Recommendation: “Establish archive and long

term data access strategies”– Suggested best practices include: “Establish a

policy and procedure for the provision of access to historic data, especially for framework data layers.”

Page 35: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

GICC Archival and Long-Term Access Committee

• Initiated in response to agency requests for guidance on temporal data management

• Federal, state, regional, and local agency representation

• Key focus– Best practices for data snapshots and retention– State Archives processes: appraisal, selection,

retention schedules, etc.– Who, What, Why, When, Where, How

Page 36: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Geo Multistate Archive and Preservation Partnership (GeoMAPP)

• Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA), State Archives of NC, with Library of Congress

• Partners:– State geospatial organizations of Kentucky and Utah– State Archives of Kentucky and Utah– NCSU Libraries in catalytic/advisory role

• State-to-state and geo-to-Archives collaboration• 2 year project: Nov. 2007-Dec. 2009• Archives as part of Spatial Data Infrastructure

Page 37: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

OGC Data Preservation Working Group

• Formed Dec. 2006• Engage archival community• Find points of intersection with other OGC

activities:– GML for archiving– Content packaging– Large scale data transfers– Time in decision support

Page 38: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

The Content Packaging Problem

XML DatabaseExport

XML DatabaseExport

TIFF Images •Pixel Value and Header file•World file•Coordinate System file•Metadata file

Shapefiles•Geometry file•Index file•Attribute file•Metadata file•Coordinate System file•Spatial Index files

Potential Ingest Objects

Files

• Multi-file dataset• Georeferencing• Metadata file• Symbols file• Additional documentation• License• Disclaimer• More

Metadata

• ISO/FGDC• Acquisition metadata• Transfer metadata • Ingest metadata• Archive rights• Archive processes• Collection metadata• Series metadata

Metadata Exchange Format (MEF) in GeoNetwork a form of content packaging

Page 39: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008

Questions?

Contact:

Steve MorrisHead, Digital Library InitiativesNCSU [email protected]

NCGDAP site: http://www.lib.ncsu.edu/ncgdap/

Page 40: Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008