cessda experts meet (and eat well) cessda council of european social science data archives...
Post on 15-Jan-2016
223 Views
Preview:
TRANSCRIPT
Cessda Experts meet (and eat well)CESSDA Council of European Social Science Data ArchivesCESSDA is an informal combination of the West European data archives with the aim of a free andintensive exchange of data. CESSDA was established in 1976 by the then existing European dataarchives. The President is at present Kevin Schurer from UKDA and the secretariat is held by Hans Jørgen Marker from the Danish Data Archive.
Participants from 14 archives, 36 peopleAlfredsson, Iris (SSD) [Iris.Alfredsson@ssd.gu.se]Alvheim, Atle (NSD) [atle.alvheim@nsd.uib.no]Aparicio Galparsolo, Carmen (ARCES) [agalparsolo@cis.es]Bischof, Christian (WISDOM) [bischof@wisdom.at]Breitenstein-Leuba, Christine (SIDOS) [Christine.Breitenstein@sidos.unine.ch]Campo Gan, Isabel do (ARCES) [idocampo@cis.es]Campo Ladero, Mª Jesús (ARCES) [mcampo@cis.es]Clausen, Nanna Floor (DDA) [nc@dda.dk]Esclapez Pizana, Ricardo (ARCES) [resclapez@cis.es]Fredzu, Christina (GSDB) [cfredzu@ekke.gr]Garrido González, Beatriz (ARCES) [bgarrido@cis.es]Hadorn, Reto (SIDOS) [reto.hadorn@sidos.unine.ch]Hidalgo Rodríguez, Miguel Ángel [mhidalgo@cis.es]Jääskeläinen, Taina (FSD) [Taina.Jaaskelainen@uta.fi]Kastrun, Tomaz (WISDOM) [tomaz@wisdom.at]Keckman-Koivuniemi, Hannele (FSD) [hannele.keckman-koivuniemi@uta.fi]Kleemola, Mari (FSD) [mari.kleemola@uta.fi]Krone, Michael (DDA) [mk@dda.dk]Laseca Arellano, Jesús (ARCES) [jlaseca@cis.es]Linardis, Apostolos (GSDB) [alinardis@ekke.gr]López Cabanas, Raquel (CIS) [rlopez@cis.es]Marosi, Veronika (SSD) [Veronika.Marosi@ssd.gu.se]Mauer, Reiner (ZA) [mauer@za.uni-koeln.de]Miller, Kenneth P (UKDA) [millk@essex.ac.uk]Moschner, Meinhard (ZA) [moschner@za.uni-koeln.de]Mykkeltvedt, Alette (NSD) [Alette.Mykkeltvedt@nsd.uib.no]Rey del Castillo, Pilar (ARCES) [prey@cis.es]Richardson, Matthew (ICPSR) [matvey@umich.edu]Rodríguez Chico, Nieves (ARCES) [nrodriguez@cis.es]Sáez de Castro, Valentina (ARCES) [bases@cis.es]Vipavc Brvar, Irena (ADP) [irena.vipavc@fdv.uni-lj.si]Volchkina, Natalia (SSDA) [natasha@vms.huji.ac.il]Watteler, Oliver (ZA) [watteler@za.uni-koeln.de]Weatherall, Jo (UKDA) [weatherallj@essex.ac.uk]Wittenberg, Marion (DANS) [Marion.Wittenberg@dans.knaw.nl]
Small group vs wider community
“Human endeavor is caught in an eternal tension between the effectiveness of small groups acting independently and the need to mesh with the wider community.
A small group can innovate rapidly and efficiently, but this produces a subculture whose concepts are not understood by others. Coordinating actions across a large group, however, is painfully slow and takes an enormous amount of communication.
The world works across the spectrum between these extremes, with a tendency to start small—from the personal idea—and move toward a wider understanding over time.”
Tim Berners-Lee, James Hendler and Ora Lassila The Semantic Web,
Scientific American, May 2001
Madiera Cessda
History, as it is known to man ….
TransistorsUNIVAC
1Integrated
circuits
Inter-net
TCP/IP
Micropro-cessors IBM PC
AppleMacIntosh
HTMLWWW
Mosaic
Infrastructure for
the social sciences
Mainframe
PC
Internet/WWW
Java
XML
1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000
Mini
CESSDACOUNCIL OF EUROPEAN SOCIAL SCIENCE DATA ARCHIVES
1960 - ZA1962 - ICPSR1967 - UKDA1972 - NSD
Tools for thought
“Much more time went into finding or obtaining information than into digesting it. ”
Dr. J.C.R. Licklider
time spent on digesting and thinking
time spent on finding and accessing )(Maximize
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
History Future
Within the Cessda framework, we have been running a focused joint
project for more than 10 years
CESSDA Meeting 1994
CESSDA IDC
NESSTAR project FASTER Nesstar ltd
Limber
DDI
Cessda DDI GroupMadiera IDC2 = cCat
Cessda MMG Group
Metadater
ELSST Thesaurus
eSSENCE
HASSET Thesaurus
A formulationtion of the long term ambitionbecame the ”Point of Departure” for ”the Social Science Dream Machine”
• all existing empirical data available online• one integrated resource discovery gateway in order to
identify and locate relevant resources• extensive amounts of metadata available integrated with the
data• data should be available in any format• ...and in any language• the ability to carry out simple browsing, visualisation and
analysis of the data on-line• comparable data should easily be identified• hyperlinks from the metadata to every relevant report or
publication• an efficient feedback system to the body of metadata
allowing the user to add to the collective memory of a data source
(Jostein Ryssevik/Simon Musgrave)
From IDC to IDC IIIntegrated Data Catalogue
IDC
SomeMeta-data
Meta-data Data
Knowledge
IDC II = cCat
What is actually data ?
Data is a representation of an object or a process in the “real world”.
“The real World” Representation of “the real world”.
Data Information Knowledge
Metadata
Analysis
What comes out as specified MADIERA Objectives ?
• The development of an integrated and effective distributed social science portal to facilitate access to a range of data archives and disparate resources. WP3
• The development of specific add-ons to existing virtual data library technologies, in particular data location technology WP4
• The employment of a multi-lingual thesaurus to break the language barriers to the discovery of key resources. WP5
• An extensive programme to add content, both at the data/information and knowledge levels. WP6
• Extensive training of data providers and users to encourage the continuos growth of the infrastructure WP7
• The gradual integration of the emerging national infrastructures of the new candidate countries to the EU into the European Research Area
A C O M M O N G A T E W A Y
D A T A A R C H I V E S
D A T A U S E R S
Global Access - Local Support
Madiera portal: A second layer on top of allready existing data infrastructures
Data Providers
Publish data
Publish metadata
Add documents
•
End User
Locate data
Multilingual access
On-line data analysis
Downloading data
Browsing metadata
Browsing documents
Uploading “knowledge”
Adding comments to metadata
Adding links to publications
The Madiera Infrastructure
Data, +
Metadata standards
Multilingual thesaurus
Access control
User added information
Publications
Madiera: Some keywords
• Timely and convenient Technology– Ease of location
– Ease of access
Standards
Instruments
Searching in textDrill-down via topical classificationsGeo-referencing, geographical searchLook up comparable data
Online analysisDownloading possibilities
DDI Implementations
Nesstar Publisher, orXML editors
• Content requires
• “Politics”Access rules, legal requirements,Documentation of activityFrom Madiera to IDC2, inclusion of data
Do we tend to forget or take for granted the most important things ?
• Quality data is the key to quality research • Standardization is the key to common solutions• Success is measured through use
Implementations of the DDI on “The European Scene”
An example: A list of mandatory, recommended or optional elements
DDI Codebook Outline - Fields Comment Recommedation
2.0 stdyDscr+ (ATT == access) The Study Description consists of information about the data collection, study, or compilation that the DDI-compliant documentation file describes.
2.1 citation+ (ATT == MARCURI) Citation for the data collection described by the marked-up documentation. If dataproducer is not the documentation producer, then the "source" attribute should be used to distinguish.
2.1.1 titlStmt (ATT == )
2.1.1.1 titl (ATT == ) Mandatory in the original language
2.1.1.2 subTitl* (ATT == )
2.1.1.3 altTitl* (ATT == )
2.1.1.4 parTitl* (ATT == ) Mandatory at least in English (UK)
2.1.1.5 IDNo* (ATT == agency) (M) Mandatory
2.1.2 rspStmt? (ATT == )
2.1.2.1 AuthEnty*
(ATT == affiliation) (R) The person, corporate body, or agency responsible for the data collection's substantive and intellectual content. Repeat the element for each author, and use the affiliation attribute if available
Mandatory (recommended in English)
2.1.2.2 othId* (ATT == type, role, affiliation)
2.1.3 prodStmt? (ATT == )
2.1.3.1 producer*
(ATT == abbr, affilation, role) The producer of the data collection is the person or organization with the financial or administrative responsibility for the physical processes whereby the data collection was brought into existence. Use the role attribute to distinguish different stages of involvement in the production process.
Recommended
2.1.3.2 copyright?
(ATT == ) Copyright statement for the data collection. Recommended
2.1.3.3 prodDate*
(ATT == date)
Another example: A Common Topical Classification of
Studies across ArchivesECONOMICS - Economic Conditions and Indicators
- Consumption/Consumer Behaviour- Economic Policy (including budget and fiscal policy, public expenditure and public revenue)- Income, Property and Investment/Saving- Rural economics- Economic Systems and Development
LABOUR AND EMPLOYMENT - employment- unemployment- retirement- (in-job) training - labour relations/conflict- working conditions
POLITICS - Mass political behaviour, attitudes/opinion - elections- political ideology- domestic political issues- government, political systems and organisations- international politics and organisations- conflict, security and peace
Multilingual Thesaurus
• Produced a Thesaurus for Social Science use
• Produced Guidelines for Thesaurus construction
and translation
• Translations to several European languages
• Set up an administrative framework for
additions of concepts and local extensions and
future maintenance of the thesaurus
Content Provision
• Guidelines for content provision
• Training packs and user guides
• Workshops
• A common template for documentation work
Instruments: The NESSTAR Publisheror similar products
Technology: A somewhat expanded NESSTAR
Data location technology expanded, in addition toordinary searching in free text
• Searching in text in certain specific DDI elements• Drill down via TopcClas or ELSST• Geo-referencing, geographic search• Look up comparable data
The present Madiera Portal
1892 datasets
The Madiera Portal, drill-down
The Madiera Portal, ELSST based
Expanded search acoss many languages
AGREEMENTS: ABKOMMEN SOPIMUKSET AVTALERAFTALE ΣΥΜΦΩΝΙΕΣ ACUERDOS ACCORDS AVTAL
Expanding data availability
We might find things we are not able to interpret or understandWith keywords it is easierto interpret or translate
Standarized keywords are better that home-made solutions
Geography
Looking up comparable data
The DDI-related work on how to document comparative data has shown that this is a complicated problem
Some studies are comparaTIVE by design, which means that the DDI-problem is to describe complex structures
But much data open possibilities for comparison or analysisacross 3.dimensions without being designed for that purpose
Still it means that it is at the conceptual level we can establishthe comparability, plain question wording is not enough
We have to insert concepts at variable level, and we need a sophisticated but standardized vocabulary for doing it: that is – a thesaurus or ontology
How to pick concepts from ELSST and
insert in your data documentation
More concepts – comparable data
A scheme
A Finnish researcher
A Swissresearcher
An Irishresearcher
Attitudes towards immigrants (A problem area)
Data on Finland (A geographic area)
Eurobarometer (A data collection)
AIM: Ideal end product
•A registration procedure, register with home archive
•Look up and access data across holdings
A ”Data-archive Political Context” for 20+ national archives
I. It might be money involved Is the data a free or commercial good ? There are categories of users,
what about non-academic use, non-CESSDA use ? Who are to fix the prices ?
II. Varying Access rules. The crossing of national borders What laws apply. Who set the rules Who is responsible ? What sanctions available ?
III. There are some “Common good” data Eurobarometers, Value studies, ISSP, ESS, Comparative collections Could best be provided from one single point (?) Charging ? Access Conditions ? Double Storage ?
IV. It is a good thing to have national archives, enhances amount of data available and betters the accessibility. Need justification and visibility
An integrated common data catalogue
• Archives formalise access policy• Develop a user registration system • Probably wise that archives try to standardise information
collected in the user registration process• Information on registered users and available data stored in
a (local) user database• An end user agreement with some legal status• Archives accept authentication through other archives• Nesstar write an extended log
• Every archive function as access point to the common catalogue, enhances visibility• Make it possible to take out accurate statistics on use
A Web of Social Science
• Building on a distributed model where data and resources are stored and maintained locally
• For the end user the system will appear as a integrated system
• A virtual data library offering global access to locally supported data holdings
End users
Data providers
top related