organizing and structuring data for digital projects suzanne huffman digital resources librarian...

Post on 26-Dec-2015

219 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS

Suzanne Huffman

Digital Resources Librarian

Simpson Library

DATA AND DIGITAL PROJECTS

Nothing really important is ever headlined “Here is some data.

Hope you find something interesting.” Annotation is critical. Editing is critical.

-Amanda Cox, New York Times Graphics Editor

http://www.slideshare.net/openjournalism/amanda-cox-visualizing-data-at-the-new-york-times

The importance of context

http://www.nejm.org/doi/full/10.1056/NEJMp1402114?query=TOC&

Add value to your data

• Regardless of website functionality, annotation and guidance are important

• Look at other digital projects and test out similar sites to see what types of discovery and analysis activities you intuitively want to do with similar types of data

• Follow web design methodology by creating user stories and acting them out with mockups, wireframes, or simple spreadsheets

ORGANIZING AND STRUCTURING DATA

Metadata

Metadata is structured information that describes, explains, locates, or

otherwise makes it easier to retrieve, use, or manage an information

resource. Metadata is often called data about data or information about

information.

http://www.niso.org/publications/press/UnderstandingMetadata.pdf

Choosing metadata parameters• Ask yourself why you are collecting the information you

want to collect (perform a needs assessment)

• Focus on the outcomes and analysis tasks you want your site’s users to be able to perform with your data

• Depending on your audience, don’t assume users know what each metadata field or category means

• Choose data fields and parameters and do a few test runs before analyzing the data to determine if you need to add, change, or edit any of your fields

What is good metadata?According to Understanding Metadata by the National Information Standards Organization, good metadata…

• Should be appropriate to the materials in the collection, users of the collection, and intended, current, and likely use of the digital object

• Supports interoperability• Uses standard controlled vocabularies to reflect the what, where,

when, and who of the content• Includes a clear statement on the conditions and terms of use for

the digital object• Should have the qualities of archivability, persistence, and

unique identification, and should be authoritative and verifiable• Supports the long-term management of objects in collections

Choose and use standard terminology• Use a controlled vocabulary that provides preferred

keywords and terminology for specific items

• Create a data dictionary and be consistent in applying it

• Example data dictionary entry: Dates are displayed in the yyyy-mm-dd format; i.e., March 15, 2015 would appear as 2015-03-15

• Helps prevent inconsistencies in data entry and analysis Example: when "T", "temp", and "t" are all used interchangeably within a single dataset to refer to temperature measurements

Metadata dos• Select your keywords wisely and think about the many

ways someone might search for your data

• Use your data dictionary whenever possible to create keywords to establish a controlled vocabulary

• Use descriptive and clear writing

• Ensure that all data fields are independent and that they could exist on their own

Metadata don’ts• Do not use jargon; define technical terms and acronyms

and put them in your data dictionary

• Remember that a computer will read the information in the metadata record, so do not to use tabs, indents, or special characters like ! @ # % { } | / \ < > ~ that may be misunderstood

• Do not copy and paste content from word documents or other sources into your metadata record (use a text editor as a middle step to prevent unnecessary characters and errors being introduced)

Metadata is

structured

information

Example Dublin Core record in XML

Data structure

• Structuring your data is important to ensure your site functions well and that the dataset can be used in a variety of ways

• Ways to structure data:• For Excel or Google spreadsheets, save your data as

CSV files in plain text format• XML documents can be easily created through online

data-entry forms and contain your metadata within a structured framework

Data best practices

• Make sure your data is portable

• Saved in an additional location outside your site in machine-readable, non-proprietary format

• Portable data is flexible, sharable, and can be harvested by a variety of tools for usage in future projects

Quality assurance and control

• Restrict what information can be entered into the dataset• Limit the use of free text fields for metadata• Use lookup tables or drop-down menus for data entry• Use validation tools• Do manual review• Clean up and normalize messy data with tools like Open

Refine

DATA MANAGEMENT

What is data management?Data management refers to all aspects of creating, housing, delivering, maintaining, archiving, and preserving data. A data management plan accounts for every activity within the data life cycle.

https://www.dataone.org/best-practices

Contents of data management plan (DMP)

• Data Type and Format• Data Storage• Data Standards• Data Security• Data Sharing• Long-term Access

Check out VCU Libraries’ Research Data Management Guide at http://guides.library.vcu.edu/c.php?g=47977&p=300081

Data citation and preservationCitation

Dataset Citations should have (at a minimum):

Creator (PublicationYear): Title. Publisher. Identifier.

The Identifier could be a DOI or just the website’s URL

Preservation

Good documentation on data provenance (the origin and history of a dataset) is crucial.

If data cannot be recreated or if it is costly to reproduce, it should be saved.

Datasets that have significant long-term value may be contributed to a repository for preservation.

Data repositories

These repositories can be used to find data for reuse or to deposit your research data for preservation and sharing:

Questions? Comments?

Thank you!

shuffman@umw.edu

540-654-1756

Please contact me if you need assistance with managing and organizing data in your research or teaching projects.

References and Resources• http://

www.slideshare.net/openjournalism/amanda-cox-visualizing-data-at-the-new-york-times

• http://www.nejm.org/doi/full/10.1056/NEJMp1402114?query=TOC&

• http://www.niso.org/publications/press/UnderstandingMetadata.pdf

• http://www.dlib.indiana.edu/~jenlrile/metadatamap/seeingstandards.pdf

• https://www.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf

• http://guides.library.vcu.edu/c.php?g=47977&p=300081

top related