VERNACULAR CLASSIFICATION: HUMANITIES NETWORKED INFRASTRUCTURE (HUNI)
Department of Digital Humanities Toby Burrows
HuNI (Humanities Networked Infrastructure)
• Aggregates data from 30 different Australian humanities datasets
• Data are defined as entities occurring in the source datasets: 740,000 entities in all
• Harvested records are mapped to one of six basic
categories
• No imported relationships between entities
• No de-duplication of entities
Challenges for HuNI
• How to organize and link heterogeneous data for browsing – without entirely pre-determining the structure and relationships
• How to make the aggregated data useful – without imposing too much of a conceptual framework
• How to respect the different disciplinary perspectives reflected in the source datasets
• Researchers need to be able to record and share their views about the data
ADB
AFIRC
AMHD
APFA
AUSTLANG
AusStage
AustLit
AWAP
Bonza
CAARP
Biography
Linguistics
Literature
CircusOz
DAAO
EMEL
EOAS
F&C (x9)
GO
LD
MAP
Mura
OA
PARADISEC
SAUL
WALL
Media
Performing arts
Social history
Visual arts
Data sources
Disc
iplin
e
Concept
HuNI Record Category
Event Organisation Person Place Work
More icons = more records
PERSON A natural person
ORGANISATION A company, club, trust, gallery, political party, etc
WORK A cultural artefact or “man-made” thing created by someone, that has some existence in its own right, either physical or digital
PLACE A real, spatial location
EVENT An activity that occurs in space and time and may involve people, organisations, places, works, etc.
CONCEPT Something whose existence is primarily mental
http://wiki.huni.net.au/display/DS/Data+Model
HuNI: creating collections
• Users are able to create their own collections of data
• They can create categories and classifications, and assign individual entities to them
• Users can choose whether to make these collections public
• The list of public collections can be seen and browsed
• Individual entities show which public collections they belong to
• The graph for each entity also shows its membership of a public collection
HuNI: socially-linked data
• Users are also able to create links between entities
• These links are public, by default
• There are no pre-determined links between entities
• Users can add to each others’ links, including disagreeing with them or contradicting them
• Links can describe any kind of reciprocal relationship
• There is no pre-determined ontology or vocabulary of relationships
HuNI: classification and categorization 1
• Specific individual entities and phenomena are the focus of the HuNI data aggregate
• There is as little pre-defined classification and categorization as possible
• HuNI avoids hierarchical ontological structures (= “flat ontologies”?)
• Entities are organized and presented primarily so that researchers can work with them and manipulate them – classifying entities into collections and creating links between individual entities
• HuNI is not organizing and presenting the entities so as to reflect an authoritative classification or organization of knowledge
HuNI: classification and categorization 2
• Not organizing the entities for structured or faceted search and retrieval – Only indexing them for a basic keyword search
• Not organizing them into browsable semantic hierarchies – Providing only basic browsing via the six categories (and the list of
source datasets) • HuNI is trying to find a middle ground between: – The linguistic and conceptual limitations of “search” – The imposition of a single “normative” ontology or classificatory
semantic structure
HuNI: vernacular classification
• The user-contributed collections and links give meaning to the data
• Multiple interpretations and perceptions of relationships between entities are encouraged – even if these are contradictory
• Users can express the relationships they see in the data – including classifications and categorizations
• HuNI resists a single normative or expert interpretation or classification of the data
• HuNI encourages the sharing of different perspectives by researchers and other users
Dr Toby Burrows Marie Curie Fellow Department of Digital Humanities King’s College London 26-29 Drury Lane London WC2B 5RL [email protected] @tobyburrows tobyburrows.wordpress.com
Alternative approaches
• Search – use ontologies to classify search results (facets) • Topic modeling – automatic generation of semantic categories
and relations from text-based Natural Language Processing • Linked Data with light categorization for reasoning – Vocabularies & thesauri encoded for the Semantic Web
(SKOS) • Social tagging or “folksonomies”
v Tags are applied to entities v There is no formal classification or categorization of concepts
v There are no relationships between tags (other than being used to tag the
same entity)
v Research into deriving ontologies from social tagging
Massive A)ack Tags (last.fm) 00s 80s 90s acid jazz alterna1ve alterna1ve dance alterna1ve rock ambient atmospheric beau1ful bristol bristol sound bri1sh chill chill out chillout dance dark downbeat downtempo dub easy listening electro electronic electronica england english experimental favorite favorites favourite female vocalists hip hop hip-‐hop house hypno1c idm indie indie rock industrial instrumental jazz lounge male vocalists massive a@ack mellow pop psychedelic rap relax rock sexy soul soundtrack technotrance trip hop trip-‐hop triphopuk