from keyword searching to discourse mining
TRANSCRIPT
From
keyword searching to
discourse mining
Pim Huijnen, Juliette Lonij
Encounters between the Humanities and Computing, Utrecht University, 18 February 2016
Dictionary searching
using extensive and context-specific word lists (‘dictionaries’) to replace the contingency of single keywords
Why dictionary searching?
…to trace discursive shifts, represented by combinations of words instead of individual words
…to trace the persistence of discourses
Eugenics in Dutch newspapers(?)
Query:
maatregel nageslacht eigenschap* aanleg theorie bloed invloed
NOT eugenetica eugenetiek eugeniek eugenese ras*
Efficiency before efficiency
Query: "product* machine* verspilling bedrijf goedkoop kwaliteit” \01-01-1890 t/m 31-12-1940
(1901)(1906)
Developing a script to extract dictionaries from literature
Experimenting with tools to visualise results of dictionary searching in kranten.delpher.nl
KB researcher-in-residence project
Visualising results of dictionary searches in Delpher
Use OR-query to search Delpher
Visualise results on the basis of Solr’s relevancy-score (min. nr. of words)
(arbeid* OR bedrij* OR beheer OR controle* OR factor* OR functie* OR kost* OR leiding* OR loon* OR maatregel* OR management OR methode* OR model* OR norm* OR organisatie* OR plannen OR prijs OR productie OR rationeel OR rendement OR reorganisatie OR statistiek OR taylor OR tijd OR werkbesparing OR werkverdeeling)
Challenges
Running an OR-query of 25+ (or, preferably, more) words on a 100.000.000+ document dataset
Accounting for particularities of the corpus: * number of newspaper titles per year * changes in newspaper titles over the years * changes in article length over the years
Getting an idea of the exact combination of words in the visualised results