temporal language models for the disclosure of historical text

Post on 16-Jun-2015

127 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Temporal language models for the disclosure of historical text

Franciska de Jong, Henning Rode, and Djoerd Hiemstra

2005

Outline

 • Consider the impetus for this study

 • Review normalized log-likelihood ratio, time partitioning,

classification, and confidence equations • Identify underlying assumptions and potential applications of

the study, and discuss related work

2

In short...

"Users of a search system will typically know one or more contemporary forms associated with the concept they want to search for. They would be helped if the search interface was enhanced with knowledge about diachronically related forms that can be considered synonyms" (161).

"Task definition given a date tagged reference corpus, consisting of documents from a certain time span, and a document X with unknown date within the same time span, the system should classify X according to time partitions of predefined granularity" (163).

 3

Related work

• Statistical language models • Metadata, time stamps

 • Automatic classification of texts:

concept hierarchies, synonyms 

 4 

 5

 36generated by Dave Farrance, using Google's n-gram viewer

Normalized log-likelihood ratio (NLLR)

   

7

8

where ti marks a partition within the corpus period (t0 would indicate the beginning of the corpus period); Ci denotes a partition of corpus documents; Dj denotes any document from the corpus; and τ(Dj)

indicates the document's presumed date.

Time partitioning

9

10

11related article

Two methods

                          

12

Questions

13

• In what circumstances would de Jong et al's proposed approach to time-stamping be useful?

 • How are the processes of information retrieval and

temporal determination similar, and how do they differ?

top related