lis618 lecture 1 thomas krichel 2003-01-29. economic rational for traditional model in olden days...
TRANSCRIPT
![Page 1: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/1.jpg)
LIS618 lecture 1
Thomas Krichel
2003-01-29
![Page 2: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/2.jpg)
economic rational for traditional model
• In olden days the cost of telecommunication was high.
• database use costs– cost of communication– cost of access time to the database
• the traditional model controls an upper bound on costs
![Page 3: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/3.jpg)
disintermediation
• with access cost time gone, the traditional model is under threat
• there is disintermediation where the librarian looses her role
• but that may not be good news for information retrieval results– user knows subject matter best– librarian knows searching best
![Page 4: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/4.jpg)
Web searching
• IR has received a lot of impetus through the web, which poses unprecedented search challenges.
• with more and more data appearing on the web DS may be a subject in decline– it is primarily concerned with non-web
databases– There is more and more web-based methods
of searching
![Page 5: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/5.jpg)
Quote to think about
• “Clearly, intermediated searching has past its prime. No longer does a search require a searcher—at least not a professional one.”
• By Barbara Quint, in “Quint’s Online”, a regular column she contributes for Information Today.
• It appeared in the 2002-12 issue.
![Page 6: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/6.jpg)
Quint’s points
• It was forseeable in 1992 that the public at large would be able to do online searching.
• At the same time need for quality answers has grown.– Google ask-a service
• Quality-filtered services will become more important.
![Page 7: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/7.jpg)
Quint’s points
• From the requirement for vetted sources, it does not follow that formal publishing and databases will flourish.
• The current offerings will have to change. In the current databases, there is as lot that would already be available for free mixed with quality-controlled stuff. – Item based pricing implies same price for all
items– Subscription-based pricing still means that the
user has to make quality judgment
![Page 8: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/8.jpg)
Quint’s points• A good business news service would
– Extracts reporting form company sites– Gets external reviews on the reports– Post statistical counts about the company– Offer to get users and authors in touch so that
authors could do private research for a user.
• Publishers have direct offerings and intermediated vending is in decline. – Example: CSA pulls out of DIALOG
![Page 9: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/9.jpg)
Main theory part
• Literature: "Modern Information Retrieval" by Ricardo Baeza-Yates and Berthier Ribiero-Neto
• Don't buy it. It is a not a good book.
![Page 10: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/10.jpg)
before the IR process
• provider– define data that is available
• documents that can be used• document operations• document structure
– index
• user– user need– IR system familiarity
![Page 11: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/11.jpg)
the IR process
• query expresses user need in a query language
• processing of query yields retrieved documents
• calculation of relevance ranking
• examination of retrieved documents
• possible relevance cycle
![Page 12: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/12.jpg)
main problem
• user is not an expert at the formulation of a query
• garbage in garbage out, the retrieval yields poor result
• ways out– design very intuitive interface for the query– give expert guidance
![Page 13: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/13.jpg)
taxonomy of classic IR models
• Boolean, or set-theoretic– fuzzy set models– extended Boolean
• vector, or algebraic– generalized vector model– latent semantic indexing– neural network model
• probabilistic– inference network– belief network
![Page 14: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/14.jpg)
summary
• There are three basic types of models in classic information retrieval.
• Extensions of these types are a matter of research concern and require good mathematical skills.
• All classic models treat document as individual pieces.
![Page 15: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/15.jpg)
key aid: index
• an index is a list of terms, with a list of locations where the term is to be found.
• The way to express locations usually depends on the form that the indexed data takes. – for a book, it is usually the page number, e.g."shmoo 34, 75"– for computer files it is usually the name of the file plus
the number of the byte where the indexed term starts, e.g. "krichel index.html 34, cv.html 890 1209"
• there is usually more than one location of the term.
![Page 16: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/16.jpg)
key aid: index terms
• index term is a part of the document that has a meaning on its own.
• it is usually a noun word.• retrieval based on index term raises questions
– semantics in query or document is lost– matching done in imprecise space of index terms
• predicting relevance is a central problem• the IR model determines the process of
relevance ranking
![Page 17: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/17.jpg)
basic concept: weight of index term
• given all nouns, not all appear to have the same relevance to the text
• sometimes, we can have a simple measure of the importance of a term, example?
• more generally, for each indexing term and each document we can associate a weight with the term and the document.
• usually, if the document does not contain the term, its weight is zero
![Page 18: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/18.jpg)
Boolean model
• in the Boolean model, the index weight of all index term for any document is 1 if the term appears in the document. It is 0 otherwise.
• This allows to combine query terms with Boolean operator AND, OR, and NOT
• thus powerful queries can be written
![Page 19: LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e7f5503460f94b83fe0/html5/thumbnails/19.jpg)
http://openlib.org/home/krichel
Thank you for your attention!