tamas doszkocs, ph.d. computer scientist [email protected]
DESCRIPTION
Meta Searching and Clustering. Tamas Doszkocs, Ph.D. Computer Scientist [email protected]. What has been will be again, what has been done will be done again, there is nothing new under the sun. (Ecclesiastes 1:9-14 NIV). A Brief History Clustering MetaSearching Metadata and Semantics - PowerPoint PPT PresentationTRANSCRIPT
![Page 2: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/2.jpg)
What has been will be again, what has been done will be done again,
there is nothing new under the sun. (Ecclesiastes 1:9-14 NIV)
![Page 3: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/3.jpg)
Meta Searching and Clustering
• A Brief History• Clustering• MetaSearching• Metadata and
Semantics• Clustering Examples
• Meta-Search and Clustering Engines
• A Clustering GYM• AllPlus• Web X.Y• Trends
![Page 4: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/4.jpg)
Related Topics:( that we won’t talk about ):
![Page 5: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/5.jpg)
Clustering
– "Finding a name for something is a way of conjuring its existence, of making it possible for people to see a pattern where they didn't see anything before“ Howard Rheingold
– Purpose: order out of chaos
– Indexes and Table of Contents are as old as human records
– Luhn, H. P. (1959). Keyword-in-Context Index for Technical Literature (KWIC Index). Yorktown Heights, N. Y.: IBM.
– Automatic Information Organization and Retrieval.G Salton - 1968 - McGraw Hill
– An Associative Interactive Dictionary - Doszkocs - 1978
– Dialog RANK command 1993
– Northern Light clustering, or "embedded folders", 1999
![Page 6: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/6.jpg)
Meta-Searching
• Purpose: distributed and enhanced search to find more relevant items
• AID, 1978, MEDLINE, TOXLINE, Hepatitis Databank– Doszkocs, Tamas E. “AID, an Associative Interactive Dictionary for Online Searching” On-Line Review, v2 n2 p163-73 Jun
1978
• Chemical Substances Information Network, 1978-198– Information Retrieval in Toxicology, H.M. Kissman, • Annual Review of Pharmacology and Toxicology, April 1980,
Vol. 20, Pages 285-305
• CITE, 1979– T. E. Doszkocs and B. A. Rapp. Searching MEDLINE in English: A prototype user interface with natural language query,
ranked output, and relevance feedback. In Proceedings of the American Society for Information Science, pages 131--139, White Plains, NY, 1979. Knowledge Industry Publications, Inc
• Dialog OneSearch, 1987• Associative Concept Navigation in MEDLINE and other NLM Databases via a Mosaic - Forms - WWW
Interface Combining Natural Language Processing, Expert Systems and (un)Conventional Information Retrieval Techniques. In Second International World Wide Web Conference, Chicago, Illinois, USA , October 1994. http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/Searching/doszkocs/doszkocs.html
• The Open Web and the Hidden Web
![Page 7: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/7.jpg)
Metadata and SemanticsWilf Lancaster, Vocabulary Control for Information Retrieval, 1972
– Dublin Core
• http://www.dublincore.org/
– Federated Searching Interface Techniques for Heterogeneous OAI Repositories
• http://jodi.ecs.soton.ac.uk/Articles/v02/i04/Liu/
– eXchangeable Faceted Metadata Language
• http://purl.oclc.org/NET/xfml/core/
– SIMILE (Semantic Interoperability of Metadata and Information in unLike Environments)
• http://simile.mit.edu/
– Folksonomies
• http://flickr.com
– Semantic Web
• http://www.few.vu.nl/~frankh/
• https://scholarsbank.uoregon.edu/dspace/bitstream/1794/3269/1/ccq_sem_web.pdf
– Ontology Lookup Service
• http://www.ebi.ac.uk/ontology-lookup/
– Web Services for Controlled Vocabularies
• http://www.asis.org/Bulletin/Jun-06/vizine-goetz_houghton_childress.html
![Page 8: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/8.jpg)
Examples of Search Result Clustering
• Jerry’s Guide to the Web, 1994• Jerry Yang and David Filo’s Yahoo! 1995
– a directory of web sites, organized in a hierarchy of subject descriptors
– Librarians at Yahoo• Surfing is to Yahoo! what the Dewey Decimal System is to libraries. In other words, Surfing is the categorization of
websites. It also happens to be how Yahoo! began. Today our Surfing team continues its passion for finding, evaluating, and organizing information on the Internet. They have a voracious appetite for learning about new topics. They are curious individuals who are skilled at intuitively and efficiently analyzing and classifying diverse, unstructured pieces of information across the Yahoo! network. Surfers are critical to the relevance and intuitive nature of information presented on Yahoo!.
• http://careers.yahoo.com/job_descriptions.html
• Google vs. Yahoo automatic vs. controlled indexing
![Page 12: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/12.jpg)
Open Directory Project
![Page 13: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/13.jpg)
PubMed Related Articles
![Page 14: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/14.jpg)
Folksonomy and Tagging in Flickr
![Page 15: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/15.jpg)
Query Refinement with Subject Headings
![Page 16: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/16.jpg)
Clustering with Multiple Criteria
![Page 17: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/17.jpg)
Multi-faceted Clustering in an OPAC
![Page 18: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/18.jpg)
Analyzing Search Results
![Page 19: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/19.jpg)
Examples of Meta Search EnginesThe NLM ToxSeek System
![Page 20: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/20.jpg)
Clustering of Search Results with Phrases
![Page 21: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/21.jpg)
PolyMeta Clustering
![Page 22: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/22.jpg)
Visualizing Topical Clusters
![Page 23: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/23.jpg)
Multi-faceted Visualization
![Page 24: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/24.jpg)
Clustering in A GYMAsk Google Yahoo MSN
![Page 25: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/25.jpg)
Yahoo health
![Page 26: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/26.jpg)
Google Health Searches
![Page 27: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/27.jpg)
Microsoft Search Result Clustering
![Page 28: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/28.jpg)
Clustering Sophistication: or the lack of it
![Page 29: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/29.jpg)
AllPlus Clustering: the WHO
![Page 30: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/30.jpg)
Clustering and Search Refinement with Natural Language and Controlled Vocabularies
![Page 31: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/31.jpg)
The NLM AllPlus Search Demo
![Page 32: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/32.jpg)
Web 2.0 Content Mashups in AllPlus
![Page 33: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/33.jpg)
HyperGraph Cluster Visualization in AllPlus
![Page 34: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/34.jpg)
The All in AllPlus
• Discovery– Meta-Searching
– Clustering
– Meaning
• Morphology
• Syntax
• Semantics
– Metadata
– Thesauri +
– Visualization
– Web X.Y
![Page 35: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/35.jpg)
Trends
– Web x.0
• Content mashups
• Improved UI
• Social Search and Knowledge Organization
• Query Understanding
– Meaning
– User intent
– Multi-faceted clustering
– Multi-dimensional Information Spaces
• Google http://searchmash.com
– Digital Libraries
– Data Mining and Analysis
– Information Visualization
– Semantic Web
![Page 36: Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih](https://reader035.vdocuments.site/reader035/viewer/2022062322/56814316550346895daf6c13/html5/thumbnails/36.jpg)