invention information retrieval and visualization€¦ · mongodb django web framework •...
TRANSCRIPT
![Page 1: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put](https://reader033.vdocuments.site/reader033/viewer/2022050410/5f87468740a0f95a975c326a/html5/thumbnails/1.jpg)
Invention Information Retrieval and Visualization
Contents:1. Introduction2. Background3. IR Framework4. Visualization Framework5. Conclusion
Honggu Lin(u6135394)
![Page 2: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put](https://reader033.vdocuments.site/reader033/viewer/2022050410/5f87468740a0f95a975c326a/html5/thumbnails/2.jpg)
User:• Person
Query:• Keywords• short
Goal of search:• Precision-Oriented• Few top relevant
document are sufficient
User:• Patent
analysts
Query:• Patent document• Long
Goal of search:• Recall-Oriented• Top 100-200
documents are examined
Web Search Prior Art Search
1. Introduction
Figure 1: Comparison between Web Search and Prior Art Search
![Page 3: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put](https://reader033.vdocuments.site/reader033/viewer/2022050410/5f87468740a0f95a975c326a/html5/thumbnails/3.jpg)
2.Background2.1Structure of Patent
Figure 2 .1: A sample XML file for a patent document from the EPO[1]
• Title• Abstract• Description• Claims• International Patent
Classification Code (IPCR)
• Citations
![Page 4: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put](https://reader033.vdocuments.site/reader033/viewer/2022050410/5f87468740a0f95a975c326a/html5/thumbnails/4.jpg)
2.Background2.2 Elasticsearch
• A search engine based on Lucene.
• Open source.
• Neal-time search.
• HTTP web interface and schema-free JSON documents.
• Elasticsearch is developed alongside a data-collection and log-parsing engine called Logstash,
and an analytics and visualisation platform called Kibana.
![Page 5: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put](https://reader033.vdocuments.site/reader033/viewer/2022050410/5f87468740a0f95a975c326a/html5/thumbnails/5.jpg)
3. IR Framework3.1 Patent Retrieval Overall Process
Query Patents
Query
Patents in Collection
Indexed Documents
Retrieved Documents
Query (Re)formulation Indexing
Retrieval Model(Elasticsearch)
Feedback
Figure 3.1: Illustration of the process in my patent retrieval system
Patent Preprocess Indexing Patent Preprocess
Index statistic(TF-IDF)
![Page 6: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put](https://reader033.vdocuments.site/reader033/viewer/2022050410/5f87468740a0f95a975c326a/html5/thumbnails/6.jpg)
3. IR Framework3.2 Data Collection
• Cross Language Evaluation Forum for Intellectual Property evaluation track (CLEF-IP).
• CLEF-IP 2010 contains 2.6 million patent documents, 2000 topics
68%
24%
8%
Language
EN
DE
FR
Figure 3 .1: Percentage of English, German, and French patents in CLEF-IP 2010 collection
22%
10%
16%
52%
Completeness
Title
Title+Abstract
Title+Claims+[Abstract]
Title+Description+Claims+[Abstract]
Figure 3 .2: Completeness of the presence of English text in the CLEF-IP 2010 patent collection
![Page 7: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put](https://reader033.vdocuments.site/reader033/viewer/2022050410/5f87468740a0f95a975c326a/html5/thumbnails/7.jpg)
3. IR Framework3.3 Data Preprocess
.XML .JSON Format UnifySection SelectionLanguage Filter
Index the .JSON file in Elasticsearch
Figure 3.3 Illustration of the process of Date Preprocess
![Page 8: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put](https://reader033.vdocuments.site/reader033/viewer/2022050410/5f87468740a0f95a975c326a/html5/thumbnails/8.jpg)
3. IR Framework3.4 Query Reduction
Section selection
Term extraction(TF-IDF)
technical phrase formation
Metadata usage (IPCR)
Section Combination
Figure 3.4 Process of Query Formation
![Page 9: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put](https://reader033.vdocuments.site/reader033/viewer/2022050410/5f87468740a0f95a975c326a/html5/thumbnails/9.jpg)
4. Visualization FrameworkQuery and Related Patents Selected from the Results
MongoDB
Django Web Framework• Highlight Common Area
between query and its related patent.
• Common Word Word-Net
Put in
Use
Effects
Figure 4.1:Illustration of the process in a Query and Related Patent Visualization System
![Page 10: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put](https://reader033.vdocuments.site/reader033/viewer/2022050410/5f87468740a0f95a975c326a/html5/thumbnails/10.jpg)
5. Conclusion
• Explore the differences of results when we use different query formulation method and find out he optimal one.
• Visualize the retrieval result in a more intuitive way.
![Page 11: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put](https://reader033.vdocuments.site/reader033/viewer/2022050410/5f87468740a0f95a975c326a/html5/thumbnails/11.jpg)
Reference:[1]Walid Magdy . (2012). Toward Higher Effectiveness for Recall- Oriented Information Retrieval: A Patent Retrieval Case Study . Retrieved from http://doras.dcu.ie/16814/1/WalidMagdyThesis.pdf
![Page 12: Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework • Highlight Common Area between query and its related patent. • Common Word Word-Net Put](https://reader033.vdocuments.site/reader033/viewer/2022050410/5f87468740a0f95a975c326a/html5/thumbnails/12.jpg)
Q & A