oeil refactoring - preview presentation for ecprd seminar … · 2011-05-17 · solr is an open...
TRANSCRIPT
![Page 1: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/1.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted Search
Some examples of applied faceted search
on websites developed by the EP
Jerry Hilbert
European Parliament
![Page 2: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/2.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search - Definition
Faceted search, also called faceted navigation or faceted browsing, is a
technique for accessing a collection of information represented using a faceted
classification, allowing users to explore by filtering available information. A
faceted classification system allows the assignment of multiple
classifications to an object, enabling the classifications to be ordered in
multiple ways, rather than in a single, pre-determined, taxonomic order. Each
facet typically corresponds to the possible values of a
property common to a set of digital objects.Facets are often derived by analysis of the text of an item using entity extraction
techniques or from pre-existing fields in the database such as author, descriptor,
language, and format. This approach permits existing web-pages, product
descriptions or articles to have this extra metadata extracted and presented as a
navigation facet
Source: Wikipedia
![Page 3: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/3.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search - Technology
Different search engines offer nowadays the possibility for faceted search.
The EP uses SolR, based on LUCENE.
Solr is an open source enterprise search platform from the Apache Lucene project. Its
major features include powerful full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing
distributed search and index replication, Solr is highly scalable.
Solr is written in Java and runs as a standalone full-text search server within a servlet
container such as Apache Tomcat. Solr uses the Lucene Java search library at its core for
full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it
easy to use from virtually any programming language. Solr's powerful external configuration
allows it to be tailored to almost any type of application without Java coding, and it has an
extensive plugin architecture when more advanced customization is required.
Source: Wikipedia
![Page 4: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/4.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search - Technology
How is Lucene/Solr used?
XML IN – XML OUT
XML IN: Data is structured in XML when submitting for indexation
XML OUT: Data is returned as XML (including facet details) as the result of a
search
Also, configuration of the search engine for free text
- number of terms to match
- relevance of the terms, according to the field they are associated to
![Page 5: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/5.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websites
In the coming slides examples of faceted search as applied
on websites developed by the EP will be shown for:
- The Legislative Observatory of the EP
- Public Register of documents
- IPEX
- ECPRD
![Page 6: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/6.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websites
OEIL
Legislative Observatory of the EP
http://www.europarl.europa.eu/oeil
(old version of the site)
![Page 7: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/7.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websitesOEIL: Legislative Observatory
OEIL contains legislative, budgetary, non-legislative and
internal parliamentary procedures, such as:
• Co-decision, consultation and assent procedure
• budgetary and discharge procedures
• own-initiative reports by the European Parliament
• appointments, waivers of immunity and changes to the Rules of Procedure (i.e.
internal EP procedures)
• resolutions and recommendations adopted by the European Parliament
• documents forwarded for information from the Commission (during the last 9
months).
![Page 8: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/8.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websitesOEIL: Legislative Observatory
Situation before implementing faceted search
![Page 9: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/9.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websitesOEIL: Legislative Observatory
![Page 10: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/10.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Challenges for applying facets in OEIL:
(1)Sequence of facets
• Parliamentary term, …
(2)Protocol order for returned matches in the facets
• Political groups, Commission DGs, etc.
(3)Facets with huge results of additional criteria
• Rapporteurs (possibly a few hundred)
(4)Facets for structured keywords list
• Legal Basis (Treaty to Article)
(5) Length of words
Faceted search – EP websitesOEIL: Legislative Observatory
![Page 11: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/11.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websitesOEIL: Legislative Observatory
![Page 12: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/12.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Where facets can help out:
(1)Date range searches
(2)Structured references of procedures or documents
Faceted search – EP websitesOEIL: Legislative Observatory
![Page 13: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/13.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websites
Public Register of documents
http://www.europarl.europa.eu/registre/
![Page 14: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/14.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websitesPublic Register
Documents accessible through the Register
• 5 main categories of documents
• Parliamentary activity
• EP general information
• From other institutions and Member States
• Documents from third parties
• Budgetary procedure
• 125 types of documents
• 362.217 References
• 2.386.485 Documents (All LV)
• List defined by EP Bureau
References Documents
December 2007 207.069 1.306.059
December 2008 262.000 1.682.774
December 2009 310.760 1.998.330
December 2010 362.217 2.386.485
![Page 15: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/15.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websitesPublic Register
Public Register / Metadata
• Usually for each document :
• reference number
• title
• dates
• summary
• authorities
• authors
• relations
![Page 16: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/16.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websitesPublic Register
Situation before implementing faceted search
![Page 17: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/17.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websitesPublic Register
Situation before implementing faceted search
![Page 18: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/18.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websitesPublic Register
![Page 19: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/19.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websitesPublic Register
![Page 20: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/20.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websitesPublic Register
![Page 21: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/21.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websites
IPEX
Interparliamentary EU Information Exchange
http://www.ipex.eu
(old version of the site)
![Page 22: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/22.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
The IPEX Database contains a complete catalog of Commission
documents from 2006.
From each Commission document users can click on "Related dossiers"
and from there access national scrutiny pages.
Each national scrutiny page contains documents from the individual
national parliaments relating to the specific Commission document or
legislative procedure.
IPEX also hosts a calendar of interparliamentary cooperation which
contains information concerning all interparliamentary meetings
relating to the European Union.
Faceted searchIPEX: Interparliamentary EU Information Exchange
![Page 23: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/23.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted searchIPEX: Interparliamentary EU Information Exchange
![Page 24: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/24.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted searchIPEX: Interparlamentary EU Information exchange
![Page 25: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/25.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Challenge:
How to guarantee that the result lists presents
the information in its context
Faceted searchIPEX: Interparlamentary EU Information exchange
Dossier
Documents
Scrutinies
Private forums
![Page 26: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/26.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted searchIPEX: Interparlamentary EU Information exchange
![Page 27: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/27.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search – EP websites
ECPRD
European Center for Parliamentary
Research and Documentation
http://www.ecprd.org
(private site)
![Page 28: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/28.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted searchECPRD: European Center for Parliamentary Research and
Documentation
![Page 29: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/29.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted searchECPRD: European Center for Parliamentary Research and
Documentation
![Page 30: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/30.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted searchECPRD: European Center for Parliamentary Research and
Documentation
![Page 31: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/31.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted searchECPRD: European Center for Parliamentary Research and
Documentation
![Page 32: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/32.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
For the next release an extension to the current (new) search
implementation is foreseen:
Using the key facet for Thesaurus entries as a privileged
entry point to find relevant objects on the site
(i.e. Taking benefit of XML structured output of facettes to use
it as a way to navigate to the good records)
Faceted searchECPRD: European Center for Parliamentary Research and
Documentation
![Page 33: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/33.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search
Conclusions
![Page 34: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/34.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search
Conclusions
- One size don’t fit it all
- Advanced search may be required for pre-selection
- Challenges show when large result lists are returned
- Site wide searches require to recall the context of the object
- Analysis starts when indexing, not when producing result lists
![Page 35: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/35.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search
Conclusions
- Easily comprehensible and powerfull drill up&down feature
- Flexible to adapt to future queries
-No ‘0 result lists’ when drilling
- Statistical follow of ‘to expect’ results
![Page 36: OEIL Refactoring - Preview presentation for ECPRD seminar … · 2011-05-17 · Solr is an open source enterprise search platform from the Apache Lucene project. Its major features](https://reader034.vdocuments.site/reader034/viewer/2022050107/5f451a1e3756cd283f633bce/html5/thumbnails/36.jpg)
ECPRD seminar “Parli@ments on the net IX”, Brussels, 2011
Faceted search
Thanks for your attention!
Questions?