![Page 1: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/1.jpg)
© 2013 LucidWorks
![Page 2: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/2.jpg)
Edanz Journal Selector: Case Study: a
Prototype based on Solr/Nutch/Hadoop
Liang SHEN @shenzhuxi
European Bioinformatics Institute
![Page 3: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/3.jpg)
© 2013 LucidWorks
Edanz Journal Selector
a Prototype based on Solr/Nutch/Hadoop
![Page 4: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/4.jpg)
© 2013 LucidWorks
English editing for scientists
![Page 5: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/5.jpg)
© 2013 LucidWorks
Help scientists publish papers
![Page 6: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/6.jpg)
© 2013 LucidWorks
Target journal?
![Page 7: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/7.jpg)
© 2013 LucidWorks
Journal Selector
![Page 8: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/8.jpg)
© 2013 LucidWorks
Open Access
PubMed
![Page 9: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/9.jpg)
© 2013 LucidWorks
Journal TOCs
created in 2009
21,498 journals from
1,677 publishers
Institute for Computer
Based Learning
Heriot-Watt University
![Page 10: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/10.jpg)
© 2013 LucidWorks
Partner
• Springer Metadata API
Provides metadata for over 5 million online documents
• Springer Open Access API
Provides metadata, full-text content, and images for
over 80,000 open access articles
![Page 11: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/11.jpg)
© 2013 LucidWorks
Open Source Stack
• Infrastructure: Amazon Web Service
• Data processing: Hadoop/Hive
• Index: Solr/Lucene
• Web service: Drupal
• Secret Sauce/Custom Works
![Page 12: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/12.jpg)
© 2013 LucidWorks
Infrastructure: Amazon EC2
![Page 13: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/13.jpg)
© 2013 LucidWorks
Data processing
HDFS
Index
AP
I
Feed
s
Web
Pages
![Page 14: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/14.jpg)
© 2013 LucidWorks
<script>
http://global.js.wid
get.eja.hk/ja/edan
z_ja/w.js
</script>
Web service
![Page 15: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/15.jpg)
© 2013 LucidWorks
Embeddable web widget
![Page 16: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/16.jpg)
© 2013 LucidWorks
Split Index for performance
Index can be divided without losing ranking, if there is always a facet field.
![Page 17: Edanz journal selector case study a prototype based on solr nutch hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051107/53fedf548d7f72965c8b470f/html5/thumbnails/17.jpg)
© 2013 LucidWorks
@shenzhuxi
Thanks!
Questions?