building saas solutions for online media using apache solr - by alberto mijare

25
Building SaaS solutions with Apache Solr Alberto Mijares, Canoo Engineering AG [email protected], 26/05/2011 Twitter: @lemaiol

Upload: lucenerevolution

Post on 14-Jun-2015

1.260 views

Category:

Technology


1 download

DESCRIPTION

See conference video - http://www.lucidimagination.com/devzone/events/conferences/revolution/2011 SaaS applications have the advantage of remote web deployment that can be instantaneously be used by potentially any consumer in internet, or of the cost reduction that a Web-based deployment provides. The speaker explains in this talk the architecture of an innovative SaaS solution built for Axel Springer media group (Switzerland). This application can extracting remotely the content of multiple online newspaper articles, analyze them and classify them, determining which articles are the most similar to a given one, and integrating back into the article to provide the user with a “related articles” feature. The core components of the analysis process are: language-specific tools (used to filter the superfluous language terms) and semantic knowledge bases (like Wikipedia, used to enrich the indexed information with new context specific terms, or to disambiguate the extracted terms). In a more technical layer, the speaker will explain the criteria to select the emerging enterprise search framework Apache Solr as platform and how it reduced drastically the development effort required.

TRANSCRIPT

Page 1: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

Building SaaS solutions with Apache Solr

Alberto Mijares, Canoo Engineering [email protected], 26/05/2011

Twitter: @lemaiol

Page 2: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

Bullet point time!

2

Page 3: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

What I Will Cover

Practical applications of Apache Solr and Apache Lucene: how to increase the time spent by a user in an website and do website “cross-selling”.

Use case: how Canoo helped Axel Springer Switzerland to increased the page impressions, user permanence time and traffic in their financial online newspapers.

Key concepts:• How to achieve this using Lucene & Solr• How to profit from a SaaS business model

3

Page 4: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

Who I am

Alberto Mijares Canoo Engineering AG Background in web applications and standards:

• Participated in W3C Semantic Web interest group (SWEO)

• Led web standards compliance tools development in the past (Web Accessibility and Mobile Web)

• Led enterprise information retrieval projects in the recent past

• Actually coaching Google Web Toolkit projects’ development

4

Page 5: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

Who is Canoo

People:• Dirk Koenig: Groovy founder• Andres Almiray: Griffon project lead and Java

Champion• Hamlet D’Arcy: Groovy committer and enthusiast• … almost 40 more top software engineers

5

Products:• WebTest: framework for web functional testing• RIA Suite (aka ULC): Java based RIA framework• FindIT: information retrieval and search tools• WMTrans: language analysis tools

Page 6: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

Canoo FindIT

http://www.canoo.com/videos/FindIT.html

6

Page 7: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

Stop “bullet-pointing”!

7

Page 8: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

The facts

9

Axel Springer group is a market leader

Bilanz, Handelszeitung and Stocks

In Switzerland financials are important!

Financial language is German

Online media is the future

Page 9: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

The gap

Make the online versions more profitable

11

Make all newspapers “market leaders”

Page 10: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

The how

Workshop

13

“Related articles”

“Cross-selling”

Page 11: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

The analysis

Find a funding model

15

Use Lucene’s “More like this”

Integrate back the suggestions

Implement a selection mechanism

Page 12: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

The issues

“More like this” was “experimental”

17

Works out-of-the-box only in English

Without “semantics” not always makes sense

Indexing full pages produces noise

Page 13: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

The key

19

Page 14: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

The functional requirements

Discover and index articles

21

Extract only content

Simple and flexible query service

Page 15: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

The funding model

22

Page 16: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

The business model

23

SaaS

Page 17: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

The “other” requirements

Lucene-based analysis pipeline

25

Web oriented platform

Multi-application platform

Reliable, fast and scalable

Plan B?

Page 18: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

The search

Wraps Lucene in a nice way

27

It is mature and Open Source

Supports scheduling, REST API, DIH…

Scalability out-of-the-box

Well documented and has professional support

Page 19: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

The plan

From POC to PROD in “80 days”

29

Page 20: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

The results

Google analytics

31

Page 21: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

The conclusions

32

Page 22: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

The Q&A

33

Thanks!

Page 23: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

Sources

Links• http://people.canoo.com/share• http://www.canoo.com• http://www.canoo.net• http://www.leo.org• http://www.bilanz.ch• http://www.handelszeitung.ch• http://www.stocks.ch

34

Page 24: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

Contact

Alberto Mijares• [email protected]• Twitter: @lemaiol

35

Page 25: Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare

Architecture

Platform: Apache Solr 1.4.1

Architecture:

Solr container Web container

Springer Solr Springer WebApp

Customer 2 Solr Customer 2 WebApp

Customer 3 Solr Customer 3 WebApp

Extern accessIntern access

Requests