importing wikipedia in plone

25
Importing Wikipedia in Plone Eric BREHAULT – Plone Conference 2013

Upload: makina-corpus

Post on 24-May-2015

166 views

Category:

Technology


1 download

DESCRIPTION

Eric BREHAULT – Plone Conference 2013

TRANSCRIPT

Page 1: Importing Wikipedia in Plone

Importing Wikipedia in Plone

Eric BREHAULT – Plone Conference 2013

Page 2: Importing Wikipedia in Plone

ZODB is good at storing objects

● Plone contents are objects,● we store them in the ZODB,● everything is fine, end of the story.

Page 3: Importing Wikipedia in Plone

But what if ...

... we want to store non-contentish records?

Like polls, statistics, mail-list subscribers, etc.,or any business-specific structured data.

Page 4: Importing Wikipedia in Plone

Store them as contents anyway

That is a powerfull solution.

But there are 2 major problems...

Page 5: Importing Wikipedia in Plone

Problem 1: You need to manage a secondary system

● you need to deploy it,● you need to backup it,● you need to secure it,● etc.

Page 6: Importing Wikipedia in Plone

Problem 2: I hate SQL

No explanation here.

Page 7: Importing Wikipedia in Plone

I think I just cannot digest it...

Page 8: Importing Wikipedia in Plone

How to store many records in the ZODB?

● Is the ZODB strong enough?● Is the ZCatalog strong enough?

Page 9: Importing Wikipedia in Plone

My grandmother often told me

"If you want to become stronger, you have to eat your soup."

Page 10: Importing Wikipedia in Plone

Where do we find a good soup for Plone?

In a super souper!!!

Page 11: Importing Wikipedia in Plone

souper.plone and souper

● It provides both storage and indexing.● Record can store any persistent pickable data.● Created by BlueDynamics.● Based on ZODB BTrees, node.ext.zodb, and repoze.catalog.

Page 12: Importing Wikipedia in Plone

Add a record

>>> soup = get_soup('mysoup', context) >>> record = Record() >>> record.attrs['user'] = 'user1' >>> record.attrs['text'] = u'foo bar baz' >>> record.attrs['keywords'] = [u'1', u'2', u'ü'] >>> record_id = soup.add(record)

Page 13: Importing Wikipedia in Plone

Record in record

>>> record['homeaddress'] = Record() >>> record['homeaddress'].attrs['zip'] = '6020' >>> record['homeaddress'].attrs['town'] = 'Innsbruck' >>> record['homeaddress'].attrs['country'] = 'Austria'

Page 14: Importing Wikipedia in Plone

Access record

>>> from souper.soup import get_soup >>> soup = get_soup('mysoup', context) >>> record = soup.get(record_id)

Page 15: Importing Wikipedia in Plone

Query

>>> from repoze.catalog.query import Eq, Contains >>> [r for r in soup.query(Eq('user', 'user1') & Contains('text', 'foo'))] [<Record object 'None' at ...>]

or using CQE format

>>> [r for r in soup.query("user == 'user1' and 'foo' in text")] [<Record object 'None' at ...>]

Page 16: Importing Wikipedia in Plone

souper

● a Soup-container can be moved to a specific ZODB mount-point,

● it can be shared across multiple independent Plone instances,● souper works on Plone and Pyramid.

Page 17: Importing Wikipedia in Plone

Plomino & souper

● we use Plomino to build non-content oriented apps easily,● we use souper to store huge amount of application data.

Page 18: Importing Wikipedia in Plone

Plomino data storage

Originally, documents (=record) were ATFolder.

Capacity about 30 000.

Page 19: Importing Wikipedia in Plone

Plomino data storage

Since 1.14, documents are pure CMF.

Capacity about 100 000.

Usally the Plomino ZCatalog contains a lot of indexes.

Page 20: Importing Wikipedia in Plone

Plomino & souper

With souper, documents are just soup records.

Capacity: several millions.

Page 21: Importing Wikipedia in Plone

Typical use case

● Store 500 000 addresses,● Be able to query them in full text and display the result on a map.

Demo

Page 22: Importing Wikipedia in Plone

What is the limit?

Can we import Wikipedia in souper?

Demo with 400 000 recordsDemo with 5,5 millions of records

Page 23: Importing Wikipedia in Plone

Conclusion

● Usage performances are good,● Plone performances are not impacted.

Use it!

Page 24: Importing Wikipedia in Plone

Thoughts

● What about a REST API on top of it?● Massive import is long and difficult, could it be improved?

Page 25: Importing Wikipedia in Plone

Makina Corpus

For all questions related to this talk,

please contact Éric Bréhault

[email protected]

Tel : +33 534 566 958

www.makina-corpus.com