Download - Infinispan,Lucene,Hibername OGM
Padova, InfoCamereJBoss User Group
12 Aprile 2012
Chi sono?
• Team Hibernate
– Hibernate Search
– Hibernate OGM
• Infinispan
– Infinispan Core
– Infinispan Query
– JGroups
• Apache Lucene
Sanne GrinoveroItaliano, Olandese, NewcastleRed Hat: JBoss, Engineering
Infinispan
• Cache distribuita
• Datagrid scalabile e transazionale: performance estreme e cloud
• NoSQL “DataBase”: key-value store
– Come si interroga un data grid ?
SELECT * FROM GRID
Interrogare una “Grid”
Object v = cache.get(“c7”);
Senza chiave, non puoi ottenere il valore.
É pratico il solo accesso per chiave?
Test sulla mia libreria
• Dov'é Hibernate Search in Action?
• Mi passi
ISBN 978-1-933988-17-7 ?
• Prendi i libri su Gaudí ?
Come implementare queste funzioni su un
Key/Value store?
• Dov'é Hibernate Search in Action?
• Mi passi ISBN 978-1-933988-17-7 ?
• Trovi i libri su Gaudí ?
document based NoSQL: Map/Reduce
Infinispan non é propriamente document based ma offre Map/Reduce.
Eppure non é escluso l'uso di JSON, XML, YAML, Java:public class Book implements Serializable {
final String title; final String author; final String editor;
public Book(String title, String author, String editor) { this.title = title; this.author = author; this.editor = editor; }
}
Iterate & collectclass TitleBookSearcher implements Mapper<String, Book, String, Book> { final String title; public TitleBookSearcher(String t) { title = t; } public void map(String key, Book value, Collector collector){ if ( title.equals( value.title ) ) collector.emit( key, value ); }
class BookReducer implements Reducer<String, Book> { public Book reduce(String reducedKey, Iterator<Book> iter) { return iter.next(); }}
Implementare queste semplici funzioni:
✔ Trova “Hibernate Search in Action”?
✔ Trova per codice “ISBN 978-1-933988-17-7” ?
✗ Quanti libri a proposito di “Shakespeare” ?
• Per uno score corretto in ricerche fulltext servono le frequenze dei frammenti di testo relative al corpus.
• Il Pre-tagging é poco pratico e limitante
Apache Lucene
• Progetto open source Apache™
• Integrato in innumerevoli progetti
• .. tra cui Hibernate via Hibernate Search
• Clusterizzabile via Infinispan
– Performance
– Real time
– High availability
Cosa offre Lucene?
• Ricerche per Similarity score
• Analisi del testo
– Sinonyms, Stopwords, Stemming, ...
• Reusable declarative Filters
• TermVectors
• MoreLikeThis
• Faceted Search
• Veloce!
Lucene: Stopwords
a, able, about, across, after, all, almost, also, am, among, an, and, any, are, as, at, be, because, been, but, by, can, cannot, could, dear, did, do, does, either, else, ever, every, for, from, get, got, had, has, have, he, her, hers, him, his, how, however, i, if, in, into, is, it, its, just, least, let, like, likely, may, me, might, most, must, my, neither, no, nor, not, of, off, often, on, only, or, other, our, own, rather, said, say, says, she, should, since, so, some, than, that, the, their, them, then, there, these, they, this, tis, to, too, twas, us, wants, was, we, were, what, when, where, which, while, who, whom, why, will, with, would, yet, you, your
Filters
Faceted Search
Facciamo un bel motore di ricerca che restituisce i risultati in ordine alfabetico?
Chi usa Lucene?
Nexus
Dov'é la fregatura?
• Necessita di un indice: risorse fisiche e di amministrazione.
– in memory
– on filesystem
– in Infinispan
• Sostanzialmente immutable segments
– Ottimizzato per data mining / query, non per updates.
• Un mondo di stringhe e vettori di frequenze
Infinispan Query quickstart• Abilita indexing=true nella
configurazione
• Aggiungi il modulo infinispan-query.jar al classpath
• Annota i POJO inseriti nella cache per le modalitá di indicizzazione
<dependency> <groupId>org.infinispan</groupId> <artifactId>infinispan-query</artifactId> <version>5.1.3.FINAL</version></dependency>
Configurazione tramite codice
Configuration c = new Configuration() .fluent() .indexing() .addProperty("hibernate.search.default.directory_provider", "ram") .build();
CacheManager manager = new DefaultCacheManager(c);
Configurazione / XML
<?xml version="1.0" encoding="UTF-8"?><infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:infinispan:config:5.0 http://www.infinispan.org/schemas/infinispan-config-5.0.xsd" xmlns="urn:infinispan:config:5.0"><default> <indexing enabled="true" indexLocalOnly="true"> <properties> <property name="hibernate.search.option1" value="..." /> <property name="hibernate.search.option2" value="..." /> </properties> </indexing></default>
Annotazioni sul modello
@ProvidedId @Indexedpublic class Book implements Serializable {
@Field String title; @Field String author; @Field String editor;
public Book(String title, String author, String editor) { this.title = title; this.author = author; this.editor = editor; }
}
Esecuzione di Query
SearchManager sm = Search.getSearchManager(cache); Query query = sm.buildQueryBuilderForClass(Book.class) .get() .phrase() .onField("title") .sentence("in action") .createQuery(); List<Object> list = sm.getQuery(query).list();
Architettura• Integra Hibernate Search (engine)
– Listener a eventi Hibernate & transazioni
• Eventi Infinispan & transazioni
– Mappa tipi Java e grafi del modello a Documents di Lucene
– Thin-layer design
Index mapping
Tests perInfinispan Query
https://github.com/infinispan/infinispan
org.apache.lucene.search.Query luceneQuery =
queryBuilder.phrase()
.onField( "description" )
.andField( "title" )
.sentence( "a book on highly scalable query engines" )
.enableFullTextFilter( “ready-for-shipping” )
.createQuery();
CacheQuery cacheQuery =
searchManager.getQuery( luceneQuery, Book.class);
List<Book> objectList = cacheQuery.list();
Architettura: Infinispan Query
Problemi di scalabilitá
• Writer locks globali
• Sharing su NFS molto problematico
Queue-based clustering(filesystem)
Index stored in Infinispan
Quickstart Hibernate Search
• Aggiungi la dipendenza ad hibernate-search:
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernatesearchorm</artifactId>
<version>4.1.0.Final</version>
</dependency>
Quickstart Hibernate Search
• Tutto il resto é opzionale:
– Come gestire gli indici
– Moduli di estensione, Analyzer custom
– Performance tuning
– Mapping custom dei tipi
– Clustering
• JGroups
• Infinispan
• JMS
Quickstart Hibernate Search
@Entitypublic class Essay { @Id public Long getId() { return id; }
public String getSummary() { return summary; } @Lob public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...
Quickstart Hibernate Search
@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; }
public String getSummary() { return summary; } @Lob public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...
Quickstart Hibernate Search
@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; } @Field public String getSummary() { return summary; } @Lob public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...
Quickstart Hibernate Search
@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; } @Field public String getSummary() { return summary; } @Lob @Field @Boost(0.8) public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...
Quickstart Hibernate Search
@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; } @Field public String getSummary() { return summary; } @Lob @Field @Boost(0.8) public String getText() { return text; } @ManyToOne @IndexedEmbedded public Author getAuthor() { return author; }...
@Entitypublic class Author {
@Id @GeneratedValueprivate Integer id;private String name;@OneToManyprivate Set<Book>
books;}
@Entitypublic class Book { private Integer id; private String title;}
Un secondo esempio
@Entity @Indexedpublic class Author {
@Id @GeneratedValueprivate Integer id;
@Field(store=Store.YES)
private String name;@OneToMany
@IndexedEmbeddedprivate Set<Book>
books;}
@Entitypublic class Book { private Integer id; @Field(store=Store.YES) private String title;}
Struttura dell'indice
String[] productFields = {"summary", "author.name"};
Query luceneQuery = // query builder or any Lucene Query
FullTextEntityManager ftEm = Search.getFullTextEntityManager(entityManager);
FullTextQuery query = ftEm.createFullTextQuery( luceneQuery, Product.class );
List<Product> items = query.setMaxResults(100).getResultList();
int totalNbrOfResults = query.getResultSize();
Query
TotalNbrOfResults= 8.320.000(0.002 seconds)
Uso della DSL
Sui risultati:
• Managed POJO: modifiche alle entitá applicati sia a Lucene che al database
• Paginazione JPA, familiari (standard):
– .setMaxResults( 20 ).setFirstResult( 100 );
• Restrizioni sul tipo, query fulltext polimorifiche:
– .createQuery( luceneQuery, A.class, B.class, ..);
• Projection
• Result mapping
FiltersFullTextQuery ftQuery = s // s is a FullTextSession
.createFullTextQuery( query, Product.class )
.enableFullTextFilter( "filtroMinori" )
.enableFullTextFilter( "offertaDelGiorno" )
.setParameter( "day", “20120412” )
.enableFullTextFilter( "inStockA" )
.setParameter( "location", "Padova" );
List<Product> results = ftQuery.list();
Uso di Infinispan per la distribuzione degli indici
Clustering di un uso Lucene “diretto”
• Usando org.apache.lucene
– Tradizionalmente difficile da distribuire su nodi multipli
– Su qualsiasi cloud
Nodo singoloidea di performance
Infinispan Local
FSDirectory
Infinispan D40
Infinispan D4
Infinispan 0
RAMDirectory
0 5000 10000 15000 20000 25000
Queries/sec
qu
eri
es
pe
r s
eco
nd
Infinispan Local
FSDirectory
Infinispan D40
Infinispan D4
Infinispan 0
RAMDirectory
0 50 100 150 200 250 300 350 400
Write ops/sec
Infinispan Local
FSDirectory
Infinispan D40
Infinispan D4
Infinispan 0
RAMDirectory
0 5000 10000 15000 20000 25000
Queries/sec
qu
eri
es
pe
r s
eco
nd
Infinispan Local
FSDirectory
Infinispan D40
Infinispan D4
Infinispan 0
RAMDirectory
0 50 100 150 200 250 300 350 400
Write ops/sec
Nodi multipliidea di performance
Le scritture non scalano?
Suggerimenti per performance ottimali
• Calibra il chunk_size per l'uso effettivo del vostro indice (evita i read lock evitando la frammentazione)
• Verifica la dimensione dei pacchetti network: blob size, JGroups packets, network interface and hardware.
• Scegli e configura un CacheLoader adatto
Requisiti di memoria
• RAMDirectory: tutto l'indice (e piú) in RAM.
• FSDirectory: un buon OS sa fare un ottimo lavoro di caching di IO – spesso meglio di RAMDirectory.
• Infinispan: configurabile, fino alla memoria condivisa tra nodi
– Flexible
– Fast
– Network vs. disk
Moduli per cloud deployment scalabili
One Infinispan to rule them all
– Store Lucene indexes
– Hibernate second level cache
– Application managed cache
– Datagrid
– EJB, session replication in AS7
– As a JPA “store” via Hibernate OGM
Ingredienti per la cloud• JGroups DISCOVERY protocol
– MPING
– TCP_PING
– JDBC_PING
– S3_PING
• Scegli un CacheLoader
– Database based, Jclouds, Cassandra, ...
Futuro prossimo• Semplificare la scalabilitá in scrittura
• Auto-tuning dei parametri di clustering – ergonomics!
• Parallel searching: multi/core + multi/node
• A component of
– http://www.cloudtm.eu
JPA for NoSQL
NoSQL:la flessibilitá costa
• Programming model• one per product :-(
• no schema => app driven schema• query (Map Reduce, specific DSL, ...)• data structure transpires• Transaction• durability / consistency
Esempio: Infinispan
Distributed Key/Value store
• (or Replicated, local only efficient cache,
invalidating cache)Each node is equal
• Just start more nodes, or kill some
No bottlenecks
• by design
Cloud-network friendly
• JGroups
• And “cloud storage” friendly too!
ABC di Infinispan
map.put( “user-34”, userInstance );
map.get( “user-34” );
map.remove( “user-34” );
É una ConcurrentMap !
map.put( “user-34”, userInstance );
map.get( “user-34” );
map.remove( “user-34” );
map.putIfAbsent( “user-38”, another );
Qualche altro dettaglio su Infinispan
● Support for Transactions (XA)● CacheLoaders
●Cassandra, JDBC, Amazon S3 (jclouds),...● Tree API for JBossCache compatibility● Lucene integration
● Two-fold● Some Hibernate integrations
● Second level cache● Hibernate Search indexing backend
Obiettivi di Hibernate OGM
• Encourage new data usage patterns
• Familiar environment
• Ease of use
• easy to jump in
• easy to jump out
• Push NoSQL exploration in enterprises
• “PaaS for existing API” initiative
Cos'é
• JPA front end to key/value stores• Object CRUD (incl polymorphism and
associations)• OO queries (JP-QL)
• Reuses• Hibernate Core• Hibernate Search (and Lucene)• Infinispan
• Is not a silver bullet• not for all NoSQL use cases
Entitá come blob serializzati?
• Serialize objects into the (key) value• store the whole graph?
• maintain consistency with duplicated objects• guaranteed identity a == b• concurrency / latency• structure change and (de)serialization,
class definition changes
OGM’s approach to schema
• Keep what’s best from relational model• as much as possible• tables / columns / pks
• Decorrelate object structure from data structure
• Data stored as (self-described) tuples• Core types limited
• portability
Query
• Hibernate Search indexes entities• Store Lucene indexes in Infinispan• JP-QL to Lucene query transformation
• Works for simple queries• Lucene is not a relational SQL engine
E ora?
• MongoDB• EHCache / Terracotta• Redis• Voldemort• Neo4J• Dynamo• ... Git? Spreadsheet? ...CapeDwarf?
Q&A
@Infinispan@Hibernate@SanneGrinovero
http://infinispan.orghttp://in.relation.tohttp://jboss.org