adventures in discovery with cassandra and solr

Post on 08-May-2015

1.598 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

For any venture, storing your data is just the first step in making sense of it. How do you make your system discoverable? How do you tune your relevancy to accommodate real-time updates? In this session, we explore pairing Cassandra with Solr using Datastax Enterprise Search, and look at different search paradigms to help your users find patterns in your data.

TRANSCRIPT

#CASSANDRAEU CASSANDRASUMMITEU

Adventures in Discoverability with C* and Solr

Patricia Gorla, Systems Engineer@patriciagorla@o19s

#CASSANDRAEU CASSANDRASUMMITEU

• Solr

• Cassandra

• Information retrieval

About Me

Paul Hostetler - phostetler.com

#CASSANDRAEU CASSANDRASUMMITEU

How Do I Find What I’m Looking For?

Simple Complex

Aristotle’s birthplace? All ancient Greek philosophers?

Coordinates of Stagira? All cities within 100km of Stagira

#CASSANDRAEU CASSANDRASUMMITEU

How Do I Find What I’m Looking For?

Simple Complex

Aristotle’s birthplace? All ancient Greek philosophers?

Coordinates of Stagira? All cities within 100km of Stagira

select birthPlacewhere name = “Aristotle”;

select coordwhere name = “Stagira”;

#CASSANDRAEU CASSANDRASUMMITEU

How Do I Find What I’m Looking For?

Simple Complex

Aristotle’s birthplace? All ancient Greek philosophers?

Coordinates of Stagira? All cities within 100km of Stagira

select birthPlacewhere name = “Aristotle”;

create index on tag;

select *where tag = “Greek philosophy”;

select coordwhere name = “Stagira”;

???

#CASSANDRAEU CASSANDRASUMMITEU

How Do I Find What I’m Looking For?

Simple

Aristotle’s birthplace? All ancient Greek philosophers?

Coordinates of Stagira? All cities within 100km of Stagira

q=Aristotle&fl=birthPlace q=Greek philosophy

q=Stagira&fl=point q=*:*&fq={!geofilt pt=40.530, 23.752 sfield=point d=100}

#CASSANDRAEU CASSANDRASUMMITEU

• Google Site Search

• MySQL ‘like’ statements

Approaches to Search

Seth Casteel - littlefriendsphoto.com

#CASSANDRAEU CASSANDRASUMMITEU

Approaches to Search

• Full-text search

• Ranking (Scoring)

• Tokenization

• Stemming

• Faceting

#CASSANDRAEU CASSANDRASUMMITEU

Approaches to Search

• Full-text search

• Ranking (Scoring)

• Tokenization

• Stemming

• Facetingand many more!

#CASSANDRAEU CASSANDRASUMMITEU

[1] Pleasure in the job puts perfection in the work.

[2] Education is the best provision for the journey to old age.

[3] If some animals are good at hunting and others are suitable for hunting, then the gods must clearly smile on hunting.

[4] It is the mark of an educated mind to be able to entertain a thought without absorbing it.

Inverted Index

Term Freq Documents

education 2 [2] [4]

hunting 3 [3]

perfection 1 [1]

#CASSANDRAEU CASSANDRASUMMITEU

<fieldType name="text_general" class="solr.TextField" > <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"/> <filter class="solr.StopFilterFactory” ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer></fieldType>

Defining Field Types

#CASSANDRAEU CASSANDRASUMMITEU

Index-side Analysis

Hope is a waking dream

Hope is a waking dream.

Hope waking dream

hope wake dream

hope waking dream

Punctuation

Stop Words

Lowercase

Stemming

Hope is a waking dream

Hope is a waking dream.

Hope waking dream

hope wake dream

hope waking dream

Punctuation

Stop Words

Lowercase

Stemming

hope wake dreamdesire awake wish

Synonyms

#CASSANDRAEU CASSANDRASUMMITEU

Query-side Analysis

#CASSANDRAEU CASSANDRASUMMITEU

Facetingfacet_fields: { tags: [hunting: 1, education: 2, work: 2], locations: [Stagira: 5, Chalcis: 3]}

#CASSANDRAEU CASSANDRASUMMITEU

• Pay Google more $$

• MySQL ‘like’ shards

• Master/Slave replication

• SolrCloud

Approaches to Distribution

#CASSANDRAEU CASSANDRASUMMITEU

“Distributed search is hard.”

#CASSANDRAEU CASSANDRASUMMITEU

• Full-text search

• Tokenization

• Stemming

• Date ranges

• Aggregation

• High Availability

• Distributed Nature

Solr + Cassandra: Datastax Enterprise

#CASSANDRAEU CASSANDRASUMMITEU

Person

Examining DBpedia.org Datasets

<http://xmlns.com/foaf/0.1/name> "Aristotle"@en .<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> Person .<http://purl.org/dc/elements/1.1/description> "Greek philosopher"@en .<http://dbpedia.org/ontology/birthPlace> Stagira<http://dbpedia.org/ontology/deathPlace> Chalcis .

<http://dbpedia.org/resource/Stagira> <http://www.opengis.net/gml/_Feature> .<http://dbpedia.org/resource/Stagira#lat> "40.5916667" .<http://dbpedia.org/resource/Stagira#long> "23.7947222" .<http://dbpedia.org/resource/Stagira#point> "40.591667 23.7947222"@en .

Place

#CASSANDRAEU CASSANDRASUMMITEU

“Love is a single soul inhabiting two bodies.”

#CASSANDRAEU CASSANDRASUMMITEU

curl http://localhost:8983/solr/solr.person/q=Aristotle

Querying for data

#CASSANDRAEU CASSANDRASUMMITEU

curl http://localhost:8983/solr/solr.location/select?q=*:* &spatial=true&fq={!geofilt pt=40.53027,23.7525 sfield=point d=100}

Filtering by location

#CASSANDRAEU CASSANDRASUMMITEU

“There is no great genius without a mixture of madness.”

#CASSANDRAEU CASSANDRASUMMITEU

Unified Schema<fields>

<field name="id" type="string" />

<field name="name" type="text" />

<dynamicField name="*Date" type="date" />

<dynamicField name="*Place" type="text" />

<dynamicField name="*Point" type="location" />

<dynamicField name="*_tag" type="text" />

</fields>

Schema.xml

#CASSANDRAEU CASSANDRASUMMITEU

curl http://localhost:8983/solr/resource/solr.location/schema.xml \

--data-binary @solr/location_schema.xml \

-H 'Content-type:text/xml; charset=utf-8 '

curl http://localhost:8983/solr/resource/solr.location/solrconfig.xml \

--data-binary @solr/location_solrconfig.xml \

-H 'Content-type:text/xml; charset=utf-8 '

curl http://localhost:8983/solr/admin/cores?action=CREATE&name=solr.location

Upload to DSE, Create Core

#CASSANDRAEU CASSANDRASUMMITEU

Location Schemacqlsh:solr> DESC COLUMNFAMILY location;

CREATE TABLE location (

id text PRIMARY KEY,

"_docBoost" text,

"_dynFld" text,

location text,

name text,

solr_query text,

tags text

) WITH COMPACT STORAGE AND

bloom_filter_fp_chance=0.010000 AND

caching='KEYS_ONLY' AND

comment='' AND

dclocal_read_repair_chance=0.000000 AND

gc_grace_seconds=864000 AND

read_repair_chance=0.100000 AND

replicate_on_write='true' AND

populate_io_cache_on_flush='false' AND

compaction={'class': 'SizeTieredCompactionStrategy'} AND

compression={'sstable_compression': 'SnappyCompressor'};

CREATE INDEX solr_location__docBoost_index ON location ("_docBoost");

...

CREATE INDEX solr_location_solr_query_index ON location (solr_query);

Cassandra Schema

#CASSANDRAEU CASSANDRASUMMITEU

Solr

• No multiValued fields

• No JOIN*

What Changes

Cassandra

• No composite columns

• No counter columns

#CASSANDRAEU CASSANDRASUMMITEU

• Fault tolerant, available search

Bringing it All Together

#CASSANDRAEU CASSANDRASUMMITEU

“Thank you.”

#CASSANDRAEU CASSANDRASUMMITEU

THANK YOU

pgorla@opensourceconnections.com@patriciagorla@o19sAll information, including slides, are on http://github.com/pgorla/million-books

top related