Download - Tldr solr-courseload
tl;dr: Solr
Dumbledore: "I use the Pensieve. One simply siphons the excess thoughts from one's mind, pours them into the basin, and examines them at one's leisure. It becomes easier to spot patterns and links, you understand, when they are in this form."Harry: "You mean... that stuff's your thoughts?"Dumbledore: "Certainly."
Dumbledore: "I use the Pensieve. One simply siphons the excess thoughts from one's mind, pours them into the basin, and examines them at one's leisure. It becomes easier to spot patterns and links, you understand, when they are in this form."Harry: "You mean... that stuff's your thoughts?"Dumbledore: "Certainly."
Solr is Lucene-based Lucene = text search engine library written in Java All kinds of crazy goodies:
Ranked search Multiple indexing Simultaneous read & write Date-range search ...the list goes on
Platform-independent (thanks, Java!) Fast & efficient
Index size ~= 20-30% size of indexed data Very high throughput indexing (95GB/hour)
Solr is NoSQL NoSQL == Non-relational database RDBMS metaphor:
One database One table Denormalized data Query parameters instead of SQL “Documents” instead of rows
Bottom line: it's a persistent datastore, and we use it to store data persistently.
Vocabulary Master Slave Replication Document API
Master There can be only one Read & write operations Must be secure Younger, stronger brother of production DB Home base for Solr slaves
Slave There are many copies They have a plan: replication Read-only Gets copy of index from the Solr master every k minutes
Responds to queries
Replication Slaves –-HTTP GET--> Master Replication is differential Configuration is set in solrconfig.xml http://tinyurl.com/DESolrRepl
Document RDBMS = row; Solr = document Denormalized relational data
my friend,
RDBMS = row; Solr = document Denormalized relational data
Flatten a bunch of related RDBMS rows into a single Solr document
API Application programming interface Primary means of communicating with Solr is an HTTP API
The Good Stuff:Unix & Diagnostics
“This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.”
- Doug McIlroy
Examples of things beyond the scope of this talk: Cat Awk Grep Sed Cut Wc Sort Tail Head
Great read: http://matt.might.net/articles/sql-in-the-shell/
The Good Stuff:Unix & Diagnostics
You cannot effectively troubleshoot without parsing logs You cannot effectively parse logs without good text-parsing tools:
Cat Awk Grep Sed Cut Wc Sort Tail Head
No *nix OS? PowerShell!
The Good Stuff:Unix & Diagnostics
Example commands: tail -f /var/log/celery/project.log
Output the Celery log to stdout, in real time cat /ebs2/log/celery/project.log|grep -oE 'BUID:([0-9]{0,5})'|grep -oE '[0-9]{0,5}'|sort --unique Parse the Celery log, printing a list of unique BUIDs
cat /ebs2/log/celery/project.log|grep -B 15 "DocumentInvalid"|grep -E 'Download complete for BUID ([0-9]{1,5})'|awk '{sub(/\[/, "");print $1 " " $2 " " $7 ":" $8}' Parse the Celery log, outputting a list of BUID the feed file for which failed for some reason:
Conclusion RTFreakingM
http://wiki.apache.org/solr/SolrQuerySyntax http://wiki.apache.org/solr/SolrCaching http://wiki.apache.org/solr/SchemaXml http://django-haystack.readthedocs.org/en/latest/
Experiment & tinker & reinvent the wheel Get comfortable with the command line – you can't effectively administer
Solr (or any sufficiently complex system) with a web GUI Read the logs Connect Solr behavior to application operations