elasticsearch for data mining

13
ElasticSearch Wm. Barrett Simms [email protected] @wbsimms

Upload: william-simms

Post on 17-Jul-2015

682 views

Category:

Technology


10 download

TRANSCRIPT

Page 1: ElasticSearch for data mining

ElasticSearch

Wm. Barrett Simms

[email protected]

@wbsimms

Page 2: ElasticSearch for data mining

About Me

Software Developer

Agile Team Member

Team LeadAgile

Advocate

SDLC Implementer

Page 3: ElasticSearch for data mining

SDLC

Page 4: ElasticSearch for data mining

Big Data

“Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications.”

- Wikipedia

Page 5: ElasticSearch for data mining

The 3 Vs

• Volume• A few Gigabytes -> Petabyte

• Velocity• Arrives quickly

• Variety• Multiple types of data

Page 6: ElasticSearch for data mining

What is ElasticSearch?

• You know, for search…

• Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTfulweb interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.

Page 7: ElasticSearch for data mining

Let’s break that down…

• Distributed• Run on multiple servers simultaneously

• Multitenant• The same system serving different groups of data

• REST• Web-based programming interface

• NoSQL for storage• Uses JSON

• Open Source

Page 8: ElasticSearch for data mining

So what is ElasticSearch?

• It’s a search engine

• Stores data on multiple machines

• Stores multiple types of data

• Stores in JSON format

• REST interface• There are managed and unmanaged programming interfaces

• .NET• Java• NodeJs• JavaScript• Scala• Clojure

• PHP• Perl• Python• Ruby• Haskell• Erlang

• ColdFusion• SmallTalk• Ocaml• CommandLine• EventMachine• Go

Page 9: ElasticSearch for data mining

Administration Tools

• CURL• CommandLine REST interface

• Marvel

Page 10: ElasticSearch for data mining

Definitions• Cluster

• One or more nodes

• Document• A stored record

• Field• A document has a list of fields, or key-value pairs

• Index• Think of this as a database

• Term• This is an exact value to be matched (“FOO”, “Foo”, “foo”) are not the same term

• Type• Similar to a database

• Text• Field value• Analyzed into terms• Stored in the index

Page 11: ElasticSearch for data mining

ElasticSearch Resources

• ElasticSearch• elasticsearch.org

• ElasticSearch NEST• .NET client

• nest.azurewebsites.net

Page 12: ElasticSearch for data mining

Installation

• Get the binaries

• Unzip

• Run elasticsearch.bat

Page 13: ElasticSearch for data mining

Contact Me

Barrett Simms

[email protected]

http://wbsimms.com

Twitter: @wbsimms

Phone: 781.405.4686