hbasecon 2012 | getting real about interactive big data management with lily & hbase - ngdata

13
WWW.NGDATA.COM Making Sense of Data Lily goes shopping – real-time recommendations with HBase HBaseCon, May 2012 Steven Noels – VP Product – @stevenn

Upload: cloudera-inc

Post on 21-Jun-2015

1.463 views

Category:

Technology


0 download

DESCRIPTION

HBase brings interactivity to Hadoop, and allows users to collect, manage and process data in real-time. Lily wraps HBase and Solr in a comprehensive Big Data platform, with HBase-native secondary indexing complementing ad-hoc structured search. Through spare write-cycles during read operations, Lily transforms HBase in an scalable data management engine providing interactive analytics, profile harvesting and real-time recommendations. This talk highlights the architecture of Lily, how it completes HBase, and explains some of its implementation use cases.

TRANSCRIPT

Page 1: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Making Sense of Data

Lily goes shopping – real-time recommendations with HBase

HBaseCon, May 2012

Steven Noels – VP Product – @stevenn

Page 2: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

•  HBase-backed data repository, with batteries included

•  Data model:

•  high-level data model on top of HBase’s byte[]’s

•  schema

•  versioning (schema and data)

•  links, variants

•  Java & REST API's

•  Indexing:

•  through configuration, not implementation

•  incremental and batch index maintenance

•  RowLog: distributed, durable queue for sec. actions

•  Open Source: www.lilyproject.org (Apache License)

Lily Core 2’ recap

HBase

Lily

Solr et al.

RowLog

client app

Page 3: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

•  BigTable model

•  sparseness

•  atomic row updates aka concistency

•  auto-partitioning

•  Apache license

•  A great community led by a Saint J

Why HBase?

Page 4: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Portfolio Overview

Schema and Data Management Total Data Aggregation Real-time Index and Retrieval Security and Enterprise Connectors

Profile Development Context and Activity Tracking

Social Stream Ingestion

Real-time AI Recommendations Industry algorithms and rules

Trend Analytics Pattern Detection

open source  

commercial availability  

Page 5: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Some of the larger Lily deployments

•  media

•  aggregation, database publishing and online archives

•  finance

•  real-time identity fraud detection

•  retail banking

•  contextualized (time+loc+person) mobile coupons

•  retail

•  e-commerce platform: product catalog, consumer data store, real-time indexing

Lily (=HBase) In Use

Page 6: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Collaborative Filtering?

Recommend items similar to a user’s highly-preferred items

Page 7: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Collaborative Filtering is … Matrixes

Sean likes “Scarface” a lot Robin likes “Scarface” somewhat Grant likes “The Notebook” not at all …

(123,654,5.0)!(789,654,3.0)!(345,876,1.0)!…!

(345,654,4.5)!…!

(Magic)

Grant may like “Scarface” quite a bit …

Page 8: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Personalized offers

Contextualized recommendations

Item Acitvity Profile

creditcard statements

shops & merchants product families offers/coupons

Page 9: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Lily Core Repository

Fitting Recommendations into the Lily Architecture

indexes

activity storerowlog

LILY CRUD API

data, activity, profile scoring

co-occurencelookup matrix

read/write demultiplexer

LILY recommender engine

Steven [email protected]

www.ngdata.comtelephone: +32 9 33 88 220

Gent (Belgium)

Makers ofALS

k-m

ea

ns

pro

pe

nsit

y

cu

sto

m ..

.

algorithm support

data store

profile store

Lily/HBase Secondary Indexes

Page 10: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

•  Transaction-based preferencing

•  Pluggable preference strategies, using Lily-based data (HBase&Solr) for decision making •  e.g. credit card statement = transactions between users and product

families

•  Preference weighting

•  Ingest: REST API, bulk support

•  Real-time updating of the recommendation model

•  Profile Store

•  Profile activities can be preferenced

•  Support for Profile behavior analysis

Preferencing aka Feeding the Matrix

Page 11: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

•  Recommender

•  Pluggable recommender strategies, using Lily-based data (HBase&Solr) for decision making

•  Multi-model support: user-item & item-user recommendations

•  Estimation of both preferenced and non-preferenced items

•  Geolocation-based recommendations

•  Re-scoring

•  REST API

•  (Planned)

•  Support for Classifications (scenario - Recommend me all (possible) coffee drinkers)

•  Matrix / recommendation indexing

Making recommendations

Page 12: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

•  Secondary indexes (= Lily Core!)

•  indexes are defined through configuration

•  single or multi-field indexes

•  range queries and prefix queries

•  asc or desc sorted results

•  can read huge, sorted lists

•  synchronously updated: index updates are applied by rowlog secondary actions

•  online building of new indexes (no table locks)

•  MapReduce integration

•  SolrCloud integration

•  Index shards and configuration managed through ZooKeeper

Other upcoming Lily Features

Page 13: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Making Sense of Data

Questions? Thank you!