new usage model for real-time analytics by dr. william l. bain at big data spain 2014

49
NEW USAGE MODEL FOR REAL-TIME ANALYTICS WILLIAM L. BAIN CEO AT SCALEOUT SOFTWARE, INC. SCALEOUT SOFTWARE, INC.

Upload: big-data-spain

Post on 07-Jul-2015

276 views

Category:

Technology


3 download

DESCRIPTION

Operational systems manage our finances, shopping, devices and much more. Adding real-time analytics to these systems enables them to instantly respond to changing conditions and provide immediate, targeted feedback. This use of analytics is called “operational intelligence,” and the need for it is widespread.

TRANSCRIPT

Page 1: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

NEW USAGE MODEL FOR REAL-TIME ANALYTICS

WILLIAM L. BAINCEO AT SCALEOUT SOFTWARE, INC. SCALEOUT SOFTWARE, INC.

Page 2: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

Using In-Memory Models ofReal-World Systems for Operational Intelligence

Copyright © 2014 by ScaleOut Software, Inc.

Big Data HispanoNovember 17, 2014

Bill Bain, CEO ([email protected])

Page 3: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

2 ScaleOut Software, Inc.

• What Is Operational Intelligence?• Example: Tracking Cable Viewers• Implementing OI Using an In-Memory Data Grid:

• Distributing the Data Across a Cluster• Integrating Data-Parallel Analysis• Building an In-Memory Model

• More Examples of In-Memory Models• Comparison to Spark and Storm• Implementing an Example in Financial Services• Using In-Memory Hadoop MapReduce for OI

Agenda

Page 4: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

3 ScaleOut Software, Inc.

• Dr. William Bain, Founder & CEO• Career focused on parallel computing – Bell Labs, Intel, Microsoft• 3 prior start-ups, last acquired by Microsoft and product now ships as

Network Load Balancing in Windows Server

• ScaleOut Software develops and markets In-Memory Data Grids,software middleware for:• Scaling application performance and • Providing operational intelligence using• In-memory data storage and computing• Nine years in the market, 400 customers,

10,000 servers; sample customers:

About the Speaker

Page 5: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

4 ScaleOut Software, Inc.

Goal: Provide immediate feedback to a system handling live data.A few examples:• Ecommerce: for personalized, real-time recommendations• Equity trading: to minimize risk during a trading day• Reservations systems: to identify issues, reroute, etc.• Credit cards & wire transfers: to detect fraud in real time• Smart grids: to optimize power distribution & detect issues

Online Systems Need Operational Intelligence

Page 6: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

5 ScaleOut Software, Inc.

• Goals:• Make real-time, personalized upsell offers.• Immediately respond to service issues.• Track aggregate behavior to identify patterns, e.g.:

• Total instantaneous incoming event rate• Most popular programs and # viewers by zip code

• Requirements:• Track events from 10M cable boxes with 25K events/sec (2.2B/day).• Correlate, cleanse, and enrich events per rules (e.g. ignore fast channel

switches, match channels to programs).• Be able to feed enriched events to recommendation engine within 5 sec.• Immediately examine any cable box (e.g., box status) & track statistics.

Example: Track Cable TV Viewers

©2011 Tammy Bruce presents LiveWire

Page 7: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

6 ScaleOut Software, Inc.

Based on a simulated workload for San Diego metropolitan area:• Continuously correlates and

enriches telemetry from 10M simulated set-top boxes (from synthetic load generator).

• Processes more than 30K events/second.

• Enriches events with program information every second.

• Tracks aggregate statistics (e.g., top 10 programs by zip code) every 10 secs.

The Result: An OI Platform

Real-Time Dashboard

Page 8: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

7 ScaleOut Software, Inc.

Big Data Analytics

Real-Time vs. Batch Analytics

Static data setsPetabytesDisk storageMinutes to hoursBest uses:

• Analyzing warehoused data

• Mining for long-term trends

Live data setsGigabytes to terabytesIn-memory storageSeconds to minutesBest uses:

• Tracking live data• Immediately

identifying trends and capturing opportunities

• Providing immediate feedback

AnalyticsServer

hServer

HadoopIBM

TeradataSASSAP

Real-Time Batch

Real-time“Operational Intelligence”

Batch“Business Intelligence”

Page 9: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

8 ScaleOut Software, Inc.

• Operational intelligence can co-exist with business intelligence:• Processes streaming data close to its sources.• Provides real-time, “tactical” feedback (e.g., recommendations, alerts).• Transforms data for storage in the data warehouse (ETL).• Data warehouse provides “strategic” guidance.

• Using the same tool set (e.g., Hadoop MapReduce) lowers TCO:• Leverages common skill set.• Simplifies design (e.g., loading data into HDFS).

Integrated View of Analytics

Page 10: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

9 ScaleOut Software, Inc.

• To keep up with fast growing “live” workloads &maintain fast response times:• Track state of entities within a

live system.• Reliably process updates to

data set in real-time.

• To identify and respond to trends in fast-changing data:• Enrich & evaluate “live” data set

in real time.• Respond to identified

patterns within seconds.

Challenges for Operational Intelligence

0

50

100

150

200

250

300

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

Mill

ions

Growth in Web Servers

Source:Netcraft

0

500

1000

1500

2000

2500

3000

3500

4000

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

Exeb

ytes

Growth in “Big Data”

“More data has been created in the past three years than in the past 40,000.”

Page 11: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

10 ScaleOut Software, Inc.

• In-memory data grid(IMDG) holds active entities undergoing state changes in memory.

• Backing store optionally holds large population of entities.

• IMDG processes incoming stream of state changes.

• Analytics engine examines entities in real time and generates alerts within seconds as needed.

In-Memory Architecture forOperational Intelligence

Page 12: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

11 ScaleOut Software, Inc.

In-Memory Data Grid (IMDG) stores “live” data in a cluster:• Fits in the business logic layer:

• Follows object-oriented view of data(vs. relational view).

• Stores collections of Java/.NET/C++ objects shared by multiple clients.

• Uses create/read/update/delete and query APIs to access data.

• Implemented across a cluster of servers or VMs:• Scales storage and throughput

by adding servers.• Provides high availability

in case a server fails.

In-Memory Data Grid

Page 13: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

12 ScaleOut Software, Inc.

• IMDG’s collections of objects act like process collections:• Unstructured, typically instances of a class

(stored as serialized blobs)• Individually accessible / update-able

• IMDG adds attributes:• Accessible by global key• Query-able by properties• Highly available• Optional timeouts• Distributed locking• Integration with a backing store• Optional dependency relationships• Asynchronous event handling

IMDGs Use Object-Oriented Model

Basic “CRUD” APIs:• Create(key, obj, tout)• Read(key)• Update(key, obj)• Delete(key)and…• Lock(key)• Unlock(key)

Objectkey

Page 14: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

13 ScaleOut Software, Inc.

In-Memory, Data-Parallel Computing• Integrates with IMDG data storage to minimize data motion.• Ex.: Parallel Method Invocation (PMI), an object-oriented version

of data-parallel computing from the HPC community:• Selects objects using a parallel query on data hosted in the IMDG.• Runs user-defined methods in parallel across the cluster and merges

results.

Analyze Data (Eval)

Combine Results (Merge)

In-Memory Data Grid Runs Data-Parallel Computation.

Page 15: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

14 ScaleOut Software, Inc.

Achieving Linear SpeedupAvoid data motion (network or disk I/O) which limits throughput:

Page 16: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

15 ScaleOut Software, Inc.

Object-oriented model tracks and analyzes real-world entities:

In-Memory Model of “Live” Entities

In-MemoryState in“IMDG”

NoSQLStorage

Real-TimeData Parallel

Analysis

Page 17: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

16 ScaleOut Software, Inc.

• Each cable box is represented as an object in the IMDG:• Object holds raw & enriched event streams, viewer parameters, and

statistics.• IMDG captures

incoming events by updating objects.

• IMDG uses data-parallel computation to:• immediately

enrich box objectsto generate alerts to recc. engine, and

• continuouslycollect and reportglobal statistics.

Example: Cable Set-Top Boxes

Page 18: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

17 ScaleOut Software, Inc.

Fast map/reduce reconciles inventory and order systems for an online retailer:• Challenge: Inventory and online

order management are handledby different applications.• Reconciled once per day.• Inaccurate orders reduces margins.

• Solution:• Host SKUs in IMDG updated in real

time by order & inventory systems.• Use MapReduce to reconcile in two minutes.• Enables real-time reconciliation to ensure accurate orders.

Example in Ecommerce: Inventory Management

Page 19: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

18 ScaleOut Software, Inc.

• IMDG holds customerinformation for activeWeb users.

• IMDG saves/retrieves customer information from backing store.

• Web browsers send activity information to analytics engine.

• IMDG updates customer history andpreferences.

• Analytics engine identifies browsing andbuying patterns.

• Analytics engine makes suggestions in real-time. Also sends email follow-ups.

Example: Web Shopping

Page 20: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

19 ScaleOut Software, Inc.

Brick and mortar stores use OI to compete with online experience:• IMDG tracks opt-in customers to make recommendations.• RFID tags identify product selection and availability in showroom. • Analytics engine sends real-time advisories to sales staff via tablet.

Example: Retail Shopping

Page 21: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

20 ScaleOut Software, Inc.

Focus: accelerating business intelligence using in-memory computing:• In-memory computing to accelerate and extend

Hadoop MapReduce using data-parallel operators in Scala.

• Stores data as “resilient distributed datasets” (RDDs):• Distributed across cluster• Immutable• Hold data from/output to HDFS.• Manages data stream as a sequence of RDDs.

• Comparison to IMDG:• Not designed for operational systems:

• Lacks high availability (uses lineage).• Intended for data-parallel operations:

• Lacks CRUD APIs on individual objects.

Comparison: IMDGs to Spark

Page 22: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

21 ScaleOut Software, Inc.

• Focus: continuous processing of input streams• Storm implements pipelined execution of tasks by “bolts” on

incoming data streams.• Streams can be distributed to bolts with configurable mappings.• Developer controls the number of tasks per bolt.

• Storm uses a centralized master node and Zookeeper for fault-tolerance.

• Issues:• Managing global state• Minimizing data motion• Complexity / tuning

Comparison to Storm

Page 23: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

22 ScaleOut Software, Inc.

• Hedge fund tracks a set of hedging strategies:• Strategies can cover various market

sectors, such as high-tech, automotive, energy, consumer, real estate, etc.

• Each strategy contains list of holdings and rules for managing the holdings (such as target allocations).

• Updates to market data continuously arrive during the trading day.

• The challenge: update and analyze a large population of hedging strategies to immediately alert traders.

Implementing an Example in FinServ

Page 24: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

23 ScaleOut Software, Inc.

• The IMDG holds hedging strategies as an object-oriented collection.• Updates to market data

are managed as a series ofsnapshot objects.

• The IMDG performsrepeated data-parallel analysis on hedging strategies to generatealerts.

• Merges alerts and feeds to traders in real time.

• IMDG automatically and dynamicallyscales its throughput to handle newhedging strategies by adding servers.

In-Memory Model

Page 25: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

24 ScaleOut Software, Inc.

Step 1: Select all objects using parallel query of strategy objects:• Query spec matches data’s object-oriented properties.• Selected objects are fed to the analysis engine on each local server.

Implementing the Analysis

Page 26: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

25 ScaleOut Software, Inc.

Java Example: Parallel Querypublic class Portfolio {

private long id;private Set<Stock> longPositions;private Set<Stock> shortPositions;private double totalValue;private Region region;private boolean alerted; // alert for trading

@SossIndexAttribute // query-able propertypublic double getTotalValue() {…}@SossIndexAttribute // query-able propertypublic Region getRegion() {…}

public Set<Long> evalPositions(MarketSnapshot ms) {…};}NamedCache pset = CacheFactory.getCache(“portfolios");

Set<Portfolio> res = pset.queryObjects(Portfolio.class, and(greaterThan(“totalValue”, 1000000),

equals(“region”, Region.US)));

Page 27: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

26 ScaleOut Software, Inc.

Step 2: Create parallel methods to update and analyze the queried collection of hedging strategies:• “Eval” method applies market snapshot to an instance of a strategy

object:• Compare to a MapReduce mapper; adds an input parameter.• Updates the strategy object’s positions.• Analyzes the positions for a deviation from allowed rules.• Optionally generates an alert.

• “Merge” method combines alerts across the collection of strategies:• Compare to a MapReduce combiner.• Uses binary combining.• Is applied globally to the object collection by the IMDG (unlike a Mapreduce

reducer).

• Note: both methods access hydrated objects; avoid need for CRUD access.

Implementing the Analysis

Page 28: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

27 ScaleOut Software, Inc.

• Create method to analyze a queried portfolio and another method to pair-wise merge the result sets of alerted portfolios:

Java Example: Parallel Method Invocation

public class PortfolioAnalysis implementsInvokable<Portfolio, MarketSnapshot, Set<Long>>

{public Set<Long> eval(Portfolio p, MarketSnapshot ms)

throws InvokeException {

// update portfolio and return id if alerted:return p.evalPositions(ms);

}

public Set<Long> merge(Set<Long> set1, Set<Long> set2) throws InvokeException {

set1.addAll(set2);return set1; // merged set of alerted portfolio ids

}}

Page 29: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

28 ScaleOut Software, Inc.

• Run a parallel method invocation on a queried set of portfolios and return set of ids for alerted portfolios:

Java Example: Parallel Method Invocation

NamedCache pset = CacheFactory.getCache(“portfolios");

InvokeResult alertedPortolios = pset.invoke(PortfolioAnalysis.class,Portfolio.class, and(greaterThan(“totalValue”, 1000000), // query spec

equals(“region”, Region.US)),marketSnapshot, // parameters...);

System.out.println("The alerted portfolios are" + alertedPortfolios.getResult());

Page 30: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

29 ScaleOut Software, Inc.

• IMDG ships user’s code and libraries to its servers.• IMDG automatically schedules analysis operations across all grid

servers and cores:• The analysis runs on all objects selected

by the parallel query.• Each grid server analyzes its locally stored

objects to minimize data motion.• Parallel execution ensures fast

completion time:• IMDG automatically distributes

workload across servers/cores.• Scaling the IMDG automatically

handles larger data sets.

Running the Analysis

Page 31: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

30 ScaleOut Software, Inc.

• The IMDG automatically merges all analysis results:• The IMDG first merges all results within each grid server in parallel.• It then merges results across all grid servers to create one combined

result.• Efficient parallel merge

minimizes the delay incombining all results.

• The IMDG delivers thecombined result to theinvoking application as one object.

Merging the Results

Page 32: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

31 ScaleOut Software, Inc.

• In-memory analysis delivers a set of alerts to traders every 300 msec.

• Enables the trader to examine strategy details in real time:

Output: Real-Time Alerts

Page 33: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

32 ScaleOut Software, Inc.

• Measured a similar financial services application (back testing stock trading strategies on stock histories)

• Hosted IMDG in Amazon EC2 using 75 servers holding 1 TB of stock history data in memory

• IMDG handled a continuous stream of updates (1.1 GB/s)• Results: analyzed 1 TB in 4.1 seconds (250 GB/s) with linear scaling

Sample Performance Results for PMI

Page 34: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

33 ScaleOut Software, Inc.

Benefits:• Enables use of standard Hadoop MapReduce for operational

intelligence.• Accelerates data access by holding data in memory.• Analyzes and updates “live” data.• Reduces overheads of standard

Hadoop distributions:• Batch scheduling• Disk access• Data shuffling• Mandatory key sorting

• Enables new features, e.g.:• Global combining, optional sorting

In-Memory MapReduce

Page 35: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

34 ScaleOut Software, Inc.

• A Hadoop distribution does not have to be installed unless HDFS is used.• The developer starts MapReduce applications from a remote workstation.• The IMDG automatically builds a reusable “invocation grid” of JVMs on the

grid’s servers for PMI and ships the application’s jars.• Results are stored in the IMDG, HDFS, or optionally globally merged and

returned to the remote workstation.

Running MapReduce on an IMDG

Page 36: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

35 ScaleOut Software, Inc.

Run In-Memory MR with YARN• YARN transparently integrates batch and in-memory MapReduce into a

single execution framework with shared access to HDFS.• For example, IMDG can transparently run Apache Hive in-memory.

Example of ScaleOut hServer with HortonworksExample of Hive

Running on IMDG

Page 37: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

36 ScaleOut Software, Inc.

Run MapReduce as two PMI phases:• Data can be input from either the

IMDG or an external data source.• Works with any input/output format

compatible with the Apache distribution.

• IMDG uses its data-parallel execution engine (PMI) to invoke the mappers and the reducers.• Eliminates batch scheduling

overhead.• Intermediate results are stored

within the IMDG.• Minimizes data motion between the

mappers and reducers.• Allows optional sorting.

• Output of a single reducer/combiner optionally can be globally merged.

Implementing MapReduce

Page 38: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

37 ScaleOut Software, Inc.

• IMDG adds grid input format for accessing key/value pairs held in the IMDG.

• MapReduce programs optionally can output results to IMDG with grid output format.

• Grid Record Reader optimizes access to key/value pairs to eliminate network overhead.

• Applications can access and update key/value pairs as operational data during analysis.

Accessing IMDG Data for M/R

Page 39: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

38 ScaleOut Software, Inc.

• IMDG adds Dataset Record Reader (wrapper) to cache HDFS data during program execution.

• Hadoop automatically retrieves data from IMDG on subsequent runs.

• Dataset Record Reader stores and retrieves data with minimum network and memory overheads.

• Tests with Terasortbenchmark have demonstrated 11Xfaster access latency over HDFS without IMDG.

Optional Caching of HDFS Data

Page 40: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

39 ScaleOut Software, Inc.

IMDG needs multiple in-memory storage models:• Named cache, optimized for

rich semantics on large objects:• Property-based query• Distributed locking• Access from remote grids

• Named map, optimized for efficient storage and bulk analysis (e.g., MapReduce):• Highly efficient object storage• Pipelined, bulk-access

mechanisms

In-Memory Storage Models

Page 41: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

40 ScaleOut Software, Inc.

In-Memory Concurrent Map:• Stores key/value pairs in chunks.• Allows CRUD operations on kvps.• Automatically organizes chunks into

splits.• Uses per-split hash table to access

keys and manage multi-valued keys.

• Stores shuffled data set between mappers and reducers.

• Pipelines chunks to mappers and from reducers.

• Optionally uses memory mapped files to reduce access latency.

• Provides support for sorting keys.

In-Memory Storage Optimizations

Page 42: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

41 ScaleOut Software, Inc.

• MapReduce optimizations:• Optional sorting• Optional multicast of parameters to mappers• Optional O(logN) global combining (avoids

single, sequential reducer)• Optional HDFS caching• Optional reuse of JVMs across jobs

• Measured performance:• Startup times reduced to a few milliseconds• Word count benchmark shows 20X speedup.• Real-world example shows >40X speedup.

• Current limitations:• No specific security for multi-tenancy• Intermediate data must fit in the IMDG

In-Memory M/R Optimizations

Page 43: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

42 ScaleOut Software, Inc.

• Re-use in-memory context across MapReduce jobs:

Accelerating Start-Up Times

public static void main(String argv[]) throws Exception {//Configure and load the invocation grid InvocationGrid grid = HServerJob.getInvocationGridBuilder("myGrid").

// Add JAR files as IG dependenciesaddJar("main-job.jar"). addJar("first-library.jar").// Add classes as IG dependenciesaddClass(MyMapper.class). addClass(MyReducer.class).// Define custom JVM parameterssetJVMParameters("-Xms512M -Xmx1024M").load();

//Run 10 jobs on the same invocation gridfor(int i=0; i<10; i++) {

Configuration conf = new Configuration();

//The preloaded invocation grid is passed as the parameter to the jobJob job = new HServerJob(conf, "Job number "+i, false, grid);

//......Configure the job here.........

//Run the jobjob.waitForCompletion(true);

}//Unload the invocation grid when we are donegrid.unload();

}

Page 44: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

43 ScaleOut Software, Inc.

• Online systems need operational intelligence on “live” data for immediate feedback.

• Operational intelligence can be implemented using an IMDG integrated with data-parallel analysis.

• IMDGs track “live” state:• Model real-world entities as a

highly available object collection.• Enable updates to track changes.• Use data-parallel computation for

immediate feedback with low latency.

• Can run standard MapReduce.

Recap

Page 45: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

Thank you!

44

Page 46: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

45 ScaleOut Software, Inc.

• Mark class properties as indexes for query:

• Define a query using these properties:

Parallel Query Example (C#)

class Stock {[SossIndex]public string Ticker { get; set; }public decimal TotalShares { get; set; }public decimal Price { get; set; }}

NamedCache cache = CacheFactory.GetCache("Stocks");var q = from s in cache.QueryObjects<Stock>()

where s.Ticker == "GOOG" || s.Ticker == "ORCL"select s;

Console.WriteLine("{0} Stocks found", q.Count());

Page 47: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

46 ScaleOut Software, Inc.

• Create method to analyze each queried stock object:

• Create method to pair-wise merge the results:

Example of Analysis Code (C#)

static decimal eval(Stock stock, StockCalcParams params){

return stock.Price * stock.TotalShares;}

static decimal merge(decimal r1, decimal r2){

return r1 + r2;}

Page 48: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

47 ScaleOut Software, Inc.

• Run a parallel method invocation:

Invoking the Parallel Analysis (C#)

NamedCache cache = CacheFactory.GetCache("Stocks");

decimal valueOfSelectedStocks =

(from s in cache.QueryObjects<Stock>()where s.Ticker == "GOOG" || s.Ticker == "ORCL"

select s)

.Invoke(new StockCalcParams(…), new Func<Stock, StockCalcParams, decimal>(eval))

.Merge(new Func<decimal, decimal, decimal>(merge));

Console.WriteLine(“The value of selected stocks is {0}",valueOfSelectedStocks);

Page 49: New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

17TH ~ 18th NOV 2014MADRID (SPAIN)