telco analytics at scale

12
Telco Analytics @Scale Harikumar, Director Platform & Architecture www.subex.com Nov , 2016 1

Upload: datamantra

Post on 08-Jan-2017

143 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Telco analytics at scale

www.subex.com

Telco Analytics @Scale

Harikumar, Director Platform & ArchitectureNov , 2016

1

Page 2: Telco analytics at scale

Private & Confidentialwww.subex.com2

Subex Intro

Page 3: Telco analytics at scale

Private & Confidentialwww.subex.com

Subex BSS/OSS Portfolio

3

Page 4: Telco analytics at scale

www.subex.com

Data Crunching - Use Cases & Latency

Real Time (Milliseconds)

Near Real Time (Seconds)

Micro Batch(Minutes)

Batch(Hours-Days)

Latency

Algorithmic Complexity

Reporting

Aggregation

Rule Engine

Profiling

Machine Learning

Audits

Graph/Network Analysis

Text Search

Natural Language Processing

Page 5: Telco analytics at scale

www.subex.com 5

Stream Processing & Complex Event Processing (CEP)Event Processing in the Eventful World• (Aggregated) Event

Data is combined/correlated with • Users• Assets• Threats• Vulnerabilities • Location• Historical

Techniques• Rule Engine

• Event filtering• Event aggregation and

transformation• Operate on stored and

streaming data

• SQL like semantics over stream data• Supervised/Unsupervised

machine learning• Applying known Models• Event Pattern Detection

• Detecting Event relationships

Areas• Real time fraud

detection.• Real time rating.• Security Information and

Event Management• Sensor Data/IOT • DPI Data – Metadata,

Content,Flow correlation.• M2M Data• Data Fraud – Malware • Transaction Risk Scoring

Page 6: Telco analytics at scale

www.subex.com 6

Stream processing

• Keep the data Moving (Low Latency) – In Memory• Distributed Message Queues• Distributed In Memory Caches• Distributed In Memory Stores

• Scalable, Highly Available Distributed stream Processing(Partition Data & Scale, Data safety & Highly Available) • Handle Stream Imperfections( Delayed, Missing, Out-Of-Order Data)

Key considerations

Page 7: Telco analytics at scale

www.subex.com 7

ETL @ Scale – In Memory Distributed Cache

• Problem Statement(s)• Scale the ETL

enrichment/lookups layer• High throughput +

Streaming Low latency• Support Multiple

Access mechanisms • GET/PUT• SQL• Views

JVMETL

JVMCache

JVMETL

JVMETL

JVMCache

JVMCache

RDBMS

Read Write Update

Page 8: Telco analytics at scale

www.subex.com 8

Rule n

Rule Engine – In Memory AggregationEvent / I/P Data Record

Rule 2

Rule 1

Event Filters

Filtered Records

Aggregation Layer

Condition Evaluation

Actions

Shared Memory

8K Page Pool

16K Page Pool

32K Page Pool

..256 K

Page Pool

Key / Value

Byte Stream

SerDEIM Log

Shared Memory

8K Page Pool

16K Page Pool

32K Page Pool

..256 K

Page Pool

Key / Value

Byte Stream

SerDEIM Log

Shared Memory

8K Page Pool

16K Page Pool

32K Page Pool

..256 K

Page Pool

Key / Value

Byte Stream

SerDEIM Log

Page 9: Telco analytics at scale

www.subex.com 9

Data placement Strategies

Application Data• Application configuration data– Rule libraries ,DNA

Configurations, Configurations – MySQL.• Application generated data – Alarms, Discrepancies –

MySQL• Operations Data (Application generated , Infra

Monitoring ) – Logs , Audit ,Metrics – Solr• Application Aggregations - Summary/Pre-aggregated

data – Hive Tables• Statistical Profiles, In Memory aggregation files –HDFS

Traditional Telco Data• Telco Entity Data – With Update Semantics –

HBase/MySQL• Telco Historic Transaction Data – Hive with ORC file

format Partitioned by Date Stored in HDFS• Switch Input Raw Files –HDFS

Other Sources• Social Media

• DPI Flow Data

• Location Data

• IOT Sensor Data

Page 10: Telco analytics at scale

www.subex.com 10

Spark Streaming

Application Data

Data Flow

Landing DirectorySAN/HDFS Apache

Flume Flume –

Spark Sink

Apache Kafka

In Memory Rule Engine Analytics Application

s …

Apache Spark Streaming

ETL Adaptors

Flume – Dir Source

Message Queue

Flume –Kafka Source

DB SourcesSqoop/CDC

Tools

HDFS – Raw File Backup

HDFS Hive Tables Hbase Tables Solr - Search Indexes

Audits

MySQL– Ref DB

HDFS

Hive Tables

Hbase Tables

Dist Message Queue

Data Lake

Submit Spark Jobs

Data AccessHive/Presto

Distributed Cache

Operational Metrics

Data Load Stage

OM

Spark Streaming

OM

Pre-aggregation

Page 11: Telco analytics at scale

Data Management

Data Platform – Business and Domain Packaging

11

Data Acquisition/Ingest

Data Federation F/W

Data Processing

Pre

Aggr

egati

on

Distributed Stream Processing Apache Spark

Data Visualization & Analysis

Mobile F/W

ROC View

Case Management

Standard APIs – EAI & WS

Analytics Engine

Reconciliation Engine BPM- Workflow

Engine

Flexible ETL Rule Processing - In

Memory

Common Data ModelDi

strib

uted

Cac

he

Control Panel

Operations & Admin

ResourceMgmt

Data Security

Audit & Logging

Scheduler

Network Analysis

ROC Insights

Real time Message based

Dist

ribut

ed

Mes

sage

Que

ue

Hadoop – HDFS, Hive , HBase

Multi -tenancy

Machine Learning

Enterprise Search

Real time Continuous Query - CEP

Document Store Graph Data Store

Authorization &

Authentication

Real time Rating

Profiling

Cloud Metering

Risk Scoring

Cloud connectors

API Mgmt

Infrastructure On premise OS/Servers/Network/StorageIaaS(Public /Private cloud)

ESB

Analytic Models

Page 12: Telco analytics at scale

www.subex.com 12

Thank youHarikumar

[email protected]