hadoop for carrier

18
Welcome to Flytxt Leveraging Hadoop Cluster for Carrier grade application Copyright © 2011 Flytxt B.V. All rights reserved. 06/06/2022

Upload: flytxt-bv

Post on 05-Dec-2014

996 views

Category:

Technology


0 download

DESCRIPTION

Harnessing Hadoop for Big Data, Series II

TRANSCRIPT

Page 1: Hadoop for carrier

04/10/2023

Welcome to FlytxtLeveraging Hadoop Cluster for Carrier grade application

Copyright © 2011 Flytxt B.V. All rights reserved.

Page 2: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 2

Service discovery

No Personalization

Page 3: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 3

Mammoth DataData Analysis

600- 800 GB of CDR per day◦ GPRS Signaling 50GB/day◦ 3G Signaling 300GB/day◦ Voice 100GB/day◦ SMS 200GB/day

100 - 200 GB/day of Web Data

Page 4: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 4

Page 5: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 5

Page 6: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 6

Framework for distributed processing of large data sets across clusters

Consists of ◦ Hadoop Distributed File System aka HDFS (File system)◦ Hadoop MapReduce (programming model )

Characteristics ◦ Performance shall scale linearly ◦ Compute should move to data◦ Simple core, Modular and Extensible

What is Hadoop

Page 7: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 7

Current Bottleneck

◦ Data resides in multiple nodes/zones/VM instance & no elegant, reliable and efficient way of extracting data

◦ Loading terabytes of data into database is slow

◦ Parallel computing not a possibility in Conventional BI ETL

◦ User profile and application data resides in DB which can scale only vertically

ETL aka Extract Transform & Load

Page 8: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 8

Structured Data

sqoop --connect jdbc:mysql://db.example.com/website --table USERS --as-sequencefile

Un Structured Data

ETL The Hadoop Way

Page 9: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 9

A Distributed data Collection server◦ Scalable◦ Configurable ◦ Extensible ◦ Manageable

Built around the concept of flows◦ A single flow corresponds to a type of data source◦ Supports compression, batching & reliability setups per flow

Data come in through a source◦ Optionally processed by one or more decorators◦ And transmitted out via sink

Flume

Page 10: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 10

Flume

Page 11: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 11

Hadoop Storage system

Page 12: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 12

Map Reduce is very powerful, but:◦ It requires a Java programmer◦ User has to re-invent common◦ functionality (join, filter, etc.)

Execution engine atop Hadoop

Pig provides a higher level language Pig Latin

Opens the system to non-Java programmers

Provides common operations like join, group, filter, sort

Pig

Page 13: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 13

Web log processing. Data processing for web search platforms. Ad hoc queries across large data sets. Rapid prototyping of algorithms for processing large data

sets. Pig runs on local machine and job gets executed in hadoop

cluster $ cd /usr/share/cloudera/pig/ $ bin/pig –x local grunt>

Log = LOAD ‘excite-small.log’ AS (user, timestamp, query); grpd = GROUP log BY user; cntd = FOREACH grpd GENERATE group, COUNT(log); STORE cntd INTO ‘output’;

Pig usage

Page 14: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 14

System for querying and managing structured data Built on top of hadoop Uses map reduce for execution SQL like syntax; supports

◦ From clause subquery◦ ANSO Join (equi join )◦ Multi-table insert◦ Multi group-by◦ Sampling◦ Object traversal

Engagement◦ Summarization◦ Ad hoc analysis◦ Spam detection

Hive

Page 15: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 15

Hive: component

Page 16: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 16

Feature Hive Pig

Language SQL-like PigLatin

Schemas/Types Yes (explicit) Yes (implicit)

Partitions Yes No

Server Optional(thirft) No

User Defined Functions Yes Yes

Custom Serializer/Deserializer Yes Yes

DFS Direct Access Yes (implicit) Yes (explicit)

Join/Order/Sort Yes Yes

Shell Yes Yes

Streaming Yes No

Web Interface Yes No

JDBC/ODBC Yes (limited) No

Page 17: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 17

Page 18: Hadoop for carrier

04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved.

18

Thank you