from archive to insight debunking myths of analytics on object stores

Post on 05-Aug-2015

99 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

From Archive to Insight:Debunking Myths of Analytics on Object Stores

Dean HildebrandIBM Research

Bill OwenIBM

Simon LorenzIBM

Rui ZhangIBM Research

Luis PabonRed Hat Storage

“Data Must Migrate from Swift to HDFS”

(and back again)”

Myth #1

Myth 1

“Swift should only be used with in-memory analytics (Spark)”

Myth #2

Myth 2

“Swift cannot efficiently support frameworks such as Hive and

HBase that require appending to a file”

Myth #3

Myth 3

“Object Stores are slow for analytics”

Myth #4

Myth 4

Load Imbalance

Unnecessary Data Movement

HTTP vs. RPC

Writing Through Proxy

Authentication

These may be true for Swift...but

Swift-on-File debunks the myths

Demo Now

So What Happened There...

10

HDFS

Typical Hadoop+HDFS Setup

Hadoop FS API

All Standard Apache Open-Source Components

MapReduce

Spark HBaseZookeeper

FlumePig

HCatalogSqoop

Solr/Lucene

Hive

HDFS HDFS HDFS HDFS HDFS HDFS HDFS

Create Copy In ⟹ ⟹ Analyze ⟹ Copy out

4 StepsMove data

twice

Hadoop FS API

All Standard Apache Open-Source Components

MapReduce

Spark HBaseZookeeper

FlumePig

HCatalogSqoop

Solr/Lucene

Hive

11

IBM Spectrum Scale or GlusterFS Hadoop Connector

Now replace HDFS with Scale-out FS

Scale-out File System

12

Now add in Swift access

Scale-out File System

IBM Spectrum Scale or GlusterFS Hadoop Connector

Hadoop FS API

All Standard Apache Open-Source Components

MapReduce

Spark HBaseZookeeper

FlumePig

HCatalogSqoop

Solr/Lucene

Hive

Proxy

Data Ingest and Results Distribution

ObjectDiskFile

Swift

Scale-out File System

IBM Spectrum Scale or GlusterFS Hadoop Connector

Hadoop FS API

All Standard Apache Open-Source Components

MapReduce

Spark HBaseZookeeper

FlumePig

HCatalogSqoop

Solr/Lucene

Hive

13

Proxy

Data Ingest and Results Distribution

Swift

Swift-on-File Policy

ObjectSwiftOnFile DiskFile

Now configure Swift-on-File

2 StepsNever

Move Data

Create Analyze⟹

Swift-on-File

● Swift Storage Policy● Stores objects on any scale-out

filesystems● Allows objects created using Swift API

to be accessed as files● Maps URL to file path

Swift-on-File Storage Policy

This object:http://swift.example.com/v1/acct/cont/obj

is located here:/mnt/swift/z1device7/objects/63773/ba2/

f91d1e7550cd32822a17b00fa86d9ba2/1414045361.93852.data

Now, this object:http://swift.example.com/v1/acct/cont/obj

is located here:/mnt/scaleout_fs/acct/cont/obj

18

Data Must Migrate from Object Store to HDFS

Myth

BUSTE

D RealityAnalyze in Place!

19

Object Stores should only be used with in-memory analytics

(Spark)

Myth

BUSTE

D RealitySupport entire Apache analytics

ecosystem with high-performance

20

Object Stores cannot efficiently support frameworks such as Hive and HBase that require

appending to a file

Myth

BUSTE

DReality

Support all POSIX operations, including append

21

Object Stores are slow for analytics

Myth

BUSTE

D RealityScale-out File System can match HDFS features and performance

© 2015 IBM Corporation

22

Swift-on-File Additional Use CasesScientific Analysis and Collaboration

Scale-out File SystemSwiftNAS

Share data to Global Scientific

Community

POSIX

Generate Scientific datasets through

FS

© 2015 IBM Corporation

23

Swift-on-File Additional Use CasesSupport General File-based Applications

Scale-out File SystemSwiftNAS

Global Ingest and

Access

POSIX

In-Place Editing

Future Plans

- Single Swift Proxy/Object process optimization

- Eliminating data movement in Scale-out File Systems

- Auditor support

- Multi-region support

- Load balancing of auxiliary Swift services across scale out file system

Summary➔ Gain insights faster

➔ Stop copying data

➔ High-performance analysis

➔ Leverage entire Apache analytics ecosystem

Q&A

Credits

Special thanks to all the people who made and released these awesome resources for free:▷ Presentation template by

SlidesCarnival▷ Photographs by Unsplash

“● Load imbalance due to lack of auto-segmentation● Rename causes data movement in Swift● Use of HTTP vs RPCs in HDFS● New objects must write through Proxy servers● Authentication overhead

BELOW THIS SLIDEIS THE TEMPLATE

EXAMPLESTemplate from:

http://www.slidescarnival.com/antonio-free-presentation-template/84

What’s this?This is a free presentation template for Google Slides designed by SlidesCarnival.We believe that good design serves to better communicate ideas, so we create free quality presentation templates for you to focus on the content.Enjoy them at will and share with us your results at:twitter.com/SlidesCarnivalfacebook.com/slidescarnival

About this templateHow can I use it?Open this document in Google Slides (if you are at slidescarnival.com use the button below this presentation)You have to be signed in to your Google account

▷ Edit in Google SlidesGo to the File menu and select Make a copy. You will get a copy of this document on your Google Drive and will be able to edit, add or delete slides.

▷ Edit in Microsoft PowerPoint®Go to the File menu and select Download as Microsoft PowerPoint. You will get a .pptx file that you can edit in PowerPoint. Remember to download and install the fonts used in this presentation (you’ll find the links to the font files needed in the Presentation design slide)

This template is free to use under Creative Commons Attribution license. If you use the graphic assets (photos, icons and typographies) provided with this presentation you must keep the Credits slide.

From Archive to Insight:Debunking a ofAnalytics on Object Stores

Hello!I am Jayden SmithI am here because I love to give presentations.

You can find me at:@username

1.TRANSITION HEADLINE

Let’s start with the first set of slides

“Quotations are commonly printed as a means of

inspiration and to invoke philosophical thoughts from

the reader.

This is a slide title

▷ Here you have a list of items▷ And some text▷ But remember not to overload

your slides with content

You audience will listen to you or read the content, but won’t do both.

Big conceptBring the attention of your audience over a key concept using icons or illustrations

WhiteIs the color of milk and fresh snow, the color produced by the combination of all the colors of the visible spectrum.

You can also split your content

BlackIs the color of coal, ebony, and of outer space. It is the darkest color, the result of the absence of or complete absorption of light.

In two or three columns

YellowIs the color of gold, butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange.

BlueIs the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum.

RedIs the color of blood, and because of this it has historically been associated with sacrifice, danger and courage.

A picture is worth a thousand words

A complex idea can be conveyed with just a single still image, namely making it possible to absorb large amounts of data quickly.

Want big impact?Use big image.

Use charts to explain your ideas

GrayWhite Black

And tables to compare data

A B C

Yellow 10 20 7

Blue 30 15 10

Orange 5 24 16

Maps

our office

89,526,124Whoa! That’s a big number, aren’t you proud?

89,526,124$That’s a lot of money

100%Total success!

185,244 usersAnd a lot of users

Our process is easy

FirstSecon

dLast

Let’s review some concepts

YellowIs the color of gold, butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange.

BlueIs the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum.

RedIs the color of blood, and because of this it has historically been associated with sacrifice, danger and courage.

YellowIs the color of gold, butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange.

BlueIs the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum.

RedIs the color of blood, and because of this it has historically been associated with sacrifice, danger and courage.

You can copy&paste graphs from Google Sheets

Android projectShow and explain your web, app or software projects using these gadget templates.

Place your screenshot here

Place your screenshot here

iPhone projectShow and explain your web, app or software projects using these gadget templates.

Place your screenshot here

Tablet projectShow and explain your web, app or software projects using these gadget templates.

Place your screenshot here

Desktop projectShow and explain your web, app or software projects using these gadget templates.

Thanks!Any questions?You can find me at:@usernameuser@mail.me

Presentation designThis presentations uses the following typographies and colors:

▷ Titles: Raleway▷ Body copy: Lato

You can download the fonts on this page:https://www.google.com/fonts#UsePlace:use/Collection:Lato:400,700,400italic,700italic|Raleway:400,700Click on the “arrow button” that appears on the top right

▷ Dark blue #2185c5▷ Light blue #7ecefd▷ Yellow #ff9715▷ Magenta #f20253▷ Dark gray #677480▷ Light gray #97abbc

You don’t need to keep this slide in your presentation. It’s only here to serve you as a design guide if you need to create new slides or download the fonts to edit the presentation in PowerPoint®

SlidesCarnival icons are editable shapes.

This means that you can:● Resize them without losing

quality.● Change fill color and opacity.● Change line color, width and

style.

Isn’t that nice? :)

Examples:

top related