spectrum scale object analytics

54
#ibmedge © 2016 IBM Corporation Analytics for Object Storage Simplified - Spectrum Scale Object Storage with File Access for Hadoop Sandeep Patil, STSM, IBM Spectrum Scale Tomer Perry, Solution Architect, IBM Spectrum Scale Smita Raut , Object Development, IBM Spectrum Scale Acknowledgement : Bill Owen, Ashutosh Mate, Shou Feng, John Gu, Yong Zeng, Piyush Chaudhary, Wei Gong

Upload: smita-raut

Post on 13-Jan-2017

49 views

Category:

Software


2 download

TRANSCRIPT

PowerPoint Presentation

Analytics for Object Storage Simplified- Spectrum Scale Object Storage with File Access for HadoopSandeep Patil, STSM, IBM Spectrum ScaleTomer Perry, Solution Architect, IBM Spectrum ScaleSmita Raut , Object Development, IBM Spectrum Scale Acknowledgement : Bill Owen, Ashutosh Mate, Shou Feng, John Gu, Yong Zeng, Piyush Chaudhary, Wei Gong

#ibmedge 2016 IBM Corporation

#ibmedgeAgendaIntroduction to Spectrum ScaleIntroduction to Spectrum Scale AnalyticsIntroduction to Spectrum Scale Object StoreUnified File & Object Access (UFO) Feature DetailsUse Cases Enabled By UFODeep Dive of In-Place Analytics Use CaseDemoQ & A1

#ibmedge2

#ibmedge3

#ibmedge4

#ibmedge5

#ibmedge6

#ibmedge7

#ibmedgeSpectrum Scale Analytics Introduction8

#ibmedge9

#ibmedge10

#ibmedge11

#ibmedge12

#ibmedge13

#ibmedge14

#ibmedgeGPFS-FPO Advanced Storage for Map Reduce Data 15Hadoop HDFSIBM GPFS AdvantagesHDFS NameNode is a single point of failureLarge block-sizes poor support for small filesNon-POSIX file system obscure commandsDifficulty to ingest data special tools requiredSingle-purpose, Hadoop MapReduce onlyNot recommended for critical data No single point of failure, distributed metadata

Variable block sizes suited to multiple types of data and data access patterns

POSIX file system easy to use and manage

Policy based data ingest

Versatile, Multi-purpose

Enterprise Class advanced storage features

#ibmedgeUse Case: Big Data AnalyticsProblem: Separate storage systems for ingest/distribution and analysisData movement overhead is a significant part of my time to insight.Increased cost from data duplication & overheadInconsistent results

Solution: Native HDFS supportDecreased time to resultsRun Map/Reduce directlyNo waiting for data transfer between storage systemsImmediately share results16Spectrum Scale

File/ ObjectFile/HDFS

Global Ingest and Distribution

BusinessAnalyticsCustomApplicationsPackagedApplications

#ibmedgeSpectrum Scale Object Storage Introduction17

#ibmedge

IBM Spectrum ScaleAvoid vendor lock-in with true Software Defined Storage and Open StandardsSeamless performance & capacity scalingAutomate data management at scaleEnable global collaborationData management at scaleOpenStack and Spectrum Scale helps clients manage data at scale

Business: I need virtually unlimited storage

Operations: I need a flexible infrastructure that supports both object and file based storage

Operations: I need to minimize the time it takes to perform common storage management tasks

Collaboration: I need to share data between people, departments and sites with low latency.

A single data plane that supports Cinder, Glance, Swift, Manila as well as NFS, et. al.

A fully automated policy based data placement and migration toolAn open & scalable cloud platform

Sharing with a variety of WAN caching modes

ResultsConverge File and Object based storage under one roofEmploy enterprise features to protect data, e.g. Snapshots, Backup, and Disaster RecoverySupport native file, block and object sharing to data.

Spectrum ScaleNFSSMB

POSIXSSDFastDiskSlowDiskTape

SwiftHDFSCinderGlanceManila

CognitiveServices18

#ibmedgeSpectrum Scale Object StorageBasic support added in 4.1.1 release & enhanced in 4.2 and 4.2.1 releaseBased on Openstack Swift (Juno Release)REST-based data accessGrowing number of clients due to extremely simple protocolApplications can easily save & access data from anywhere using HTTPSimple set of atomic operations:PUT (upload)POST (update metadata)GET (download)DELETEAmazon S3 Protocol supportHigh Availability with CES IntegrationSimple and Automated Installation Process Integrated authentication (Keystone) supportNative GPFS Command Line Interface to manage Object service (mmobj command)

19

#ibmedgeSpectrum Scale Object Store Additional Features Unified file and object support with Hadoop connectors Support for Encryption Support for Compression Only Object Store with Tape support for Backup Object store with integrated transparent cloud tiering Support Multi Region support AD/LDAP support for authenticationILM support for Object Movement of Object across storage tiers based on access heat Spectrum Scale Object with IBM DeepFlash becomes object store over all flash array for newer faster workloads. Spectrum Scale Object with WAN caching support (AFM)20

#ibmedgeSpectrum Scale Object Vs Cleversafe21

#ibmedge

The right solution for the workload 22

Ideal WorkloadsBig Data AnalyticsHigh Performance Computing, e.g. Engineering ApplicationsPerformance optimized Backup and RestoreMulti-Site file collaboration Multi-tier File Synch and Share Cold data archive with lowest cost data storage tier DifferentiationDesigned for high performanceUnified Storage Infrastructure: Native File, Object & HadoopRobust Tiering with policy based data placement and data movementMulti site collaboration with advanced routing and cachingEnterprise Features, e.g. Encryption, Compression, QoS, & Disaster Recovery

Ideal WorkloadsActive Archive (warm data, mostly static) Cost optimized Cloud backup target Web app contentRemote office storage consolidationStorage as a service DifferentiationDesigned for easy deployment and management at scaleAlways-on architectureGeo-dispersed erasure coding for site fault tolerance and DRSimple keyless native encryption and multi-tenant securityReduced cost and complexity Spectrum Scale

IBM Cloud Object Store (Cleversafe)

#ibmedgeIBM Spectrum Scale: Unified File and Object Access Feature Overview23

#ibmedgeUnified File & Object (UFO) SupportChallengeThe world is not converged/file/object/HDFS today!and never will be completely

Unified Scale-out Content RepositoryFile or object in. Object or file out.Integrated big data analytics supportNative protocol supportHigh-performance that scalesSingle Management Plane24

Spectrum ScaleNFSSMB

POSIXSSDFastDiskSlowDiskTape

Swift/S3HDFS

Spectrum Scale: Redefining Unified Storage

#ibmedgeSpectrum Scale Unified File & ObjectAccess same content both as a File & as an Object without making a copy or needing File or Object Gateways!File-In-Object-Out and Object-In-File-Out SupportSupport for File Access Protocols (NFS/SMB/POSIX) and Object Access Protocols (Swift/S3)Objects ingested into designated Unified Container available as Files and Files ingested into it available as Objects.Support for File & Object ACLs with Unified Mode ID Mapping25

#ibmedgeUnified File and Object Access What is it ?26

#ibmedgeWhat is Unified File and Object Access ?Accessing object using file interfaces (SMB/NFS/POSIX) and accessing file using object interfaces (REST) helps legacy applications designed for file to seamlessly start integrating into the object world.It allows object data to be accessed using applications designed to process files. It allows file data to be published as objects.Multi protocol access for file and object in the same namespace (with common User ID management capability) allows supporting and hosting data oceans of different types of data with multiple access options.Optimizes various use cases and solution architectures resulting in better efficiency as well as cost savings.27

Swift (With Swift on File)

NFS/SMB/POSIXObject(http)

2 1

File Exports created on container levelORPOSIX access from container level Objects accessed as FilesData ingested as Objects 3Data ingested as Files 4Files accessed as Objects

#ibmedge Flexible Identity Management ModesSupports Two Identity Management Modes Administrators can choose based on their need and use-case using CLI -------------->

28#mmobj config change --ccrfile object-server-sof.conf --section DEFAULT --property id_mgmt --value unified_mode | local_modeLocal_ModeUnified_Mode Identity Management Modes Object created by Object interface will be owned by internal swift user Application processing the object data from file interface will need the required file ACL to access the data.Object authentication setup is independent of File Authentication setupObject created from Object interface should beowned by the user doing the Object PUT (i.eFILE will be owned by UID/GID of the user)Users from Object and File are expected to be common auth and coming from same directory service (only AD+RFC 2307 or LDAP)Owner of the object will own and have access to the data from file interface.Suitable for unified file and object access for end users. Leverage common ILM policies for file and object data based on data ownershipSuitable when auth schemes for file and object are different and unified access is for applications

#ibmedgeUse Cases Enabled by Unified File Object29

#ibmedgeUse case 1 Enabling In-Place analytics for Object data repository with analytic results available as objects 30

Clustered file system/

Object(http)

Data ingested as Objects

Spark or Hadoop MapReduceIn-Place AnalyticsSource:https://aws.amazon.com/elasticmapreduce/Traditional object store Data to be copied from object store to dedicated cluster , do the analysis and copy the result back to object store for publishing

Object store with Unified File and Object Access Object Data available as File on the same fileset. Analytics systems like Hadoop MapReduce or Spark allow the data to be directly leveraged for analytics.

No data movement i.e. In-Place immediate data analytics.

Analytics With Unified File and Object AccessAnalytics on Traditional Object StoreExplicit Data movement

Results Published as Objects withno data movementResults returnedin place

#ibmedgeUse case 2 : Process Object Data with File-Oriented Applications and Publish Outcomes as Objects 31

Swift on file

Container1

Virtual Machine InstancesVirtual Machine InstancesContainer2

Subsidiary 1Subsidiary 2

NFS Exporton Container 1NFS Exporton Container 2

Virtual Machine Instances

Virtual Machine InstancesVM Farm for Subsidiary 1for video processingVM Farm for Subsidiary 2for video processing..

IngestMedia ObjectsMedia House OpenStack Cloud Platform(Tenant = Media House Subsidiaries)Manila Shares (NFS) exported only for Subsidiary1Publishing ChannelsFinal Video (as objects)available for streamingFinal processed videos available asObjects in container which is used for external publishing

Raw media content sent for media processing which happens over files(Object to File access)

NFS Export on Container 1Container 1Manila Shares (NFS) exported only for Subsidiary2Files converted into objects for publishing(File to Object access)

#ibmedgeUse case 3 : Users read/write data via File and Object with Common User Authentication and Identity 32Clustered file systemDataNFSSMBObjectDataNFSSMBObject

User: John

User: RiyaAccess Common Data using the same User Credentials across all protocols

Corporate User Directory(Active Directory/LDAP)Riyas data Read/Written from Object should be owned by Riya when accessed from File (SMB/NFS/POSIX) User: RiyaUID: 1001GID: 2000Domain: XYZ

#ibmedgeDeep Dive on In-Place Analytics Use Case33

#ibmedgeAnalytics use case 34

#ibmedgeWhat is In-place Analytics ?35

#ibmedgeSetup Details 36/dev/dm-3

viknode1Roles Admin, quorum, NSDviknode2Roles Quorum,NSD, CES Nodeviknode3Roles Quorum,CES Node

Spectrum Scale ClusterIBM BigInsight with Spectrum Scale Demo Setup/dev/dm-2

Disks

Ambari Server

IBM BigInsights

YarnSparkHiveOozieSliderKnox

#ibmedgePrerequisites For Demo 37

#ibmedgeDemo Content 38

#ibmedgeDemo

#ibmedgeSpectrum Scale User GroupThe Spectrum Scale User Group is freeto join and open to all using, interestedin using or integrating Spectrum Scale.Join the User Group activities to meetyour peers and get access to expertsfrom partners and IBM.Next meetings:- APAC: October 14, Melbourne- Global at SC16 : November 13 1pm to 5pm, Salt Lake CityWeb page:http://www.spectrumscale.org/ Presentations:http://www.spectrumscale.org/presentations/ Mailing list:http://www.spectrumscale.org/join/ Contact:http://www.spectrumscale.org/committee/ Meet Bob Oesterlin (US Co-Principal) at Edge2016: [email protected]

#ibmedgeSession : Futures of IBM Spectrum Scale

NDA & Customers ONLY Who: IBM Spectrum Scale Offering Management Carl Zetie, Ron Riffe When: Tuesday, September 20, 2016 1pm to 2pm Where: MGM Grand, Signature Tower 3 Meeting Room D Contact (if any questions) [email protected], [email protected]

#ibmedgeSession : How to apply Flash benefits to big data analytics and unstructured data

NDA & Customers ONLYWho: IBM Elastic Storage Server Offering Management Alex ChenWhen: Thursday, September 22, 20161:15pm to 2:15pmWhere: Grand Garden Arena, Lower Level, MGM, Studio 10Contact(if any questions) [email protected], [email protected]

#ibmedgeTrial VMDownload the IBM Spectrum Scale Trial VM from : http://www-03.ibm.com/systems/storage/spectrum/scale/trial.html

43

#ibmedgeReferencesWrite a File, read as an Object: Openstack Summit, Austin, TX Apr 2016https://www.youtube.com/watch?v=6ovLb6aktbM&feature=youtu.be&t=2

Amalgamating Manila and Swift for Unified Data Sharing: Openstack Summit, Austin, TX Apr 2016https://www.youtube.com/watch?v=3MMrMUaA_Mg

Hadoop HDFS Vs Spectrum Scale: https://www.youtube.com/watch?v=kOeEbdO8F4A

From Archive to Insight: Debunking Myths of Analytics on Object Stores Dean Hildebrand, Bill Owen,Simon Lorenz, Luis Pabon, Rui Zhang. Vancouver Summit, Spring 2015.https://www.youtube.com/watch?v=brhEUptD3JQ Deploying Swift on a File System Bill Owen, Thiago Da Silva. BrownBag at OpenStack Paris, Fall 2014https://www.youtube.com/watch?v=vPn2uZF4yWo

Breaking the Mold with OpenStack Swift and GlusterFS Jon Dickinson, Luis Pabo. Atlanta Summit, Spring 2014https://www.youtube.com/watch?v=pSWdzjA8WuA

SNIA SDC 2015 http://www.snia.org/sites/default/files/SDC15_presentations/security/DeanHildebrand_Sasi__OpenStack%20SwiftOnFile.pdf

Spectrum Scale Infocenter http://www.ibm.com/support/knowledgecenter/#!/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_manageunifiedaccess.htm

44

#ibmedgeOpenStack Summit 2016: IBM Spectrum Scale in an OpenStack Environment Redpaper Published.

45

http://www.redbooks.ibm.com/abstracts/redp5331.html

#ibmedgeThank You

2016 IBM Corporation#ibmedge

#ibmedgeIBM Spectrum Scale - Unified File and Object Access Feature OverviewMulti protocol access for file and object in the same namespace Access object as file from POSIX, NFS and SMBAccess file as objectProvision to convert files to object automatically via background service called objectizerProvision to explicitly and immediately convert files to objects using CLIFeature is specifically made available as an object storage policyAllows to coexists with traditional object and other policiesCreate multiple unified file and object access policiesSince policies are applicable per container , it gives end user the flexibility to create certain containers with Unified File and Object Access policy and certain without it.Flexible Identity Management Mode SupportLocal Mode: Suitable when auth schemes for file and object are different and unified access is for applicationsObject created by Object interface will be owned by internal swift user Unified Mode: Suitable for unified file and object access by end users. Leverage common ILM policies for file and object data based on data ownership.Object created from Object interface should be owned by the user doing the Object PUT (i.e. FILE will be owned by UID/GID of the user)Ability to run in-place analytics of object data using Spectrum Scale Hadoop connectors via POSIX interface.

47

#ibmedgeFilesystem Layout (Traditional Vs Unified File and Object Access)One of the key advantages of unified file and object access is the placement and naming of objects when stored on the file system. In unified file and object access stores objects following the same path hierarchy as the object's URL. In contrast, the default object implementation stores the object following the mapping given by the ring, and its final file path cannot be determined by the user easily. 48

ibm/gpfs0/Object ingest

object_fileset/o/z1device108/objects/7551/12575fc66179f12dc513580a239e92c3125

a.jpg

a.jpgObject ingestibm/gpfs0///AUTH_acctID/cont/a.jpg

Traditional SWIFT Unified File and Object AccessIngest object URL: https://swift.example.com/v1/acct/cont/a.jpg

#ibmedgeEasy Access Of Objects as Files via supported File Interfaces (NFS/SMB/POSIX)Objects ingested are available immediately for File access via the 3 supported file protocols.ID management modes (explained later) gives flexibility of assigning/retaining of owners, generally required by file protocols. Object authorization semantics are used during object access and file authorization semantics are used during file access of the same data thus ensuring compatibility of object and file applications49

/

NFS/SMB/POSIXObject(http)

2 1

File Exports created on container levelORPOSIX access from container level Objects accessed as FilesData ingested as Objects

#ibmedgeObjectization Making Files as Objects (Accessing File via Object interface) Spectrum Scale 4.2 features with a system service called ibmobjectizer responsible for objectization.Objectization is a process that converts files ingested from the file interface on unified file and object access enabled container path to be available from the object interface. When new files are added from the file interface, they need to be visible to the Swift database to show correct container listing and container or account statistics.50

Spectrum Scale FilesystemUnified File and ObjectFileset

NFS/SMB/POSIXObject(http)

ibmobjectizer

objectization

1 2 3Data ingested as FilesFiles accessed as Objects

#ibmedgeUnified File and Object Access Policy Integration for Flexibility

This feature is specifically made available as an object storage policy as it gives the following advantages:Flexibility for administrator to manage unified file and object access separately Allows to coexists with traditional object and other policiesCreate multiple unified file and object access policies which can vary based on underlying storageSince policies are applicable per container , it gives end user the flexibility to create certain containers with Unified File and Object Access policy and certain without it.

Example: mmobj policy create SwiftOnFileFS --enable-file-access

51

#ibmedgeNotices and Disclaimers52Copyright 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided. IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customers responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customers business and any actions the customer may need to take to comply with such laws. IBMdoes not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law

#ibmedgeNotices and Disclaimers Cont. 53Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBMs products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com, Aspera, Bluemix, Blueworks Live, CICS, Clearcase, Cognos, DOORS, Emptoris, Enterprise Document Management System, FASP, FileNet, Global Business Services , Global Technology Services , IBM ExperienceOne, IBM SmartCloud, IBM Social Business, Information on Demand, ILOG, Maximo, MQIntegrator, MQSeries, Netcool, OMEGAMON, OpenPower, PureAnalytics, PureApplication, pureCluster, PureCoverage, PureData, PureExperience, PureFlex, pureQuery, pureScale, PureSystems, QRadar, Rational, Rhapsody, Smarter Commerce, SoDA, SPSS, Sterling Commerce, StoredIQ, Tealeaf, Tivoli, Trusteer, Unica, urban{code}, Watson, WebSphere, Worklight, X-Force and System z Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

#ibmedge

53