copyright © 2004, sas institute inc. all rights reserved. wayne embry technical account manager...

37
Copyright © 2004, SAS Institute Inc. All rights reserved. Wayne Embry Technical Account Manager March 17, 2005 Delivering Enterprise Value with SAS ® 9 Architecture: GRID COMPUTING and SAS

Upload: pierce-shaw

Post on 22-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Copyright © 2004, SAS Institute Inc. All rights reserved.

Wayne Embry

Technical Account Manager

March 17, 2005

Delivering Enterprise Value with SAS® 9 Architecture:

GRID COMPUTING and SAS

Copyright © 2004, SAS Institute Inc. All rights reserved. 2

Agenda Defining Grid

Why is Grid Computing Important?

Who’s Interested in Grid and Why?

SAS Technology Behind Grid

Packaging

Architecture

Supported Platforms

Summary

Copyright © 2004, SAS Institute Inc. All rights reserved. 3

Defining Grid in the IT World… According to Gartner, "a grid is a collection of

resources owned by multiple organizations that is coordinated to allow them to solve a common problem." Gartner (and Wayne) further define three commonly recognized forms of grid: *Computing Grid - multiple computers to solve one

application problem Data Grid - multiple storage systems to host one very

large data set Collaboration Grid - multiple collaboration systems for

collaborating on a common issue. Other:

Utility Grid – Resources are chosen for you; ASPs

Copyright © 2004, SAS Institute Inc. All rights reserved. 4

Why is Grid Computing Important to SAS?

SAS believes that 2005 will be the year customers begin to view grid computing as a practical solution to their business problems, so the timing is right for it to be an important focus.

Our ability to speak to our Grid capabilities will further positions our solutions and toolsets as enterprise class, and substantially differentiates our offerings from those of competitors. We will also be able to build additional enterprise credibility.

Proof: A recent IDC report projected that the grid computing market

may exceed $12 billion by 2007. Gartner reported that 56% of large IT customers had not been

contacted by a single vendor regarding Grid.

Copyright © 2004, SAS Institute Inc. All rights reserved. 5

Oracle

Copyright © 2004, SAS Institute Inc. All rights reserved. 6

Why is Grid Computing Important?

Grid computing leverages under-utilized and un-tapped computing resources to drastically reduce processing times which in turn saves money.

Grid computing allows organizations to further leverage their current IT investment by harnessing the collective processing power of existing computers to more rapidly solve complex problems and to run increasingly data-intensive applications.

IT spending continues to be substantially restricted while demands on the IT department continue to increase. Grid computing is a strategic alternative to resolve this dilemma, providing one of the biggest “bangs-for-the buck” in IT.

Copyright © 2004, SAS Institute Inc. All rights reserved. 7

Reality Check: Who’s Interested and Why?…

Frugal Phyllis

Title: CIO of a business unit of a large corporation

Report to: CEO of the business unit

Computing Skills: Advanced

Top ETL-related issues:1. Faced with processing ever-increasing volumes of data

2. Challenged to provide useable results in ever-shorter time-frames

3. Short on funds, especially for additional hardware

Copyright © 2004, SAS Institute Inc. All rights reserved. 8

Reality Check: Who’s Interested and Why?…

Al the Architect Title: Head Information Architect and “right-hand”

to CIO Report to: CIO of a business unit of a large

corporation Computing Skills: Expert Top ETL-related issues:

1. Charged with building fast and flexible architectures without spending much money

2. Needs to find ways to cope with more jobs, and larger jobs, all being squeezed into the same batch window

3.Would be nice if his solutions to the above could inspire the Enterprise as a whole, or at least integrate with their existing tools

Copyright © 2004, SAS Institute Inc. All rights reserved. 9

Reality Check: Who’s Interested and Why?…

Silo Sandy (somewhat similar to Frugal Phyllis)

Title: CEO (or Director) of a business unit of a large corporation

Report to: CEO of the Enterprise

Computing Skills: Average

Top ETL-related issues:1.Trying to build own information organization because

she is not satisfied with corporate IT

2.Needs to do so using only existing hardware resources

3.Needs solutions running quickly and with reliability and maintainability

Copyright © 2004, SAS Institute Inc. All rights reserved. 10

Reality Check: Who’s Interested and Why?…

And a user persona who influences the above buyers:

Forever Fred Title: Business Analyst (a.k.a Power User) Report to: Director or Sr. Manager of a business

unit of a large corporation Computing Skills: Power User Top ETL-related issues:

1.Takes too long to load data for his job, so he misses batch windows

2.Constantly being admonished for monopolizing system resources

3.“Beaten up” for not delivering reports fast enough

Copyright © 2004, SAS Institute Inc. All rights reserved. 11

Types of Applications Suitable for Grid

Long running jobs (batch window)

Many repetitive iterations of a fundamental task Simulation BY GROUP processing

Parallelism Independent tasks against large data sources

Scoring, Risk analysis Pipeline parallelism (Piping) Both

Copyright © 2004, SAS Institute Inc. All rights reserved. 12

RFID Data

Collector

RFID Data

Collector

RFID Data

Collector

RFID Data

Collector

REALTIME

SAP/R3

REALTIME REALTIME REALTIME

DB/2ORACLE

SYBASE

RFID COMPLEXITY

Copyright © 2004, SAS Institute Inc. All rights reserved. 13

SAS Technology Behind Grid – Today…Analytics Scenario

Base, Connect,….

Base, Connect,…

Base, Connect,….

n

Connect Client

%Distribute

SAS

Copyright © 2004, SAS Institute Inc. All rights reserved. 14

SAS Technology Behind Grid – Today…Data Integration Scenario

ETL Studio

SAS MC

Schedule Manager

SAS

Servers

Base Connect,….

Base, Connect, …..

Base, Connect,…..

n

Metadata Server

Workspace Server

Connect Client

LSF

Job Scheduler

Copyright © 2004, SAS Institute Inc. All rights reserved. 15

SAS Technology Behind Grid – 2005…Improving our Capabilities

Base, Connect,.....

LSF

Base, Connect,……

LSF

Base, Connect, ……

LSF

n

Connect Client

LSF

SAS

Server

Copyright © 2004, SAS Institute Inc. All rights reserved. 16

SAS Grid –2005…

ETL Studio

SAS MC

Schedule Manager

Grid Manager - New

SAS

Servers

Metadata Server

Workspace Server

Connect Client

LSF

Job Scheduler Base, Connect,…

LSF

Base, Connect,….

LSF

Base, Connect,.…

LSF

n

Enterprise Miner

Copyright © 2004, SAS Institute Inc. All rights reserved. 17

SAS 9 Packaging… Head Start – SAS\Connect is already included in

ETL Server and EETL Server

Any solution including ETL Server

Copyright © 2004, SAS Institute Inc. All rights reserved. 18

Supported Platforms…

Good News – Any platform that supports Base and Connect

Heterogeneous architecture

Copyright © 2004, SAS Institute Inc. All rights reserved. 19

Architecture Guidelines

There are guidelines to keep in mind when architecting SAS Grid environments:

Permanent data SASWORK

Data Accessibility - Where it is and how each of the machines on the grid are attached to it (NFS, SAN) greatly affects performance.

For help architecting SAS Grids, please call SAS Account Representative

Copyright © 2004, SAS Institute Inc. All rights reserved. 20

Example Grid Job 1

ETL StudioSAS

Server

Workspace Server

-Base

Connect

L8364 - 1 CPU (1.6 GHz; 2 GB RAM)

Base, Connect Data

Quality

Demo0505 – 2 CPU (3.06 GHz; 4 GB RAM)

Base, Connect Data

Quality

Demo0507 – 2 CPU (3.06 GHz; 4 GB RAM)

Customer

Orders_grid

Order_item_grid

Copyright © 2004, SAS Institute Inc. All rights reserved. 21

Example Grid Job 2

ETL StudioSAS

Server

Workspace Server

-Base

Connect

L8364 - 1 CPU (1.6 GHz; 2 GB RAM)

Base, Connect Data

Quality

Demo0505 – 2 CPU (3.06 GHz; 4 GB RAM)

Base, Connect Data

Quality

Demo0507 – 2 CPU (3.06 GHz; 4 GB RAM)

Orders_gridOrder_item_grid

LXYZ

SASWORK Customer

Copyright © 2004, SAS Institute Inc. All rights reserved. 22

An Example - The Scenario… Single Platform Job - Local_Complicated

Run locally on my laptop in sequential order Source Data – 3 local SAS tables:

– Customer: 16 Mb; 89,954 rows; 12 columns– Orders_grid: 214 Mb; 5,710,014 rows; 8

columns– Order_item_grid: 315 Mb; 4,487,718 rows; 7

columns Target – 1 local SAS table with 15 columns

Copyright © 2004, SAS Institute Inc. All rights reserved. 23

Local_Complicated Job

ETL StudioSAS

Server

Workspace Server

-Data Quality

-Base

L8364 - 1 CPU (1.6 GHz; 2GB RAM)

Order_item_grid

Orders_grid

Customer

Elapsed Wall Clock Time: 4

minutes

Copyright © 2004, SAS Institute Inc. All rights reserved. 26

Leveraging the Grid - The Scenario… Enable Job to Run on a SAS Grid -

Remote_Complicated Grid Strategies:

Independent parallelism – Independent data and processes

Pipeline parallelism Source Data:

2 remote SAS tables:– Orders_grid: 214 Mb; 5,710,014 rows; 8

columns– Order_item_grid: 315 Mb; 4,487,718 rows; 7

columns 1 local SAS table:

– Customer: 16 Mb; 89,954 rows; 12 columns Target – 1 local SAS table with 15 columns

Copyright © 2004, SAS Institute Inc. All rights reserved. 27

Remote_Complicated Job

ETL StudioSAS

Server

Workspace Server

-Base

Connect

L7875 - 1 CPU (1.6 GHz; 1 GB RAM)

Base, Connect Data

Quality

Demo0505 – 2 CPU (3.06 GHz; 4 GB RAM)

Base, Connect Data

Quality

Demo0507– 2 CPU (3.06 GHz; 4 GB RAM)

Customer

Orders_grid

Order_item_grid

Elapsed Wall Clock Time: 30

seconds

90% improvement!

Copyright © 2004, SAS Institute Inc. All rights reserved. 28

Performance Issues Competition answer to performance issues

Buy a bigger server (i.e., 32 way to a 64 way) Increase the number of RDMS instances (i.e., Oracle) More $$$$

SAS’ answer Grid computing leverages under-utilized and un-tapped

heterogeneous computing resources to drastically reduce processing times

Grid computing allows organizations to further leverage their current IT investment by harnessing the collective processing power of existing computers

Save $$$$

Copyright © 2004, SAS Institute Inc. All rights reserved. 29

Architecture Guidelines

There are guidelines to keep in mind when architecting SAS Grid environments:

Permanent data SASWORK

Data Accessibility - Where it is and how each of the machines on the grid are attached to it (NFS, SAN) greatly affects performance.

Copyright © 2004, SAS Institute Inc. All rights reserved. 30

How is it Set Up? The SAS Technology Behind the Scenario…

Components and Considerations: Base, SAS/Connect ETL Studio Metadata Server Data Quality

Copyright © 2004, SAS Institute Inc. All rights reserved. 31

GRID ETL JOB

Copyright © 2004, SAS Institute Inc. All rights reserved. 32

GRID STATS

Copyright © 2004, SAS Institute Inc. All rights reserved. 34

Connect Servers and Spawners

Copyright © 2004, SAS Institute Inc. All rights reserved. 35

Connect Servers and Spawners

Copyright © 2004, SAS Institute Inc. All rights reserved. 36

Libraries

Copyright © 2004, SAS Institute Inc. All rights reserved. 37

Logins

Copyright © 2004, SAS Institute Inc. All rights reserved. 38

Closing Thoughts… Mileage may vary

Next step in evolving the SAS9 Platform

Enterprise credibility

Competition Buy more servers and license more DBMS instances These 50 jobs will use this server, these 30 jobs run on

this server…. Manageability

BI – Stored processes

EMiner and LSF Integration ITMS – ITRM will have a generic collector to collect

LSF performance data

Copyright © 2004, SAS Institute Inc. All rights reserved. 39

Collateral… White Papers

SUGI29 - http://support.sas.com/rnd/scalability/papers/sugi29_grid.pdf

Connect Syntax - http://support.sas.com/rnd/scalability/papers/mpconnect0401.pdf

%DISTRIBUTE –http://support.sas.com/rnd/scalability/papers/distConnect0401.pdf

Web Site http://support.sas.com/rnd/scalability/grid/index.html

Customer Reference Stories http://support.sas.com/rnd/scalability/grid/gridcust.html

Copyright © 2004, SAS Institute Inc. All rights reserved. 40Copyright © 2003, SAS Institute Inc. All rights reserved. 40

Questions ?