parallax - a new operating system for scalable, distributed, and parallel computing

11
Private & Confidential Dr. Rao Mikkilineni, Kawa Objects Ian Seyler, Return Infinity May 16, 2011 Parallax - A New Operating System for Scalable, Distributed, and Parallel Computing DIME Network Architecture (DNA) for a New Generation of Many-core Computing SMTPS11 Stop Start

Upload: jeroen

Post on 11-Jan-2016

25 views

Category:

Documents


1 download

DESCRIPTION

DIME Network Architecture (DNA) for a New Generation of Many-core Computing. Parallax - A New Operating System for Scalable, Distributed, and Parallel Computing. Stop. Start. SMTPS11. Agenda. The hardware upheaval and the von Neumann Bottleneck - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Parallax - A New Operating System for Scalable, Distributed, and Parallel Computing

Private & Confidential

Dr. Rao Mikkilineni, Kawa Objects

Ian Seyler, Return Infinity May 16, 2011

Parallax - A New Operating System for Scalable, Distributed,

and Parallel Computing

DIME Network Architecture (DNA)

for a New Generation of

Many-core Computing

SMTPS11

Stop Start

Page 2: Parallax - A New Operating System for Scalable, Distributed, and Parallel Computing

Private & Confidential 2

Agenda

• The hardware upheaval and the von Neumann Bottleneck

• Possible Solution using a Parallel DIME™ network computing model with telecom grade trust

• Parallax – A potentially new Operating System (OS)

• Proof of concept demo

The history of the evolution of current OSs is filled with lessons on wasted billions (does anyone remember Multics or OS2?), unmet expectations (who would have thought UNIX, the original System V, would vanish), surprise winners (Windows and Linux), and stealthy survivors (Mach in a Mac)

Page 3: Parallax - A New Operating System for Scalable, Distributed, and Parallel Computing

Private & Confidential 3

Many-core Servers

• SeaMicro – Custom Servers – 512, 1.66 GHz 64 bit X86 Intel Atom cores in 10 RU; 2,048 CPUs/rack

• Calxeda - highly integrated Server on Chip built ‐ ‐around a new generation ARM processor – 480 cores

• Silicon Graphics – Altix UV – – 2048 cores, 16 TB memory per Single System Image scales

to 32,768 processor sockets providing up to 262,144 Intel Xeon cores (8-cores per socket)

Page 4: Parallax - A New Operating System for Scalable, Distributed, and Parallel Computing

Private & Confidential 4

Hardware Upheaval and von Neumann Bottleneck

Network Infrastructure With Complex Management Systems

Layers of Management Infrastructure

Up to 46,080 processing cores or 29.8 petabytes of storage per container

Running an OS that cannot see beyond tens of cores

No Operating System that provides Application-centric Resource Management in real-time

Operating System Gap

512 Cores

480 Cores

Page 5: Parallax - A New Operating System for Scalable, Distributed, and Parallel Computing

Private & Confidential 5

Current Economics of IT

Infrastr

ucture

Administrati

on

Business

Down Time

Power & Coolin

g

Unplanned

Down Time

Serve

r HW

Client H

W +

MS VEC

D

Rack S

pace &

Office

Spac

e

Planned

Down Time

Storag

e HW

Networki

ng & Se

curit

y HW

Other + T

ax

VMW SW

+SnS

Servi

ces a

nd Train

ing0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

PhysicalVirtual

% o

f TCO

ove

r Fiv

e Ye

ars

$61.2M $31.6M

Hardware Upheaval is not Matched by Software Innovation!!

Page 6: Parallax - A New Operating System for Scalable, Distributed, and Parallel Computing

Private & Confidential

SPC Element Network & von Neumann Bottleneck

...code...

...code...

...code...

...mngt code...

...mngt code...

...mngt code...

...code...

...code...

...code...

...mngt code...

...mngt code...

...mngt code...

Service Regulation Executable Instructions

Service Package Executable Instructions

Distributed Application

Serial Processing

...mngt code...

...mngt code...

...mngt code...

...mngt code...

...mngt code...

...mngt code...

Service Regulation Executable Instructions

Network, Storage, Virtualization, application etc.

etc. Management

Parallel FCAPS* Management of Stored Program Computing Element using Signaling Channel

Distributed Intelligent Managed Element Network

...mngt code...

...mngt code...

...mngt code...

Real-time Application Management(Provisioning,

Monitoring & Control)

...code...

...code...

...code...

Application(Service Component in a Distributed Workflow)

Managed Intelligent Computing Element1. Signaling & Self-

Management of Node2. Workflow with DIME

Network Management

* Fault, Configuration, Accounting, Performance and Security (Node & Network)

Hello World

StopStart

Hello World

End-to-end distributed transaction response is no longer controlled by the individual node OS

in a shared resource environment

Page 7: Parallax - A New Operating System for Scalable, Distributed, and Parallel Computing

Private & Confidential 7

DIMEs In A Multi-Core Server

F

Physical Server 1

Parallax OS (P)

App App

A B

Free Memory (F)

S S S S

A AB B

S

A B BA

P P P PP Shared Memory (S)

MICE MICE

Signaling

I/O

F

Free Memory (F) Physical Server 1

F

Free Memory (F) Physical Server 1

F

Free Memory (F)

Network

DIME Sub-network Managers

F C A P S

Run-time OrchestratorLinux

Service (Service Regulator and Service Package)

Proof of Concept Features• DIME Instantiation• Discovery• Workflow Orchestration• Scaling• Dynamic Reconfiguration• Fault Management

Server 1 Server 2 Server 3

A B A BA B

Page 8: Parallax - A New Operating System for Scalable, Distributed, and Parallel Computing

Private & Confidential 8

DNA In A Multi-core Server

The proof of concept and the secret saucehttp://youtu.be/IMXxmRSVGoI

Neumann, J. v. “The General and Logical Theory of Automata” In E. b. Taub, John von Neumann Collected Works (pp. Vol 5, p259). Chicago: University of Illinois Press (1951)

George B. Dyson, “Darwin among the Machines, the evolution of global intelligence”, Helix Books, Addition Wesley Publishing Company, Inc., Reading, MA, 1997, p123.

Page 9: Parallax - A New Operating System for Scalable, Distributed, and Parallel Computing

Private & Confidential 9

Service Deployment

FFF

DIME Sub-network Managers

F C A P S

Run-time OrchestratorLinux

Network

Service Component Developer

(Service Creation)

Service Workflow Creator

(Service Delivery)

Node 1 Worker 1

Node 1 Worker 2

Node 2 Worker 1

Node 2 Worker 2

Node 3 Worker 1

Node 3 Worker 2

Hello World Hello World Hello World

Service Control Manager

(Service Assurance)

Page 10: Parallax - A New Operating System for Scalable, Distributed, and Parallel Computing

Private & Confidential 10

Lessons From Biology

"The basic principle of dealing with malfunctions in nature is to make their effect as unimportant as possible and to apply correctives, if they are necessary at all, at leisure. In our dealings with artificial automata, on the other hand, we require an immediate diagnosis. Therefore, we are trying to arrange the automata in such a manner that errors will become as conspicuous as possible, and intervention and correction follow immediately." --- John von Neumann, "The General and Logical Theory of Automata", John von Neumann Collected Works, Edited by A. H. Taub, Volume 5, p 289 (Hixon Symposium 1948)

"It's very likely that on the basis of philosophy that every error has to be caught, explained, and corrected, a system of the complexity of the living organism would not run for a millisecond." --- von Neumann, Theory of Self-Reproducing Automata (1948) at the Hixon Symposium, Pasadena, California

Page 11: Parallax - A New Operating System for Scalable, Distributed, and Parallel Computing

Private & Confidential

Dr. Rao Mikkilineni, Kawa Objects

Ian Seyler, Return Infinity May 16, 2011

Questions?

DIME Network Architecture (DNA)

for a New Generation of

Many-core Computing

ReplicationRepair

RecombinationReconfiguration

SMTPS11

Stop Start