improving systems management policies using hybrid reinforcement learning gerry tesauro ibm tj...

Improving Systems

Management Policies Using

Hybrid Reinforcement

Learning

Gerry Tesauro <[email protected]>

IBM TJ Watson Research CenterJoint work with Rajarshi Das (IBM), Nick Jong (U. Texas) Mohamed Bennani (George Mason Univ.)

2

Outline: Main points of the talk

Introduction: Brief Overview of “Autonomic Computing”

Grandiose Motivation: Combining Machine Learning with domain knowledge in Autonomic Computing

Problem Description

Scenario: Online server allocation in Internet Data Center

Data Center Prototype Implementation

Reinforcement Learning Approach

Quick RL Overview

Prior Online RL Approach

New Hybrid RL Approach

Results/Insights into Hybrid RL outperformance

Fresh results on new application: Power Management

3

Challenges in Systems Management

IBM's Global IP NetworkAT&T

Description REV

SDC North Physical/Logical WAN Connectivity 1/2

SCALE N/A SHEET 1

DRAWN

ISSUED

Gregg Machovec SDC North Network Architect

9/13/2001

SDC North

Customer C:\temp\SDC North Physical-Logical IP WAN Connectivity.vsdIBM Global Services Network Services

FDDI 100MbpsOSPF 0.0.0.0 Cost 9

Seg BB19.32.236.145-158.0

Token-Ring 16MbpsOSPF 0.0.0.0 Cost 160

Seg 6109.32.236.217-222.0 (.217/.218)


Seg B0E9.130.104.0 (.12/.9)


Seg 6119.32.236.193-198.0 (.193/.194)


Seg B519.66.7.0. (.3/.7)


9.32.236.185-190 (.185/.186)


9.50.123.0 (.4/.2)

SD

RESETPOWERRUNBOOTDIAGBACKBONE NODE

Bay Networks



Seg 6209.32.236.33-38.0 (.33/.34)


Seg 6409.32.236.40-46.0 (.41/.42)

SOMCSB-2DD3 (D)

SOMCSB-1DD2 (C)

SBY002-1DD6 (C)

SBY002-2DD7 (D)

POK010-3DBC (F)

POK918-3DBB (E)


Seg 5519.32.237.65-78 (.65)

HAW790-1DD9 (D)

YKT801-1DD8 (C)


Seg 6709.32.236.105-110.0 (.105/.106)


Seg 6909.32.236.128 (.129/.130)


9.32.236.65-70 (.69/.70)

Token-Ring 16MbpsOSPF 0.0.0.0 Cost 1609.32.236.48 (.49/.50)

Sync 1.544Mbps OSPF 0.0.0.0 Cost 648

9.32.232.24 (.25/.26)


Seg 6319.32.236.177-182.0 (.178/.177)

SD


Bay Networks

SQV257-1D15 (E)

SD


Bay Networks

SQV014-1D14 (F) FDDI 100Mbps

OSPF 9.5.0.0 Cost 1Seg F03

9.5.101.0 (.24/.25)

.147

.147


Seg BB09.117.1.0 (.2/.19)

ATM 155MbpsOSPF 9.117.220.0 Cost 80

USPOKTR0BC1_IP109.117.220.0 (.249/229)

.149.149

.137

FSH330-3D90 (E)

BTV963-IGSNSD52 (F)

BTV863-5D61 (E)

SD


Bay Networks

RCHRTE25FD9 (B)

SD


Bay Networks

RCHRTE24FD8 (A)

POK010-1DB2 (D)

FSH640-3DA6 (F)

FDDI 100MbpsStatic 9.38.80-85.0

9.38.80.193 (.219/.218)

RCHSDR-1DEB (C)

SD


Bay Networks

RCHSDR-2DEC (D)

SD


Bay Networks

Token-Ring 16MbpsSeg C01

9.242.96-103.0 (.128/.129)9.242.104-111.0 (.128/.129)

Token-Ring 16MbpsSeg 099

9.242.64-71.0 (.128/.129)

PAL001-2DE2 (D)

PAL001-1DE1 (C)


9.242.48-55.0 (.127/.128)9.242.80-87.0 (.127/.128)

SD


Bay Networks

STF001-2DDD (D)

Token-Ring 16MbpsSeg DF3,DF1 Armonk

9.242.144-151.0 (.128/.129)9.242.152-159.0 (.128/.129)

Seg BB0 North Castle9.38.32.97-110.0 (.100/.105)

SD


Bay Networks

STF001-1DDC (C)

Token-Ring 16MbpsLIG 32.225.9.0

204.146.137-142.0 (.141)

Token-Ring 16MbpsLIG 32.226..113.0,

32.226.175.032.96.121.49-54.0 (.53)

SD


Bay Networks

ARM001-4D01 (D)

ARM001-3D00 (C)

44S001-2DD5 (D)

44S001-1DD4 (C)


9.32.237.33.-46.0 (.33)


9.32.237.17-30.0 (.17)


Seg 3969.32.237.49-62.0 (.49)

SD


Bay Networks

SD


Bay Networks

SomersCampus Network

SD


Bay Networks

Southbury CampusNetwork

SD


Bay Networks

SD

RESETPOWERRUNBOOTDIAGBACKBONE NODE Bay Networks

SD


Bay Networks

.148

.148

.146

.146

SD


Bay Networks

SD


Bay Networks

SD


Bay Networks

SD


SD


Bay Networks


204.146.252.249-254.0(.133)

SD


Sync 3Mbps OSPF 9.66.0.0 Cost 220 Seg E52

9.66.124.0 (.1/.2) BTV617-2SD


Bay Networks

SD


44 SouthBroadway

PoughkeepsieNY

Endicott NY

Rochester MN

Burlington VT

Fishkill NY

Southbury CT

Palisades NJ

Sterling ForestNY

Armonk NYRochester NY

Hawthorne NY

Yorktown NY

Somers NY


USPOKTR0BC3_GW109.32.237.145-158.0

SD


Bay Networks

SD


Bay Networks

FDDI 100MbpsBGP-4

9.32.236.136 (.139/.138)

POK918-1DBD (B)

SD


Bay Networks

SD


Bay Networks

Morning rebootingp-patterns

Weekly rebootingp-patterns

Outages

Event bursts

Excessive DM events

Hosts

Large-scale, heterogeneous distributed systems with highly dynamic, complex multi-component interactions

Large volumes of real-time high-dimensional data, but also lots of missing information and uncertainty

Too much complexity, too few (skilled) administrators

Need for “self-managing” systems autonomic computing

4

What is Autonomic Computing?

“Computing systems that manage themselves in accordance with high-level objectives from humans”Kephart and Chess, A Vision of Autonomic Computing, IEEE Computer, 2003

“Self-management” capabilities include

Self-Configuration: Automated configuration of components, systems according to high-level policies; rest of system adjusts seamlessly.

Self-Healing: Automated detection, diagnosis, and repair of localized software/hardware problems.

Self-Optimization: Automatic and continual adaptive tuning of hundreds of parameters (database params, server params,…) affecting performance & efficiency

Self-Protection: Automated defense against malicious attacks or cascading failures; use early warning to anticipate and prevent system-wide failures.

Good application domain for ML: rich opportunities, little previously done

5

A “Knowledge Bottleneck” in Autonomic Computing

Managed Element

ES

Monitor

Analyze

Execute

Plan

Knowledge

Autonomic Manager

ES

6

Machine Learning to the Rescue

Can avoid knowledge bottleneck: automatically extract knowledge from observations of data

Examples: Supervised Learning: Input Predicted Output

(classification, regression) Unsupervised Learning: Input Structure among

input variables (clustering, data mining) Reinforcement Learning: Learns behavioral

policies: State Action

7

Will ML Without Built-In Knowledge Work?

Managed Element

ES

Monitor

Analyze

Execute

Plan

Tabula Rasa ML

Autonomic Manager

ES

Tabula Rasa = “blank slate” (Latin)

8

A Hybrid Approach Combining Knowledge + ML

Initial Knowledge Behavioral Data ML Improved Knowledge

Several advantages: No direct interface between ML and Initial

Knowledge; don’t engineer knowledge into ML Initial knowledge can be virtually anything:

very simple (e.g. crude heuristic) highly sophisticated (multi-tier closed queuing network) could even be human behavior

Can do multiple iterations to keep improving

9


Introduction:

Problem Description

Scenario: Online server allocation in Internet Data Center

Data Center Prototype Implementation


Results

Insights into Hybrid RL outperformance

Wrapup

10

Application: Allocating Server Resources in a Data Center

Scenario: Data center serving multiple customers, each running high-volume web apps with independent time-varying workloads

Macy’s Online Shopping

ApplicationManager

ServersServersServers

DB2

Router

E-Trade: online trading

ApplicationManager


DB2

Router

Citibank: online banking

ApplicationManager


DB2

Router

SLA $$ SLA $$ SLA $$

ResourceArbiter

Data Center Maximize business value

across all customers

11

Problem Description Scenario: Online server allocation in Internet Data Center Data Center Prototype Implementation:

Real servers: Linux cluster (X series machines) Realistic Web-based workload: Trade3 (online trading emulation)

Runs on top of WebSphere and DB2

Realistic time-varying demand generation: Open-loop scenario: Poisson HTTP requests; vary mean arrival rate Closed-loop scenario: Finite number of customers M with fixed think time

distribution; M varies with time Use Squillante-Yao-Zhang time-series model to vary M or above

12

Data Center Prototype: Experimental setup

8 xSeries servers

Value(#srvrs)

Trade3

AppManager

Value(RT)

ResourceArbiter

Batch

AppManager

Trade3

Server Server Server Server Server Server Server Server

Value(#srvrs)

Value(#srvrs)

Demand(HTTP req/sec)

WebSphere 5.1

DB2

AppManager

WebSphere 5.1

DB2

Value(#srvrs)

Maximize Total SLA Revenue

5 sec

Value(RT)

Demand(HTTP req/sec)

SLA SLA SLA

13

Standard Approach: Queuing Models Design an appropriate model of flows and queues (arrival

process/ routing discipline/service process etc.) in system Estimate model parameters offline or online Model estimates Value(numServers) by estimating (asymptotic)

performance changes due to changes in numServers Has worked well in many deployed systems

Two main limitations: Model design is difficult and knowledge-intensive Model assumptions don’t exactly match real system

Real systems have complex dynamics; standard models assume steady-state behavior

Two prospective benefits of machine learning approach: Avoid knowledge bottleneck Decisions can reflect dynamic consequences of actions

e.g. properly handle transients and switching delays

14


Introduction

Problem Description

Reinforcement Learning Approach Quick RL Overview

Results


Wrapup

15

Reinforcement Learning (RL) approach

ActionReward State

Alg?

App 1

Value(RT) # serversMonitored data streams

RL

System

16

Reinforcement Learning: 1-slide Tutorial A learning agent interacts with the environment

Observes current state s of the environment Takes an action a Receives an (immediate) scalar reward r

Agent learns a long-range value function V(s,a)

estimating cumulative future reward:

We use a standard RL algorithm “Sarsa”: learns state-action value function

By design RL does “trial-and-error” learning without model of environment Naturally handles long-range dynamic consequences of actions (e.g., transients,

switching delays) Solid theoretical grounding for MDPs; recent practical success stories

System

Agent

ActionReward

State

01

tt

k rR

),()','(),( asVasVrasV

17


Introduction

Problem Description

Reinforcement Learning Approach Quick RL Overview

Online RL Approach

Results


Wrapup

18

Will ML Without Built-In Knowledge Work?

Managed Element

ES

Monitor

Analyze

Execute

Plan

Tabula Rasa ML

Autonomic Manager

ES

Tabula Rasa = “blank slate” (Latin)

19

Application: Allocating Server Resources in a Data Center

Scenario: Data center serving multiple customers, each running high-volume web apps with independent time-varying workloads


ApplicationManager


DB2

Router


ApplicationManager


DB2

Router


ApplicationManager


DB2

Router


ResourceArbiter

Data Center Maximize business value

across all customers

20

Assumptions Behind RL Formulation


ApplicationManager


DB2

Router


ApplicationManager


DB2

Router


ApplicationManager


DB2

Router


ResourceArbiter

• Each application has local state; unaffected by other apps• Each app. has local state transitions and local rewards, depending

only on local state and local resource Collection of separate local MDPs, but global decision maker wants

to maximize sum of local rewards

21

Global RL versus Local RL• One approach: Make the Resource Arbiter a global Q-Learner• Advantages:

• Arbiter’s problem is a true MDP• Can rely on convergence guarantee

• Main Disadvantage:• Arbiter’s state space is huge: cross product of all local state spaces

Serious curse-of-dimensionality if many applications

• Alternative Approach: Local RL• Each application does local Sarsa(0) based on local state, local

provisioning, and local reward learns local value function• Each application conveys current V(resource) estimates to arbiter• Arbiter then acts to maximize sum of current value functions• Local learning should be much easier than global learning; but• No longer have a convergence guarantee• Related work: Russell & Zimdars, ICML-03. (local rewards only)

22

Online RL in Trade3 Application Manager (AAAI 2005)

Application Environment

TRADE3 App Mgr

SLA (RT)

ResponseTime

V(n)

U

RL

Demand

Servers

V(, n)

Observed state = current demand only

Arbiter action = # servers provided (n)

Instantaneous reward U = SLA payment

Learns long-range expected value function V(state,action) = V(, n)

(two-dimensional lookup table)

Data Center results:

good asymptotic performance, but

poor performance during long training period

method scales poorly with state space size

ResourceArbiter

Server Server Server

23

Amazingly Enough, RL Works! :-)Results of overnight training (~25k RL updates = 16 hours real time) with random initial condition

24

Comparison of Performance: 2 Application Environments

25

3 Application Environments: Performance

26


Introduction

Problem Description


Quick RL Overview

Online RL Approach

Hybrid RL Approach (Tesauro et al., ICAC 2006)

Results


Wrapup

27

RL

RL

RL

System

MBP

ActionReward

State

Run RL offline on data from initial policy Bellman Policy Improvement Theorem (1957)

V(state,action) defines a new policy guaranteed better than original policy

Combines best aspects of both RL and model-based (e.g. queuing) methods

Very general method that automatically improves any existing systems management policy

In Data Center prototype:

• Implement best queuing models within each Trade3 mgr• Log system data in overnight run (~12-20 hrs)• Train RL on log data (~2 cpu hrs) new value functions• Replace queuing models by RL value functions and rerun experiment

Hybrid Reinforcement Learning Illustrated

28

Two key ingredients of Trade3 implementation 1. “Delay-Aware” State Representation:

Include previous allocation decision as part of current state V = V( t , nt-1 , nt )

Can learn to properly evaluate switching delay (provided that delay < allocation interval)

e.g. can distinguish V(, 2, 3) from V(, 3, 3) delay need not be directly observable: RL only observes

delayed reward Also handles transient suboptimal performance

2. Nonlinear Function Approximation (Neural Nets) Generalizes across states and actions

Obviates visiting every state in space Greatly reduces need for “exploratory” actions

Much better scaling to larger state spaces From 2-3 state variables to 20-30, potentially

But: lose guaranteed optimality

29


Introduction

Problem Description


Results


Wrapup

30

Results: Open Loop, No Switching Delay

+2.6% Trade3 RT+12.7% Batch thrput

-0.4% Trade3 RT+38.9% Batch thrput

+73% Trade3 RT+221% Batch thrput

31

Results: Closed Loop, No Switching Delay

32

Results: Effects of Switching Delay

33


Introduction

Problem Description


Results


Wrapup

34

Insights into Hybrid RL outperformance 1. Less biased estimation errors

Queuing model predicts indirectly: RT SLA(RT) V Nonlinear SLA induces overprovisioning bias

RL estimates utility directly less biased estimate of V

2. RL handles transients and switching delays Steady-state queuing models cannot

3. RL learns to avoid thrashing

35

Policy Hysteresis in Learned Value FunctionStable joint allocations (T1, T2, Batch) at fixed 2

36

Hybrid RL learns not to thrash

Closed Loop Demand: #Customers in T1 & T2Allocation Delay 4.5s

Queuing Model Servers(T2)

Queuing Model Servers(T1)

Hybrid RL Servers(T1)

Hybrid RL Servers(T2)T1

T2

37

<n>

Experiment

Hybrid RL does less swapping than QM

0.5780.464

0.581

0.269

0.654

0.486

0.736

0.331

00.10.20.30.40.50.60.70.80.9

QM RL QM RL QM RL QM RL

Delay=0 Delay=4.5 Delay=0 Delay=4.5

Open Open Closed Closed

38


Introduction

Problem Description


Results


Power Management (Kephart et al., ICAC 2007)

39

StockTrading

Prioritization andFlow Control

Routing andLoad Balancing

Classification

Computing Resources

WebSphere On Demand Router

WebSphere XDControllers

AccountMgmt

FinancialAdvice

HighImportance

MediumImportance

LowImportance

AMSTNode

2

FASTNode

3

Node4

Node1

PlacementExecutions

StockTrading

AccountMgmt

FinancialAdvice

PlacementDecisions

WebSphere XDPerformance Manager

AM

FAST

FAST

Load balancing parameters

U(RT)

Power Executive

Control CPU speeds

IBM DirectorManipulate power controls dynamically

Power and Performance Management

{U(RT) – C(Pwr)}

40

Architecture Overview

ICAC 2007, to appear

(IBM Director)

2007 Tivoli/AC Joint Program © 2007 IBM Corporation41

IBM Software Group | Tivoli software

Experiment with hand-tuned policyNo power management Power management, using Hand-tuned Policy

Avg power = 96.6 watts (savings: 11.3 watts = 10.5%) Avg power = 107.9 watts

Workload intensity

CPU

Power

Response time

Workload intensity

CPU

Power

Response time

Time Time

42

Hybrid RL Results

Learn V=V(s,a) state s uses single input variable (numClients) Both response time performance and power consumption comparable to

hand-crafted policy

43

Hybrid RL results (15 input variables)

Avg power = 98.3 watts (savings = 8.9%) SLA violations = 1.5% vs 21%

44

Conclusions

Hybrid RL works quite well for server allocation combines disparate strengths of RL and queuing models exploits domain knowledge built into queuing model but doesn’t need access to knowledge: only uses externally observable behavior

of queuing model policy

Initial promising results in power management suggests a basic 2-d value function V(load_intensity, resource_knob) may be

generally useful and easy to learn

Potential for wide usage of Hybrid RL in systems management managing other resource types: memory, storage, VMs etc. manage control params: OS/DB params etc. simultaneous management of multiple criteria: performance/utilization,

performance/availability etc.

45

For further info/reading material

Papers: “Online Resource Allocation using Decompositional Reinforcement

Learning,” G. Tesauro, Proc. of AAAI-05. “A Hybrid Reinforcement Learning Approach to Autonomic Computing”

G. Tesauro et al., Proc. of ICAC-06. “Coordinating Multiple Autonomic Managers to Achieve Specified Power-

Performance Tradeoffs,” J. Kephart et al., Proc. of ICAC-07. More info about R & D in Autonomic Computing:

Our work: www.research.ibm.com/nedar AC toolkit (Autonomic Manager ToolSet): AMTS v1.0 available as

part of Emerging Technologies Toolkit v1.1 on IBM alphaWorks: www.alphaworks.com

IBM: www.research.ibm.com/autonomic Intl. Conf. on Autonomic Computing (ICAC-07):

www.autonomic-conference.org Summer internships: email me: [email protected] Thanks! Any questions??

46

The End

47


Description REV

SDC North Physical/Logical WAN Connectivity 1/2

SCALE N/A SHEET 1

DRAWN

ISSUED

Gregg Machovec SDC North Network Architect

9/13/2001

SDC North

Customer C:\temp\SDC North Physical-Logical IP WAN Connectivity.vsdIBM Global Services Network Services


Seg BB19.32.236.145-158.0


Seg 6109.32.236.217-222.0 (.217/.218)


Seg B0E9.130.104.0 (.12/.9)


Seg 6119.32.236.193-198.0 (.193/.194)


Seg B519.66.7.0. (.3/.7)


9.32.236.185-190 (.185/.186)


9.50.123.0 (.4/.2)

SD

RESET

POWER

RUN

BOOT

DIAG

B A C K B O N E N O D E

Bay Ne tworks



Seg 6209.32.236.33-38.0 (.33/.34)


Seg 6409.32.236.40-46.0 (.41/.42)

SOMCSB-2DD3 (D)

SOMCSB-1DD2 (C)

SBY002-1DD6 (C)

SBY002-2DD7 (D)

POK010-3DBC (F)

POK918-3DBB (E)


Seg 5519.32.237.65-78 (.65)

HAW790-1DD9 (D)

YKT801-1DD8 (C)


Seg 6709.32.236.105-110.0 (.105/.106)


Seg 6909.32.236.128 (.129/.130)


9.32.236.65-70 (.69/.70)

Token-Ring 16MbpsOSPF 0.0.0.0 Cost 1609.32.236.48 (.49/.50)


Seg 6319.32.236.177-182.0 (.178/.177)

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Networks

SQV257-1D15 (E)

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Network s

SQV014-1D14 (F) FDDI 100Mbps

OSPF 9.5.0.0 Cost 1Seg F03

9.5.101.0 (.24/.25)

.147

.147


Seg BB09.117.1.0 (.2/.19)


USPOKTR0BC1_IP109.117.220.0 (.249/229)

.149.149

.137

FSH330-3D90 (E)

BTV963-IGSNSD52 (F)

BTV863-5D61 (E)

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Network s

RCHRTE25FD9 (B)

SD

RESET

POWER

RUN

BOOT

DIAG


Ba y Network s

RCHRTE24FD8 (A)

POK010-1DB2 (D)

FSH640-3DA6 (F)

FDDI 100MbpsStatic 9.38.80-85.0

9.38.80.193 (.219/.218)

RCHSDR-1DEB (C)

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Networks

RCHSDR-2DEC (D)

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Networks


9.242.96-103.0 (.128/.129)9.242.104-111.0 (.128/.129)

Token-Ring 16MbpsSeg 099

9.242.64-71.0 (.128/.129)

PAL001-2DE2 (D)

PAL001-1DE1 (C)


9.242.48-55.0 (.127/.128)9.242.80-87.0 (.127/.128)

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Network s

STF001-2DDD (D)

Token-Ring 16MbpsSeg DF3,DF1 Armonk

9.242.144-151.0 (.128/.129)9.242.152-159.0 (.128/.129)

Seg BB0 North Castle9.38.32.97-110.0 (.100/.105)

SD

RESET

POWER

RUN

BOOT

DIAG


Ba y Network s

STF001-1DDC (C)


204.146.137-142.0 (.141)

Token-Ring 16MbpsLIG 32.226..113.0,

32.226.175.032.96.121.49-54.0 (.53)

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Network s

ARM001-4D01 (D)

ARM001-3D00 (C)

44S001-2DD5 (D)

44S001-1DD4 (C)


9.32.237.33.-46.0 (.33)


9.32.237.17-30.0 (.17)


Seg 3969.32.237.49-62.0 (.49)

SD

RESET

POWER

RUN

BOOT

DIAG


Ba y Network s

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Networks

SomersCampus Network

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Ne tworks

Southbury CampusNetwork

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Networks

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Networks

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Ne tworks

.148

.148

.146

.146

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Networks

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Network s

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Networks

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Networks

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Networks


204.146.252.249-254.0(.133)

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Network s

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Networks

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Networks

44 SouthBroadway

PoughkeepsieNY

Endicott NY

Rochester MN

Burlington VT

Fishkill NY

Southbury CT

Palisades NJ

Sterling ForestNY

Armonk NYRochester NY

Hawthorne NY

Yorktown NY

Somers NY


USPOKTR0BC3_GW109.32.237.145-158.0

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Networks

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Ne tworks

FDDI 100MbpsBGP-4

9.32.236.136 (.139/.138)

POK918-1DBD (B)

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Networks

SD

RESET

POWER

RUN

BOOT

DIAG


Bay Networks

48

Evolution of Computing

improving systems management policies using hybrid reinforcement learning gerry tesauro ibm tj...

Documents

domain knowledge

knowledge bottleneck

improved knowledge

computing systems

vision of autonomic

machine learning

builtin knowledge work

power management slide