java on z/os: a fresh look scott chapman american electric power

Post on 12-Jan-2016

232 Views

Category:

Documents

10 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Java on z/OS: A fresh look

Scott ChapmanAmerican Electric Power

Important notes

I don’t really like Java as a language I’m not a Java expert

Results presented herein may be installation-dependent There’s a lot of moving parts here

I understand there’s zAAP on zIIP “zAAP” used generically here

All trademarks of IBM, Oracle, and everybody else hereby recognized

Why Java on z/OS?

Because programmers want to use it

http://xkcd.com/801/

Why Java on z/OS

Because it enables open source projects that are cool/useful/interesting

Key trick: run the JVM in ASCII -Dfile.encoding=ISO8859-1

Many things will just run with that run-time option!

What about a GUI?

Turns out that that just works too! Start Xming X server on your PC

Check the “No Access Control” option Set the DISPLAY environment variable Run the code

S147774:/u/s147774: >export DISPLAY=10.97.131.15:0S147774:/u/s147774: >java -Xmx320m -jar ga33.jar

Debugging Javascript code running in Helma on themainframe with the GUI connected to Xming on

my laptop

Works better than I expected

Why Java on z/OS Because it enables more

programming language choices Javascript built in to Java 6

Rhino interpreter from Mozilla In theory, should be able to run any JVM-

based language (I haven’t tested these) Jython Groovy Clojure Scala Ruby (via JRuby)

Why Java on z/OS

It may perform better If you are on a sub-capacity machine

It may save you money Pretty unlikely Only if you can take some work away

from your peaks

Which job is better?

How cheap are zAAP/zIIPs?• $100K/SE (z196, zEC12)• How much is $100K?• Consider adding 1 engine to z196-710:

a) 710 = 10,250 MIPS, 1191 MSUs

b) 711 = 11,073 MIPS, 1286 MSUs

c) 710+1 zIIP = 10,302+1,000 MIPS

• z/OS (base) at this level costs $62/MSU• Scenario B, z/OS base goes up almost

$6K/month

• zIIP costs < 17 months of z/OS Base • Not to mention features, DB2, CICS, etc.

What about accessing z/OS services? JZOS Classes to easily access z/OS

specific constructs z/OS datasets RACF Respond to operator commands Access JES Spool

Ways to Run Java on z/OS

WebSphere CICS DB2 Stored Procedures Batch Started Tasks Unix shell

Batch / Started Task options

BPXBATC BPXBATCH (traditional alias) BPXBATSL (local spawn alias) Traditional approach

Difficulty with 100-byte JCL Parm

JZOS Ships with z/OS Avoids 100-byte parm limit Adds a lot of flexibility

Measuring Java

zAAP vs. GCP time

Watch the normalization factor! Most SMF values not normalized Tools/reports may normalize for you

Consider IFAHONORPRIORITY=NO Avoid using GCPs to help zAAPs Can result in >99% of Java CPU time

executed on zAAP

SDSF zAAP vs. GCP columns

JOBNAME CPU-Time GCP-Time zAAP-Time zACP-Time zAAP-NTime P3SR01BS 1514.11 9.53 772.02 2.26 1501.82 P3SR01AS 1706.50 12.82 868.75 1.95 1690.00 P3SR01B 788.55 197.66 281.64 1.53 547.87 P3SR01A 763.01 192.47 272.33 1.10 529.77 P3SR02A 2953.37 422.62 1188.79 5.39 2312.56 P3SR02B 3051.88 437.74 1226.02 6.55 2385.00 P3SR01AS 7281.39 62.56 3698.72 11.47 7195.17 P3SR02BS 2805.58 123.85 1316.22 22.15 2560.45 P3SR01BS 7783.21 63.38 3955.54 14.38 7694.77 P3SR02AS 2591.27 118.60 1216.36 10.74 2366.21 RTMSERVE 2661.39 3.85 1363.45 1.03 2652.34

zAAP on GCP normalizedrealTCB + SRB

This data comes from RMF

SMF 30 Accounting

BPXBATCH vs. BPXBATSL vs. JZOS Important due to spawned OMVS tasks

Single step job results: BPXBATSL: 1 step, 1 job record BPXBATCH: 6 step, 4 job records

CPU time collected on type OMVS records JZOS: 2 step, 2 job records

CPU time almost completely on JOB types

Some interesting calculations

zAAPn = SMF30_TIME_ON_IFA * SMF30ZNF / 256

percent work done on zAAP =

zAAPn / (zAAPn + SMF30CPT + SMF30CPU)

(“Generosity” or “offload” factor)

percent zAAP sent to GCP =

SMF30_TIME_IFA_ON_CP / (SMF30_TIME_ON_IFA+SMF30_TIME_IFA_ON_CP)

(“Fallback” percentage—can be <1%, although some fallback is normal and expected)

Other SMF records

RMF records Look for breakdown of processor types

for both hardware and report / service classes

WAS 120 records New subtype 9s for WAS 7+ much

better! HIS type 113 records

GCP vs. zAAP vs. zIIP

Java Performance

What about performance?

Java on the mainframe has a history of performance problems

Java is inherently “heavy” due to the JVM Scott’s Law: “The easier you make it on

the programmer, the harder it is on the system”

Today’s z hardware and software are up to the task! (But you probably want zAAPs!)

Heard at WAS Week 200x…

“Our goal is to get JVM startup time down to about 1 second.”

Seemed like a stretch at the time! WAS startup took several minutes

Today: WAS Servant Startup <1 min15.49.15 STC14327 ---- MONDAY, 18 APR 2011 ----

15.49.15 STC14327 $HASP373 P3SR02AS STARTED

15.49.15 STC14327 IEFUSI BPXBATSL-P3ASRU ABOVE REGION SET TO 1536MB

15.49.15 STC14327 IEF403I P3SR02AS - STARTED - TIME=15.49.15

15.49.16 STC14327 +BBOO0004I WEBSPHERE FOR Z/OS SERVANT PROCESS

P3CELL/P3NODEA/P3SR02/P3SR02A IS STARTING.

15.49.16 STC14327 +BBOO0239I WEBSPHERE FOR Z/OS SERVANT PROCESS p3cell/p3nodea/p3sr02a IS

STARTING.

15.49.16 STC14327 +BBOO0308I SERVANT PROCESS P3CELL/P3NODEA/P3SR02/P3SR02A IS EXECUTING

IN 64-BIT ADDRESSING MODE.

15.49.16 STC14327 +BBOM0007I CURRENT CB SERVICE LEVEL IS build level 7.0.0.12

(cf121027.08) release WAS70.ZNATV date 07/09/10 11:02:02.

...

15.49.56 STC14327 +BBOO0222I: WSVR0001I: Server SERVANT PROCESS p3sr02a open for

e-business

15.49.57 STC14327 +BBOO0020I INITIALIZATION COMPLETE FOR WEBSPHERE FOR Z/OS SERVANT

PROCESS P3SR02A.

15.49.57 STC14327 +BBOO0248I INITIALIZATION COMPLETE FOR WEBSPHERE FOR Z/OS SERVANT

PROCESS P3CELL/P3NODEA/P3SR02/P3SR02A.

Not much in that particular servant

Today: HelloWorld in <2 seconds10.08.55 JOB47259 IEF403I S147774B - STARTED - TIME=10.08.55 10.08.57 JOB47259 - --TIMINGS (MINS.)-- ----PAGING COUNTS--- 10.08.57 JOB47259 -JOBNAME STEPNAME PROCSTEP RC EXCP CPU SRB CLOCK SERV PG PAGE SWAP VIO 10.08.57 JOB47259 -S147774B RUNOMVS 00 59 .00 .00 .02 2524 0 0 0 0 10.08.57 JOB47259 IEF404I S147774B - ENDED - TIME=10.08.57

10.08.57 JOB47259 -S147774B ENDED. NAME-BPXBATCH TEST TOTAL CPU TIME= .00 TOTAL ELAPSED TIME= .02 10.08.57 JOB47259 $HASP395 S147774B ENDED

OutputHello Scott Java runtime: IBM Corporation 1.6.0, vm version 2.4 Running on: s390 z/OS 01.10.00 Running for: S147774 Classpath: /usr/lpp/java/J6.0/lib:/usr/lpp/java/IBM/J1.3/l

JCL//RUNOMVS EXEC PGM=BPXBATCH, // PARM='SH java -Xms32M -Xmx32M HelloWorldApp Scott'//SYSOUT DD SYSOUT=* //SYSPRINT DD SYSOUT=* //SYSUDUMP DD SYSOUT=* //STDENV DD * //STDOUT DD SYSOUT=* //STDERR DD SYSOUT=*

z10 EC 504 with zAAP

Small machine

10.51.53 JOB10901 IEF403I S147774B - STARTED - TIME=10.51.53 10.52.04 JOB10901 - --TIMINGS (MINS.)-- ----PAGING COUNTS---

10.52.04 JOB10901 -JOBNAME STEPNAME PROCSTEP RC EXCP CPU SRB CLOCK SERV PG PAGE SWAP VIO

10.52.04 JOB10901 -S147774B RUNOMVS 00 86 .00 .00 .18 2252 0 0 0 0

10.52.04 JOB10901 IEF404I S147774B - ENDED - TIME=10.52.04

10.52.04 JOB10901 -S147774B ENDED. NAME-BPXBATCH TEST TOTAL CPU TIME= .00 TOTAL ELAPSED TIME= .18

10.52.04 JOB10901 $HASP395 S147774B ENDED

z10 BC E02without zAAPs

Not surprising that ~50 MIPS engines can’t keep up with

450 / 900 MIPS engines

What about doing real work?

Days of assuming it will run faster on your PC are over Have seen H2 perform better on z/OS

Still, it is Java, it’s not CPU-free Performance may depend on:

zAAP and GCP capacity System settings (USS, zFS, WLM) Application code Java Settings (heap size, GC policy) Random luck

Application code Application code is always important

Regardless of the language!

BufferedReader or ZFile? Classic “it depends” BufferedReader seems like it should be faster But they provide different results: byte array vs.

string What you want to do with the result may impact

which is best for any given situation

Java has lots of similar but slightly different ways of doing things

Heap settings Heap settings always seen as an

issue Size is the usual suggestion

Is bigger always better? Does anybody know how much heap

they really need? (no) Min / Max sizes same or different? Garbage collection policy options

Memory is an issue

Java’s memory usage can be an issue “Requirements” for 100s of MBs are

not unusual Often “requirements” seem to be a

SWAG Java heap size can’t be reliably predicted

from the code & expected volumetrics Test with reasonable numbers before

assuming the requirements are real Be sure to get all processing scenarios!

Garbage Collection Options (IBM Java 6) optthruput – default

Probably best for batch gencon – generational / concurrent

maybe good for large heap, transactional workloads (WAS)

optavgpause – reduces long pauses subpool – “improved” object allocation

For important workloads, may want to test all of them at various size

Lots of other heap/gc options too See IBM JDK Diagnostics Guide!

Heap size impact - Workload 1

0

5

10

15

20

25

30

35

40

45

Run 1 Run 2 Run 3 Run 4 Run 5

zAA

Pn

se

co

nd

s

32MB 64MB 128MB 256MB 512MB

For some workloads, heap size may not matter

Heap size impact - Workload 2

0

50

100

150

200

250

300

350

Run 1 Run 2 Run 3 Run 4 Run 5

zAA

Pn

se

co

nd

s

32MB 64MB 128MB 256MB 512MB

Too small of a heap cancause CPU increase

Variable vs. Fixed Heap size

0

50

100

150

200

250

300

350

WL1 32MB WL1 32-128MB WL1 128MB WL2 32MB WL2 32-128MB WL2 128MB

zAA

Pn

Sec

onds

Run 1 Run 2 Run 3 Run 4 Run 5

There might be a slightbenefit to a fixed

heap size

Heap size most important,but GC Policy alsocan be significant

GC Policy Comparison, Workload 2

0

100

200

300

400

500

600

700

800

Run 1 Run 2 Run 3 Run 4 Run 5

zAA

Pn

Sec

on

ds

optthruput 128MB optavgpause 128MB subpool 128MB gencon 128MB

optthruput 32MB optavgpause 32MB subpool 32MB

Runtime options

0

20

40

60

80

100

120

140

Run 1 Run 2 Run 3 Run 4 Run 5

zAA

Pn

Se

co

nd

s

Baseline jit:count=0 quickstart

Don’t messwith the JIT!

Quickstart with trivial workload

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Run 1 Run 2 Run 3 Run 4 Run 5

zAA

Pn

se

co

nd

s

baseline quickstart Could be goodfor certainworkloads

So what’s the random thing? Much more variation in CPU time

measurements with today’s CPUs Superscalar pipeline and cache issues

Seems to impact my Java work more than I expected Consistently ran same workload Extremely lightly utilized LPAR Lightly utilized zAAPs Same variability over time

So I tried some more tests…

Java Workload Variability

0

20

40

60

80

100

120

140

160

180

200

17M

AY11

:07:

45:0

0

17M

AY11

:10:

00:0

0

17M

AY11

:12:

00:0

0

17M

AY11

:14:

00:0

0

17M

AY11

:16:

00:0

0

17M

AY11

:18:

00:0

0

17M

AY11

:20:

00:0

0

17M

AY11

:22:

00:0

0

18M

AY11

:04:

00:0

0

18M

AY11

:06:

00:0

0

18M

AY11

:08:

00:0

0

18M

AY11

:10:

00:0

0

18M

AY11

:12:

00:0

0

18M

AY11

:15:

15:0

0

18M

AY11

:17:

15:0

0

18M

AY11

:19:

15:0

0

18M

AY11

:21:

15:0

0

18M

AY11

:23:

15:0

0

19M

AY11

:01:

15:0

0

19M

AY11

:03:

15:0

0

19M

AY11

:05:

15:0

0

19M

AY11

:08:

15:0

0

19M

AY11

:10:

15:0

0

19M

AY11

:12:

15:0

0

19M

AY11

:14:

15:0

0

19M

AY11

:16:

15:0

0

19M

AY11

:18:

15:0

0

19M

AY11

:20:

15:0

0

19M

AY11

:22:

15:0

0

20M

AY11

:00:

45:0

0

20M

AY11

:02:

45:0

0

20M

AY11

:04:

45:0

0

CPU

sec

onds

(zA

APn

+ G

CP)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

CPU

Sec

onds

for t

rivia

l wor

kloa

d

Workload1, 32MB Workload1, 512MB Workload1, REXX Workload2, 128MB Workload2, 512MB Trivial, 32MB

One zAAP Zero zAAPsTwo zAAPs

Why is this?

I don’t know, but best guess is CPU cache and memory access effects

But I thought I’d look at the 113 records to see if I could find anything interesting….

Processor Speed

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 2

Proc 0 = GCPProc 2 = zAAP

Data fromTest period 1(One zAAP)

Executed Instruction Rate

0

50

100

150

200

250

300

350

400

0 2

Seems to confirmour SMF30 data

Proc 0 = GCPProc 2 = zAAP

Level 1 Miss Percentage

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 2

Proc 0 = GCPProc 2 = zAAP

Percent sourced from L1.5 Cache

0

10

20

30

40

50

60

70

80

90

100

0 2

L1.5 Improvementcorresponds to dipin machine usage

Proc 0 = GCPProc 2 = zAAP

Percent TLB Miss of Total CPU

0

5

10

15

20

25

30

35

40

45

50

0 2

Dip in GCP TLB Missoverhead due to

machine less busyProc 0 = GCPProc 2 = zAAP

Estimated Cycles Per Instruction

0

1

2

3

4

5

6

7

8

9

10

0 - Sum of ESTIMATEDINSTRUCTIONCOMPLEXITYCPI(ESTICCPI)

0 - Sum of ESTIMATEDCPI FROMFINITECACHE/MEM(ESTFINCP)

2 - Sum of ESTIMATEDINSTRUCTIONCOMPLEXITYCPI(ESTICCPI)

2 - Sum of ESTIMATEDCPI FROMFINITECACHE/MEM(ESTFINCP)

Proc 0 = GCPProc 2 = zAAP

My Guesses… My test Java workloads were too cache and

superscalar friendly Perhaps makes it more susceptible to pipeline

hazards But:

Wouldn’t the REXX workload be even more superscalar and cache friendly?

Why were the 113 measurements so consistent?

Or Java is really doing variable amounts of work?

Or… something isn’t right someplace? Take away: Java CPU measurements might be

more variable than you expect

Most recent testing Repeated testing later in the year

z/OS 1.12 vs. 1.10 1 Year more recent Java 6 (Fall 2010 vs. Fall 2009)

Still saw variability, but worst of it was closer to 25-30% instead of upwards of 75%

Saw similar variability when testing on a z9 with zAAPs

Saw at least one instance in a production LPAR with similar variability: (in 3 executions of the same job, 1st consumed just over half as much CPU of the later runs)

Could not readily replicate on a WSC system running under z/VM

Summary

Java enables all sorts of cool things you might not have thought could run on the mainframe

Mainframe’s Java performance not significantly worse than any other platform (Assuming adequate zAAP capacity)

Lots of tuning knobs for Java Java CPU time measurements might

be more variable

top related