s3125 high-speed financial xml message processing system...
TRANSCRIPT
© Hitachi Solutions, Ltd. 2013. All rights reserved. 0
Tetsuya Uemura
March 19, 2013
High-speed Financial XML Message Processing System Accelerated by Massively Parallel Technologies
Hitachi Solutions, Ltd.
S3125
© Hitachi Solutions, Ltd. 2013. All rights reserved. 1
About us
September 21, 1970
15,724
Founded
Number of employees
$ 3.0 billion Net Sales
Our Solutions and Business areas
Consulting System Development Systems Operation and Maintenance
Provision of Products and Services
Task-specific Solutions
Industry-specific Solutions
Financial Affairs / Accounting ... ERP・CRM Workflow ...
Banking Services ... Public Services Government Services ...
Hitachi Solutions, Ltd. Company Name
Other Solutions
© Hitachi Solutions, Ltd. 2013. All rights reserved.
Today’s Topic
2
1. Why we should apply GPGPU to data processing?
2. GPGPU framework for business applications.
3. GPGPU can process financial XML messages more
than 100 times faster than CPU.
© Hitachi Solutions, Ltd. 2013. All rights reserved. 3
1. Why we should apply GPGPU to data processing?
2. GPGPU framework for business applications.
3. GPGPU can process financial messages more than
100 times faster than CPU.
Today’s Topic
© Hitachi Solutions, Ltd. 2013. All rights reserved.
1.1 Why we should apply GPGPU to Data Processing?
4
Elimination of a hot spot is very important in business applications such as data and transaction processing. ⇒ Server Clustering is a traditional method but not cost-
effective. ⇒ If offloading heavy processing, or hot spot on to GPU, it is a
improving performance and cost-effective solution. GPU Server Server
AP1 AP2 AP3
OS : Windows/Linux
hardware
software
Hot Spot
AP1 AP2 AP3
OS : Windows/Linux
Offload
GPGPU FW
Massively Parallel Library
CPU CPU
© Hitachi Solutions, Ltd. 2013. All rights reserved. 5
1.2 Challenges for applying GPGPU to Data Processing
In order to apply GPGPU to data Processing, we have to optimise not only inside GPGPU but also the whole system including I/O to improve performance.
PCIe Data Transfer
Server
CPU GPU
Client PC
…
Parallel Processing
TCP/IP Data Transfer
Cores
© Hitachi Solutions, Ltd. 2013. All rights reserved.
1.3 Performance Limitations
6
Data Path Bandwidth
Network 1 Gbps (GbE)
PCI Express 16 GB/s (Gen2 x16)
Main Memory 25 GB/s (DDR3 SDRAM)
VRAM 200 GB/s (inside GPU)
Bandwidth limitations
Other limitations System initialization cost: GPU Transaction cost: Network, PCIe, GPU
© Hitachi Solutions, Ltd. 2013. All rights reserved. 7
1. Why we should apply GPGPU to data processing?
2. GPGPU framework for business applications.
3. GPGPU can process financial messages more than
100 times faster than CPU.
Today’s Topic
© Hitachi Solutions, Ltd. 2013. All rights reserved.
2.1 Why we develop GPGPU Framework?
8
Invoke GPU from legacy or vm-based applications such as COBOL and Java.
Data Transfer between CPU and GPU
GPGPU Framework provides these I/F and flow control mechanisms.
• TCP/IP Interface between
Applications and GPU Server
• Flow Control Mechanism for
TCP/IP
• Flow Control Mechanism for PCIe
① TCP/IP Bottleneck ②PCIe Bottleneck
Solution Solution
© Hitachi Solutions, Ltd. 2013. All rights reserved.
2.2 GPGPU Framework
9
Server
CPU GPU
Client PC
Data Base
Cores Flow
Control
Flow Control
Optimize Data Size
GPGPU Framework
© Hitachi Solutions, Ltd. 2013. All rights reserved. 10
1. Why we should apply GPGPU to data processing?
2. GPGPU framework for business applications.
3. GPGPU can process financial messages more than
100 times faster than CPU.
Today’s Topic
© Hitachi Solutions, Ltd. 2013. All rights reserved. 11
3.1 Message Standards In Financial Services Industry
XML-based financial services message standard, ISO 20022, is growing in the financial industry.
Fixed-length messages, however, are widely used in banking core systems so that message conversions from XML to fixed-length messages are inevitable.
The size of XML is big, and CPU takes long time to process XML. We accelerate the conversions by the power of GPGPU massively parallel processing.
XML message <?xml version="1.0" encoding="UTF-8" ?> <Document xmlns="urn:iso:std:iso:20022:..."> <CstmrCdtTrfInitn> <GrpHdr> <MsgId>ABC/1234</MsgId> <CreDtTm>2012-09-28</CreDtTm>
Fixed-length message ABC/1234 2012-09-28
:
Data volume extends more than 10 times due to tags and indents.
<?xml version="1.0"?><Document>
<Test1>0001</Test1><Test2>0002</Test2><Test3>0003</Test3>
</Document>CPU
<?xml version="1.0"?><Document>
<Test1>0001</Test1><Test2>0002</Test2><Test3>0003</Test3>
</Document>
GPUMany GPU cores process XML in parallel so that it can accelerate the processing.
CPU processes XML sequentially so that it takes long time.
© Hitachi Solutions, Ltd. 2013. All rights reserved.
3.2 Experimental environment
12
CPU: AMD Phenom(tm) II X6 1090T Processor
GPU: GeForce GTX 580
Cores 6
Basic Clocks (GHz) 3.2
Memory (GB) 8
Cores 512
Basic Clocks (MHz) 772
Memory (GB) 1.5
B/W (GB/sec) 192.4
PCIe Gen2
© Hitachi Solutions, Ltd. 2013. All rights reserved.
3.3 Processing flow of GPGPU XML Processing
13
Multi Level flow controls, TCP/IP and PCIe, improve GPGPU XML processing performance.
Step1 Step2 Step3 Step4 Step5 Step6 Step7
Transfer XML
Transfer CSV
Initialize GPU
Transfer to GPU
Process XML
Transfer to CPU
Create CSV
Server GPGPU
Client PC
Step 4
Step 2 Step 3
Step 1
Step 5
Step 6
Flow Control Flow
Control
Step 7
Cores
Optimize Data Size
CPU
© Hitachi Solutions, Ltd. 2013. All rights reserved.
3.4 Processing Acceleration Ratio
14
0
20
40
60
80
100
120
140
Processing Acceleration Ratio (data size, acceleration ratio)
The bigger data size gets the better acceleration ratio. GPGPU XML processing is superior at any data size.
Not so bat at small data size. Faster than CPU. CPU cases process XML by
using Xerces2 Java Parser.
Saturated at large data size, because processing time is proportional to the size of data.
Accelerated ratio = (CPU case time / GPU case time)
© Hitachi Solutions, Ltd. 2013. All rights reserved.
3.5 Throughput
15
0
20
40
60
80
100
120
140
160
180
Throughput (data size, MB/s)
The bigger data size gets the better throughput, which is saturated due to network bandwidth.
GeForce GTX 580 has enough power to exhaust the bandwidth of GbE.
GbE bandwidth
TCP/IP bandwidth limit (loopback I/F)
© Hitachi Solutions, Ltd. 2013. All rights reserved.
3.5 Processing Time Breakdown
16
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
4KB
16KB
64KB
256KB
1MB
4MB
16MB
64MB
Processing Time Breakdown (normalized time (s/MB), data size)
Step1 Step2 Step3 Step4 Step5 Step6 Step7 Transfer XML
Transfer CSV
Initialize GPU
Transfer to GPU
Process XML
Transfer to CPU
Create CSV
The proportion of GPU initialization and GPGPU XML processing is so large that the data size should be large enough to dilute them.
GPU Initialization
GPU XML Processing
The costs of GPU initialization and processing are diluted at large data area.
© Hitachi Solutions, Ltd. 2013. All rights reserved.
Summary
17
GPGPU is a good solution for data processing.
Data size optimization and flow control are a key to get better performance in GPGPU data processing.
Total optimization is necessary to accelerate business applications using GPGPU.
Future Works
Continuous evaluations and optimizations are needed because the most efficient data size will vary with hardware evolution.
PCIe bandwidth: gen2 ⇒ gen3
Network bandwidth: GbE ⇒ 10GbE or Infiniband
Number of GPU cores: 500 ⇒ 2500
© Hitachi Solutions, Ltd. 2013. All rights reserved.
Thanks
18
Contact
4-12-7 Higashishinagawa, Shinagawa-ku, Tokyo, Japan
http://www.hitachi-solutions.com/
Tetsuya Uemura
My colleagues are waiting for you at the poster session: P0233: High-speed Financial XML Message Processing System Accelerated by Massively Parallel Technologies