ia6.5 performance tuning
DESCRIPTION
Captiva EMC peformance tuningTRANSCRIPT
![Page 1: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/1.jpg)
TAKING ADVANTAGE OF THE EMC CAPTIVA ARCHITECTURE
Applying Best Practices to Optimize
1© Copyright 2011 EMC Corporation. All rights reserved.
Applying Best Practices to Optimize Performance
Christopher Lund
EMC
![Page 2: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/2.jpg)
Agenda
• How InputAccel works
• InputAccel 6.5 Benchmark Results
• Tuning– InputAccel Server
– Batches
– InputAccel Database
2© Copyright 2011 EMC Corporation. All rights reserved.
– InputAccel Database
– Client Modules
– Capture Workflow
• Diagnosing Performance Issues
The EMC® Captiva® InputAccel® and Dispatcher™ Version 6.5 Performance Sizing and Tuning Guide – which is available on PowerLink – provided much of the data for this presentation.
![Page 3: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/3.jpg)
A multi-machine capture application server
• Server is the data tier (memory mapped)
• Server manages task queues
• Server is multi-threaded…
• VBA execution is single-threaded
How InputAccel Works
InputAccel System
Export Modules
Export Modules
Processing Modules
Processing Modules
Capture Modules
Capture Modules
Capture Modules
Export Modules
Processing Modules
3© Copyright 2011 EMC Corporation. All rights reserved.
•
• DB writes are queued, but single-threaded
• Server uses asynchronous I/O
• Most work done from a thread pool
• Clients are the executing tier (where scaling comes from)
InputAccelServers
WIP&Reports Tables
![Page 4: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/4.jpg)
How InputAccel WorksAn execution pipeline
• Task queues are not FIFO
• Tasks are scheduled based on Priority, then creation date
• Recovery is through reprocessing
Execution Pipeline
B2 B3 B4
B3 B4
B4t1
t2
t3
m1 m2 m3 m4
4© Copyright 2011 EMC Corporation. All rights reserved.
reprocessing
• Not a repository
• Assume short duration tasks
• Work is pushed, no polling
• Tasks may be prefetched
B1
B1
B2
B2
B3
B3
B4
A4
B1 B2A4
B1A3 A4A2
A3 A4A2A1
B2 B3 B4
A3A2A1
t3
t5
t4
t7
t6
t8
t9
![Page 5: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/5.jpg)
How InputAccel WorksLike a Petri net
• A ProcessFlow defines the steps and trigger levels.
• Implicit fire when data is available
• There is no predefined execution
Petri Net
AA
5© Copyright 2011 EMC Corporation. All rights reserved.
• There is no predefined execution order
• There is no end state – IADonetriggered implies completed
D
CB CB
![Page 6: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/6.jpg)
InputAccel 6.5 Benchmark Results• One 8-core IA Server, over 300 Client Modules
• In-house, ideal conditions, your mileage will vary
• Performance similar to IA 6.0 SP1
Performance Level Overall Task Processing Rate(tasks/hour)
Processing Rate/CPU Core(tasks/hour)
Avg. CPU Utilization/CPU Core
6© Copyright 2011 EMC Corporation. All rights reserved.
(tasks/hour) (tasks/hour) PU Core
50 active batches w/reporting disabled
2,672,007 324,001 67%
1000 active batches w/reporting disabled
1,990,892 248,862 53%
1000 active batches w/reporting enabled
1,384,910 173,114 32%
VMware ESX 4.x degrades throughput by approx 27%
![Page 7: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/7.jpg)
Tuning – InputAccel Server…
• It is a server application. It will use all available resources
• Recommendation: use Windows 2008 R2 for best performance
• CPU– ias.exe is multi-threaded – Recommendation: use at least a 4-core CPU for optimum throughput, 8+ is
better
• RAM
7© Copyright 2011 EMC Corporation. All rights reserved.
• RAM – Recommendation: 4-8 GB RAM, no less than 4 GB– ias.exe is a 32-bit app, so 4 GB address space max– BatchMaxAddressSpaceK controls how much RAM IAS uses
• 1.5 GB is the default• Set to 2 to 2.5 GB on 32-bit Windows with /3G option• Set to 3 to 3.5 GB on 64-bit Windows
– Only as many batches as will fit in BatchMaxAddressSpaceK are kept in RAM, when there are more, swapping occurs• Have your working set of batches small enough to all fit in RAM• Delete batches when you are done with them
![Page 8: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/8.jpg)
…Tuning – InputAccel Server
• Disk
– Used heavily for batches and processes
– RAID 1+0 is best – RAID 5 usually is not fast enough and is not recommended• Recommendation: use a caching controller with Read Ahead and Write Back
– SAN is OK, NAS not recommended
– Turn off anti-virus scanning on IAS folder
– IAS folder should be on a dedicated disk drive or array so it is not shared by other programs or the Windows swap file
8© Copyright 2011 EMC Corporation. All rights reserved.
programs or the Windows swap file
– #2 cause of slow performance: a slow hard drive
• Network
– InputAccel has a “chatty” protocol between client modules and the InputAccel Server
– For best performance, client modules, the InputAccel Server, Administration Console, and DB should be on the same sub-net
– WANs usually have low bandwidth with high latency, which may make it unsuitable • Connecting client modules to InputAccel Server via a WAN is doable with adequate performance – depends on
the client modules, # batches on IA server, and IPP
• InputAccel Server and IADB should not be connected by a WAN
![Page 9: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/9.jpg)
Tuning – Batches…
• Batch size (IAB file)– 5 – 20 MB is ideal
– < 100 MB is OK
– > 100 MB not recommended, but is allowed
– 10 – 100 pages per batch recommended
9© Copyright 2011 EMC Corporation. All rights reserved.
– 1000 pages per batch degrades throughput by 10% or more
– < 10 pages per batch leads to too many small batches and too much batch swapping
![Page 10: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/10.jpg)
…Tuning – Batches
• Number of Batches– Throughput is best when the working set of batches fit in RAM
• Typically 50 – 500 fit depending on batch size and BatchMaxAddressSpaceK
– When there are more batches than fit in RAM, IA Server swaps batches in/out as needed• Swapping decreases IA Server throughput
– Up to 9,000 idle batches is possible with adequate performance
10© Copyright 2011 EMC Corporation. All rights reserved.
– Up to 9,000 idle batches is possible with adequate performance• Idle means that all tasks within the batch have been processed and finished
– IA Server startup is slower with 1000’s of batches because IAS must load all batches into RAM to extract data• After startup they are swapped to disk and do not consume too many resources
– #1 cause of slow performance: too many active batches causing excessive swapping
![Page 11: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/11.jpg)
Tuning – InputAccel DatabaseSystem Requirements
• InputAccel Server requires a database to run
• Only MSSQL is supported, versions 2005 and 2008– Express supported only for low volume and no IA Reporting
– Standard or Enterprise recommended for medium to high volume
11© Copyright 2011 EMC Corporation. All rights reserved.
• IADB stores:– IA configuration data
– Reporting data on completed batches/tasks
– Work In Progress (WIP) status
– Web services data
![Page 12: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/12.jpg)
Tuning – InputAccel DatabaseSystem Requirements
• Recommendation: 64-bit MSSQL server for best performance– 2-4 CPU cores
– 4-8 GB RAM
– RAID 1+0 with read/write caching controller w/ fast disks (15k RPM)
• InputAccel Server requires fast, uninterrupted access to the DB
12© Copyright 2011 EMC Corporation. All rights reserved.
• InputAccel Server requires fast, uninterrupted access to the DB– If the DB goes offline, IA Server pauses
• Recommendation: put the DB and InputAccel Server on the same low-latency, high-bandwidth subnet
• Reporting decreases the InputAccel Server throughput by about 10-30%, although it can be more
![Page 13: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/13.jpg)
Tuning – InputAccel DatabaseData Volume and Rates
• IADB File Size and Growth Rates
– Configuration data - typically < 100 MB
– WIP data - 1 MB × # batches• WIP is transient and grows and shrinks as needed
– Error/Warning Log data – typically negligible and can be purged as needed
– Reporting data
13© Copyright 2011 EMC Corporation. All rights reserved.
– Reporting data • All reporting log rules off – 0 MB• Some or all reporting log rules are on
� Typically data grows 100 MB – 3 GB per hour� Shrinks only when purged� But…
» Growth rate depends on page volume and which log rules are on» Overall size depends on # days of data retained
– Audit Log data
• All audit log rules off – 0 MB
• All audit logs rules on – 2× the growth rate of Reporting data
![Page 14: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/14.jpg)
Tuning – InputAccel DatabaseData Volume and Rates
• InputAccel Database transaction rates
Log Rules EnabledLog Rules EnabledLog Rules EnabledLog Rules Enabled Estimated Transaction RateEstimated Transaction RateEstimated Transaction RateEstimated Transaction Rate
None IAS Tasks / Hour × 0.075
Reports only (no Audit) IAS Tasks / Hour × 2.5
14© Copyright 2011 EMC Corporation. All rights reserved.
Reports only (no Audit) IAS Tasks / Hour × 2.5
Audit only (no Reports) IAS Tasks / Hour × 5.0
Reports + Audit IAS Tasks / Hour × 7.5
![Page 15: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/15.jpg)
Tuning – InputAccel DatabaseImproving Performance
• Defragment and rebuild indexes– up_ReorganizeIndex – defragments all indexes
– up_RebuildIndex – rebuilds all indexes
• Purge reporting and auditing data– Reporting and Auditing tables grow continuously
15© Copyright 2011 EMC Corporation. All rights reserved.
– Reporting and Auditing tables grow continuously
– You must schedule purges via the Admin Console
• Recommendation: generate reports during non-peak hours– Generating Reports runs complex queries that place a heavy load on
IADB and MSSQL
• Store MSSQL transaction logs and data files on separate hard drives/controllers
![Page 16: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/16.jpg)
Tuning – Client Modules
• Each client module has its own unique tuning characteristics– Example: val2xml slows as more IA Values are exported and is slower when triggered at
level 1 than level 7
– See “EMC® Captiva® InputAccel® and Dispatcher™ Version 6.5 Performance Sizing and Tuning Guide” for details
• Parameters on the client machines can be modified to optimize performance– Stored in settings.ini located in %ALLUSERSPROFILE%\EMC\InputAccel
16© Copyright 2011 EMC Corporation. All rights reserved.
– Stored in settings.ini located in %ALLUSERSPROFILE%\EMC\InputAccel
– PrefetchDefault (default = 2) – the number of additional tasks the InputAccel Server sends to each client module
– FileCacheSize (default = 8)
– CacheSize (default = 1,048,576)
– CacheCount (default = 200,000), previously 20,000 – the number of files and IA Values the client module caches
– IAClientDebug (default = 0) – set to 1 to capture the debug log iaclient.log in %ALLUSERSPROFILE%\EMC\InputAccel (previously was created in C:\)
![Page 17: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/17.jpg)
Tuning – Client Modules
• CPU intensive client modules like NuanceOCR & ImageEnhancement– Require fast CPUs
– Run one instance for each CPU core.
• e.g. 4 CPU cores, run 4 instances of NuanceOCR
• Non-CPU intensive client modules like val2xml & Documentum
17© Copyright 2011 EMC Corporation. All rights reserved.
• Non-CPU intensive client modules like val2xml & Documentum Export– Performance is limited by other resources (disk, network)
– Run at least one instance for each CPU core and possibly more
![Page 18: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/18.jpg)
Tuning – Client ModulesUsing InputAccel over a WAN…
• InputAccel Server, Database and Administration Console– Require high-speed, low-latency connections
– Must be on the same LAN as each other
• Unattended client modules – Often require high-speed, low-latency connections for best throughput
18© Copyright 2011 EMC Corporation. All rights reserved.
– Generally do not need to be remote from the InputAccel Server
– Should be on the same LAN as the InputAccel Server
• Attended client modules– Performance varies by environment and module
– Detailed guidance is in the EMC® Captiva® InputAccel® and Dispatcher™ Version 6.5 Performance Sizing and Tuning Guide
– Recommendation:Recommendation:Recommendation:Recommendation: do benchmark testing to ensure adequate performance
![Page 19: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/19.jpg)
Tuning – Client Modules…Using InputAccel over a WAN
• Attended client module details– ScanPlus
• Not recommended, but may perform acceptably with scanner hardware compression or small scanned images
• Works best when bandwidth is ≥50 Mbps and round trip latency ≤25 ms
– IndexPlus• Generally performs well over a WAN (except for thumbnail display) • Displaying the batch list takes about 25% longer
19© Copyright 2011 EMC Corporation. All rights reserved.
• Displaying the batch list takes about 25% longer• Works best when bandwidth is ≥1.5 Mbps and round-trip latency ≤50 ms
– Dispatcher Classification Edit and Dispatcher Validation• Should not be used over a WAN
• Recommendation:Recommendation:Recommendation:Recommendation: consider using VMware View or Citrix for remote operators– Module executes on the LAN, screen display is over the WAN– Supports remote scanning
• ScanPlus is on the LAN, the scanner is on the remote machine• Use scanner hardware compression for best results
– Maximizes InputAccel Server-to-client module throughput– EMC OnDemand uses VMware View
![Page 20: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/20.jpg)
Tuning – Capture WorkflowTrigger Levels
• Triggering at level 0 or 1 – Usually gives better throughput than level 7– The tasks within a batch can be distributed among many client modules– Which results in faster end-to-end processing of any single batch
• Triggering at level 7– Is less work for InputAccel Server as it has fewer tasks to manage
20© Copyright 2011 EMC Corporation. All rights reserved.
– Is less work for InputAccel Server as it has fewer tasks to manage– Under some circumstances may provide better overall throughput at the
expense that any single batch may take more time to process
• Unsupported: accessing external resources in IPP VBA code– VBA execution with InputAccel Server is single-threaded– The external resource may be slow or not present– If InputAccel Server needs to wait for the resource, all other tasks block– Put more complicated custom code logic on the client through the .NET Code
module or client scripting
![Page 21: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/21.jpg)
Diagnosing Performance Issues
• IA Server Performance Counter– Batches loaded and loads/second– Connections– Disk bytes written & read/second– VBA calls/second & queue length– Processing Message Count– Network bytes written & read/second– Packets send & received/second
21© Copyright 2011 EMC Corporation. All rights reserved.
– Packets send & received/second– Pending I/O (~ # of asynchronous sends in progress)– Event (db) queue length
• Data Access Layer Performance Counters– Data Requests/second– % Load Factor– Avg. Execution Time Millisec– Current connection count
![Page 22: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/22.jpg)
EMC OnDemandCaptiva Instant Cloud Implementation
22© Copyright 2011 EMC Corporation. All rights reserved.
![Page 23: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/23.jpg)
IIG Applications on EMC OnDemand
ENTERPRISECAPTURE
CONTENTMANAGEMENT
CUSTOMERCOMMUNICATIONS
CASEMANAGEMENT
DocumentumDocumentumxCPxCP
CaptivaCaptiva DocumentumDocumentumECMECM
DocumentDocumentSciencesSciences
INFORMATION
GOVERNANCE
SourceOneSourceOne
23© Copyright 2011 EMC Corporation. All rights reserved.
NetworksNetworks
StorageStorage
Virtualization and SecurityVirtualization and Security
Cloud ManagementCloud Management
![Page 24: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/24.jpg)
Summary/Key Takeaways
• Focus on system throughput, not per-task time
• Use high-speed hardware – disk drives and networks
• Minimize disk I/O where possible– Keep active batches in memory
– Avoid excessive reporting
24© Copyright 2011 EMC Corporation. All rights reserved.
– Avoid excessive reporting
• Parallelize – multi-core CPUs and task granularity
• Use performance counters to find bottlenecks
![Page 25: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/25.jpg)
Q&A
25© Copyright 2011 EMC Corporation. All rights reserved.
Chris Lund
(858) 320-1215
![Page 26: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/26.jpg)
Learn More About EMC Captiva
26© Copyright 2011 EMC Corporation. All rights reserved.
Go to: www.EMC.com/Captiva
![Page 27: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/27.jpg)
Captiva @ Momentum 2011
Thursday
10:00 AM AP Automation: Best Practices for Capturing and Integrating Paper Invoices into your Accounts Payable Processes
Galileo 705
27© Copyright 2011 EMC Corporation. All rights reserved.
![Page 28: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/28.jpg)
GET SOCIAL
28© Copyright 2011 EMC Corporation. All rights reserved.
Come to the Momentum Lounge todayCome to the Momentum Lounge todayCome to the Momentum Lounge todayCome to the Momentum Lounge todayto play, win, learn and more.to play, win, learn and more.to play, win, learn and more.to play, win, learn and more.
DOWNLOADDOWNLOADDOWNLOADDOWNLOAD
Apple: Apple: Apple: Apple: http://bit.ly/MMTM11http://bit.ly/MMTM11http://bit.ly/MMTM11http://bit.ly/MMTM11
Other: Other: Other: Other: http://bit.ly/MMTMemchttp://bit.ly/MMTMemchttp://bit.ly/MMTMemchttp://bit.ly/MMTMemc
![Page 29: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/29.jpg)
29© Copyright 2011 EMC Corporation. All rights reserved.
![Page 30: IA6.5 Performance Tuning](https://reader034.vdocuments.site/reader034/viewer/2022050819/55cf92d0550346f57b99c3f3/html5/thumbnails/30.jpg)
THANK YOU
30© Copyright 2011 EMC Corporation. All rights reserved.