digital nightmares - the biggest performance killers in your environment

25

Upload: wes-morgan

Post on 16-Jul-2015

119 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Digital Nightmares - The Biggest Performance Killers in Your Environment
Page 2: Digital Nightmares - The Biggest Performance Killers in Your Environment

ID109Digital Nightmares: The Biggest Performance Killers in Your Environment

Rob Gearhart, IBMWes Morgan, IBM (@wesmorgan1)

Page 3: Digital Nightmares - The Biggest Performance Killers in Your Environment

Why Are We Here?§ Two of IBM's senior troubleshooters§ Combined 35+ years of experience with varied customer environments§ 60% of reported problems with collaborative applications are environmental

– NOT addressed by fixes/patches– NOT addressed by IBM software configurations

§ High availability apps (Domino, Traveler, Sametime) are often the first to manifest lower-level problems– Sound familiar? “Our other apps don't have any problems”

§ Most of these performance killers are avoidable§ Performance tuning & monitoring your entire environment is key!

Page 4: Digital Nightmares - The Biggest Performance Killers in Your Environment

Disk I/O – Local Drives

§ Slow/congested disks affect EVERYTHING– Application performance– Operating system performance (e.g. paging file)

§ Optimal read/write performance – less than 15ms per transaction§ Hallmark symptom – disk queue lengths

– These queue lengths indicate the number of transactions awaiting disk service – not their size

– Every disk manufacturer agrees – queue lengths > 2.0 indicate poor performance

§ Monitor: Platform statistics (Domino), Perfmon (Windows), iostat (Linux)

Page 5: Digital Nightmares - The Biggest Performance Killers in Your Environment

Disk Shares (Client-side)§ Commonly used in virtual desktop environments

– Citrix, VMware VDI§ Large user data (e.g. Domino databases) stored on remote disk farms§ May substantially degrade file operations (especially upload/download)§ Clients may have applications installed on remote disks§ Smart idea to have multiple file servers for critical data

§ Monitor: All usual disk I/O metrics apply

Page 6: Digital Nightmares - The Biggest Performance Killers in Your Environment

Disk I/O – SAN and NAS§ Same basic standards – 15ms per transaction, disk queues <= 2.0§ Complicated by multiple applications sharing same disk chassis

– Other applications “hammering” SAN can create problems for you§ SAN: Check HBA configuration on servers and cache configuration on SAN device(s)

– Don't forget max queue depth!§ NAS: Network latency/congestion most common performance factor

§ Monitor: Platform statistics, Perfmon, iostat, network analysis§ Best way to ensure performance, examine disk latency (<15-20 ms) for avg. read & write

separately (don't pay attention to total I/O operations)

Page 7: Digital Nightmares - The Biggest Performance Killers in Your Environment

Memory Constraints

§ Memory consumption can vary widely with load– Be aware of growth in userbase, or added mobile devices

§ Pay particular attention to JVMs– Know their memory configuration– Consult tuning documentation– Can be particular concern in Websphere environments

§ Can be exacerbated by high paging rates in the OS, or strapped kernel caches

§ Monitor: Perfmon, vmstat (Committed Bytes, %Committed Bytes)

Page 8: Digital Nightmares - The Biggest Performance Killers in Your Environment

Overcommitted Virtual Hosts

§ Growing problem with rise of virtual environments§ Performance problems can be triggered by demands of OTHER VMs§ May show up as memory constraints, CPU contention, network latency§ May be paired with disk I/O constraints§ Pay attention to Websphere logs for CPU contention§ Dynamic load re-distribution (moving VM's to new host) can cause problems for HA or

near-real-time apps (and lessen your HA to boot!)

§ Monitor: Check %CPUReady in VMWare statistics (> 5% = contention)§ Can be addressed with resource pooling/prioritization

Page 9: Digital Nightmares - The Biggest Performance Killers in Your Environment

Congested Proxy Servers

§ Major contributor to end-user performance§ Frequently seen in new cloud deployments (i.e. large added baseline load)§ Tends to appear during “peak times”§ Tends to affect multiple applications (and general Internet traffic)§ Can be exacerbated with increased file upload/download traffic (e.g. Connections Files)§ Can affect extranet users (if reverse proxy servers in use)§ Can be confirmed with HTTP analysis

§ Test/monitor with: HTTPWatch, Firebug, Rational Performance Test

Page 10: Digital Nightmares - The Biggest Performance Killers in Your Environment

Proxy Server Example§ ACME - enterprise with 50,000 users§ All Internet web traffic required to go through farm of 3 proxy servers§ ACME migrated messaging to the cloud§ Massive surge in HTTP/HTTPS traffic swamped proxy servers§ User reports focused on applications – and which applications were most commonly

used?– No one thinks twice if a random website is slow...

§ Resolved by expanding proxy capacity (adding proxy servers)

Page 11: Digital Nightmares - The Biggest Performance Killers in Your Environment

Firewall & Load Balancer Timeouts

§ Often conflict with application-layer timeout settings§ Load balancer timeouts can result in arbitrarily high (re)connection rates

– Check session affinity/”stickiness” timeouts§ Create situations where neither endpoint has clear picture of connectivity§ Often indicated by “connection reset” or “connection timed out” log errors

§ Monitor/confirm via network analysis– Red flag: TCP retransmissions & RST in existing connection– Red flag: TCP RSTs appearing “out of nowhere”

§ Mitigate by ensuring that application-layer timeout is the shortest

Page 12: Digital Nightmares - The Biggest Performance Killers in Your Environment

Firewall Timeout Example

§ ACME has a significant number of extranet users§ Users complain that if Notes client had been idle for more than 30 minutes, client “freezes” for

20-30 seconds when they resume activity (check mail, send a draft email, refresh a view, etc.)§ Initial review of logs showed “server not responding” in client logs, and “connection reset” or

“connection broken” indicators in server logs§ Network traffic analysis showed connections established normally, but eventually going through

the retransmission/timeout cycle (resulting in 20+ second delay)§ When both endpoints show symptoms of retransmissions and timeouts, we suspect that an

intermediate device is interfering§ Resolved by lowering Domino Server Session timeout below Firewall's 30 minutes (allows for

orderly closure of idle sessions)

Page 13: Digital Nightmares - The Biggest Performance Killers in Your Environment

Firewall Timeout

FIREWALLIdle timeout: 30m CLIENTSERVER

Idle timeout: 60m

Start

FIREWALLIdle timeout: 30m CLIENTSERVER

Idle timeout: 60m

After 30 minutes idle, firewall SILENTLY “drops state”

X X

FIREWALLIdle timeout: 30m CLIENTSERVER

Idle timeout: 60m

The next time either side tries to use the connection

X3-5 retransmissions, then

give up with TCP RST

X

Page 14: Digital Nightmares - The Biggest Performance Killers in Your Environment

Software Firewalls (Client & Server)

§ Often installed by default– Do you know if your standard image includes one?

§ Often affect even localhost connections§ Do not usually include timeout capability§ Must be configured for specific apps/ports/port ranges§ May exhibit symptoms of “some things work, others don't”§ Usually a problem on client side

§ Recommend: disabling software firewalls on servers

Page 15: Digital Nightmares - The Biggest Performance Killers in Your Environment

Network Appliances§ Often used to improve WAN performance (e.g. Riverbed, Blue Coat)

– Includes content caching, bandwidth throttling, packet shaping– Packet shaping and bandwidth throttling may also be introduced by routers and

switches due to Quality of Service (QoS) policies§ When problems occur, can cause various performance problems:

– Email attachment problems (packet shaping)– Higher WAN loads– SYN-ACK problems, causing general problems on target App server

§ Red flags: WAN behavior different from LAN behavior, network/application diagnostics point to a network issue

§ Monitor: Network bandwidth usage, dropped packets, connection resets

Page 16: Digital Nightmares - The Biggest Performance Killers in Your Environment

Network Accelerator/Packet Shaping Example§ Admins upgrade Domino server's OS from AIX 6 to AIX 7§ Suddenly, users could not access large attachments over the WAN

– Notes Client experiences small delay, then produces error “Remote System no longer responding”§ Network capture client side shows that server acknowledges full attachment size

– After downloading first part of attachment, window size is reset– Server stops sending additional attachment data, client produces error– WAN connections routed through Blue Coat device that performed packet shaping– Turned off packet shaping, problem went away– Packet Shaping was deliberately truncating download– Network team did not see any “errors”, since no packets were dropped– AIX 6 -> AIX 7, TCP window behavior changed slightly, Blue Coat device needed to account for changes in

its application-specific policy settings

Page 17: Digital Nightmares - The Biggest Performance Killers in Your Environment

LDAP Performance/Misconfiguration/Search Filters

§ Usually shows up as authentication delays

§ May also cause “slow lookups”§ LDAP server performance may be affected by other applications§ Overly complex search filters can degrade LDAP performance

– How many different ways do users need to authenticate?§ Use of large (or nested) groups also affects performance§ Active Directory: Consult Global Catalog Server, NOT domain controllers§ Mitigate with standard (and simple) search filters§ Monitor: LDAP and host statistics§ Note: HA apps depend upon HA LDAP!

Page 18: Digital Nightmares - The Biggest Performance Killers in Your Environment

LDAP Search Filter Example

§ ACME's authentication filter(&(|(objectclass=person)(objectclass=EuroPerson))(|(uid=%s)(cn=%s)(mail=%s)(employeeid=%s)))

§ This filter demands 6 LDAP comparison per record queried§ First deployment of a worldwide application brought the LDAP infrastructure to its knees§ Resolved by simplyifing LDAP schema (removing distinction between person and

EuroPerson) and setting a standard “how you will authenticate” policy (using only email address or employee ID)

Page 19: Digital Nightmares - The Biggest Performance Killers in Your Environment

SQL Servers

§ An important component for Traveler§ Mail file and connection metadata is contained on SQL server§ Traveler must consult SQL before it knows if there are relevant changes in the mailfile to

push to the user§ Any SQL performance hiccups can dramatically affect Traveler performance§ Mobile adoption may drive you to the breaking point

§ Monitor: SQL server and host statistics, including disk latency

Page 20: Digital Nightmares - The Biggest Performance Killers in Your Environment

Third-Party Plugins/Addins/Extensions

§ Can affect both servers and clients§ Notes plugins can add menu options, Browser plugins can modify Javascript/CSS§ Server-side extensions can introduce external dependencies

– Example: archival plugin may use SAN/NAS/UNC drives§ Even if the primary external app is disabled, extension dll's will still load & execute

– e.g. Anti-Virus, Mail Signature plugins§ Don't forget authentication addins

– May contribute to overall LDAP load§ Monitor: May be specific to plugin/extension, application stats will apply

Page 21: Digital Nightmares - The Biggest Performance Killers in Your Environment

Third Party Addin Example

§ ACME users intermittently report a 2-5 second delay when Notes Client sends email– No apparent network delays– Diagnostics show creation of Note in mail.box requires 3-5 seconds– Additional debug shows that Ext Mgr Plugin is invoked, takes 95% of the time

§ ACME uses a third party mail signature app that adds a pre-existing signature– Even though they disable the third party task, the delay still occurs– Domino still loads any referenced DLL's, depending on how written, may introduce delays -

cannot assume that these DLLs are 'lite'§ Resolved by removing the extmgr DLL from notes.ini (Third Party Vendor required to investigate)

Page 22: Digital Nightmares - The Biggest Performance Killers in Your Environment

In Summary

§ Very few of these problems come “out of nowhere”§ They're often the consequence of growth§ Get an idea NOW of what “normal”/”good” looks like!§ You may be suffering from one (or more) of these problems right now§ Engage your server, network and/or security teams NOW to avoid problems with

new or expanded deployments

Page 23: Digital Nightmares - The Biggest Performance Killers in Your Environment

Questions?

§ Where's YOUR pain point?

THANKS FOR BEING HERE!

[email protected] [email protected]@wesmorgan1

Page 24: Digital Nightmares - The Biggest Performance Killers in Your Environment

Engage Online

§ SocialBiz User Group socialbizug.org– Join the epicenter of Notes and Collaboration user groups

§ Social Business Insights blog ibm.com/blogs/socialbusiness– Read and engage with our bloggers

§ Follow us on Twitter– @IBMConnect and @IBMSocialBiz

§ LinkedIn http://bit.ly/SBComm– Participate in the IBM Social Business group on LinkedIn

§ Facebook https://www.facebook.com/IBMConnected– Like IBM Social Business on Facebook

Page 25: Digital Nightmares - The Biggest Performance Killers in Your Environment

Notices and DisclaimersCopyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.

Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.

Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.

Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.

References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.

Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.

It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.

IBM, the IBM logo, ibm.com, BrassRing®, Connections™, Domino®, Global Business Services®, Global Technology Services®, SmartCloud®, Social Business®, Kenexa®, Notes®, PartnerWorld®, Prove It!®, PureSystems®, Sametime®, Verse™, Watson™, WebSphere®, Worklight®, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.