baking-in transparency
DESCRIPTION
This is from an invited talk I gave at the Pittsburgh Perl Workshop a few years back. It's not often that I get a chance to talk to developers, so I thought I'd take advantage of it and yell at them a bit ;-)TRANSCRIPT
Baking-In Transparency
Saturday, October 8, 11
About Me
• Matt Simmons
• 11+ year System Administrator
• http://www.standalone-sysadmin.com
• @standaloneSA
Saturday, October 8, 11
Baking-In Transparency
Saturday, October 8, 11
The Situation
Saturday, October 8, 11
Devs make things
• Small discrete programs
• Large complex programs
• Immense interconnected software suites
Saturday, October 8, 11
Ops makes things go
• Script using small discrete programs
• Administer large complex programs
• Cluster immense interconnected software suites
Saturday, October 8, 11
There is a
direct relationship
between the software that developers write and the
software that gets implemented by operations.
Saturday, October 8, 11
The Problems
Saturday, October 8, 11
Software needs to be monitored
"When performance is measured, performance improves. When performance is measured and reported back, the rate of improvement accelerates."
--Pearson’s Law
Saturday, October 8, 11
“You can’t manage what you can’t measure”--Robert Kaplan
Why?
Saturday, October 8, 11
Software needs to be managedClearly we need to
“Management by objective works - if you know the objective. 90% of the time, you don’t.”
--Peter Drucker
Saturday, October 8, 11
Clearly we need to measure...
But what do we measure?
And what metrics do we use?
How do we obtain the measurements?
Saturday, October 8, 11
Software Engineers measure...• Programmer Productivity
• code size/efficiency
• Defect Density
• Bugs / module size
• Requirement Stability
• “feature creep”
What do we measure?
Saturday, October 8, 11
Operations measures...• Resource Utilization
• Diskspace, Bandwidth, etc
• Infrastructure Stability
• Service Uptime, MTBF, etc
• Performance
• CPU / Memory efficiency, etc
What do we measure?
Saturday, October 8, 11
What metrics do we use?
It depends.
Duh.
Saturday, October 8, 11
The metrics that Ops needs to monitor are not always easy to obtain...
Saturday, October 8, 11
...even though they’re really important
• Reliability
• Repeatability
• Root Cause Identification
Saturday, October 8, 11
...so not only is monitoring important...
Saturday, October 8, 11
Monitoring is hard.
Saturday, October 8, 11
Monitoring is hard.correctly
V
Saturday, October 8, 11
Why is monitoring hard?
• Monitoring Software Suites are complex
• Infrastructures are complex
• Processes and applications are opaque to our futile requests to determine and track internal state
Saturday, October 8, 11
Processes and applications are opaque to our futile
requests to determine and track internal state
Saturday, October 8, 11
The Solution(s)
Saturday, October 8, 11
Dev/Ops working together gives
• Team Interrelationships
• Knowledge Sharing
• Cross Training
• Tool Sharing
Saturday, October 8, 11
But more specifically...
Methods of monitoring software can be
BUILT INTO THE SOFTWARE
Saturday, October 8, 11
How things are designed now
Question:
Answer:
A well-designed program encounters an error. What happens?
It handles the error, and continues processing requests
Saturday, October 8, 11
How things are designed now
Question:
Answer:
A poorly-designed program encounters an error. What happens?
It crashes and burns
Saturday, October 8, 11
Which of those is easier to monitor?Question:
Saturday, October 8, 11
Obviously, dying to alert the monitoring system is overkill.
(pun firmly intended)
Saturday, October 8, 11
How do we make our statuses available to the monitoring system, then?
It depends on the kind of software
Saturday, October 8, 11
Remember these?
• Small discrete programs
• Large complex programs
• Immense interconnected software suites
Saturday, October 8, 11
Small Discrete Programs
• Possibly a utility
• Usually scripted or run manually
• Typically short-term run time
Saturday, October 8, 11
Small Discrete Programs: Monitoring
• Screen output
• Return codes
• Catch signals
• Great example: ping & SIGQUIT
• SIGUSR1 & SIGUSR2
Saturday, October 8, 11
Signal Handling in Perl
sub USR1_handler { drop_state_file();
}
$SIG{‘USR1’} = ‘USR1_handler’;
Saturday, October 8, 11
Large Complex Programs
• Probably a daemon or interactive program
• Long running, needs to be stable
• Subject to resource change over time
• May need to retain state across restarts
• May have a web component
Saturday, October 8, 11
Large Complex Programs:Reporting
• No screen output (except debugging)
• Logging
• SNMP Agent/Traps
• (seriously, read ‘man snmpd.conf’)
• Named Pipes (FIFO)
• State Output to DB (if appropriate)
Saturday, October 8, 11
Net-SNMP Embedded Perl
perl use Data::Dumper;
perl sub myroutine { print "got called:",Dumper(@_),"\n";
}
perl $agent->register('mylink', '.1.3.6.1.8765', \&myroutine);
Saturday, October 8, 11
Immense Interconnected Software Suites
(or Large Suites)
Saturday, October 8, 11
Large Suites
• Definitely retain state across restarts
• Probably requires centralized controller
• May use sockets to communicate
• Probably has a web component
Saturday, October 8, 11
Large Suites:Reporting
• Monitoring coordinated by the “central” node or program
• Aggregation of state
• Provide layer of abstraction from any in-suite monitoring or reporting
• Provide XML/CSV in addition to human-parsable HTML pages
Everything under “Large Programs”, plus...
Saturday, October 8, 11
What we’re really doing is IPC
So what other methods exist? Lots.
Saturday, October 8, 11
Unix IPC
• Sockets
• RPC
• Message Queues
• FIFO
• Shared Memory
• And Many More...
Saturday, October 8, 11
They shouldn’t all be used...
Saturday, October 8, 11
What is important is that you use SOMETHING
Saturday, October 8, 11
What is best?
To crush your enemies, see them driven before you, and to hear the
lamentation of their women?
Saturday, October 8, 11
What is best?
• An application that is easily and openly monitored
• A developer that considers monitoring in all phases of design and development
• A developer who writes their own monitoring checks
Saturday, October 8, 11
Do us all a favor...
When you develop software, be it scripts, utilities, programs, or suites, please please please...
Saturday, October 8, 11
Do us all a favor...
When you develop software, be it scripts, utilities, programs, or suites, please please please...
Consider how we Ops folks will manage and monitor it.
Saturday, October 8, 11
Thank you for your time.
Matt SimmonsstandaloneSA on Twitter
[email protected]://www.standalone-sysadmin.com
Baking-In Transparency
Saturday, October 8, 11