just another bughunt
DESCRIPTION
It's not the bugs you know that kill a website. It's the ones you can't see, lurking just out of sight, that get you. Learn how Lafayette College identified the Lovecraftian code horrors lurking beneath its feet with tools like Splunk (server log analysis), OSSEC (server-side bad behavior monitor) and SiteImprove (web page auditing tool) and then surgically eliminated the problems. Examples include PHP scripts spewing error notices into logs, undiscovered CAS authentication failures, and thumbnail generation scripts that choke on large files.TRANSCRIPT
Just another bughunt?
Tools to improve your site without nuking it from orbit#DPA11Ken Newquist (@knewquist) | Charles Fulton (@mackensen)
Who we are
Ken NewquistDirector, Web Applications Development Lafayette College
Charles FultonSenior Web Applications DeveloperLafayette College
#DPA11
Rebuild or Fix?
● Your website’s problems may seem intractable
● The temptation to nuke the bugs and start fresh is strong
● We’ve found tools that identify the problems so we can surgically eliminate them○ (and find a few issues we didn’t know about in the
process)#DPA11
#DPA11Tools
● Crawls web presence● Reports broken links and common
misspellings● Shows changes over time● Pretty graphs!
Siteimprove
#DPA11
Pretty graph!#DPA11
Splunk
● Log aggregation● Real-time monitoring● Rich analysis● More pretty graphs!
#DPA11
Another pretty graph!#DPA11
Nagios
● Real-time monitoring● Defines a base-line of system performance● Does not detect presence of dinosaurs
#DPA11
Dinosaurs! #DPA11
OSSEC
● Log-based intrusion detection system● Define states of acceptable behavior● No pretty graphs
#DPA11
Not a pretty graph :/#DPA11
● Define expected behavior with OSSEC & Nagios
● Test expectations with Siteimprove & Splunk
● Here be monsters
Discovering your web presence
#DPA11
Investigations #DPA11
The Lost Thumbnails
● Site: Moodle● Tools: Splunk, OSSEC● Outcome: Improved
Apache configuration
#DPA11
Sky falling!
● Splunk reported ~400 500 internal server errors within a few minutes
● Also showed concentrated bursts of 404 errors when viewing resources
● Concern within department that sky was falling
#DPA11
Sky not falling!
● System ran out of memory generating thumbnails from massive images; threw 500s
● Preview of missing images generated the 404s
#DPA11
Outcomes
● Memory limits were not reasonable● Users do not report catastrophic errors
#DPA11
Comments
● Site: WordPress● Tools: Splunk, OSSEC● Outcome: WordPress
core fixes
#DPA11
What Lies Beneath
● 500 errors are reserved for server issues● WordPress has notions of its own
○ Double-submitted comment? 500 error○ Missing a required field? 500 error○ Blank comment? 500 error
● OSSEC would ban all of these for bad behavior
#DPA11
https://github.com/bigcompany/know-your-http#DPA11
Outcomes
● Learned reasonable mistakes can yield unreasonable error codes
● Hacked core to return 200s and 400s instead
● Core is discussing what to do○ https://core.trac.wordpress.org/ticket/11286
#DPA11
Revenge of the Base Theme
● Site: WordPress● Tools: Siteimprove● Outcome: WordPress
theme fix; Apache configuration change
#DPA11
March 10: the day the links broke#DPA11
Nothing to see here … oh wait--
● Developer dismissed initial reports of login issues as user error
● Then Siteimprove said we had 1,800 new broken links
● A two-character change in RHEL defaults for httpd.conf broke WordPress
#DPA11
Lessons
● Small changes have vast consequences● Documentation is doubleplusgood
#DPA11
The Incredible Shrinking Provost
● Site: Drupal● Tools: Splunk● Outcome: Cleaned data in
ERP system
#DPA11
Who’s the fairest of them all?
● The directory passes the search query via a GET parameter
● Splunk told us our associate provost, “Jane Doe”, was most-searched by an order of magnitude
#DPA11
...we searched for “Jane Doe”...
...and the search returned...
...NOTHING!
#DPA11
Lessons
● “Jane A. B. Doe !== Jane Doe”● Data lies
#DPA11
Dumpster fire#DPA11
The Virtual Tour
● Site: Custom app● Tools: Splunk● Outcome: Fixed PHP
bugs
#DPA11
Pretty graphs!● 238,908 errors...in three days● (We didn’t expect that)
#DPA11
Fixed it!
#DPA11
Outcomes
● No one cares that we fixed the Virtual Tour ○ (we feel better though)
#DPA11
Mr. Foo and Mr. Bar
● Site: WordPress● Tools: Splunk● Outcome: Disproved long-
standing alleged bug
#DPA11
I swear I wasn’t there!
● Various reports over the years alleging that WordPress improperly reported another user was editing a post
● Much speculation and theorizing in absence of facts
#DPA11
Outcomes
● People are wrong on the Internet
#DPA11
The Cache That Wouldn’t Die● Site: WordPress● Tools: Nagios● Outcome: Database
size reduced by two-thirds
#DPA11
Doom at 11….
● Nagios had concerns
● MySQL ran out of disk space
● Size of WordPress DB tripled in two weeks
#DPA11
Pretty terminal dumps?
SELECT option_name FROM wp_190_options WHERE option_name LIKE "displayed_gallery%";...| displayed_gallery_rendering_ffffb5e48845fbb7b3347244f8aa06d4 || displayed_gallery_rendering_ffffd6d9f2ab40195295c70f775b0ee8 || displayed_gallery_rendering_ffffe1416b8d969e25ec7a6094282bbe || displayed_gallery_rendering_ffffe8e4a0c399605f434bd51be2d9d7 |+--------------------------------------------------------------+722141 rows in set (2.28 sec)
#DPA11
…Salvation at Noon
● The Google Mini found something terrible lurking in club websites
● NextGEN Gallery bug caused near-endless crawl by the mini
● Code bug meant the cache never expired
#DPA11
Outcomes
● NextGEN Gallery has stability issues● Listen to Nagios● It’s turtles all the way down
#DPA11
Attack of the Python Script● Site: WordPress● Tools: Nagios, Splunk● Outcome: Quickly
identified source of massive load event
#DPA11
Traffic Jam!
● Load on a server spiked at 800%
● Seemed bad● Nagios had more
concerns
#DPA11
Hello there!
● Splunk real-time monitoring revealed top client IPs
● We’re very popular with a misconfigured IIS Server in Oregon and its “Python-urllib/3.4” script
#DPA11
Outcomes
● Banned the IP on the proxy
● Began developing rate-limiting rules for OSSEC
#DPA11
Alternatives #DPA11
Bughunting on the cheap
W3C Link Checker● Reports on broken links to a specified depth● http://validator.w3.org/checklinkGoogle Webmaster Tools● Details on broken links and server errors● https://www.google.com/webmasters/tools/
#DPA11
More options● Bureau of Internet Accessibility
○ Cheaper than Siteimprove○ Broken link and accessibility reports○ http://www.boia.org
● Google Analytics○ Identify high-traffic broken pages○ http://google.com/analytics
● vim | grep○ Eyeballing your logs can’t hurt
#DPA11
Conclusions #DPA11
Did we really fix all those errors?
Or is logging broken?#DPA11
● Data are free● Bugs are hard to find● Reports are expensive● Good reports make finding bugs easy● You can improve your site without rebuilding
it from scratch● You will find more bugs than you can fix
Takeaways
#DPA11
#DPA11
Anatomy of a Redirect
● Tool: Splunk● Forthcoming from
Lafayette College● WordPress tries to be
helpful!
#DPA11
Join the discussion at https://core.trac.wordpress.org/ticket/16557!
#DPA11
Ken Newquist ● [email protected]● @knewquistCharles Fulton ● [email protected]● @mackensen
Questions?
#DPA11