HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data

Download HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data

Post on 13-Jul-2015




1 download

Embed Size (px)


  • Youve got HBaseHow AOL Mail handles Big DataMay 22, 2012Presented at HBaseCon

    Presented atHBaseCon 2012

  • Presented atHBaseCon 2012Page *The AOL Mail SystemOver 15 years oldConstantly evolving10,000+ hosts70+ Million mailboxes50+ Billion emailsA technology stack that runs the gamut

    Presented atHBaseCon 2012

  • Presented atHBaseCon 2012Page *What that meansLots of dataLots of moving partsTight SLAsMature system + Young software = Tough marriageWe dont buy commodity hardwareEngrained Dev/QA/Prod product lifecycleSomewhat version locked to tried-and-true platformsExpect service outages to be quickly mitigated by our NOC w/out waiting for an on-call

    Presented atHBaseCon 2012

  • Presented atHBaseCon 2012Page *So where does HBase fit?Its a component, not the foundationCurrently used in two placesBeing evaluated for moreIt will remain a tool in our diverse Big Data arsenal

    Presented atHBaseCon 2012

  • An Activity Profiler

    Presented atHBaseCon 2012

  • Presented atHBaseCon 2012Page *An Activity ProfilerWatches for particular behaviorsDesigned and built in 6/2010Originally vanilla Hadoop 0.20.2 + HBase 0.90.2Currently CDH31.4+ Million Events/min60x 24TB (raw) DataNodes w/ local RegionServers15x application hostsIs an internal-only toolUsed by automated anti-abuse systemsLeveraged by data analysts for adhoc queries/MapRed

    Presented atHBaseCon 2012

  • Presented atHBaseCon 2012Page *An Activity Profiler

    Presented atHBaseCon 2012

  • Presented atHBaseCon 2012Page *Why the Event Catcher layer?Has to speak the language of our existing systemsEasy to plug an HBase translator in to existing data feedsHard to modify the infrastructure to speak HBaseFlume was too young at the time

    Presented atHBaseCon 2012

  • Presented atHBaseCon 2012Page *Why batch load via MapRed?Real time is not currently a requirementAllows filtering at different pointsAllows us to trigger eventsDesigned before coprocessorsEarly data integrity issues necessitated replayingMissing append support early onHoles in the Meta tableLong splits and GC pauses caused client timeouts Can sample data into a sandbox for job developmentMakes pig, hive, and other MapRed easy and stableWe keep the raw data around as well

    Presented atHBaseCon 2012

  • Presented atHBaseCon 2012Page *HBase and MapRed can live in harmonyBigger than average hardware36+GB RAM8+ coresProper system tuning is essentialGood information on tuning Hadoop is prolific, butXFS > EXTJBOD > RAIDAs far as HBase is concernedJust go buy Lars bookCareful job development, optimization is key!

    Presented atHBaseCon 2012

  • Contact History API

    Presented atHBaseCon 2012

  • Presented atHBaseCon 2012Page *Contact History API Services a member-facing APIDesigned and built in 10/2010Modeled after the previous applicationBuilt by a different Engineering teamUsed to solve a very different problem250K+ Inserts/min3+ Million Inserts/min during MapRed20x 24TB (raw) DataNodes w/ local RegionServers14x application hostsLeverages Memcached to reduce query load on HBase

    Presented atHBaseCon 2012

  • Presented atHBaseCon 2012Page *Contact History API

    Presented atHBaseCon 2012

  • Where we go from here

    Presented atHBaseCon 2012

  • Presented atHBaseCon 2012Page *Amusing mistakes to learn fromExploding regionsBatch inserts via MapRed result in fast, symmetrical key space growthAttempting to split every region at the same time is a bad ideaTurning off region splitting and using a custom rolling region splitter is a good ideaTake time and load into consideration when selecting regions to splitBackups, backups, backups!You can never have to manyLarge, non-splitable regions tell you thingsOur key space maps to accountsExcessively large keys equal excessively active accounts

    Presented atHBaseCon 2012

  • Presented atHBaseCon 2012Page *Next-generation model

    Presented atHBaseCon 2012

  • Presented atHBaseCon 2012Page *Thanks!

    Presented atHBaseCon 2012

    *Introduce myself: I am Chris Niemira, a Systems Administrator with AOL.

    I run a number of Hadoop and HBase clusters, along with numerous other components of the AOL Mail system. I spend my days doing work that ranges from system patches, code installs and troubleshooting, to capacity planning, performance and bottleneck analysis, and kernel tuning. I do a little engineering, a little design work, an on-call rotation, and every once in a while I get to play with Hadoop/HBase. *The AOL Mail System has been around for a long time, and went through a major re-architecture between 2010 2011. Its not a 15 year old code base, and we evolve it constantly.

    We service over 70 million mailboxes in the AOL Mail environment today. That includes supporting our paying members, in addition to free accounts. Of course, member experience is our #1 priority.

    We have all kinds of tools in our proverbial utility belt, as we believe in trying to use the right thing for the right job. *It means were reasonably large. But weve also been operating at scale for a long time now. While we have been doing Big Data for a lot of years now, we got to our current size by operating a certain way: Rigid quality and change controls, lots of documentation, emphasis on uptime.

    As we have shifted toward being more agile, we have had to be careful with unproven technologies. HBase, for all the buzz, is still pretty young and error-prone.

    Some of the realities for dealing with a production Hadoop/HBase system would seemingly require a departure from our traditional mentality. Like everyone, we require stability and robustness of our production applications, but our way of getting there has had to change.

    Above all, however, we must still take care of our customers, so its a balancing act for us.*So HBase is one of the tools weve added to kit in the last few years thats still proving itself. Weve got two applications running and weve identified a few other places where its a good candidate to utilize.

    This isnt to say that we are not using it for important things, but its not at the core of our system.

    Weve managed to build a relatively stable platform over time. Theres a lot of scripted recovery, and a lot of proactive monitoring in our environment, and for the most part when there are problems, they are mitigated or resolved without even the involvement of an admin. **AOL Mail first stared looking into Hadoop and HBase back in mid 2010. Other business units in our company had been working with Hadoop for a while before then, and a little of intra-company consulting convinced us to give HBase a try.

    This system is one component our our anti-abuse strategy. I cant reveal exactly what it does, but I can tell you a bit about how the HBase stuff happens.

    In addition to the 60 node cluster and the application servers theres the ancillary junk which includes NameNodes (2x), HMasters (2x), Zookeepers (3x).

    The app hosts and Zookeepers, which are currently physicals, are being switched to virtual devices in our internal cloud.*This is what the application looks like.

    The Service Layer comprises various components within the AOL Mail system. They speak their own protocols and send messages to an Event Catcher which decodes the stream, and writes a log to local disk.

    That log is imported in Hadoop (and can optionally be sampled to a development sandbox at the same time) and then further cooked via MapRed which ultimately outputs rows into HBase, and can send triggers to external applications. One thing we can do at this point (not illustrated) is populate a memcache which may be used by client apps to reduce load on some HBase queries.**The real answer is that when we first started, we couldnt make streaming a million and a half rows a minute work out with the Hbase we had two years ago. At the time, it was easier for us to build the batch loader, which has proven to have a few interesting advantages.

    Our next-generation model will rely on HBase itself being more stable, and will heavily leverage coprocessors to do a lot of what were doing now with MapReduce.*A big obstacle for us is getting MapReduce and HBase to play nicely together.

    From what Ive seen, bigger hardware is starting to become more popular for running HBase, and we believe its essential. Weve floated between an 8 16 GB heap for the RegionServer. For this application, I believe were currently using 16.

    Getting GC tuning and the IPC timeouts in HBase/Zookeeper correct are critically important.

    System tuning is also very important. Depending on which flavor of Linux youre running, the stock configuration may be completely inappropriate for the needs of an HBase/Hadoop complex. In particular, look at the kenels IO scheduler, and VM settings.**This application was built a short while after we started our trial-by-fire with HBase on the previous application. It was a different development team with input from the engineers working on the previously discussed application.

    This application has the same event catcher layer for the same reasons, but it has always written directly to HBase. We import data into a raw table and then process that table with MapReduce writing the output into a cooked table. Theres a much lower number of events here, but it spikes up significantly during the MapReduce phase.

    Its exactly the same class of hardware with the same ancillary junk as the previous app.

    Most of the query load is actually farmed out of memcache. *Yes, this is a relatively straight-forward design.**Exploding tables might be a better name for this, since its an across-the-board sort of thing.

    Backups, of course, are obvious. Weve run into three catastrophic data loss events, actually once each with three different clusters. The first was during a burn-in phase for the Contact History application I described earlier. At that time the data it had accumulated over the week or so that it had been running wasnt considered important essential so we were able to truncate and move along.

    Another time, for a separate plain Hadoop cluster, an unintentionally malicious user actually managed to delete my backups and corrupt the namenodes event log. Luckily that data was restorable from another source.

    The last time was with the Activity Profiler application. Basically, having data backups saved the day.*This is our working model for a next-generation HBase system

    It is currently being prototyped with the cooperation of our Engineering and Operations teams

    The key design concept is to allow for a great deal of flexibility and re-use, and it centers around this idea of installing a fairly dynamic rules-engine at both the event collection and event storage layers.

    Hopefully will be presenting it soon*