hbase @ meetup

Upload: oleksiy-kovyrin

Post on 30-May-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 HBase @ Meetup

    1/21

    HBase @ MeetupGary Helmling Lead SW Engineer

  • 8/14/2019 HBase @ Meetup

    2/21

  • 8/14/2019 HBase @ Meetup

    3/21

    The Solution

    Show activity from allyour groups in oneplace

    real-time updates better discovery of

    what's going on find new ways to

    participate and get toknow your groups

  • 8/14/2019 HBase @ Meetup

    4/21

    Challenges

    Normalized schema Each type of activity requires querying a separate table

    already wasn't scaling at the group level

    Query efficiency Activity occurs at group level Members can be in hundreds of groups For member home page we need activity from all groups ordered by

    most recent

    N subqueries by group ID merged back by descending timestamp

  • 8/14/2019 HBase @ Meetup

    5/21

  • 8/14/2019 HBase @ Meetup

    6/21

    Why HBase?

    We own infrastructure, no usage limits Data model

    Semi-structured data in HBase (easily handles multiple types in sametable)

    Time-series ordered Scaling is built in (just add more servers) But extra indexing is DIY

    Very active developer community Established, mature project (in relative terms!) Matches our own toolset (java/linux based)

  • 8/14/2019 HBase @ Meetup

    7/21

  • 8/14/2019 HBase @ Meetup

    8/21

    What is HBase?Data Storage

    Table Regions, defined by row [start key, end key)

    Store, 1 per family 1+ Store Files (Hfile format on HDFS)

    (table, rowkey, family, column, timestamp) = value Everything is byte[] Rows are ordered sequentially by key Special tables: -ROOT-, .META.

    Tell clients where to find user data

  • 8/14/2019 HBase @ Meetup

    9/21

    HBase ArchitectureCourtesy of Lars George

    from http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

    http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.htmlhttp://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
  • 8/14/2019 HBase @ Meetup

    10/21

    What is HBase?Data Access

    Random access (Gets) by rowkey only

    Sequential reads (Scans) starting row key where you stop is as important as where you start

    ending row key (optional) server-side filter (optional)

    Writes (Puts) No insert vs. update distinction

  • 8/14/2019 HBase @ Meetup

    11/21

    How It WorksStoring activity data in HBase

    FeedItem : stores activity data for all types keyed by group and descending timestamp

    ch -ts - -

    each row only contains data for that typeRow Key info: content:ch1261585-ts9223... item_type = chapter_greeting

    target_greeting = 8104438greeting = Hi, Gary

    ch1261585-ts9223... item_type = new_discussiontarget_forum = 847743target_thread = 7369603

    title = Improvementsbody = When adiscussion is created...

    MemberFeedIndex : index of FeedItem rows from all of a member's groups one row per member (keyed by member ID) columns store refs to FeedItem row keys for that member's groups TTL of 2 months expires old index values

    Row Key item:

    4679998 ch176399-ts9223370788400750807-mem-10044424 = new_member ch1261585-ts9223370787431124807-ptag-8525047 = photo_tag...

  • 8/14/2019 HBase @ Meetup

    12/21

  • 8/14/2019 HBase @ Meetup

    13/21

    How it WorksSecondary index tables

    Still need to find rows by column values tried tableindexed contrib (0.19 release), high CPU usage & contention

    on scans decided to update to 0.20 release for other performance improvements built secondary indexing into app layer

    Separate table per indexed column FeedItem info:actor_member indexed by FeedItem-by_actor_member Index table rows keyed by column value and descending timestamp

    -< Long.MAX_VALUEtimestamp >-

    Zero pad numeric values (or big-endian representation) for correct byteordering

  • 8/14/2019 HBase @ Meetup

    14/21

    How it WorksSecondary index tables

    ex. FeedItem-by_actor_member Row Key info: __idx__:

    0002851766-9223370783553935005- rowkey actor_member = 2851766item_type = new_rsvppub_date =

    row = ch1143475-ts9223370783553935005-rsvp-54704795

    0004679998-9223370783650851832- rowkey actor_member = 4679998

    item_type = new_discussionpub_date =

    row = ch1261585-

    ts9223370783650851832-disc-7369603

    Row Key info: content:

    ch1143475-ts9223370783553935005-rsvp-54704795 actor_member = 2851766

    item_type = new_rsvppub_date =

    comment = See you there

    ch1261585-ts9223370783650851832-disc-7369603 actor_member = 4679998item_type = new_discussionpub_date =

    title = Next monthbody = ...

    indexes FeedItem

  • 8/14/2019 HBase @ Meetup

    15/21

    Interacting with HBaseMeetup.Beeno

    package com.meetup.feeds.db;

    ...

    @HEntity (name="FeedItem")public class FeedItem implements Externalizable {

    ...

    @HRowKeypublic String getId() { return this.id; }public void setId(String id) { this.id = id; }

    @HProperty (family="info", name="actor_member",indexes = { @HIndex (date_col="info:pub_date", date_invert=true,

    extra_cols={"info:item_type"}) } )public Integer getMemberId() { return this.memberId; }public void setMemberId(Integer id) { this.memberId = id; }

    Java Beans mapped to HBase tables

  • 8/14/2019 HBase @ Meetup

    16/21

    Interacting with HBaseServices

    Base service class provides round-tripping based on annotations

    public class EntityService {

    public T get( String rowKey ) throws HBaseException {}

    public void save( T entity ) throws HBaseException {}

    public void saveAll( List entities ) throws HBaseException {}

    public void delete( String rowKey ) throws HBaseException {}

    public Query query() throws MappingException {}

    }

    easily extended for specific needs

    Almost all HBase interaction through service instances.

  • 8/14/2019 HBase @ Meetup

    17/21

    Interacting with HBaseQueries

    Find all items related to a discussion

    FeedItemService service = new FeedItemService(DiscussionItem.class);Query query =

    service.query()

    .using( Criteria.eq("threadId", threadId) );List items = query.execute();

    Find all greetings from a given member

    FeedItemService service = new FeedItemService(GreetingItem.class);Query query =

    service.query()

    .using( Criteria.eq("memberId", memberId) )

    .where( Criteria.eq(type,FeedItem.ItemType.CHAPTER_GREETING) );

    List items = query.execute();

    Simple Query API uses mappings and secondary index tables

  • 8/14/2019 HBase @ Meetup

    18/21

    Interacting with HBaseMember Feed Retrieval

    // retrieve the member's index recordHTable mfiTable = HUtil.getTable("MemberFeedIndex");Get get = new Get( Bytes.toBytes(String.valueOf(memberId)) );get.addFamily( Bytes.toBytes("item") );Result r = mfiTable.get(get);

    FeedItemService service = new FeedItemService();Set sortedKeys = sortKeys(r);List items = new ArrayList();

    // for each index col get the entity recordfor (IndexKey key : sortedKeys) {

    FeedItem item = service.get(key.getKey());if (item != null)

    items.add(item);}

    // populate member and chapter info

    Get latest activity from all a member's groups using MemberFeedIndex

  • 8/14/2019 HBase @ Meetup

    19/21

    HBase @ MeetupIssues along the way

    Performance testing Product targeting 3 of our highest traffic pages, simulating load is hard Started with load scripts Moved to testing with live traffic

    Use AJAX calls to simulate requests Selective enable for X% of traffic

    Launched data collection/write traffic first Allowed tweaking configuration before impacting user experience

  • 8/14/2019 HBase @ Meetup

    20/21

    HBase @ MeetupIssues along the way

    High CPU / Concurrency issues Updated to 0.20 release for performance gains across the board Replaced tableindexed usage with application level secondary indexing

    Hot regions - profile page hits small table every pageload

    Force split table to distribute across multiple servers Newest region still handling high load

    changed index keying to -- for even

    distribution I/O Heavy load / MemberFeedIndex table growing

    Lowered MemberFeedIndex time-to-live to 2 months Enabled LZO compression

  • 8/14/2019 HBase @ Meetup

    21/21

    HBase @ MeetupCurrent Status

    Live traffic growing Cluster handling ~2.5k 3k request/sec 50+% still write traffic ~17% of page views hit HBase (for reads) Expanding to 30% of page views in coming months

    Meetup.Beeno now open-source on Github: http://github.com/ghelmling/meetup.beeno

    Next up Continue tweaking Site analytics

    http://github.com/ghelmling/meetup.beenohttp://github.com/ghelmling/meetup.beeno