best practices for architecting high volume, high performance publishing for data intensive website

30
Copyright Edmunds Inc. (the “Company”). All rights reserved. Edmunds ® , Edmunds.com ® , the Edmunds.com car design, Inside Line sm , CarSpace sm and AutoObserver® are proprietary trademarks of the Company. This document contains proprietary and/or confidential information of the Company. No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Company, and any such disclosure requires the express approval of the Company. Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Web Site October 23 th 2010 Greg Rokita Director, Sr. Architect Edmunds.com

Upload: edmundscom-inc

Post on 05-Dec-2014

4.324 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

Copyright Edmunds Inc. (the “Company”). All rights reserved.Edmunds®, Edmunds.com®, the Edmunds.com car design, Inside Linesm , CarSpacesm and AutoObserver® are proprietary trademarks of the Company. This document contains proprietary and/or confidential information of the Company. No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Company, and any such disclosure requires the express approval of the Company.

Best Practices for Architecting High Volume, HighPerformance Publishing for Data Intensive Web Site

October 23th 2010

Greg Rokita

Director, Sr. Architect

Edmunds.com

Page 2: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Assumptions

o Knowledge of Java

o Basic understanding of Spring

o Basic knowledge of JMS

Page 3: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Agenda

o Common Enterprise Problems

o Layered Architecture

o ActiveMQ and Virtual Topics

o Camel

o Thrift & Versioning

o Retry and Throttling mechanism

o Monitoring

o Q&A

Page 4: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Common Enterprise Problems

o Multiple:o Environments (Prod, Test, Dev, etc)

o Data Centers (Los Angeles, New York, Amazon EC2, etc)

o Sites

o Applications (Solr, Coherence, etc)

o Data Sets (inventory, user data, pricing data, etc)

o Data Format Changes

o Components Fail

Page 5: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

What evolved from the efforts

o Messageo Delivery

o Routing

o Persistence

o Durability

o Retries

o Throttling

o Versioning

o Monitoring

ActiveMQ

Camel

Thrift

Page 6: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

6

ActiveMq Broker

Page 7: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Publish Subscribe

o Producers decoupled from consumers �– cool idea

o JMS durable topics sucko message consumer is created with a JMS client ID anddurable subscriber name

o only one consumer can be active for a client ID andsubscriber name

o CAN�’T failover of the subscriber if that one process runningthat one consumer thread dies

o CAN�’T load balancing of messages.

Page 8: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Virtual Topics

Virtual Topic:VirtualTopic.Vehicle

Queues:Consumer.Queue1.VirtualTopic.VehicleConsumer.Queue2.VirtualTopic.Vehicle

Page 9: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Virtual Topics

public class QueueDestinationInterceptor implements DestinationInterceptor {

public synchronized Destination intercept(final Destination destination) {return new DestinationFilter(destination) {

public void send(ProducerBrokerExchange context, Message message) throws Exception {if (applyFilterBasedOnMessageProperties(destination)) {

return;}destination.send(context, message);

}};

}}

o Message is always send to ALL the queues

o Solution: Destination Interceptor ActiveMQ plug in

Page 10: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Camel

from(A).filter(header(�“type").isEqualTo(�“Widget")).to(B)

Endpoint A Endpoint BFilter

Page 11: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Camel Cont.

activemq.queue.A activemq.queue.BFilter

RouteBuilder builder = new RouteBuilder() {

public void configure() {

from(“activemq.queue.A”)

.filter(header(“type”).isEqualTo(“Widget”))

.to(“activemq.queue.B”);

}

};

Page 12: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Camel Cont.

activemq.queue.A activemq.queue.BFilter

<camelContext errorHandlerRef="errorHandler“ xmlns="http://camel.apache.org/schema/spring">

<route>

<from uri=“activemq.queue.A"/>

<filter>

<xpath>/foo:person[@name='James']</xpath>

<to uri="activemq.queue.B"/>

</filter>

</route>

</camelContext>

Page 13: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Camel Cont.<route>

<from uri="timer://foo?fixedRate=true&amp;period=1000"/>

<to uri="bean:myBean?method=someMethodName"/>

</route>

�URXWH! �IURP�XUL �WLPHU���IRR"IL[HG5DWH WUXHDPS�SHULRG �������! �WR�XUL �EHDQ�P\%HDQ"PHWKRG VRPH0HWKRG1DPH��! ��URXWH!�URXWH! �IURP�XUL �WLPHU���IRR"IL[HG5DWH WUXHDPS�SHULRG �������! �WR�XUL �EHDQ�P\%HDQ"PHWKRG VRPH0HWKRG1DPH��! ��URXWH!�URXWH! �IURP�XUL �WLPHU���IRR"IL[HG5DWH WUXHDPS�SHULRG �������! �WR�XUL �EHDQ�P\%HDQ"PHWKRG VRPH0HWKRG1DPH��! ��URXWH!�URXWH! �IURP�XUL �WLPHU���IRR"IL[HG5DWH WUXHDPS�SHULRG �������! �WR�XUL �EHDQ�P\%HDQ"PHWKRG VRPH0HWKRG1DPH��! ��URXWH!�URXWH! �IURP�XUL �WLPHU���IRR"IL[HG5DWH WUXHDPS�SHULRG �������! �WR�XUL �EHDQ�P\%HDQ"PHWKRG VRPH0HWKRG1DPH��! ��URXWH!

o Example endpoints

o Queue

o Topic

o Timer

o Email

o Log

o Javabean

o FTP

o HDFS

o HTTP

o XSLT

Page 14: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Source/Target Selectors

14

Field Example Values Purpose

Environment PROD, TEST, DEV The staging environment in thepromotional cycle

Data Center LAX, EC2 The data center where theenvironment is located

Site Edmunds, InsideLine Defines the site as a set ofservices

Application Digital Asset Manager,Inventory Application

Deployment Unit

Page 15: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Topic Selectors

15

Field Example Values Purpose

Type Publish, Audit, Control Defines the type of the message

Service Inventory, Pricing Type of data being send

Page 16: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Producer / Consumer matching

Producer Consumer

ProdLaxEdmundsInventory

I am

Prod, TestLax, EC2EdmundsDealer

Send To ProdLax, EC2EdmundsInventory

I amTestEC2EdmundsDealer

Receive From

BrokerDestinationInterceptor

PublishInventory

PublishInventory

Virtual Topic Name

QueueName

Match!

Page 17: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Thrift: Data+Service+Strong Typing+Versioning

namespace java com.edmunds.inventory.thrift.gen

struct Product {1: string productType = "NCI",2: map<string, string> vehicleDisplayInfo,

}

struct Inventory {1: string id,2: string vin,3: string franchiseId,4: map<string, string> edmundsAttributes,5: list<Product> products,

}

service InventoryService {oneway void removeInventory(1:Inventory inventory),oneway void updateInventory(1:Inventory inventory),

}

Page 18: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Thrift �– Camel Integration

Camel

Thrift

Page 19: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Thrift �– Camel Integration: Senderimport org.apache.thrift.transport.TTransport;

public class ClientTransportImpl extends TTransport {

private SenderInternal senderInternal;

public void flush() throws TTransportException {byte[] buf = writeBuffer.toByteArray();writeBuffer.reset();senderInternal.sendThrift(buf);

}

public class SenderImpl extends AbstractEndpoint implements Sender, SenderInternal,InitializingBean {

public void sendThrift(Object object) throws Exception {Map<String, Object> headers = initializeMessageHeaders();headers.put("CamelBeanMethodName", "executeThrift");headers.put("CamelJmsMessageType", getContext().getProtocol().getJmsMessageType());

doSend(object, headers, ReceiverImpl.getEntryEndpointName(getTopicSelectors().topicName()));}

private void doSend(Object object, Map<String, Object> headers, String entryPointName) {producerTemplate

.sendBodyAndHeaders(entryPointName, ExchangePattern.InOnly, object, headers);}

}

Page 20: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Thrift �– Camel Integration: Receiver

public class ReceiverService {public void executeThrift(@Body byte[] byteArray, @Headers Map<String, String> headers)

{enterMessageDeck.addHeaders(headers);

ReceiverInternal receiverInternal = (ReceiverInternal) callable.getReceiver();receiverInternal.getContext().initialize(headers);

ProcessorTransportImpl processorTransport = new ProcessorTransportImpl();

TProcessor processor = findProcessor();

processorTransport.messageReceived(byteArray, processor, receiverInternal}

Page 21: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Creating Consumer

@Component(“inventoryConsumer") public class InventoryConsumer extends AbstractDataHandler implements

InventoryService.Iface {

@Override public void updateInventory(com.edmunds.inventory.thrift.gen.Inventory inventory) {

// perform your business logic here }

<bean id="receiver" class="com.edmunds.eps.endpoint.impl.ReceiverImpl"> <property name="service" value=“inventory"/><property name="messageType" value=“publish"/><property name=“dataCenter" value=“lax"/><property name=“environment" value=“prod"/><property name=“site" value=“Edmunds"/><property name="application" value=“Search"/>

</bean>

Page 22: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Creating Producer

// Inventory.Client is generated by ThriftInventory.Client inventoryClient = new Inventory.Client(sender.getProtocol());

inventoryClient.updateInventory( /*inventory object generated by Thrift*/ );

<bean id=“sender" class=“com.edmunds.eps.endpoint.impl.SenderImpl "> <property name="service" value=“inventory"/><property name="messageType" value=“publish"/><property name=“dataCenter" value=“lax"/><property name=“environment" value=“prod"/><property name=“site" value=“Edmunds"/><property name="application" value=“Inventory-Source"/>

</bean>

Page 23: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Throttling

getReceiver().getThrottler().setEnabled(true);getReceiver().getThrottler().setMaximumRequestsPerPeriod(10);getReceiver().getThrottler().setTimePeriodInMilliseconds(2000);

Camel:

Dynamically in Java:

receiver.throttler.enabled=truereceiver.throttler.maximumRequestsPerPeriod=10receiver.throttler.timePeriodInMilliseconds=2000

Statically in property file using Spring PropertyOverrideConfigurer:

from(…).throttle(10).timePeriodMillis(2000).to(…)

Page 24: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Retries / Error Handling:free gift from Camel

getReceiver().getErrorHandler().setUseCollisionAvoidance(true);getReceiver().getErrorHandler().setUseExponentialBackOff(true);getReceiver().getErrorHandler().setDelayPattern("5:1000;10:5000;20:20000");getReceiver().getErrorHandler().setUri("jms:queue:dead");

ExceptionHandler exceptionHandler = new ExceptionHandlerImpl(RegionException.class);exceptionHandler.setHandled(true);exceptionHandler.addException(ExceptionA.class);exceptionHandler.setStop(true);getReceiver().addExceptionHandler(exceptionHandler);

Page 25: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Control System

25

Topic

Producer A

Producer B

Producer C

Control Message

o Control Message

o Initiates producer activity

o Bulk, Single and Multiple loads

o Indicates targets systems for publishing

o Decouples Producer logic form Clients of the publishing system

o Allows to initiate all publishing activity form a single point

o Can be sent from JMX or HTTP

Page 26: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Heartbeat

26

Page 27: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

High Level View

Page 28: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Summaryo Simple to develop producers and consumers (library takes care of the

plumbing)o Can deploy producers and consumers “anywhere”o Can match producers and consumers at any levelo Handle error conditions, throttling o Type safetyo Versioningo HA & scalability

o Consumers: Virtual Topics, Queueso Producers: Control Systemo Broker: Network of Brokers

o Monitoring

Page 29: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

We are hiring!

http://www.edmunds.com/help/about/jobs

Page 30: Best Practices for Architecting High Volume, High Performance Publishing for Data Intensive Website

No part of this document or the information it contains may be used, or disclosed to any person or entity, for any purpose other than advancing the best interests of the Edmunds Inc., and any such disclosure requires the express approval of Edmunds Inc.

Q&A

Greg [email protected]