building scalable solutions for commerce

35
www.mindteck.com Building Scalable Solutions for Commerce August 2011 http://www.mindteck.com/coe/cloud-computing. html ([email protected]) Version1.4

Upload: liuz

Post on 12-Jan-2016

32 views

Category:

Documents


1 download

DESCRIPTION

Building Scalable Solutions for Commerce. August 2011. Version1.4. http://www.mindteck.com/coe/cloud-computing.html ([email protected]). Disclaimer. Coverage Disclaimer I don’t cover every aspect of building large scale applications, but I’m sincerely working on it! - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Building Scalable Solutions for Commerce

www.mindteck.com

Building Scalable Solutions for Commerce

August 2011

http://www.mindteck.com/coe/cloud-computing.html([email protected])

Version1.4

Page 2: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 2 | www.mindteck.com

Disclaimer

• Coverage Disclaimer

– I don’t cover every aspect of building large scale applications,

– but I’m sincerely working on it!

• Presentation

– I’m only representing what I’ve understood, learnt and practiced during my architecture experience

• Objective

– I’m here to share and learn and I’m sure you gathered here for the same purpose too!

– Good luck to you all

Page 3: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 3 | www.mindteck.com

Just a quote to begin with..

• So a scalability system is meant to be:– The least scalable component of your system becomes a

bottleneck for the whole system

Reference http://msdn.microsoft.com/en-us/library/aa291873(v=vs.71).aspx

Page 4: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 4 | www.mindteck.com

Tagged.com

• Tagged Architecture – Scaling To 100 Million Users, 1000 Servers, And 5 Billion

Page Views

– 100 million registered members

– 25 million unique worldwide monthly visitors

– 6 million unique U.S. monthly visitors

– 7 billion page views per month

• Platform– PHP Webapp, Java, Memcached

Page 5: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 5 | www.mindteck.com

Youtube.com

• Servers– Supports the delivery of over 100 million videos per day– Founded 2/2005– 3/2006 30 million video views/day– 7/2006 100 million video views/day

• Platform– Apache– Python– Linux (SuSe)– MySQL– psyco, a dynamic python->C compiler– lighttpd for video instead of Apache

Page 6: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 6 | www.mindteck.com

Google.com

• Worlds largest Search Engine Serves– Search ~450,000 low-cost commodity servers in 2006 – Google indexed 8 billion+ web pages in 2005– Over 200+ GFS clusters at Google – A cluster can have 1000 or even 5000 machines

• Pools of tens of thousands of machines retrieve data from GFS clusters that run as large as 5 petabytes of storage. Aggregate read/write throughput can be as high as 40 gigabytes/second across the cluster

– ~6000 MapReduce applications at Google and hundreds of new applications are being written each month

– BigTable scales to store billions of URLs, hundreds of terabytes of satellite imagery, and preferences for hundreds of millions of users

• Platform– Linux– A large diversity of languages: Python, Java, C++

Page 7: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 7 | www.mindteck.com

Facebook.com

• Biggest Social Site Serves– Serves 570 billion page views per month (according to Google Ad

Planner)– There are more photos on Facebook than all other photo sites combined

(including sites like Flickr)– More than 3 billion photos are uploaded every month– Facebook’s systems serve 1.2 million photos per second

• This doesn’t include the images served by Facebook’s CDN.

– More than 25 billion pieces of content (status updates, comments, etc) are shared every month

– Facebook has more than 30,000 servers 

• Platform– PHP, but it has built a compiler for it so it can be turned into native code on its web

servers, thus boosting performance– Linux, but has optimized it for its own purposes– MySQL, but primarily as a key-value persistent storage, moving joins and logic

onto the web servers since optimizations are easier to perform there – Memcached

Page 8: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 8 | www.mindteck.com

Amazon.com

• World’s largest ecommerce site serves– More than 55 million active customer accounts

– More than 1 million active retail partners worldwide

– Between 100-150 services are accessed to build a page

• Platform– Linux

– Oracle

– C++

– Perl

– Mason

– Java

– Jboss

– Servlets

Page 9: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 9 | www.mindteck.com

Scalability – A close look

• Accommodate

– the increased usage of user requests

– an increased dataset that it can handle with agreed SLA

• Maintainable

– to ensure service request are met and managed easily

• Is a property of

– a system which indicates its ability to either handle enlarged as demands increases

• Scale succeed if

– it continues to be available at consistent speeds as the number of users and requests continues to grow to very high number.

Page 10: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 10 | www.mindteck.com

Are there really any challenges while designing scalable solutions – hmmm, you bet?• Cost

– How much one could afford to spend on h/w for the increasing scalability

• Timeline– Sooner, you do it better the business returns

• Maintainability & Manageability– Easy to maintain as the complexity of the components grows

• Tools and Technology Approach– Choice of tool that could directly provide scalability features for easy leverage

• (Growing) Data– Complex business process in dealing with the data

– information access is become more challenge with additional layer of complexity with scalable components

Page 11: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 11 | www.mindteck.com

Scalability Design Requirements• Increase in performances (Can it really)

– Caching, Replication techniques

• Low Latency– Network dependency, too many back-end integrations, multiple dB read/write access

• High Reliability– Data integrity and access to right info

• Dynamic – No of users, volume of data– Data peak load during the spike

• Operational efficiency– Round-trip presentation, retrieval and performance

• Low cost– Focus on design, than investing on new features

• High Availability– To make always the features are available

• Manageability– Its easy to manage and administer with limited skill and knowledge

Page 12: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 12 | www.mindteck.com

Scalability layers of Architecture

• Client– Caching– HTTP Protocol

• Web Server– Load Balance

• Application Server– Distributed Server Caching– Connection Pooling– Load balancer & Clustering (Commodity h/w)– Synch Vs Asynch Choice of Messaging

• Database– Data Replication– Federation– Database Sharding or Partitioning

• Sharding helps to isolate and constrain storage, CPU, memory, and IO– Memchaced– Hadoop

Page 13: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 13 | www.mindteck.com

Techniques to improve scalability

Page 14: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 14 | www.mindteck.com

Scalability and Performance Technique

• Load balancing–Vertical scaling and Horizontal Scaling

• Caching/replication• Partitioning• Parallelism• Redundancy• Request Processing• Asynchronous Messaging• Multi-thread• Resource Pooling• Session Management

Page 15: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 15 | www.mindteck.com

Scalability Approach – Load balancing Hardware• Load Balancing and Clustering

– Vertical Scaling

– Vertical Partitioning

– Horizontal Scaling

– Horizontal Partitioning

Page 16: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 16 | www.mindteck.com

Scaling Explained

Page 17: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 17 | www.mindteck.com

How we do typically – Vertical Scaling

Increasing the hardware resources without changing the number of nodesReferred to as “Scaling up” the Server

CPU

RAM

CPUnRAMn

Page 18: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 18 | www.mindteck.com

How we do typically – Vertical Partitioning

Each service deployed at individual physical node/hardware

Page 19: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 19 | www.mindteck.com

How we do typically – Horizontal Scaling

Each service deployed at individual physical node/hardware

Load B

ala

nce

r

Page 20: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 20 | www.mindteck.com

Client /Server : Use Efficient HTTP Protocol Design• Connection Management

• Intermediate network support

• Concurrent Request Processing

• Design Considerations– Architect your application in a way that encourages HTTP caching

– Identify key settings of an HTTP server that affect scalability and performance

– Understand important efficiency-related parameters of a typical HTTP API, such as that provided by Java

• Scalability Tips– Use GET and POST Judiciously

– Consider HTTP for Nonbrowser Clients

– Promote HTTP Response Caching

– Support Persistent Connections

Page 21: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 21 | www.mindteck.com

DB

DB

Caching Explained

Page 22: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 22 | www.mindteck.com

Caching / Replication• Readily accessible data structure that allows thread-safe access to in-memory

data

• Clustered Cache

– Cache system where each cache instance is aware of other cache instances in a cluster and is capable of synchronizing operations with its peers. Cache contents are typically mirrored

• Distributed Cache

– Distribute any cached state across a cluster to maximize retrieval efficiency, reduce overall memory used, and guarantee data redundancy (fragmented data sets over the network)

• Technology Tool

– Squid, EHCache

– Jcache (JSR107)

– Memcached

– Hadoop / Map-Reduce

Page 23: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 23 | www.mindteck.com

Distributed Cached Techniques

• Replicated Cache

–Where memory isn’t issue

–Only few cache nodes

Page 24: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 24 | www.mindteck.com

Caching Explained

• Partitioned Cache

–More cache nodes avail

–Does not contain entire cache data in single node

–Partition the data in a cluster, so each node will share the burden

Page 25: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 25 | www.mindteck.com

Asynchronous Messaging

• Asynchronous process will not block further processing and may optionally be notified when the operation is completed

• Use Java 5.0 Concurrency package for asynchronous behavior

• Ajax Asynchronous JavaScript and XML) – XMLHttpRequest object is used to exchange data asynchronous from the web server

• JMS (Java Messaging Service) - In distributed environment JMS API can be used to read, receive and send messages in multiple formats

• Asynchronous Web Services

• Asynchronous communication mode –

– Polling type

– Push type

Page 26: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 26 | www.mindteck.com

Request Processing

• Connection Management

• Data Marshaling

• Request Servicing

• Design Considerations

– Synchronous Communication

• Servlets/JSP

– Asynchronous Communication

• JMS

• Scalability Tips

Page 27: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 27 | www.mindteck.com

Parallelism

• Conceptually doing more then one task at a time. Based on hardware and software it can be implemented

• Software –Threads

• Hardware– Massively parallel processors (MPP) – Nodes that don’t share

data but compute by routing data between nodes.

– Symmetric multiprocessing machines(SMP) –Nodes consists of multiple processors that share same data

– Clustered computing system – Nodes consists of multiple computers that don’t share same data but route it between computers over network.

Page 28: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 28 | www.mindteck.com

Redundancy

• Duplication of hardware or software, so that more resource are available for execution

• Redundancy increases ability of system to scale but increases reliability i.e. availability of application in case of one node crashes.

• It refers duplication of data in all nodes.

• Drawback are deployment cost and consistency.

Page 29: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 29 | www.mindteck.com

Resource Pooling

• Collection of pre-created objects that can be loaned out to save the expense of creating them many times

– Thread Pool– EJB Pool– Database Connection Pool

• When application have to deal with large number of request, to reduce the overhead resource pooling can be done

• Database connection can be pooled or thread object can be pooled.

• Based on the application requirement the fixed pool of object can be created which request can borrow.

Page 30: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 30 | www.mindteck.com

Best Practices

• Caching best practices:– Choose between lazy and early loading objects in memory.– Cache objects that are expensive to compute and frequently

used.– Use immutable keys for caching eliminate possibility of map

leak.– Allocate enough heaps.– Does not cache write objects

• Use many cache nodes.

• Use Externalizable which is faster as compared to default serialization.

• LRU cache policy or any other based on fitment for caching data.

Page 31: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 31 | www.mindteck.com

Best Practices

• Cache as coarse grained object as possible in read only mode

• Object Caching framework:–Open Source

o Java Caching System (JCS) from Jakarta (part of the Turbine project)o OSCacheo Commons Collections (another Jakarta project)o JCache API (SourceForge.net)

–Commercialo SpiritCache (from SpiritSofto Coherence (Tangosol)o Javlin (eXcelon)o Object Caching Service for Java (Oracle)

Page 32: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 32 | www.mindteck.com

Best Practices

• Use clustering technologies

• Consider logical versus physical tiers

• Isolate transactional methods

• Eliminate business logic layer state when possible

• Use Caching Extensively and Appropriately– Avoid hitting DB, opening transaction and connection unless

absolutely required

– Avoid remote communication, proper use of value object

• Constraint concurrent access to limited resource

• Proper usage of java.util.concurrent package.

Page 33: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 33 | www.mindteck.com

Best Practices

• Please understand Scaling takes Iteration(s)• Don't try Over Design upfront• Choose the right tool for the job after do enough research

and understand your requirements. Don’t follow tool choice just because everyone says it

• Be open to see if you could think different to get away from traditional approach to find your own scalability solutions

Page 34: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 34 | www.mindteck.com

Quick Review of Scalability Example

Server Station

Server Station

Enterprise

Database

Gateway

GSM/GPRS

ZigBee Wireless

Smart Energy Meter

Gateway

ZigBee Wireless

Gateway

ZigBee WirelessIP Link

Cloud engineCloud engine

“N”

Environm

ents

SmartAppliance

Electric Gas Meter

In-Premise display

Smart Energy Meter

SmartAppliance

Electric Gas Meter

In-Premise display

Smart Energy Meter

SmartAppliance

Electric Gas Meter

In-Premise display

Billing

Challenge Make Smart Energy Management System Scalable and accessible to a very large client base. Performance should not be degraded Service provided should be secured. Minimal infrastructure cost.

Mindteck’s Approach Deploy the services offered by Smart Energy Management System on to Cloud platform coupled with

RestFul Web services for scalability, with load balanced gateway servers

Benefits Scalable Enterprise level test infrastructure that meets requirements

Page 35: Building Scalable Solutions for Commerce

Confidential © Mindteck 2011 | 35 | www.mindteck.com

Application Software, Smart Energy Banking-Financial-

Services-Insurance BFSI, Business Intelligence, Business Process

Outsourcing, Content Analytics, Electronic Design Services, Firmware,

Hardware / Device, Infrastructure, Java, Knowledge Management,

Wireless

Life Sciences, Maintenance, Mechanical, Microsoft Technologies,

Mobile Platforms, MySQL, Open Source, Oracle, Public Sector, Product

Development, QA & Testing Services, SAP, Services,

Semiconductor, Smart Energy, Storage, Support

Services, System Software, SQL Server, Verticals, ZigBee

Thank You