scalabilitydelara/courses/ece1779/handouts/scalability.pdftutorial ! step 5: create a ec2 security...
TRANSCRIPT
1
© 2010 VMware Inc. All rights reserved
Scalability
Recommended Text:
Database Replication
Bettina Kemme, Ricardo Jiménez-Peris, Marta Patiño-Martínez Morgan & Claypool Publishers
2
Scalability
! Load Balancing
! Provisioning/auto scaling
! State • Caching
• Commonly used items of the database are kept in memory.
• Replication • Logical items of the database (e.g., tuples, objects) have multiple physical copies
located on different nodes.
• Partitioning • Logical items of the database (e.g., tuples, objects) are divided among multiple
physical nodes.
! Fault Tolerance
3
Load Balancing
! Determines where requests/transactions will be executed ! Blind techniques • Require minimum or no state on load balancer
• Strategies: • Random • Round robin
! Load aware • Consider load on replicas when routing requests
• Different request may generate different load • Replica nodes may have different capacity
• Strategies: • Shortest queue first
• Simple to implement, just track number of active requests • Least loaded
• Replicas need to periodically provide load info to load balancer
4
Load Balancing
! Application Aware • Uses knowledge of the application
• Strategies: • Shortest execution length
• Profile request types to generate execution time estimates • Load estimates load at replica by keeping tracks of pending requests and their
type, i.e., determines application-specific load metric. • Routes to less loaded replica
• Locality-aware request distribution (LARD) • Takes into account data accessed by requests, and routes similar request to the
same replica to increase cache hit rates
• Conflict-aware distribution • Execute potentially conflicting requests on same replica for early conflict detection
2
5
Load Balancing Comparison
! Blind • General
• Easy to implement • Suboptimal
! Load aware • Better performance • Require more state
! Application aware • Best performance
• Require even more state • Harder to implement
• Brittle – need to change if application changes
6
AWS Elastic Load Balancer
! Balances load between EC2 instances ! Can distribute load across availability zones ! Within an availability zone: • Round-robin among the least-busy instances
• Tracks active connections per instance • Does not monitor CPU
! Supports the ability to stick user sessions to specific EC2 instances.
7
AppEngine Load Balancing
! Google is not saying!
8
Self Provisioning
! Automatically adjust the worker pool or number of replicas to the current workload • Add workers/replicas when load increases
• Retire replica/workers when under load
! Approaches • Reactive
• Reconfigure when load threshold reached
• Proactive • User prediction mechanism to trigger reconfiguration • Reconfigure before saturation • Based on time series or machine learning approaches
3
9
Self Provisioning
! Retiring replicas/workers • Move instance load to rest of system
• All new request go to other nodes
• Wait for outstanding transactions to commit • When idle, release
! Adding replicas/workers • Transfer state/database to new replica • Receive all updates that occurred during previous transfer
• Add replica to pool
10
Self Provisioning Considerations
! Load metrics • Use low level system metrics to determine node utilization
• CPU, I/O utilization
• Use application level metrics • Response time, e.g., transaction completes within X milliseconds.
! Cost/latency prediction • How long and how expensive it is to add a replica/worker • Reactive: the higher the latency the lower the configuration threshold
• Proactive: the higher the latency the farther into the future we need to predict
! Monitoring • Monitoring isn’t free • The more data that we collect, the higher the impact on the system
11
AWS Auto Scaling
! Scale automatically according to user-defined conditions ! Enabled by CloudWatch ! Create scale up/down policy ! Associate scaling policy with CloudWatch alarm
12
AS Tutorial
! Step 1: Download the command line tools • http://aws.amazon.com/developertools/2535
! Step 2: Create a credentials file • AWSAccessKeyId=<Write your AWS access ID>
AWSSecretKey=<Write your AWS secret key>
! Step 3: Set environment variables • AWS_AUTO_SCALING_HOME=<directory where you installed the tools> • JAVA_HOME=<java installation home directory>
• PATH = $AWS_AUTO_SCALING_HOME/bin:$PATH
• AWS_CREDENTIAL_FILE=<the file created in 2>
! Step 4: Test • as-cmd –help prints list of available commands
4
13
Tutorial
! Step 5: Create a EC2 security group and a load balancer ! Step 6: Create a launch configuration • as-create-launch-config ece1779_config --image-id ami-37ec3c5e --instance-type m1.small --group default
! Step 7: Create an auto scaling group • as-create-auto-scaling-group ece1779_group --availability-zones us-east-1a
--launch-configuration ece1779_config
--min-size=1 --max-size 3
--load-balancers ece1779LoadBalancer
--health-check-type ELB --grace-period 300
Health-chek-type can set to EC2 (machine level health) or ELB (app-level health)
! Step 8: Create a scaling policy • as-put-scaling-policy ScaleUp --auto-scaling-group ece1779_group --adjustment=1 --type ChangeInCapacity
Other types: ExactCapacity and PecentageChangeInCapacity
14
AS Tutorial
! Step 9: Create alarms on CloudWatch and associate with AS policy
15
AppEngine Auto Scaling
! App owner can adjust the front instance class,# of idle instances, and the pending latency.
16
AppEngine Auto Scaling
! Can configure number of backend nodes ! war/WEB-INF/backends.xml
! Administer • AppEngine management console
• appcfg.sh command line tool (part of the Java SDK)
5
17
Caching
! Objective: • Reduce load on storage server
• Reduce latency to access data
! Store recent/hot/frequent data in RAM ! Volatile ! Fast read/write access
18
memcached
! A distributed memory cache implementation ! Provides a key-value store interface
put (key,value)
value = get(key)
! Scalable and consistent temporary storage • Secondary system that provides fast access to data
• Data stored reliably somewhere else
19
Google memcache
! Inspired by memcached ! Commonly used to cache datastore entities by their keys ! Key is up to 250 bytes long • Larger keys hashed
! Value can be up to 1 megabyte ! Single value operation are atomic • No support for multi-item transactions
! Support for atomic numeric value increment/decrement
20
Usage Model
! Data accesses get value from memcached
if cache miss fectch data from datastore put value on memcached
operate on value
! Date updates • Possible to overwrite value with new data • No transactional way to do this • Update may succeed in datastore and fail in memcache • Updates may happen at different order in datastore and memcache • Instead: Invalidate item on update
Fetch fresh value on next access
6
21
Memcache Java API
! Key and value can be any Java serializable object
! Obtain memcache pointer • MemcacheService syncCache = MemcacheServiceFactory.getMemcacheService();
! Put
22
Memcache Java API
! Get
! Delete
! Contains
23
Memcache Java API
! Increment
! Cache statistics
24
Replication
! Fault tolerance • High availability despite failures
• If one replica fails, there is another one that can take over
! Throughput • Support a large use base
• Each replica handles a fraction of the users • Scalability
! Response time • Support a geographically distributed user base • There is a replica close to the user
• Fast local access
7
25
Stateless vs. Stateful
! Stateless • Services that require mainly computing
• State can be regenerated on demand • Easy to replicate
! Stateful • Services uses business critical data • Good performance depends on state availability
• Harder to replicate
26
Transactions
! User-defined sequence of read and write operations on data items that meet the following properties: • Atomicity
• Transaction either executes entirely and commits, or it aborts not leaving any changes in the database
• All or nothing
• Consistency • Integrity constraints on the data are maintained
• Isolation • Provides impression that no other transaction is executing in the system
• Durability • Once the transaction commits, the changes are indeed reflected in the database, even
in the event of failures.
27
Replica Control
! Task of keeping data copies consistent as items are updated ! There is a set of database nodes RA, RB, … ! Database consist of set of logical data items x, y, …. ! Each logical item x has physical copies xA, xB,…. • Where xA resides in RA
! A transaction is a sequence of read and write operation on logical data items
! The replica control mechanism maps the operation on the logical data items onto the physical copies
28
Read-One-Write-All (ROWA)
! Common replica control method ! Read can be sent to any replicas • Logical read operation ri(x) on transaction Ti
• Mapped to ri(xA) on one particular copy of x
! Updates performed on all replicas • Logical write operation wi(x) on transaction Ti
• Mapped to wi(xA), wi(xB), … on one particular copies of x
! ROWA works well because on most applications reads >> writes
8
29
1-copy equivalence
! A replicated system provides the same semantics as a non-replicated system • 1-copy-isolation
• 1-copy-atomicity • 1-copy-durability
• 1-copy-consistency
! Requires thigh coupling of replica control mechanisms with db mechanisms that achieve: atomicity, consistency, isolation, durability
30
Serializability
! Most well-known isolation model ! Provides strongest isolation level ! Concurrent execution of a set of transactions most be equivalent
to some possible serial execution of the set ! Conflict: two operations that access the same item, are from
different transactions, and at least 1 is a write ! To be serializable all conflicts have to execute in the same order • Ti has to appear to execute before Tj or Tj before Ti • Acyclic serialization graph
S1, S2 are serial executions S3 is serializable S4 is unserializable
31
Serializability
! Use concurrency control to enforce transaction isolation • Acquire share lock before read
• Acquire exclusive lock before write • Release all locks at end of transaction (2-phase locking)
32
1-copy-serializability
! First attempt: just make sure schedules are serializable at each replica • Does not work
! Need to order all conflicting operations in the same way ! Whenever one local schedule executes one of Ti ’s operations
before a conflicting operation of Tj , then Ti is executed before Tj in all replicas
9
33
1-copy-atomicity
! Transaction executes on its entirety and commits, or it aborts and does not leave any effects on database
! On replicated database • Transaction has to have same decision of either all (commit) or nothing (abort)
at all replicas at which it performs • ROWA
• All replicas for update transaction • 1 replica for read-only transactions
• Requires some agreements protocol to force all replicas to reach same decision on outcome of transaction • 2-phase commit
34
Read-One-Write-All (ROWA)
! Key design alternatives: • Where are updates executed
• When are changes propagated • How are changes propagated
35
Primary Copy vs. Update Anywhere
! Primary copy • All updates are executed first on a single node, the primary
• Advantages: simpler to implement • Disadvantages: primary can be come bottleneck
! Update Anywhere/Everywhere • Updates and read only request can be sent to any replica • Advantages: potentially more scalable
• Disadvantages: harder to guarantee consistency
36
Eager vs. Lazy
! Eager replication • Confirmation is only returned to user once all secondaries execute update
• Advantages: Strong consistency • Disadvantages: Replica coordination can be slow
! Lazy/asynchronous replication • Confirmation is returned to client after updates are applied to a single replica • Updates are subsequently propagated to other replicas
• Advantages: Fast
• Disadvantages: Weaker consistency, potential for conflicts
10
37
Eager Primary Copy
S(y) acquire shared lock
X(y) acquire exclusive lock
U(y) unlock
38
Eager Primary Copy Properties
! Strengths • 1-copy-serializability
• 1-copy-atomicity
! Weaknesses • Transaction only commits when all replicas have executed writes
• Execution time determined by slowest replica
• 2-phase-commit is slow and expensive • Combination of eager write propagation and locking can lead to blocking in the
presence of long read transactions
• Primary may become bottleneck as it has to execute all reads for update transactions
• Single point of failure
39
Eager Update Anywhere
40
Eager Update Anywhere Properties
! Strengths • 1-copy-serializability
• 1-copy-atomicity • No single point of failure
! Weaknesses • Possibility for distributed deadlock
• Adds significant complexity to protocol
• Transaction only commits when all replicas have executed writes • Execution time determined by slowest replica
• 2-phase-commit is slow and expensive
• Combination of eager write propagation and locking can lead to blocking in th presence of long read transactions
11
41
Lazy Primary Copy
42
Lazy Primary Copy Properties
! Strengths • Fast
• Transactions don’t involve remote communication
! Weaknesses • Weak consistency
• Remote replicas have stale date • Does not comply with 1-copy-atomicity
• Similar bottleneck issues as eager primary copy
•
43
Lazy Update Anywhere
44
Lazy Update Anywhere Properties
! Strengths • Fast
• Transactions don’t involve remote communication
! Weaknesses • Weak consistency
• Remote replicas have stale date • Does not comply with 1-copy-atomicity, 1-copy-isolation
12
45
Processing Write Operations
! Writes have to the executed on all replicas
! Write processing is the main overhead of replication
! Symmetric update processing • SQL statement is sent to all replicas
• All replicas parse the statement, determine the tuples affected, and perform the modification/deletion/insertion.
• Pros: Reuse existing mechanisms
• Cons: Redundancy Execution has to be deterministic. Consider and update that sets a timestamp
! Asymmetric update processing • SQL statement is executed locally on a single replica
• Writeset (extracted changes) sent to other replicas • Identifier and after-image of each updated record
• Pros: Efficiency
• Cons: Harder to implement 46
Replication Architecture
! Kernel-based or White-box approach • Implement inside the database
• Pros: Best integration with concurrency control and other transaction control mechanisms
• Cons: Hard to maintain and port to other databases Confined to a single database system
! Middleware-based or Black-box approach • Replication performed by an outside component
interposed between the database replicas and the client
• Pros: Does not require changes to the database Portability, modularity Can use different database systems
• Cons: Have to re-implement concurrency control Coarse-grain locking " Less parallelism
! Grey-box approach • Database provides special replications API
• Example: a writeset extraction/application API
• Middleware calls on the APIs
47
MySQL Replication
! ROWA ! Primary copy ! Eager and lazy replication ! Symmetric and asymmetric update processing ! Full and partial replication
48
MySQL Isolation Levels
! SERILIZABLE ! REPEATABLE READ (default) • By default reads do not acquire locks
• A snapshot is created at transaction start • Multiple reads to the same item will return consistent value • Writes by others will not be seen
• Possible to acquire explicit locks • SELECT * FROM sometable FOR UPDATE • SELECT * FROM sometable LOCK IN SHARE MODE
! READ COMMITTED ! READ UNCOMMITTED
13
49
App Engine Datastore
! High Replication Datastore (HRD) • Data is replicated across multiple data centers using a consensus algorithm
• Queries are eventually consistent • Pros: Highest level of availability for reads and writes
• Cons: Higher latency on writes due to the propagation of data.
! Master/Slave Datastore • One data center holds the master copy of all data
• Data written to the master data center is replicated asynchronously to all other (slave) data centers. • Pros: Strong consistency for all reads and queries
• Cons: Temporary unavailability during data center downtime
50
Failures
! Machine crashes
! Network errors • Partitions Quorum protocols • Message loss TCP handles this case
• Message corruption TCP handles this case
51
Self Healing
! Normal operation • Replica control protocols keep replicas consistent
• Provide enough information to recover from failures
! Failover • Failure detections
• In case of primary failure, choose a new primary • Voting • Deterministic assignment scheme, e.g., chose surviving replica with smallest ID
• Deal with outstanding transactions • Preserve atomicity • All submitted transactions should either commit or abort at all remaining replicas
! Recovery • Restart a failed replica, or create a new replica
• Integrate into the replication system
52
Failure Tolerant Architecture
! Decouple application to tolerate failure ! Implemented as shared-noting independent modules ! Use reliable message queues for coordination ! When a task fails, it is just restarted ! In addition, each part of the application can scale independently ! Examples: Amazon SQS, AppEngine Task Queues
14
53
AppEngine Task Queues
! Task Quest • Asynchronous processing
• Image transcoding of the critical path
! Scheduled Tasks • Periodic processing, cron jobs
• Fetching and caching data from remote services • Sending daily status emails
• Generating reports
54
Task Queues
! A facility for enqueueing HTTP requests ! Fault tolerant • Retry if task does not succeed (returns HTTP 200 status)
• Retry rate uses exponential backoff strategy
! Enqueueing a task is fast • About 3 time faster than writing to datastore
55
Adding a Task
! Tasks consist of • URL
• If null, URL set to /_ah/queue/queue-name • Default URL for default queue: /_ah/queue/default
• Name • If null, AppEngine assignes a unique name • Name can be used to prevent enqueueing duplicates
• Request parameters
56
Configuring Task Queues
! Task queues defined in WEB-INF/queue.xml ! An application can have up to 10 queues ! Configure queues to determine how fast tasks execute • bucket-size " maximum number tokens for the queue
to execute, a task need a token • rate " speed at which token are replenished
15
57
Securing Tasks
! AppEngine’s front end recognizes requests coming from a queue as if it were coming from a developer account
! Can use web.xml to restrict access to task URLs
58
Sharding
! Challenges: • Very large working set
• Slows reads • Facebook/Google user table
• Too many writes
! Solution: • Partition the data into shards • Assign shards to different machines • Denormalize the data
59
Sharding Strategies
! Range-based partitioning • E.g., username from a to m assigned to shard 1, n to z to shard 2
• Pros: simple • Cons: hard to get load balancing right
! Hash-based partitioning • Compute hash of key. • Different shards responsible for different hash ranges
• Pros: Good load balancing
• Cons: A little more complex,
! Directory-based partitioning • Lookup service keeps track of partitioning scheme
• Pros: Flexibility • Cons: Lookup service may become bottleneck
60
Denormalization
! Data that is access together is stored together • E.g., multi valued properties in AppEngine datastore entities
! Eliminates costly joins ! May require replication • E.g., a comment may be stored on the commenter’s and commentee’s profile
16
61
Sharding
! Pros • High availability
• Faster queries • More write bandwidth
! Cons • Queries get more complicated
• Need to figure out which shard to target • May need to join data from multiple shards
• Referential integrity • Foreign key constrains are hard to enforce • Not supported in many databases