an introduction to data gravity by john tkaczewski of filecatalyst
TRANSCRIPT
Data Gravity
• A term first coined by Dave McCrory circa 2010
• Data is difficult to move around
• Data attracts greater and greater amount of Apps, Services and other tools as it grows
Throughput and latency • As throughput and latency to the Data increase, the gravitational pull
of the data mass also increases
• Which forces the apps and services to move closer to the data
If the model stopped here… all apps and services would end up in a single giant online BLOB (the cloud) to be closer to the data
Real Life Scenario USB Thumb Drive VS. Amazon S3
• Unlimited flexible growing storage
• Easy Sharing with the rest of the world
• Security
• Convenience
• Fast Access to Data
• Practically Free
• Can be physically moved
Data Gravity on the Cloud
• Make inbound data as light as possible
• Make outbound data as heavy as possible
• Cost in VS. cost out
• Make Context of the data proprietary (example of a picture on flickr from http://datagravity.org/)
Data Gravity as a computational theory
• Borrows from gravitational theory
• Similarities with the way nations negotiate trade tariffs and trade agreements between countries and cities (ref)
• Shannon’s law how much information can be squeezed down a wire
• Von Newmann Bottleneck, how fast the data can move from Persistent Storage to Memory to CPU cache to CPU
Traditional File Transfers
FTP, SFTP, HTTP, WebDav, SMTP, CIFS etc… • All use TCP
• Provides reliability, error checking, ordered packets in a stream
• Congestion control built in
• Internet could not survive without it
• Works well for most internet traffic, email, web browsing small ad-hoc transfers
Problems with TCP • Flow control limits transmission window, causes dead air with high latency
• Very aggressive in response to network congestion, cannot tune in application layer
• Result is less than ideal performance on wireless, satellite, or long haul links
• Can be tuned but still not ideal for many-one, one-many
File Transfer Acceleration • Ideal for bulk file transfer
• Predictable - Can send at a perfect rate
• Not affected by latency or packet loss
• Congestion Control implemented in application layer
• Tunable congestion control aggression
• Instantly detect link capacity
• Data gravity still exists but is reduced by eliminating the latency component
• The gravity continues to exist towards every storage location
• With faster moving data, the owner can now have more choices where to store it.
• It’s not always possible to make cloud services available near the all the users
• File Transfer Acceleration can help to reach those far away users at a lower cost then building a new data center
Future … • Cloud services will continue to expand (money maker)
• Local and personal storage will continue to be needed but merely as a cache to what’s on the cloud
• Throughput will continue to increase but the latency will stay the same (speed of light++ anyone??)
• The need for faster file transfers will continue to grow as the cloud, data and links get bigger.