work with hundred of hot terabytes in jvms

Work With Hundreds of Hot Terabytes in JVMs

Per Minborg CTO, Speedment, Inc.

Do not Cross the Brook for Water

Why would you use a slow remote databasewhen you can have all your data available directly in your JVMs, ready for concurrent low-latency access ?

Scenario

>10 TB

Application

In-JVM-Cache

Fraud Detection,Credit Card Company

Web ShopStock TradeBankEtc.

Source of Truth

Table of Content

• Two Important Aspects of Big Data Latency• Cache synchronization strategies• How can you have JVMs that are in the TBs?• Speedment SQL Reflector

Two Important Aspects of Big Data Latency

• No matter how advanced database you may ever use, it is really the data locality that counts

• Be aware of the big change in memory pricing

Compare Latencies Using the Speed of Light

Database

During the time a database makes a 1 s query, how far will the light move?


Disk seek

Intra-data center TCP

SSD


Main Memory

CPU L3 cache

CPU L2 cache

CPU L1 cache

”Back to the Future”How much does 1 GB cost?

Cost of 1 GB RAM - Back to the Future

$ 5

$ 0.04

$ 720,000

$ 67,000,000,000

Source: http://www.jcmit.com/memoryprice.htm

Conclusion

• Keep your data close• RAM is close enough, cheap and getting even cheaper

Table of Content


Cache Synchronize Strategies

Poll Caching• Data evicted, refreshed or marked as old• Evicted element are reloaded• Data changes all the time• System restart either warm-up the

cache or use a cold cache

Dump and Load Caching• Dumps are reloaded periodically • All data elements are reloaded• Data remains unchanged between

reloads• System restart is just a reload

Common ways:

Cache Synchronize Strategies

Reactive Persistent Caching• Changed data is captured in the Database• Changed data events are pushed into the cache• Events are grouped in transactions• Cache updates are persisted• Data changes all the time• System restart, replay the missed events

The Speedment way:

Comparison

Dump and Load Caching

Poll Caching Reactive Persistance Caching

Max Data Age Dump period Eviction time Replication Latency - ms

Lookup Performance Consistently Instant ~20% slow Consistently Instant

Consistency Eventually Consistent Inconsistent - stale data Eventually Consistent

Database Cache Update Load

Total Size Depends on Eviction Time and Access Pattern

Rate of Change

Restart Complete Reload Eviction Time Down time update rate -> 10% of down time

*

Table of Content


Conventional Java Applications

• Java Objects are Garbage Collected periodically• Garbage Collection times increases with Java Heap size• Garbage Collection times increases with Java Heap mutation rate• “The app has hit the GC wall”• Hard to meet reasonable SLAs with more than 16:ish GB JVMs• 10 TB data and 10 GB JVMs -> ~1000 JVMs

Hazelcast High-Density Memory Store (HDMS)

• Stores data outside of the Java heap• The Garbage Collector does not see the HDMS content• Scales up to terra bytes of main memory in a single JVM• Use any number of nodes

Hot Restart Persistence

• Persists data in the Hazelcast maps in a file• Operations are appended to the file• SSD backing device recommended• 1.3 GB/s reload per node

• 10 GB in 6s• 100 GB in 1 min• 1 TB in 10 min

• 6.5 GB/s reload in a system with 10 nodes (1 active and 1 backup)• 10 GB in 1 s• 100 GB in 12 s• 1 TB in 2 min

• 65 GB/s reload in a system with 100 nodes, 1 TB in 12 s

Compressed Oops in Java 8 (36 bits)

• Using the default of –XX:+UseCompressedOops –XX:ObjectAlignmentInBytes=16

• In a 64-bit JVM, it can use “compressed” memory references.• This allows the heap to be up to 64 GB without the overhead of 64-

bit object references. • As all object must be 8 or 16-byte aligned, the lower 3 or 4 bits of

the address are always zeros and don’t need to be stored. This allows the heap to reference 4 billion * 16-bytes or 64 GB.

• Uses 32-bit references.

Table of Content

• Two Important aspects of Big Data Latency• Cache synchronization strategies• How can you have JVMs that are in the TBs?• Speedment SQL Reflector

Scenario

>10 TB

Application

In-JVM-CacheFraud DetectionCredit Card CompanyWeb ShopStock TradeBankBack Testing

Source of Truth

What is Speedment?

• Database SQL Reflector• Code generation tool -> Automatic domain model extraction from

databases• In-JVM-memory technology• Pluggable storage engines (ConcurrentHashMap,

OffHeapConcurrentHashMap, Hazelcast, etc.)• Transaction-aware

Speedment SQL Reflector

• Detects changes in a database • Buffers the changes• Can replay the changes later on• Will preserve order

Database

INSERTUPDATEDELETE

• Will preserve transactions• Sees data as it was persisted• Detects changes from any source

Download Trial @ www.speedment.com

1. Connect to Your Existing SQL DB

2. Automatic Schema Analysis

3. Just Push Play…

Super Easy Integration

Config hcConfig = new Config();

// Add optimized serialization Factories for Hazelcast// that are generated automatically

HazelcastInstance hcInstance = Hazelcast.newHazelcastInstance(hcConfig);

// Tell Speedment what Hazelcast instance to use

// Automatically build all Database metadata (e.g. Schema, Tables and Columns)

// Load selected data from the database into the Hazelcast maps and start tracking DB

SpeedmentHazelcastConfig.addTo(hcConfig);

ProjectManager.getInstance().putProperty(HazelcastInstance.class, hcInstance);

new TreeBuilder().build();

ProjectManager.getInstance().init();

Licenses and Services

• Free trial (60 days)• Speedment’s and Hazelcast’s Enterprise licenses are aligned • Start-up package with Speedment experts• Project consultants

Thank you!

[email protected]

@Speedment

www.speedment.com

mailto:[email protected]

http://www.speedment.com/

work with hundred of hot terabytes in jvms

Software