in-memory caching in hdfs lower latency, same great taste
DESCRIPTION
In-memory Caching in HDFS Lower latency, same great taste. Andrew Wang | [email protected] Colin McCabe | [email protected]. Query. Alice. Result set. Hadoop cluster. Alice. Fresh data. Fresh data. Alice. Rollup. Problems. Data hotspots - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/1.jpg)
1
In-memory Caching in HDFSLower latency, same great tasteAndrew Wang | [email protected] McCabe | [email protected]
![Page 2: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/2.jpg)
Alice
Hadoop cluster
Query
Result set
![Page 3: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/3.jpg)
Alice
Fresh data
![Page 4: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/4.jpg)
Fresh data
![Page 5: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/5.jpg)
Alice
Rollup
![Page 6: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/6.jpg)
6
Problems
• Data hotspots• Everyone wants to query some fresh data• Shared disks are unable to handle high load
• Mixed workloads• Data analyst making small point queries• Rollup job scanning all the data• Point query latency suffers because of I/O contention
• Same theme: disk I/O contention!
![Page 7: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/7.jpg)
7
How do we solve I/O issues?
• Cache important datasets in memory!• Much higher throughput than disk• Fast random/concurrent access
• Interesting working sets often fit in cluster memory• Traces from Facebook’s Hive cluster
• Increasingly affordable to buy a lot of memory• Moore’s law• 1TB server is 40k on HP’s website
![Page 8: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/8.jpg)
Alice
Page cache
![Page 9: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/9.jpg)
Alice
Repeated query
?
![Page 10: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/10.jpg)
Alice
Rollup
![Page 11: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/11.jpg)
Alice
Extra copies
![Page 12: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/12.jpg)
Alice
Checksum verification
Extra copies
![Page 13: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/13.jpg)
13
Design considerations
1. Explicitly pin hot datasets in memory2. Place tasks for memory locality3. Zero overhead reads of cached data
![Page 14: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/14.jpg)
14
Outline
• Implementation• NameNode and DataNode modifications• Zero-copy read API
• Evaluation• Microbenchmarks• MapReduce• Impala
• Future work
![Page 15: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/15.jpg)
15
Outline
• Implementation• NameNode and DataNode modifications• Zero-copy read API
• Evaluation• Microbenchmarks• MapReduce• Impala
• Future work
![Page 16: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/16.jpg)
16
Architecture
DataNode
The NameNode schedules which DataNodes cache each block of a file.
DataNode DataNode
NameNode
![Page 17: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/17.jpg)
17
Architecture
NameNode
DataNode
DataNodes periodically send cache reports describing which replicas they have cached.
DataNode DataNode
![Page 18: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/18.jpg)
18
Cache Locations API
• Clients can ask the NameNode where a file is cached via getFileBlockLocations
DataNode
DataNode DataNode
NameNode DFSClient
![Page 19: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/19.jpg)
19
Cache Directives
• A cache directive describes a file or directory that should be cached• Path• Cache replication factor
• Stored permanently on the NameNode
• Also have cache pools for access control and quotas, but we won’t be covering that here
![Page 20: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/20.jpg)
20
mlock
• The DataNode pins each cached block into the page cache using mlock.
• Because we’re using the page cache, the blocks don’t take up any space on the Java heap.
DataNode
Page Cache
DFSClientread
mlock
![Page 21: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/21.jpg)
21
Zero-copy read API
• Clients can use the zero-copy read API to map the cached replica into their own address space
• The zero-copy API avoids the overhead of the read() and pread() system calls
• However, we don’t verify checksums when using the zero-copy API• The zero-copy API can be only used on cached data, or
when the application computes its own checksums.
![Page 22: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/22.jpg)
22
Skipping Checksums
• We would like to skip checksum verification when reading cached data• DataNode already checksums when caching the block
• Requirements• Client needs to know that the replica is cached• DataNode needs to notify the client if the replica is
uncached
![Page 23: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/23.jpg)
23
Skipping Checksums
• The DataNode and DFSClient use shared memory segments to communicate which blocks are cached.
DataNode
Page Cache
DFSClient
read
mlock
Shared MemorySegment
![Page 24: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/24.jpg)
24
Outline
• Implementation• NameNode and DataNode modifications• Zero-copy read API
• Evaluation• Single-Node Microbenchmarks• MapReduce• Impala
• Future work
![Page 25: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/25.jpg)
25
Test Cluster
• 5 Nodes• 1 NameNode• 4 DataNodes
• 48GB of RAM• Configured 38GB of HDFS cache per DN
• 11x SATA hard disks• 2x4 core 2.13 GHz Westmere Xeon processors• 10 Gbit/s full-bisection bandwidth network
![Page 26: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/26.jpg)
26
Single-Node Microbenchmarks
• How much faster are cached and zero-copy reads?• Introducing vecsum (vector sum)
• Computes sums of a file of doubles• Highly optimized: uses SSE intrinsics• libhdfs program• Can toggle between various read methods
![Page 27: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/27.jpg)
27
Throughput
TCP TCP no csums
SCR SCR no csums
ZCR0
1
2
3
4
5
6
7
0.8 0.9
1.92.4
5.9
GB/s
![Page 28: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/28.jpg)
28
ZCR 1GB vs 20GB
1GB 20GB0
1
2
3
4
5
6
75.9
2.7GB/s
![Page 29: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/29.jpg)
29
Throughput
• Skipping checksums matters more when going faster• ZCR gets close to bus bandwidth
• ~6GB/s• Need to reuse client-side mmaps for maximum perf
• page_fault function is 1.16% of cycles in 1G• 17.55% in 20G
![Page 30: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/30.jpg)
30
Client CPU cycles
TCP TCP no csums
SCR SCR no csums
ZCR0
10
20
30
40
50
60
7057.6
51.8
27.123.4
12.7
CPU
cycl
es (b
illio
ns)
![Page 31: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/31.jpg)
31
Why is ZCR more CPU-efficient?
![Page 32: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/32.jpg)
32
Why is ZCR more CPU-efficient?
![Page 33: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/33.jpg)
33
Remote Cached vs. Local Uncached
• Zero-copy is only possible for local cached data• Is it better to read from remote cache, or local disk?
![Page 34: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/34.jpg)
34
Remote Cached vs. Local Uncached
TCP iperf SCR dd0
200
400
600
800
1000
1200
841
1092
125 137
MB/
s
![Page 35: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/35.jpg)
35
Microbenchmark Conclusions
• Short-circuit reads need less CPU than TCP reads• ZCR is even more efficient, because it avoids a copy• ZCR goes much faster when re-reading the same
data, because it can avoid mmap page faults• Network and disk may be bottleneck for remote or
uncached reads
![Page 36: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/36.jpg)
36
Outline
• Implementation• NameNode and DataNode modifications• Zero-copy read API
• Evaluation• Microbenchmarks• MapReduce• Impala
• Future work
![Page 37: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/37.jpg)
37
MapReduce
• Started with example MR jobs• Wordcount• Grep
• Same 4 DN cluster• 38GB HDFS cache per DN• 11 disks per DN
• 17GB of Wikipedia text• Small enough to fit into cache at 3x replication
• Ran each job 10 times, took the average
![Page 38: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/38.jpg)
38
wordcount and grep
wordcount
wordcount c
ached
grep
grep ca
ched
byteco
unt
byteco
unt cach
ed
byteco
unt-2G
byteco
unt-2G ca
ched
050
100150200250300350400
280
55
275
52
![Page 39: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/39.jpg)
39
wordcount and grep
wordcount
wordcount c
ached
grep
grep ca
ched
byteco
unt
byteco
unt cach
ed
byteco
unt-2G
byteco
unt-2G ca
ched
050
100150200250300350400
280
55
275
52
Almost no speedup!
![Page 40: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/40.jpg)
40
wordcount and grep
wordcount
wordcount c
ached
grep
grep ca
ched
byteco
unt
byteco
unt cach
ed
byteco
unt-2G
byteco
unt-2G ca
ched
050
100150200250300350400
280
55
275
52
~60MB/s
~330MB/s
Not I/O bound
![Page 41: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/41.jpg)
41
wordcount and grep
• End-to-end latency barely changes• These MR jobs are simply not I/O bound!
• Best map phase throughput was about 330MB/s• 44 disks can theoretically do 4400MB/s
• Further reasoning• Long JVM startup and initialization time• Many copies in TextInputFormat, doesn’t use zero-copy• Caching input data doesn’t help reduce step
![Page 42: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/42.jpg)
42
Introducing bytecount
• Trivial version of wordcount• Counts # of occurrences of byte values• Heavily CPU optimized
• Each mapper processes an entire block via ZCR• No additional copies• No record slop across block boundaries• Fast inner loop
• Very unrealistic job, but serves as a best case• Also tried 2GB block size to amortize startup costs
![Page 43: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/43.jpg)
43
bytecount
grep
grep ca
ched
byteco
unt
byteco
unt cach
ed
byteco
unt-2G
byteco
unt-2G ca
ched
010203040506070
5545
585239 35
![Page 44: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/44.jpg)
44
bytecount
grep
grep ca
ched
byteco
unt
byteco
unt cach
ed
byteco
unt-2G
byteco
unt-2G ca
ched
010203040506070
5545
585239 35
1.3x faster
![Page 45: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/45.jpg)
45
bytecount
grep
grep ca
ched
byteco
unt
byteco
unt cach
ed
byteco
unt-2G
byteco
unt-2G ca
ched
010203040506070
5545
585239 35
Still only ~500MB/s
![Page 46: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/46.jpg)
46
MapReduce Conclusions
• Many MR jobs will see marginal improvement• Startup costs• CPU inefficiencies• Shuffle and reduce steps
• Even bytecount sees only modest gains• 1.3x faster than disk• 500MB/s with caching and ZCR• Nowhere close to GB/s possible with memory
• Needs more work to take full advantage of caching!
![Page 47: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/47.jpg)
47
Outline
• Implementation• NameNode and DataNode modifications• Zero-copy read API
• Evaluation• Microbenchmarks• MapReduce• Impala
• Future work
![Page 48: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/48.jpg)
48
Impala Benchmarks
• Open-source OLAP database developed by Cloudera• Tested with Impala 1.3 (CDH 5.0)• Same 4 DN cluster as MR section
• 38GB of 48GB per DN configured as HDFS cache• 152GB aggregate HDFS cache• 11 disks per DN
![Page 49: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/49.jpg)
49
Impala Benchmarks
• 1TB TPC-DS store_sales table, text format• count(*) on different numbers of partitions
• Has to scan all the data, no skipping• Queries
• 51GB small query (34% cache capacity)• 148GB big query (98% cache capacity)• Small query with concurrent workload
• Tested “cold” and “hot”• echo 3 > /proc/sys/vm/drop_caches• Lets us compare HDFS caching against page cache
![Page 50: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/50.jpg)
50
Small Query
Uncached cold
Cached cold Uncached hot
Cached hot0
5
10
15
20
25
19.8
5.84.0 3.0
Aver
age
resp
onse
tim
e (s
)
![Page 51: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/51.jpg)
51
Small Query
Uncached cold
Cached cold Uncached hot
Cached hot0
5
10
15
20
25
19.8
5.84.0 3.0
Aver
age
resp
onse
tim
e (s
)
2550 MB/s 17 GB/s
I/O bound!
![Page 52: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/52.jpg)
52
Small Query
Uncached cold
Cached cold Uncached hot
Cached hot0
5
10
15
20
25
Aver
age
resp
onse
tim
e (s
)
3.4x faster,disk vs. memory
![Page 53: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/53.jpg)
53
Small Query
Uncached cold
Cached cold Uncached hot
Cached hot0
5
10
15
20
25
Aver
age
resp
onse
tim
e (s
)
1.3x after warmup, still wins on CPU efficiency
![Page 54: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/54.jpg)
54
Big Query
Uncached cold
Cached cold Uncached hot
Cached hot0
10
20
30
40
50
60
48.2
11.5
40.9
9.4
Aver
age
resp
onse
tim
e (s
)
![Page 55: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/55.jpg)
55
Big Query
Uncached cold
Cached cold Uncached hot
Cached hot0
10
20
30
40
50
60
Aver
age
resp
onse
tim
e (s
)
4.2x faster, disk vs mem
![Page 56: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/56.jpg)
56
Big Query
Uncached cold
Cached cold Uncached hot
Cached hot0
10
20
30
40
50
60
Aver
age
resp
onse
tim
e (s
)
4.3x faster, doesn’t fit in page cache
Cannot schedule for page cache locality
![Page 57: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/57.jpg)
57
Small Query with Concurrent Workload
Uncached Cached Cached (not concurrent)
05
101520253035404550
Aver
age
resp
onse
tim
e (s
)
![Page 58: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/58.jpg)
58
Small Query with Concurrent Workload
Uncached Cached Cached (not concurrent)
05
101520253035404550
Aver
age
resp
onse
tim
e (s
)
7x faster when small query working set is cached
![Page 59: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/59.jpg)
59
Small Query with Concurrent Workload
Uncached Cached Cached (not concurrent)
05
101520253035404550
Aver
age
resp
onse
tim
e (s
)
2x slower than isolated, CPU contention
![Page 60: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/60.jpg)
60
Impala Conclusions
• HDFS cache is faster than disk or page cache• ZCR is more efficient than SCR from page cache• Better when working set is approx. cluster memory
• Can schedule tasks for cache locality• Significantly better for concurrent workloads
• 7x faster when contending with a single background query• Impala performance will only improve
• Many CPU improvements on the roadmap
![Page 61: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/61.jpg)
61
Outline
• Implementation• NameNode and DataNode modifications• Zero-copy read API
• Evaluation• Microbenchmarks• MapReduce• Impala
• Future work
![Page 62: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/62.jpg)
62
Future Work
• Automatic cache replacement• LRU, LFU, ?
• Sub-block caching• Potentially important for automatic cache replacement
• Compression, encryption, serialization• Lose many benefits of zero-copy API
• Write-side caching• Enables Spark-like RDDs for all HDFS applications
![Page 63: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/63.jpg)
63
Conclusion
• I/O contention is a problem for concurrent workloads• HDFS can now explicitly pin working sets into RAM• Applications can place their tasks for cache locality• Use zero-copy API to efficiently read cached data• Substantial performance improvements
• 6GB/s for single thread microbenchmark• 7x faster for concurrent Impala workload
![Page 64: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/64.jpg)
![Page 65: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/65.jpg)
![Page 66: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/66.jpg)
![Page 67: In-memory Caching in HDFS Lower latency, same great taste](https://reader035.vdocuments.site/reader035/viewer/2022081604/56816935550346895de08f2b/html5/thumbnails/67.jpg)
67
bytecount
grep
grep ca
ched
byteco
unt
byteco
unt cach
ed
byteco
unt-2G
byteco
unt-2G ca
ched
010203040506070
5545
585239 35
Less disk parallelism