![Page 1: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/1.jpg)
VPC3: A Fast and Effective Trace-Compression Algorithm
Martin Burtscher
![Page 2: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/2.jpg)
Introduction Traces
PC values. Good compression algorithms e.g sequitur.
Extended traces PC values + Extended data (ED) ED = {load/store values, load/store addresses…} ED exihibit lower repeatability and span higher range
than PC values. No good compression algorithms.
Value predictors Compress extended traces. Preprocess extended traces for post compression.
![Page 3: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/3.jpg)
Compression using value predictors Assume the following
A set of predictors each with 1 byte ID. 8 byte trace entries
Compression algorithm Compare trace entry with predicted values. If match, write ID of predictor. If no match,
Write special code followed by trace entry value. Update predictors with trace entry value.
Decompression is analogous.
![Page 4: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/4.jpg)
vpc3 compression
Lossless compression Single pass algorithm Excellent compression rate Fixed memory requirement Fast decompression speed Fast compression speed
![Page 5: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/5.jpg)
Evaluated compression algorithms Gzip
2.3MB memory usage. Fast (de)compression. Poor compression rate.
Bzip2 10MB memory usage. Slower (de)compression. Better compression rate.
Sequitur (modified) Source code optimizations. Split PC and ED into separate streams for compression. Bzip2 post compression. Impressive compression rate for traces. 951MB memory usage.
![Page 6: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/6.jpg)
Value predictors (1)
Last-n-value predictor(LnV) Retain n most recent values. All n values are provided for prediction. Viewed as n independent predictors. Can predict sequence of repeating or
alternating values of length <= n. n <= 4 typically. Used for ED values but not PC values.
![Page 7: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/7.jpg)
Value predictors (2) Finite-context-method predictor(FCM)
Hash of n most recent values used as index into hash table for predicting and for inserting new values.
FCMs are described by their n order, eg FCM3.
Most recent (a), and 2nd most recent (b) values retained and predicted for each hash index(line).
FCM3b = 3 order FCM predictor that predicts 2nd most recent value.
Can predict long arbitrary sequence of values. Used for both PC and ED values.
![Page 8: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/8.jpg)
Value predictors (3) Differential-finite-context-method
predictor. Similar to FCM Predicts and stores strides instead of absolute
values. Final prediction is formed by adding predicted
stride to more recently seen value. Can predict never before seen values. Improved prediction of ED values over FCM. No improvement for PC values over FCM.
![Page 9: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/9.jpg)
Extended trace Format
32 bit PC field 64 bit ED field
PC032,ED064,PC132,ED164,……
![Page 10: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/10.jpg)
vpc0
Compressed only ED value. 27 predictors. Fixed bit predictor encoding i.e 5
bits. 2.6x compression rate limit
96 bit PC/ED compressed to 37 bit PC/ED.
![Page 11: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/11.jpg)
vpc1 Compress PC using 10 predictors. Optimized predictor bit encoding using dynamic
huffman encoder. Use predictor with shortest huffman code. Compress unpredictable values.
Encode PC values with log2(Max(PC value)) bits. For ED values, store code of closest predictor and
difference. Many other optimizations to enhance
compression rate. 48x compression rate limit.
96 PC/ED compressed to 2 bit PC/ED.
![Page 12: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/12.jpg)
vpc2 Vpc1 compressed traces are highly
compressible. Vpc2 = vpc1 + gzip post compression Improved compression rate over vpc1. 2x geometric mean compression rate
over sequitur. 3.5 times slower decompression speed.
![Page 13: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/13.jpg)
vpc3 Expose and enhance patterns in
trace for bzip2 post compression. Simpler than vpc2
14 predictors, down from 37 in vpc2. Fixed byte encoding for predictors. Unpredictable values not compressed. Eliminated vpc1 optimizations that
hurt post compression rate.
![Page 14: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/14.jpg)
vpc3 Value predictors tuned using gcc load value
trace L4V for ED values. FCM{1a,1b,3a,3b} for PC values. FCM{1a,1b} for ED values. DFCM{1a,1b,3a,3b} for ED values.
Trace converted into 4 streams PC predictor codes stream Unpredicted PC values stream ED predictor codes stream Unpredicted ED values stream
![Page 15: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/15.jpg)
![Page 16: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/16.jpg)
Evaluation 64-bit CS20 system with dual 833MHz
21264B Alpha CPUs. Only one CPU used. Spec2k int and float programs. Generated traces (<= 12 GB)
PC + effective address of store instructions. PC + effective address of misses on
simulated L1 cache. PC + load value.
Compressors compiled with same compiler and compile flags.
![Page 17: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/17.jpg)
Evaluation
vpc3 predictor configuration Lots of state sharing amongst
predictors. 5MB for PC predictors 21MB for ED predictors 27MB total memory used by vpc3.
![Page 18: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/18.jpg)
Compression Rate
![Page 19: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/19.jpg)
Decompression Speed
![Page 20: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/20.jpg)
Compression speed
![Page 21: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/21.jpg)
Predictor Usage
![Page 22: VPC3: A Fast and Effective Trace-Compression Algorithm](https://reader036.vdocuments.site/reader036/viewer/2022062500/56815409550346895dc205c3/html5/thumbnails/22.jpg)
Conclusion
Questions Insights Discussions