learning to optimize tensor programs04-15... · learning to optimize tensor programs high-level...
TRANSCRIPT
![Page 1: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/1.jpg)
Learning to Optimize Tensor ProgramsTianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau,
Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
![Page 2: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/2.jpg)
Goal: Deploy Deep Learning Everywhere
Explosion of models and frameworks
Huge gap between model/frameworks and hardware backends
Frameworks
Explosion of hardware backends
Acclerator
![Page 3: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/3.jpg)
Existing Approach
Hardware
Frameworks
![Page 4: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/4.jpg)
Existing Approach
High-level data flow graph
Hardware
Frameworks
![Page 5: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/5.jpg)
Existing Approach
High-level data flow graph
Hardware
Primitive Tensor operators such as Conv2D
Frameworks
![Page 6: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/6.jpg)
Existing Approach
High-level data flow graph
Hardware
Primitive Tensor operators such as Conv2D
eg. cuDNN Offload to heavily optimized DNN operator library
Frameworks
![Page 7: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/7.jpg)
Limitations of Existing Approach
cuDNN
Frameworks
![Page 8: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/8.jpg)
Limitations of Existing Approach
cuDNN
Frameworks
![Page 9: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/9.jpg)
Limitations of Existing Approach
cuDNN
Frameworks
![Page 10: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/10.jpg)
Limitations of Existing Approach
cuDNN
Frameworks
New operators
![Page 11: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/11.jpg)
Limitations of Existing Approach
cuDNN
Frameworks
New operators
![Page 12: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/12.jpg)
Limitations of Existing Approach
cuDNN
Frameworks
New operators
![Page 13: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/13.jpg)
Limitations of Existing Approach
cuDNN
Frameworks
New operators
Engineering intensive
![Page 14: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/14.jpg)
Learning to Optimize Tensor Programs
High-level data flow graph and optimizations
Hardware
Frameworks
![Page 15: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/15.jpg)
Learning to Optimize Tensor Programs
High-level data flow graph and optimizations
Hardware
Frameworks
![Page 16: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/16.jpg)
Machine Learning based Program Optimizer
Learning to Optimize Tensor Programs
High-level data flow graph and optimizations
Hardware
Frameworks
![Page 17: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/17.jpg)
Machine Learning based Program Optimizer
Learning to Optimize Tensor Programs
High-level data flow graph and optimizations
Learning to generate optimized program for new operator workloads and hardware
Hardware
Frameworks
![Page 18: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/18.jpg)
Search over Possible Program Transformations
Hardware
Loop Transformations
Thread Bindings
Cache Locality
Thread Cooperation Tensorization Latency
Hiding
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Compute Description
![Page 19: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/19.jpg)
Search over Possible Program Transformations
Hardware
Loop Transformations
Thread Bindings
Cache Locality
Thread Cooperation Tensorization Latency
Hiding
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Compute Description
![Page 20: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/20.jpg)
Search over Possible Program Transformations
Hardware
Loop Transformations
Thread Bindings
Cache Locality
Thread Cooperation Tensorization Latency
Hiding
Billions of possible optimization choices
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Compute Description
![Page 21: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/21.jpg)
Learning-based Program Optimizer
Program Optimizer Program Code Generator
![Page 22: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/22.jpg)
Learning-based Program Optimizer
Program Optimizer Program Code Generator
D<latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit>
Training data
![Page 23: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/23.jpg)
Learning-based Program Optimizer
Program Optimizer Program Code Generator
D<latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit>
Training data
LearningStatistical Cost Model
![Page 24: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/24.jpg)
Learning-based Program Optimizer
• Relatively low experiment cost • Domain-specific problem structure • Large quantity of similar tasks
Unique Problem Characteristics
Program Optimizer Program Code Generator
D<latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit>
Training data
LearningStatistical Cost Model
![Page 25: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/25.jpg)
Program-aware Cost Modeling
High-Level Configuration
![Page 26: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/26.jpg)
Program-aware Cost Modeling
High-Level Configuration
for y in range(8): for x in range(8): C[y][x]=0 for k in range(8): C[y][x]+=A[k][y]*B[k][x]
Low-level Abstract Syntax Tree (shared between tasks)
![Page 27: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/27.jpg)
Program-aware Cost Modeling
High-Level Configuration
for y in range(8): for x in range(8): C[y][x]=0 for k in range(8): C[y][x]+=A[k][y]*B[k][x]
Low-level Abstract Syntax Tree (shared between tasks)
C A By 64 64 64x 8 8 64k 1 8 8
y 1x 8k 64
touched memory
outer looplength
statistical features
Boosted Tree Ensembles
![Page 28: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/28.jpg)
Program-aware Cost Modeling
High-Level Configuration
for y in range(8): for x in range(8): C[y][x]=0 for k in range(8): C[y][x]+=A[k][y]*B[k][x]
Low-level Abstract Syntax Tree (shared between tasks)
for
context vec of x
for
for
context vec of y
context vec of k
+
soft scatter
finalembedding
TreeGRU
C A By 64 64 64x 8 8 64k 1 8 8
y 1x 8k 64
touched memory
outer looplength
statistical features
Boosted Tree Ensembles
![Page 29: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/29.jpg)
Transfer Learning Among Different WorkloadsHistorical Optimization Tasks
Domain Invariant Program Representations
Transferable Models to speedup new tasks
![Page 30: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/30.jpg)
State of Art Performance
Poster #104
Nvidia GPU ARM CPU ARM GPU
![Page 31: Learning to Optimize Tensor Programs04-15... · Learning to Optimize Tensor Programs High-level data flow graph and optimizations Learning to generate optimized program for new operator](https://reader034.vdocuments.site/reader034/viewer/2022050223/5f68b37fdd9e52252529378c/html5/thumbnails/31.jpg)
State of Art Performance
Poster #104
Nvidia GPU ARM CPU ARM GPU
In production use inside several major companies