automated feature engineering · 2018. 5. 29. · copyright © sas ins1tute inc. all rights...

37
Copyright © SAS Ins1tute Inc. All rights reserved. Automated Feature Engineering Xin Hunt SAS Ins1tute Inc

Upload: others

Post on 03-Dec-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

AutomatedFeatureEngineeringXin Hunt SAS Ins1tute Inc

Page 2: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

AutomatedMachineLearningPipeline

Input Data Pre-

processing Model

Selec1on Output

Feature Engineering

Auto-Tune

Page 3: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

Mo7va7on“Garbagein,garbageout”

•  Data Engineering is the process of cleaning, filtering, and organizing the data for successful mining and modeling, by solving or avoiding problems in the

data. •  Could take 60-80% of the whole data mining effort.

•  Feature Engineering methods allow us to choose the right representa1on to train our models.

•  Part of the Automa1on Ini1a1ve at SAS®: Automated Feature Engineering •  Envisioned for SAS® Visual Data Mining and Machine Learning •  Runs on SAS® Viya®: tested and op1mized for Intel® Xeon® for performance and

scalability

Page 4: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

Tradi7onalFeatureEngineering

•  Performed by data scientists •  Relies heavily on model selected and domain expertise •  Features are designed through trial and error

Input Data Pre-

Processing Model

Selec1on Output

Hand-Pick Features

Auto-Tune

Page 5: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

AutomatedFeatureEngineering

•  Performed by data scientists •  Assisted by automated feature generation, selection, and

composition methods •  Reduces manual trial and error time •  Expands search width and depth for best features •  Combined with automated model recommendation

Input Data Pre-

Processing Automated Model Recommenda1on Output

Automated Feature Genera1on and

Selec1on/Composi1on Auto-Tune

Page 6: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

ProblemFormula7onFeatureSelec7onandComposi7on

•  Original dataset •  Model •  Set of transforma1ons where each outputs a

set of features

•  Composi1on and concatena1on of transforma1ons

•  Objec1ve: find a composi1on of transforma1ons that maximize model performance

•  In reality, and are op1mized separately

X<latexit sha1_base64="zWsNqpr7zfWc//+nf7ubY7/LDyY=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkUI8FLx5bsB/QhrLZTtq1m03Y3Qgl9Bd48aCIV3+SN/+N2zYHbX0w8Hhvhpl5QSK4Nq777RS2tnd294r7pYPDo+OT8ulZR8epYthmsYhVL6AaBZfYNtwI7CUKaRQI7AbTu4XffUKleSwfzCxBP6JjyUPOqLFSqzcsV9yquwTZJF5OKpCjOSx/DUYxSyOUhgmqdd9zE+NnVBnOBM5Lg1RjQtmUjrFvqaQRaj9bHjonV1YZkTBWtqQhS/X3REYjrWdRYDsjaiZ63VuI/3n91IS3fsZlkhqUbLUoTAUxMVl8TUZcITNiZgllittbCZtQRZmx2ZRsCN76y5ukc1P13KrXqlUatTyOIlzAJVyDB3VowD00oQ0MEJ7hFd6cR+fFeXc+Vq0FJ585hz9wPn8AsYOMzg==</latexit><latexit sha1_base64="zWsNqpr7zfWc//+nf7ubY7/LDyY=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkUI8FLx5bsB/QhrLZTtq1m03Y3Qgl9Bd48aCIV3+SN/+N2zYHbX0w8Hhvhpl5QSK4Nq777RS2tnd294r7pYPDo+OT8ulZR8epYthmsYhVL6AaBZfYNtwI7CUKaRQI7AbTu4XffUKleSwfzCxBP6JjyUPOqLFSqzcsV9yquwTZJF5OKpCjOSx/DUYxSyOUhgmqdd9zE+NnVBnOBM5Lg1RjQtmUjrFvqaQRaj9bHjonV1YZkTBWtqQhS/X3REYjrWdRYDsjaiZ63VuI/3n91IS3fsZlkhqUbLUoTAUxMVl8TUZcITNiZgllittbCZtQRZmx2ZRsCN76y5ukc1P13KrXqlUatTyOIlzAJVyDB3VowD00oQ0MEJ7hFd6cR+fFeXc+Vq0FJ585hz9wPn8AsYOMzg==</latexit><latexit sha1_base64="zWsNqpr7zfWc//+nf7ubY7/LDyY=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkUI8FLx5bsB/QhrLZTtq1m03Y3Qgl9Bd48aCIV3+SN/+N2zYHbX0w8Hhvhpl5QSK4Nq777RS2tnd294r7pYPDo+OT8ulZR8epYthmsYhVL6AaBZfYNtwI7CUKaRQI7AbTu4XffUKleSwfzCxBP6JjyUPOqLFSqzcsV9yquwTZJF5OKpCjOSx/DUYxSyOUhgmqdd9zE+NnVBnOBM5Lg1RjQtmUjrFvqaQRaj9bHjonV1YZkTBWtqQhS/X3REYjrWdRYDsjaiZ63VuI/3n91IS3fsZlkhqUbLUoTAUxMVl8TUZcITNiZgllittbCZtQRZmx2ZRsCN76y5ukc1P13KrXqlUatTyOIlzAJVyDB3VowD00oQ0MEJ7hFd6cR+fFeXc+Vq0FJ585hz9wPn8AsYOMzg==</latexit><latexit sha1_base64="zWsNqpr7zfWc//+nf7ubY7/LDyY=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkUI8FLx5bsB/QhrLZTtq1m03Y3Qgl9Bd48aCIV3+SN/+N2zYHbX0w8Hhvhpl5QSK4Nq777RS2tnd294r7pYPDo+OT8ulZR8epYthmsYhVL6AaBZfYNtwI7CUKaRQI7AbTu4XffUKleSwfzCxBP6JjyUPOqLFSqzcsV9yquwTZJF5OKpCjOSx/DUYxSyOUhgmqdd9zE+NnVBnOBM5Lg1RjQtmUjrFvqaQRaj9bHjonV1YZkTBWtqQhS/X3REYjrWdRYDsjaiZ63VuI/3n91IS3fsZlkhqUbLUoTAUxMVl8TUZcITNiZgllittbCZtQRZmx2ZRsCN76y5ukc1P13KrXqlUatTyOIlzAJVyDB3VowD00oQ0MEJ7hFd6cR+fFeXc+Vq0FJ585hz9wPn8AsYOMzg==</latexit>

m<latexit sha1_base64="yCU3e2mL8fi1obgrOxK+w2GhglE=">AAAB6HicbVA9SwNBEJ2LXzF+RS1tFoNgFe4koGXAxjIBkwjJEfY2c8ma3b1jd08IIb/AxkIRW3+Snf/GTXKFJj4YeLw3w8y8KBXcWN//9gobm1vbO8Xd0t7+weFR+fikbZJMM2yxRCT6IaIGBVfYstwKfEg1UhkJ7ETj27nfeUJteKLu7STFUNKh4jFn1DqpKfvlil/1FyDrJMhJBXI0+uWv3iBhmURlmaDGdAM/teGUasuZwFmplxlMKRvTIXYdVVSiCaeLQ2fkwikDEifalbJkof6emFJpzERGrlNSOzKr3lz8z+tmNr4Jp1ylmUXFloviTBCbkPnXZMA1MismjlCmubuVsBHVlFmXTcmFEKy+vE7aV9XArwbNWqVey+MowhmcwyUEcA11uIMGtIABwjO8wpv36L14797HsrXg5TOn8Afe5w/RV4zj</latexit><latexit sha1_base64="yCU3e2mL8fi1obgrOxK+w2GhglE=">AAAB6HicbVA9SwNBEJ2LXzF+RS1tFoNgFe4koGXAxjIBkwjJEfY2c8ma3b1jd08IIb/AxkIRW3+Snf/GTXKFJj4YeLw3w8y8KBXcWN//9gobm1vbO8Xd0t7+weFR+fikbZJMM2yxRCT6IaIGBVfYstwKfEg1UhkJ7ETj27nfeUJteKLu7STFUNKh4jFn1DqpKfvlil/1FyDrJMhJBXI0+uWv3iBhmURlmaDGdAM/teGUasuZwFmplxlMKRvTIXYdVVSiCaeLQ2fkwikDEifalbJkof6emFJpzERGrlNSOzKr3lz8z+tmNr4Jp1ylmUXFloviTBCbkPnXZMA1MismjlCmubuVsBHVlFmXTcmFEKy+vE7aV9XArwbNWqVey+MowhmcwyUEcA11uIMGtIABwjO8wpv36L14797HsrXg5TOn8Afe5w/RV4zj</latexit><latexit sha1_base64="yCU3e2mL8fi1obgrOxK+w2GhglE=">AAAB6HicbVA9SwNBEJ2LXzF+RS1tFoNgFe4koGXAxjIBkwjJEfY2c8ma3b1jd08IIb/AxkIRW3+Snf/GTXKFJj4YeLw3w8y8KBXcWN//9gobm1vbO8Xd0t7+weFR+fikbZJMM2yxRCT6IaIGBVfYstwKfEg1UhkJ7ETj27nfeUJteKLu7STFUNKh4jFn1DqpKfvlil/1FyDrJMhJBXI0+uWv3iBhmURlmaDGdAM/teGUasuZwFmplxlMKRvTIXYdVVSiCaeLQ2fkwikDEifalbJkof6emFJpzERGrlNSOzKr3lz8z+tmNr4Jp1ylmUXFloviTBCbkPnXZMA1MismjlCmubuVsBHVlFmXTcmFEKy+vE7aV9XArwbNWqVey+MowhmcwyUEcA11uIMGtIABwjO8wpv36L14797HsrXg5TOn8Afe5w/RV4zj</latexit><latexit sha1_base64="yCU3e2mL8fi1obgrOxK+w2GhglE=">AAAB6HicbVA9SwNBEJ2LXzF+RS1tFoNgFe4koGXAxjIBkwjJEfY2c8ma3b1jd08IIb/AxkIRW3+Snf/GTXKFJj4YeLw3w8y8KBXcWN//9gobm1vbO8Xd0t7+weFR+fikbZJMM2yxRCT6IaIGBVfYstwKfEg1UhkJ7ETj27nfeUJteKLu7STFUNKh4jFn1DqpKfvlil/1FyDrJMhJBXI0+uWv3iBhmURlmaDGdAM/teGUasuZwFmplxlMKRvTIXYdVVSiCaeLQ2fkwikDEifalbJkof6emFJpzERGrlNSOzKr3lz8z+tmNr4Jp1ylmUXFloviTBCbkPnXZMA1MismjlCmubuVsBHVlFmXTcmFEKy+vE7aV9XArwbNWqVey+MowhmcwyUEcA11uIMGtIABwjO8wpv36L14797HsrXg5TOn8Afe5w/RV4zj</latexit>

T = {ti}, i = 1, 2, . . . ,<latexit sha1_base64="PYNjpwwzMg8Y+HeAV6oFX8d6IKs=">AAACRnicbVDPa9swGP2c/WjrbqvXHXsRC4FkmGCXQnspFHrppdCNpg3ErpEVORGVbCN9Hgsmf90uO/e2P2GXHVpGr5XTHLZkDwSP9973SXppKYXBIPjptF68fPV6Y3PL3X7z9t2O9373yhSVZnzAClnoYUoNlyLnAxQo+bDUnKpU8uv09rTxr79ybUSRX+Ks5LGik1xkglG0UuLFnXNyTKJaJSKa+0Qch/6+H8lxgcbt4M2nxqR6Ein6LUHyJTnvYnfY67mdhkbM5nru5WIDrm/wE68d9IMFyDoJl6QNS1wk3l00LlileI5MUmNGYVBiXFONgkk+d6PK8JKyWzrhI0tzqriJ60UNc9KxyphkhbYnR7JQ/56oqTJmplKbVBSnZtVrxP95owqzo7gWeVkhz9nzRVklCRak6ZSMheYM5cwSyrSwbyVsSjVlaJt3bQnh6pfXydV+Pwz64eeD9snBso5N2IOP0IUQDuEEzuACBsDgO/yCe3hwfji/nT/O43O05SxnPsA/aMETVS6rKA==</latexit><latexit sha1_base64="PYNjpwwzMg8Y+HeAV6oFX8d6IKs=">AAACRnicbVDPa9swGP2c/WjrbqvXHXsRC4FkmGCXQnspFHrppdCNpg3ErpEVORGVbCN9Hgsmf90uO/e2P2GXHVpGr5XTHLZkDwSP9973SXppKYXBIPjptF68fPV6Y3PL3X7z9t2O9373yhSVZnzAClnoYUoNlyLnAxQo+bDUnKpU8uv09rTxr79ybUSRX+Ks5LGik1xkglG0UuLFnXNyTKJaJSKa+0Qch/6+H8lxgcbt4M2nxqR6Ein6LUHyJTnvYnfY67mdhkbM5nru5WIDrm/wE68d9IMFyDoJl6QNS1wk3l00LlileI5MUmNGYVBiXFONgkk+d6PK8JKyWzrhI0tzqriJ60UNc9KxyphkhbYnR7JQ/56oqTJmplKbVBSnZtVrxP95owqzo7gWeVkhz9nzRVklCRak6ZSMheYM5cwSyrSwbyVsSjVlaJt3bQnh6pfXydV+Pwz64eeD9snBso5N2IOP0IUQDuEEzuACBsDgO/yCe3hwfji/nT/O43O05SxnPsA/aMETVS6rKA==</latexit><latexit sha1_base64="PYNjpwwzMg8Y+HeAV6oFX8d6IKs=">AAACRnicbVDPa9swGP2c/WjrbqvXHXsRC4FkmGCXQnspFHrppdCNpg3ErpEVORGVbCN9Hgsmf90uO/e2P2GXHVpGr5XTHLZkDwSP9973SXppKYXBIPjptF68fPV6Y3PL3X7z9t2O9373yhSVZnzAClnoYUoNlyLnAxQo+bDUnKpU8uv09rTxr79ybUSRX+Ks5LGik1xkglG0UuLFnXNyTKJaJSKa+0Qch/6+H8lxgcbt4M2nxqR6Ein6LUHyJTnvYnfY67mdhkbM5nru5WIDrm/wE68d9IMFyDoJl6QNS1wk3l00LlileI5MUmNGYVBiXFONgkk+d6PK8JKyWzrhI0tzqriJ60UNc9KxyphkhbYnR7JQ/56oqTJmplKbVBSnZtVrxP95owqzo7gWeVkhz9nzRVklCRak6ZSMheYM5cwSyrSwbyVsSjVlaJt3bQnh6pfXydV+Pwz64eeD9snBso5N2IOP0IUQDuEEzuACBsDgO/yCe3hwfji/nT/O43O05SxnPsA/aMETVS6rKA==</latexit><latexit sha1_base64="PYNjpwwzMg8Y+HeAV6oFX8d6IKs=">AAACRnicbVDPa9swGP2c/WjrbqvXHXsRC4FkmGCXQnspFHrppdCNpg3ErpEVORGVbCN9Hgsmf90uO/e2P2GXHVpGr5XTHLZkDwSP9973SXppKYXBIPjptF68fPV6Y3PL3X7z9t2O9373yhSVZnzAClnoYUoNlyLnAxQo+bDUnKpU8uv09rTxr79ybUSRX+Ks5LGik1xkglG0UuLFnXNyTKJaJSKa+0Qch/6+H8lxgcbt4M2nxqR6Ein6LUHyJTnvYnfY67mdhkbM5nru5WIDrm/wE68d9IMFyDoJl6QNS1wk3l00LlileI5MUmNGYVBiXFONgkk+d6PK8JKyWzrhI0tzqriJ60UNc9KxyphkhbYnR7JQ/56oqTJmplKbVBSnZtVrxP95owqzo7gWeVkhz9nzRVklCRak6ZSMheYM5cwSyrSwbyVsSjVlaJt3bQnh6pfXydV+Pwz64eeD9snBso5N2IOP0IUQDuEEzuACBsDgO/yCe3hwfji/nT/O43O05SxnPsA/aMETVS6rKA==</latexit>

ti(·)<latexit sha1_base64="ggz5gWsuPAeLGBrYsjXFe8X/Yvc=">AAACUnicbVLPS8MwGM3mr1l/VT16CY7BJmO0IuhlMPDiZaDidLDOkmbZFpa0JfkqjrK/URAv/iFePKjp7EGdHwQe773vS/KSIBZcg+O8FopLyyura6V1a2Nza3vH3t271VGiKOvQSESqGxDNBA9ZBzgI1o0VIzIQ7C6YnGf63QNTmkfhDUxj1pdkFPIhpwQM5du80sZN7KXS596sjnnTrR/XPTGIQFsVuD/KRKJGniSPPuBrv12FardWsyoZ9KjxGXwzHwGLI+qWIXObb5edhjMvvAjcHJRRXpe+/ewNIppIFgIVROue68TQT4kCTgWbWV6iWUzohIxYz8CQSKb76TySGa4YZoCHkTIrBDxnf3akRGo9lYFxSgJj/VfLyP+0XgLDs37KwzgBFtLvjYaJwBDhLF884IpREFMDCFXcnBXTMVGEgnkFy4Tg/r3yIrg9brhOw706KbdO8jhK6AAdoipy0SlqoQt0iTqIoif0hj7QZ+Gl8F40v+TbWizkPfvoVxU3vwDmcq1k</latexit><latexit sha1_base64="ggz5gWsuPAeLGBrYsjXFe8X/Yvc=">AAACUnicbVLPS8MwGM3mr1l/VT16CY7BJmO0IuhlMPDiZaDidLDOkmbZFpa0JfkqjrK/URAv/iFePKjp7EGdHwQe773vS/KSIBZcg+O8FopLyyura6V1a2Nza3vH3t271VGiKOvQSESqGxDNBA9ZBzgI1o0VIzIQ7C6YnGf63QNTmkfhDUxj1pdkFPIhpwQM5du80sZN7KXS596sjnnTrR/XPTGIQFsVuD/KRKJGniSPPuBrv12FardWsyoZ9KjxGXwzHwGLI+qWIXObb5edhjMvvAjcHJRRXpe+/ewNIppIFgIVROue68TQT4kCTgWbWV6iWUzohIxYz8CQSKb76TySGa4YZoCHkTIrBDxnf3akRGo9lYFxSgJj/VfLyP+0XgLDs37KwzgBFtLvjYaJwBDhLF884IpREFMDCFXcnBXTMVGEgnkFy4Tg/r3yIrg9brhOw706KbdO8jhK6AAdoipy0SlqoQt0iTqIoif0hj7QZ+Gl8F40v+TbWizkPfvoVxU3vwDmcq1k</latexit><latexit sha1_base64="ggz5gWsuPAeLGBrYsjXFe8X/Yvc=">AAACUnicbVLPS8MwGM3mr1l/VT16CY7BJmO0IuhlMPDiZaDidLDOkmbZFpa0JfkqjrK/URAv/iFePKjp7EGdHwQe773vS/KSIBZcg+O8FopLyyura6V1a2Nza3vH3t271VGiKOvQSESqGxDNBA9ZBzgI1o0VIzIQ7C6YnGf63QNTmkfhDUxj1pdkFPIhpwQM5du80sZN7KXS596sjnnTrR/XPTGIQFsVuD/KRKJGniSPPuBrv12FardWsyoZ9KjxGXwzHwGLI+qWIXObb5edhjMvvAjcHJRRXpe+/ewNIppIFgIVROue68TQT4kCTgWbWV6iWUzohIxYz8CQSKb76TySGa4YZoCHkTIrBDxnf3akRGo9lYFxSgJj/VfLyP+0XgLDs37KwzgBFtLvjYaJwBDhLF884IpREFMDCFXcnBXTMVGEgnkFy4Tg/r3yIrg9brhOw706KbdO8jhK6AAdoipy0SlqoQt0iTqIoif0hj7QZ+Gl8F40v+TbWizkPfvoVxU3vwDmcq1k</latexit><latexit sha1_base64="ggz5gWsuPAeLGBrYsjXFe8X/Yvc=">AAACUnicbVLPS8MwGM3mr1l/VT16CY7BJmO0IuhlMPDiZaDidLDOkmbZFpa0JfkqjrK/URAv/iFePKjp7EGdHwQe773vS/KSIBZcg+O8FopLyyura6V1a2Nza3vH3t271VGiKOvQSESqGxDNBA9ZBzgI1o0VIzIQ7C6YnGf63QNTmkfhDUxj1pdkFPIhpwQM5du80sZN7KXS596sjnnTrR/XPTGIQFsVuD/KRKJGniSPPuBrv12FardWsyoZ9KjxGXwzHwGLI+qWIXObb5edhjMvvAjcHJRRXpe+/ewNIppIFgIVROue68TQT4kCTgWbWV6iWUzohIxYz8CQSKb76TySGa4YZoCHkTIrBDxnf3akRGo9lYFxSgJj/VfLyP+0XgLDs37KwzgBFtLvjYaJwBDhLF884IpREFMDCFXcnBXTMVGEgnkFy4Tg/r3yIrg9brhOw706KbdO8jhK6AAdoipy0SlqoQt0iTqIoif0hj7QZ+Gl8F40v+TbWizkPfvoVxU3vwDmcq1k</latexit>

C(T,X) = [ti1(ti2(. . . (X))), tj1(tj2(. . . (X))), . . .]<latexit sha1_base64="86OjMVOqVg3OUmpgGB6RQILUR10=">AAAC13icdVJbi9QwGE3rba2XHfXRl+BQaKUM7bCgLwsL8+LLwiozuyPTWtJMZja7TVuSr8uOpfigiK/+Nd/8E/4G0wvizugHIYdzTk6+XJIi5Qp8/6dh3rp95+69vfvWg4ePHu8Pnjw9VXkpKZvRPM3lPCGKpTxjM+CQsnkhGRFJys6Sy0mjn10xqXieTWFTsEiQdcZXnBLQVDz4ZR/jQxxWIuZh7WF+GHhjL0yXOSjLhg8vG5HIdSjIdQz4XXzsgDN3XctuYEi1T+NpGwG7EZ7OiPkfH8QVj4Paaedx7XSmNm/iTL25q3MW/ze5nuYueu1iW+twZNmTm11PdNfC6fLdeDD0R35beBcEPRiivk7iwY9wmdNSsAxoSpRaBH4BUUUkcJqy2gpLxQpCL8maLTTMiGAqqtp3qbGtmSVe5VKPDHDL/r2iIkKpjUi0UxA4V9taQ/5LW5Sweh1VPCtKYBntNlqVKYYcN4+Ml1wyCulGA0Il171iek4koaC/gqUvIdg+8i44HY8CfxS8PRgeHfTXsYeeoxfIQQF6hY7QG3SCZogaU+Oj8dn4Yr43P5lfzW+d1TT6Nc/QjTK//wbvEtac</latexit><latexit sha1_base64="86OjMVOqVg3OUmpgGB6RQILUR10=">AAAC13icdVJbi9QwGE3rba2XHfXRl+BQaKUM7bCgLwsL8+LLwiozuyPTWtJMZja7TVuSr8uOpfigiK/+Nd/8E/4G0wvizugHIYdzTk6+XJIi5Qp8/6dh3rp95+69vfvWg4ePHu8Pnjw9VXkpKZvRPM3lPCGKpTxjM+CQsnkhGRFJys6Sy0mjn10xqXieTWFTsEiQdcZXnBLQVDz4ZR/jQxxWIuZh7WF+GHhjL0yXOSjLhg8vG5HIdSjIdQz4XXzsgDN3XctuYEi1T+NpGwG7EZ7OiPkfH8QVj4Paaedx7XSmNm/iTL25q3MW/ze5nuYueu1iW+twZNmTm11PdNfC6fLdeDD0R35beBcEPRiivk7iwY9wmdNSsAxoSpRaBH4BUUUkcJqy2gpLxQpCL8maLTTMiGAqqtp3qbGtmSVe5VKPDHDL/r2iIkKpjUi0UxA4V9taQ/5LW5Sweh1VPCtKYBntNlqVKYYcN4+Ml1wyCulGA0Il171iek4koaC/gqUvIdg+8i44HY8CfxS8PRgeHfTXsYeeoxfIQQF6hY7QG3SCZogaU+Oj8dn4Yr43P5lfzW+d1TT6Nc/QjTK//wbvEtac</latexit><latexit sha1_base64="86OjMVOqVg3OUmpgGB6RQILUR10=">AAAC13icdVJbi9QwGE3rba2XHfXRl+BQaKUM7bCgLwsL8+LLwiozuyPTWtJMZja7TVuSr8uOpfigiK/+Nd/8E/4G0wvizugHIYdzTk6+XJIi5Qp8/6dh3rp95+69vfvWg4ePHu8Pnjw9VXkpKZvRPM3lPCGKpTxjM+CQsnkhGRFJys6Sy0mjn10xqXieTWFTsEiQdcZXnBLQVDz4ZR/jQxxWIuZh7WF+GHhjL0yXOSjLhg8vG5HIdSjIdQz4XXzsgDN3XctuYEi1T+NpGwG7EZ7OiPkfH8QVj4Paaedx7XSmNm/iTL25q3MW/ze5nuYueu1iW+twZNmTm11PdNfC6fLdeDD0R35beBcEPRiivk7iwY9wmdNSsAxoSpRaBH4BUUUkcJqy2gpLxQpCL8maLTTMiGAqqtp3qbGtmSVe5VKPDHDL/r2iIkKpjUi0UxA4V9taQ/5LW5Sweh1VPCtKYBntNlqVKYYcN4+Ml1wyCulGA0Il171iek4koaC/gqUvIdg+8i44HY8CfxS8PRgeHfTXsYeeoxfIQQF6hY7QG3SCZogaU+Oj8dn4Yr43P5lfzW+d1TT6Nc/QjTK//wbvEtac</latexit><latexit sha1_base64="86OjMVOqVg3OUmpgGB6RQILUR10=">AAAC13icdVJbi9QwGE3rba2XHfXRl+BQaKUM7bCgLwsL8+LLwiozuyPTWtJMZja7TVuSr8uOpfigiK/+Nd/8E/4G0wvizugHIYdzTk6+XJIi5Qp8/6dh3rp95+69vfvWg4ePHu8Pnjw9VXkpKZvRPM3lPCGKpTxjM+CQsnkhGRFJys6Sy0mjn10xqXieTWFTsEiQdcZXnBLQVDz4ZR/jQxxWIuZh7WF+GHhjL0yXOSjLhg8vG5HIdSjIdQz4XXzsgDN3XctuYEi1T+NpGwG7EZ7OiPkfH8QVj4Paaedx7XSmNm/iTL25q3MW/ze5nuYueu1iW+twZNmTm11PdNfC6fLdeDD0R35beBcEPRiivk7iwY9wmdNSsAxoSpRaBH4BUUUkcJqy2gpLxQpCL8maLTTMiGAqqtp3qbGtmSVe5VKPDHDL/r2iIkKpjUi0UxA4V9taQ/5LW5Sweh1VPCtKYBntNlqVKYYcN4+Ml1wyCulGA0Il171iek4koaC/gqUvIdg+8i44HY8CfxS8PRgeHfTXsYeeoxfIQQF6hY7QG3SCZogaU+Oj8dn4Yr43P5lfzW+d1TT6Nc/QjTK//wbvEtac</latexit>

[C⇤, T ⇤] = argmax

C,TRm(C(T,X))

<latexit sha1_base64="vXg/Fq9F9f+aUZYjXex/rV/2BoY=">AAAC4nicdVLLattAFB2pr1R9Oe2ym6HGIBlhJBNoN4GAN90E0iInBkkeRuOxM4leaK5CjdCymy5aSrf9qu76Kd119KCkdnthmMM55555RnksJDjOT02/c/fe/QcHD41Hj588fTY4fH4us7JgfM6yOCsWEZU8Fimfg4CYL/KC0ySK+UV0PWv0ixteSJGlHmxzHiZ0k4q1YBQURQa/Rqf4GAdVQkRQ21gcu/bUDuJVBtIYwXLciLTYBAn9QAC/J6cmmAvLMkYNDJjyKey1EbAfYasMIv74gFSCuLXZztPa7Exd3sz07IWlgvz/uyxbcVe9drWrdTg0/NlybGNvOQ5vb76a2V6tDpCY3UoWGQydidMW3gduD4aorzMy+BGsMlYmPAUWUyl918khrGgBgsW8NoJS8pyya7rhvoIpTbgMq/aJajxSzAqvs0KNFHDL3u6oaCLlNomUM6FwKXe1hvyX5pewfhNWIs1L4CnrFlqXMYYMN++NV6LgDOKtApQVQu0Vs0taUAbqVxjqEtzdI++D8+nEdSbuu6PhyVF/HQfoJXqFTOSi1+gEvUVnaI6YRrSP2mfti77SP+lf9W+dVdf6nhfor9K//wbak9pi</latexit><latexit sha1_base64="vXg/Fq9F9f+aUZYjXex/rV/2BoY=">AAAC4nicdVLLattAFB2pr1R9Oe2ym6HGIBlhJBNoN4GAN90E0iInBkkeRuOxM4leaK5CjdCymy5aSrf9qu76Kd119KCkdnthmMM55555RnksJDjOT02/c/fe/QcHD41Hj588fTY4fH4us7JgfM6yOCsWEZU8Fimfg4CYL/KC0ySK+UV0PWv0ixteSJGlHmxzHiZ0k4q1YBQURQa/Rqf4GAdVQkRQ21gcu/bUDuJVBtIYwXLciLTYBAn9QAC/J6cmmAvLMkYNDJjyKey1EbAfYasMIv74gFSCuLXZztPa7Exd3sz07IWlgvz/uyxbcVe9drWrdTg0/NlybGNvOQ5vb76a2V6tDpCY3UoWGQydidMW3gduD4aorzMy+BGsMlYmPAUWUyl918khrGgBgsW8NoJS8pyya7rhvoIpTbgMq/aJajxSzAqvs0KNFHDL3u6oaCLlNomUM6FwKXe1hvyX5pewfhNWIs1L4CnrFlqXMYYMN++NV6LgDOKtApQVQu0Vs0taUAbqVxjqEtzdI++D8+nEdSbuu6PhyVF/HQfoJXqFTOSi1+gEvUVnaI6YRrSP2mfti77SP+lf9W+dVdf6nhfor9K//wbak9pi</latexit><latexit sha1_base64="vXg/Fq9F9f+aUZYjXex/rV/2BoY=">AAAC4nicdVLLattAFB2pr1R9Oe2ym6HGIBlhJBNoN4GAN90E0iInBkkeRuOxM4leaK5CjdCymy5aSrf9qu76Kd119KCkdnthmMM55555RnksJDjOT02/c/fe/QcHD41Hj588fTY4fH4us7JgfM6yOCsWEZU8Fimfg4CYL/KC0ySK+UV0PWv0ixteSJGlHmxzHiZ0k4q1YBQURQa/Rqf4GAdVQkRQ21gcu/bUDuJVBtIYwXLciLTYBAn9QAC/J6cmmAvLMkYNDJjyKey1EbAfYasMIv74gFSCuLXZztPa7Exd3sz07IWlgvz/uyxbcVe9drWrdTg0/NlybGNvOQ5vb76a2V6tDpCY3UoWGQydidMW3gduD4aorzMy+BGsMlYmPAUWUyl918khrGgBgsW8NoJS8pyya7rhvoIpTbgMq/aJajxSzAqvs0KNFHDL3u6oaCLlNomUM6FwKXe1hvyX5pewfhNWIs1L4CnrFlqXMYYMN++NV6LgDOKtApQVQu0Vs0taUAbqVxjqEtzdI++D8+nEdSbuu6PhyVF/HQfoJXqFTOSi1+gEvUVnaI6YRrSP2mfti77SP+lf9W+dVdf6nhfor9K//wbak9pi</latexit><latexit sha1_base64="vXg/Fq9F9f+aUZYjXex/rV/2BoY=">AAAC4nicdVLLattAFB2pr1R9Oe2ym6HGIBlhJBNoN4GAN90E0iInBkkeRuOxM4leaK5CjdCymy5aSrf9qu76Kd119KCkdnthmMM55555RnksJDjOT02/c/fe/QcHD41Hj588fTY4fH4us7JgfM6yOCsWEZU8Fimfg4CYL/KC0ySK+UV0PWv0ixteSJGlHmxzHiZ0k4q1YBQURQa/Rqf4GAdVQkRQ21gcu/bUDuJVBtIYwXLciLTYBAn9QAC/J6cmmAvLMkYNDJjyKey1EbAfYasMIv74gFSCuLXZztPa7Exd3sz07IWlgvz/uyxbcVe9drWrdTg0/NlybGNvOQ5vb76a2V6tDpCY3UoWGQydidMW3gduD4aorzMy+BGsMlYmPAUWUyl918khrGgBgsW8NoJS8pyya7rhvoIpTbgMq/aJajxSzAqvs0KNFHDL3u6oaCLlNomUM6FwKXe1hvyX5pewfhNWIs1L4CnrFlqXMYYMN++NV6LgDOKtApQVQu0Vs0taUAbqVxjqEtzdI++D8+nEdSbuu6PhyVF/HQfoJXqFTOSi1+gEvUVnaI6YRrSP2mfti77SP+lf9W+dVdf6nhfor9K//wbak9pi</latexit>

C<latexit sha1_base64="Y2SjkjHlsa/o/MeSNtm3sY6KPKE=">AAAC5XicdVLLattAFB2pr1R9Oe2ym6FGIBlhJBNoN4GANt0E0iInBkkWo/HYmUQvNFehRmjfTRctpdv+U3f9ly46elBSu70wzOGcc8884yLhAmz7p6LeuXvv/oODh9qjx0+ePhsdPj8XeVVSNqd5kpeLmAiW8IzNgUPCFkXJSBon7CK+dlv94oaVgueZB9uChSnZZHzNKQFJRaNf+ik+xkGdRjxoLMyPHWtmBckqB6HpsJy0Iik3QUo+RIDfR6cGGAvT1PQWBlT6JPa6CNiPsGRGxP/4IKp55DRGN88aozf1ea7hWQtTBvn/d5mW5K4G7WpX63Go6b67nFjYW07C27uvXctr5AlSo1/K1NxoNLandld4HzgDGKOhzqLRj2CV0yplGdCECOE7dgFhTUrgNGGNFlSCFYRekw3zJcxIykRYd6/UYF0yK7zOSzkywB17u6MmqRDbNJbOlMCl2NVa8l+aX8H6TVjzrKiAZbRfaF0lGHLcPjle8ZJRSLYSEFpyuVdML0lJKMiPoclLcHaPvA/OZ1PHnjrvjsYnR8N1HKCX6BUykINeoxP0Fp2hOaJKrHxUPitf1I36Sf2qfuutqjL0vEB/lfr9N93g2vI=</latexit><latexit sha1_base64="Y2SjkjHlsa/o/MeSNtm3sY6KPKE=">AAAC5XicdVLLattAFB2pr1R9Oe2ym6FGIBlhJBNoN4GANt0E0iInBkkWo/HYmUQvNFehRmjfTRctpdv+U3f9ly46elBSu70wzOGcc8884yLhAmz7p6LeuXvv/oODh9qjx0+ePhsdPj8XeVVSNqd5kpeLmAiW8IzNgUPCFkXJSBon7CK+dlv94oaVgueZB9uChSnZZHzNKQFJRaNf+ik+xkGdRjxoLMyPHWtmBckqB6HpsJy0Iik3QUo+RIDfR6cGGAvT1PQWBlT6JPa6CNiPsGRGxP/4IKp55DRGN88aozf1ea7hWQtTBvn/d5mW5K4G7WpX63Go6b67nFjYW07C27uvXctr5AlSo1/K1NxoNLandld4HzgDGKOhzqLRj2CV0yplGdCECOE7dgFhTUrgNGGNFlSCFYRekw3zJcxIykRYd6/UYF0yK7zOSzkywB17u6MmqRDbNJbOlMCl2NVa8l+aX8H6TVjzrKiAZbRfaF0lGHLcPjle8ZJRSLYSEFpyuVdML0lJKMiPoclLcHaPvA/OZ1PHnjrvjsYnR8N1HKCX6BUykINeoxP0Fp2hOaJKrHxUPitf1I36Sf2qfuutqjL0vEB/lfr9N93g2vI=</latexit><latexit sha1_base64="Y2SjkjHlsa/o/MeSNtm3sY6KPKE=">AAAC5XicdVLLattAFB2pr1R9Oe2ym6FGIBlhJBNoN4GANt0E0iInBkkWo/HYmUQvNFehRmjfTRctpdv+U3f9ly46elBSu70wzOGcc8884yLhAmz7p6LeuXvv/oODh9qjx0+ePhsdPj8XeVVSNqd5kpeLmAiW8IzNgUPCFkXJSBon7CK+dlv94oaVgueZB9uChSnZZHzNKQFJRaNf+ik+xkGdRjxoLMyPHWtmBckqB6HpsJy0Iik3QUo+RIDfR6cGGAvT1PQWBlT6JPa6CNiPsGRGxP/4IKp55DRGN88aozf1ea7hWQtTBvn/d5mW5K4G7WpX63Go6b67nFjYW07C27uvXctr5AlSo1/K1NxoNLandld4HzgDGKOhzqLRj2CV0yplGdCECOE7dgFhTUrgNGGNFlSCFYRekw3zJcxIykRYd6/UYF0yK7zOSzkywB17u6MmqRDbNJbOlMCl2NVa8l+aX8H6TVjzrKiAZbRfaF0lGHLcPjle8ZJRSLYSEFpyuVdML0lJKMiPoclLcHaPvA/OZ1PHnjrvjsYnR8N1HKCX6BUykINeoxP0Fp2hOaJKrHxUPitf1I36Sf2qfuutqjL0vEB/lfr9N93g2vI=</latexit><latexit sha1_base64="Y2SjkjHlsa/o/MeSNtm3sY6KPKE=">AAAC5XicdVLLattAFB2pr1R9Oe2ym6FGIBlhJBNoN4GANt0E0iInBkkWo/HYmUQvNFehRmjfTRctpdv+U3f9ly46elBSu70wzOGcc8884yLhAmz7p6LeuXvv/oODh9qjx0+ePhsdPj8XeVVSNqd5kpeLmAiW8IzNgUPCFkXJSBon7CK+dlv94oaVgueZB9uChSnZZHzNKQFJRaNf+ik+xkGdRjxoLMyPHWtmBckqB6HpsJy0Iik3QUo+RIDfR6cGGAvT1PQWBlT6JPa6CNiPsGRGxP/4IKp55DRGN88aozf1ea7hWQtTBvn/d5mW5K4G7WpX63Go6b67nFjYW07C27uvXctr5AlSo1/K1NxoNLandld4HzgDGKOhzqLRj2CV0yplGdCECOE7dgFhTUrgNGGNFlSCFYRekw3zJcxIykRYd6/UYF0yK7zOSzkywB17u6MmqRDbNJbOlMCl2NVa8l+aX8H6TVjzrKiAZbRfaF0lGHLcPjle8ZJRSLYSEFpyuVdML0lJKMiPoclLcHaPvA/OZ1PHnjrvjsYnR8N1HKCX6BUykINeoxP0Fp2hOaJKrHxUPitf1I36Sf2qfuutqjL0vEB/lfr9N93g2vI=</latexit>

T<latexit sha1_base64="X562f0A8Rh7AyQsInLkyoLXj3mU=">AAAC5XicdVLLattAFB2pr1R9Oe2ym6FGIBthJBNoN4GAN90E0iInBkkWo/HYmWT0QHMVaoT23XTRUrrtP3XXf+miowcltdsLwxzOOffMM84Fl+A4PzX9zt179x8cPDQePX7y9Nng8Pm5zMqCsjnNRFYsYiKZ4CmbAwfBFnnBSBILdhFfzxr94oYVkmepB9uchQnZpHzNKQFFRYNf5ik+xkGVRDyobcyPXXtqB2KVgTRMWI4bkRSbICEfIsDvo1MLrMVoZJgNDKjyKey1EbAfYauMiP/xQVTxyK2tdp7WVmfq8maWZy9GKsj/v2tkK+6q1652tQ6HhunPlmMbe8txeHv31cz2anWCxOqWGhleNBg6E6ctvA/cHgxRX2fR4EewymiZsBSoIFL6rpNDWJECOBWsNoJSspzQa7JhvoIpSZgMq/aVamwqZoXXWaFGCrhlb3dUJJFym8TKmRC4lLtaQ/5L80tYvwkrnuYlsJR2C61LgSHDzZPjFS8YBbFVgNCCq71iekkKQkF9DENdgrt75H1wPp24zsR9dzQ8Oeqv4wC9RK+QhVz0Gp2gt+gMzRHVYu2j9ln7om/0T/pX/Vtn1bW+5wX6q/TvvwH3pNsD</latexit><latexit sha1_base64="X562f0A8Rh7AyQsInLkyoLXj3mU=">AAAC5XicdVLLattAFB2pr1R9Oe2ym6FGIBthJBNoN4GAN90E0iInBkkWo/HYmWT0QHMVaoT23XTRUrrtP3XXf+miowcltdsLwxzOOffMM84Fl+A4PzX9zt179x8cPDQePX7y9Nng8Pm5zMqCsjnNRFYsYiKZ4CmbAwfBFnnBSBILdhFfzxr94oYVkmepB9uchQnZpHzNKQFFRYNf5ik+xkGVRDyobcyPXXtqB2KVgTRMWI4bkRSbICEfIsDvo1MLrMVoZJgNDKjyKey1EbAfYauMiP/xQVTxyK2tdp7WVmfq8maWZy9GKsj/v2tkK+6q1652tQ6HhunPlmMbe8txeHv31cz2anWCxOqWGhleNBg6E6ctvA/cHgxRX2fR4EewymiZsBSoIFL6rpNDWJECOBWsNoJSspzQa7JhvoIpSZgMq/aVamwqZoXXWaFGCrhlb3dUJJFym8TKmRC4lLtaQ/5L80tYvwkrnuYlsJR2C61LgSHDzZPjFS8YBbFVgNCCq71iekkKQkF9DENdgrt75H1wPp24zsR9dzQ8Oeqv4wC9RK+QhVz0Gp2gt+gMzRHVYu2j9ln7om/0T/pX/Vtn1bW+5wX6q/TvvwH3pNsD</latexit><latexit sha1_base64="X562f0A8Rh7AyQsInLkyoLXj3mU=">AAAC5XicdVLLattAFB2pr1R9Oe2ym6FGIBthJBNoN4GAN90E0iInBkkWo/HYmWT0QHMVaoT23XTRUrrtP3XXf+miowcltdsLwxzOOffMM84Fl+A4PzX9zt179x8cPDQePX7y9Nng8Pm5zMqCsjnNRFYsYiKZ4CmbAwfBFnnBSBILdhFfzxr94oYVkmepB9uchQnZpHzNKQFFRYNf5ik+xkGVRDyobcyPXXtqB2KVgTRMWI4bkRSbICEfIsDvo1MLrMVoZJgNDKjyKey1EbAfYauMiP/xQVTxyK2tdp7WVmfq8maWZy9GKsj/v2tkK+6q1652tQ6HhunPlmMbe8txeHv31cz2anWCxOqWGhleNBg6E6ctvA/cHgxRX2fR4EewymiZsBSoIFL6rpNDWJECOBWsNoJSspzQa7JhvoIpSZgMq/aVamwqZoXXWaFGCrhlb3dUJJFym8TKmRC4lLtaQ/5L80tYvwkrnuYlsJR2C61LgSHDzZPjFS8YBbFVgNCCq71iekkKQkF9DENdgrt75H1wPp24zsR9dzQ8Oeqv4wC9RK+QhVz0Gp2gt+gMzRHVYu2j9ln7om/0T/pX/Vtn1bW+5wX6q/TvvwH3pNsD</latexit><latexit sha1_base64="X562f0A8Rh7AyQsInLkyoLXj3mU=">AAAC5XicdVLLattAFB2pr1R9Oe2ym6FGIBthJBNoN4GAN90E0iInBkkWo/HYmWT0QHMVaoT23XTRUrrtP3XXf+miowcltdsLwxzOOffMM84Fl+A4PzX9zt179x8cPDQePX7y9Nng8Pm5zMqCsjnNRFYsYiKZ4CmbAwfBFnnBSBILdhFfzxr94oYVkmepB9uchQnZpHzNKQFFRYNf5ik+xkGVRDyobcyPXXtqB2KVgTRMWI4bkRSbICEfIsDvo1MLrMVoZJgNDKjyKey1EbAfYauMiP/xQVTxyK2tdp7WVmfq8maWZy9GKsj/v2tkK+6q1652tQ6HhunPlmMbe8txeHv31cz2anWCxOqWGhleNBg6E6ctvA/cHgxRX2fR4EewymiZsBSoIFL6rpNDWJECOBWsNoJSspzQa7JhvoIpSZgMq/aVamwqZoXXWaFGCrhlb3dUJJFym8TKmRC4lLtaQ/5L80tYvwkrnuYlsJR2C61LgSHDzZPjFS8YBbFVgNCCq71iekkKQkF9DENdgrt75H1wPp24zsR9dzQ8Oeqv4wC9RK+QhVz0Gp2gt+gMzRHVYu2j9ln7om/0T/pX/Vtn1bW+5wX6q/TvvwH3pNsD</latexit>

Page 7: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

HowToBuildGoodFeatures?Thetwobuildingblocks

•  Feature generators •  Domain specific feature generators

•  General purpose feature generators

•  Feature selec1on and composi1on algorithm •  The “best features” are both data and model specific

•  Need to combine with an efficient model selec1on and recommenda1on method

Page 8: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

HowToFindGoodFeatures?Thetwobuildingblocks

•  Feature generators •  Domain specific feature generators

•  General purpose feature generators

•  Feature selec1on and composi1on algorithm •  The “best features” are both data and model specific

•  Need to combine with an efficient model selec1on and recommenda1on method

Page 9: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

FeatureExtrac7onandGenera7onDomainSpecificFeatures

•  Text Data •  Bag of words, seman1c structural representa1on, latent seman1c

representa1ons (latent Dirichlet alloca1on), Word2Vec embeddings

•  Image Data •  Color, texture, shape (edges, corners, blobs), wavelet coefficients, Scale-

invariant features (SIFT), bag-of-features + spa1al pyramid, deep learning based features

•  Time series •  Spectral features, mo1fs, shapelets, discords, paeern dic1onaries

Page 10: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

FeatureExtrac7onandGenera7onGeneralPurposeFeatures

•  Single-variable transforma1ons •  log, exponen1al, frequency count, one-hot coding, normaliza1on

•  Two-variable combina1ons •  sum, difference, division, product

•  Mul1variate and model-based methods •  Unsupervised feature genera1on -  PCA, random projec1ons, meta data learning, distance/cluster based features, rela1onal feature

genera1on, kernel manifold learning

•  Supervised feature genera1on -  Linear discriminant analysis (LDA), supervised dic1onary learning

•  Deep learning based methods -  auto-encoders, mid layers of trained deep neural networks

Page 11: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

HowToFindGoodFeatures?Thetwobuildingblocks

•  Feature generators •  Domain specific feature generators

•  General purpose feature generators

•  Feature selec1on and composi1on algorithm •  The “best features” are both data and model specific

•  Need to combine with an efficient model selec1on and recommenda1on method

Page 12: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

FeatureSelec7onandComposi7onPureSelec7on

•  Examples: DSM [Kanter et al. 2015], OneBM [Lam et al. 2017]

•  Select using sta1s1cs •  Filter by variance, correla1on, mutual informa1on

•  Select by model •  Build models that encourage sparsity (e.g., L1 penaliza1on)

•  Select by filtering out features with low weights

•  Grid Search •  Build model with random subsets of features

•  Compare and choose the subset with best performance

Page 13: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

FeatureSelec7onandComposi7onPureSelec7on

•  Examples: DSM [Kanter et al. 2015], OneBM [Lam et al. 2017]

•  Limita1ons: •  Does not allow feature composi1on

•  Sta1s1cs and sparse model weights do not directly translate to performance when used to train the actual model

•  Grid search is computa1onally expensive, especially when the number of possible transforma1ons is large

Page 14: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

•  Example: ExploreKit [Katz et al., 2016]

•  Greedy search for best feature combina1on

•  Ini1alize with empty feature set •  At each itera1on:

•  Find candidate feature with the highest performance improvement

•  Add best candidate to the feature set

•  Repeat un1l convergence (low improvement) or 1me budget is reached

FeatureSelec7onandComposi7onItera7veCombina7on

C0(T,X) = ;<latexit sha1_base64="QtvlEfDhC1dqBg84okC/LEuFvh4=">AAADKXicdVLLjtMwFHXCawivDizZWFSVkipUSYUEmxEjZcNmpAG1M5WS1HJdt+OZvBTfIKoov8OGX2EDEgjY8iM4D1Bp4UqWj+45Pvde24ssEhIc57umX7t+4+atg9vGnbv37j/oHT48k2mRMz5laZTmswWVPBIJn4KAiM+ynNN4EfHzxZVX8+dveS5Fmkxgk/EwputErASjoFLkUHs5OMFHOChjIoLKxuLItcd2EC1TkMYA5sOapPk6iOk7AvgNOTHBnFmWMahhwJRO4UljAfsWtvIg4o8OSCmIW5nNPq7MVtT6eebEnlnKyP+/yrJV7rLjLne5FofGwPfmQxtP5sNwu/vSsyeVmiA221KqplAjbSlEw/oeKeGpqlGL7Lr9mRVa2PCI87vFgMcZbCQHwyC9vjNymsD7wO1AH3VxSnqfg2XKipgnwCIqpe86GYQlzUGwiFdGUEieUXZF19xXMKExl2HZvHSFByqzxKs0VysB3GS3T5Q0lnITL5QypnAhd7k6+S/OL2D1IixFkhXAE9YWWhURhhTX3wYvRc4ZRBsFKMuF6hWzC5pTBupz1Zfg7o68D87GI9cZua+f9Y+d7joO0GP0BJnIRc/RMXqFTtEUMe299lH7on3VP+if9G/6j1aqa92ZR+iv0H/+Au+f8Kw=</latexit><latexit sha1_base64="QtvlEfDhC1dqBg84okC/LEuFvh4=">AAADKXicdVLLjtMwFHXCawivDizZWFSVkipUSYUEmxEjZcNmpAG1M5WS1HJdt+OZvBTfIKoov8OGX2EDEgjY8iM4D1Bp4UqWj+45Pvde24ssEhIc57umX7t+4+atg9vGnbv37j/oHT48k2mRMz5laZTmswWVPBIJn4KAiM+ynNN4EfHzxZVX8+dveS5Fmkxgk/EwputErASjoFLkUHs5OMFHOChjIoLKxuLItcd2EC1TkMYA5sOapPk6iOk7AvgNOTHBnFmWMahhwJRO4UljAfsWtvIg4o8OSCmIW5nNPq7MVtT6eebEnlnKyP+/yrJV7rLjLne5FofGwPfmQxtP5sNwu/vSsyeVmiA221KqplAjbSlEw/oeKeGpqlGL7Lr9mRVa2PCI87vFgMcZbCQHwyC9vjNymsD7wO1AH3VxSnqfg2XKipgnwCIqpe86GYQlzUGwiFdGUEieUXZF19xXMKExl2HZvHSFByqzxKs0VysB3GS3T5Q0lnITL5QypnAhd7k6+S/OL2D1IixFkhXAE9YWWhURhhTX3wYvRc4ZRBsFKMuF6hWzC5pTBupz1Zfg7o68D87GI9cZua+f9Y+d7joO0GP0BJnIRc/RMXqFTtEUMe299lH7on3VP+if9G/6j1aqa92ZR+iv0H/+Au+f8Kw=</latexit><latexit sha1_base64="QtvlEfDhC1dqBg84okC/LEuFvh4=">AAADKXicdVLLjtMwFHXCawivDizZWFSVkipUSYUEmxEjZcNmpAG1M5WS1HJdt+OZvBTfIKoov8OGX2EDEgjY8iM4D1Bp4UqWj+45Pvde24ssEhIc57umX7t+4+atg9vGnbv37j/oHT48k2mRMz5laZTmswWVPBIJn4KAiM+ynNN4EfHzxZVX8+dveS5Fmkxgk/EwputErASjoFLkUHs5OMFHOChjIoLKxuLItcd2EC1TkMYA5sOapPk6iOk7AvgNOTHBnFmWMahhwJRO4UljAfsWtvIg4o8OSCmIW5nNPq7MVtT6eebEnlnKyP+/yrJV7rLjLne5FofGwPfmQxtP5sNwu/vSsyeVmiA221KqplAjbSlEw/oeKeGpqlGL7Lr9mRVa2PCI87vFgMcZbCQHwyC9vjNymsD7wO1AH3VxSnqfg2XKipgnwCIqpe86GYQlzUGwiFdGUEieUXZF19xXMKExl2HZvHSFByqzxKs0VysB3GS3T5Q0lnITL5QypnAhd7k6+S/OL2D1IixFkhXAE9YWWhURhhTX3wYvRc4ZRBsFKMuF6hWzC5pTBupz1Zfg7o68D87GI9cZua+f9Y+d7joO0GP0BJnIRc/RMXqFTtEUMe299lH7on3VP+if9G/6j1aqa92ZR+iv0H/+Au+f8Kw=</latexit><latexit sha1_base64="QtvlEfDhC1dqBg84okC/LEuFvh4=">AAADKXicdVLLjtMwFHXCawivDizZWFSVkipUSYUEmxEjZcNmpAG1M5WS1HJdt+OZvBTfIKoov8OGX2EDEgjY8iM4D1Bp4UqWj+45Pvde24ssEhIc57umX7t+4+atg9vGnbv37j/oHT48k2mRMz5laZTmswWVPBIJn4KAiM+ynNN4EfHzxZVX8+dveS5Fmkxgk/EwputErASjoFLkUHs5OMFHOChjIoLKxuLItcd2EC1TkMYA5sOapPk6iOk7AvgNOTHBnFmWMahhwJRO4UljAfsWtvIg4o8OSCmIW5nNPq7MVtT6eebEnlnKyP+/yrJV7rLjLne5FofGwPfmQxtP5sNwu/vSsyeVmiA221KqplAjbSlEw/oeKeGpqlGL7Lr9mRVa2PCI87vFgMcZbCQHwyC9vjNymsD7wO1AH3VxSnqfg2XKipgnwCIqpe86GYQlzUGwiFdGUEieUXZF19xXMKExl2HZvHSFByqzxKs0VysB3GS3T5Q0lnITL5QypnAhd7k6+S/OL2D1IixFkhXAE9YWWhURhhTX3wYvRc4ZRBsFKMuF6hWzC5pTBupz1Zfg7o68D87GI9cZua+f9Y+d7joO0GP0BJnIRc/RMXqFTtEUMe299lH7on3VP+if9G/6j1aqa92ZR+iv0H/+Au+f8Kw=</latexit>

in = argmax

iRm([Cn�1(T,X), ti(X)])

<latexit sha1_base64="cgOhnNeElDgCM7O/YaWKZrz7C2s=">AAADTXicdVLPi9NAGJ1k1V3jr64evQyWQlLikhRBLwsLvXhZWKXdLSTpMJ1Ou7ObTELmi1hC/kEvgjf/Cy8eFBEnSV26rX4Q8njvfd/3ZphZFgsFnvfVMPfu3L23f3DfevDw0eMnncOn5yotcsbHLI3TfDKjisdC8jEIiPkkyzlNZjG/mF0Pa/3iA8+VSOUIVhmPErqUYiEYBU2RQ4P1TvExDsuEiLBysTj23YEbxvMUlNWDab8Wab4ME/qRAH5PTm2wJ45j9WoYMu3TeNSMgN0Rrp5BxI0PSCmIX9nNf1DZramdN7RH7sTRg4L/uxxXc1dr7Wpba3Fk9YLhtO/i0bQfbaYvh+6o0idI7HaVYwkiNw2iEYMhKeVLvaL2uHX6iRM5WAck3t+IIU8yWCkONStvgt/qxE18WUWWRTpd78hrCu8Cfw26aF1npPMlnKesSLgEFlOlAt/LICppDoLFvLLCQvGMsmu65IGGkiZcRWXzGirc08wcL9JcfxJww252lDRRapXMtDOhcKm2tZr8lxYUsHgTlUJmBXDJ2kWLIsaQ4vpp4bnIOYN4pQFludBZMbukOWWgH2B9Cf72kXfB+eDI9478d6+6J976Og7Qc/QC2chHr9EJeovO0Bgx45Pxzfhh/DQ/m9/NX+bv1moa655n6Fbt7f8BnSL7gg==</latexit><latexit sha1_base64="cgOhnNeElDgCM7O/YaWKZrz7C2s=">AAADTXicdVLPi9NAGJ1k1V3jr64evQyWQlLikhRBLwsLvXhZWKXdLSTpMJ1Ou7ObTELmi1hC/kEvgjf/Cy8eFBEnSV26rX4Q8njvfd/3ZphZFgsFnvfVMPfu3L23f3DfevDw0eMnncOn5yotcsbHLI3TfDKjisdC8jEIiPkkyzlNZjG/mF0Pa/3iA8+VSOUIVhmPErqUYiEYBU2RQ4P1TvExDsuEiLBysTj23YEbxvMUlNWDab8Wab4ME/qRAH5PTm2wJ45j9WoYMu3TeNSMgN0Rrp5BxI0PSCmIX9nNf1DZramdN7RH7sTRg4L/uxxXc1dr7Wpba3Fk9YLhtO/i0bQfbaYvh+6o0idI7HaVYwkiNw2iEYMhKeVLvaL2uHX6iRM5WAck3t+IIU8yWCkONStvgt/qxE18WUWWRTpd78hrCu8Cfw26aF1npPMlnKesSLgEFlOlAt/LICppDoLFvLLCQvGMsmu65IGGkiZcRWXzGirc08wcL9JcfxJww252lDRRapXMtDOhcKm2tZr8lxYUsHgTlUJmBXDJ2kWLIsaQ4vpp4bnIOYN4pQFludBZMbukOWWgH2B9Cf72kXfB+eDI9478d6+6J976Og7Qc/QC2chHr9EJeovO0Bgx45Pxzfhh/DQ/m9/NX+bv1moa655n6Fbt7f8BnSL7gg==</latexit><latexit sha1_base64="cgOhnNeElDgCM7O/YaWKZrz7C2s=">AAADTXicdVLPi9NAGJ1k1V3jr64evQyWQlLikhRBLwsLvXhZWKXdLSTpMJ1Ou7ObTELmi1hC/kEvgjf/Cy8eFBEnSV26rX4Q8njvfd/3ZphZFgsFnvfVMPfu3L23f3DfevDw0eMnncOn5yotcsbHLI3TfDKjisdC8jEIiPkkyzlNZjG/mF0Pa/3iA8+VSOUIVhmPErqUYiEYBU2RQ4P1TvExDsuEiLBysTj23YEbxvMUlNWDab8Wab4ME/qRAH5PTm2wJ45j9WoYMu3TeNSMgN0Rrp5BxI0PSCmIX9nNf1DZramdN7RH7sTRg4L/uxxXc1dr7Wpba3Fk9YLhtO/i0bQfbaYvh+6o0idI7HaVYwkiNw2iEYMhKeVLvaL2uHX6iRM5WAck3t+IIU8yWCkONStvgt/qxE18WUWWRTpd78hrCu8Cfw26aF1npPMlnKesSLgEFlOlAt/LICppDoLFvLLCQvGMsmu65IGGkiZcRWXzGirc08wcL9JcfxJww252lDRRapXMtDOhcKm2tZr8lxYUsHgTlUJmBXDJ2kWLIsaQ4vpp4bnIOYN4pQFludBZMbukOWWgH2B9Cf72kXfB+eDI9478d6+6J976Og7Qc/QC2chHr9EJeovO0Bgx45Pxzfhh/DQ/m9/NX+bv1moa655n6Fbt7f8BnSL7gg==</latexit><latexit sha1_base64="cgOhnNeElDgCM7O/YaWKZrz7C2s=">AAADTXicdVLPi9NAGJ1k1V3jr64evQyWQlLikhRBLwsLvXhZWKXdLSTpMJ1Ou7ObTELmi1hC/kEvgjf/Cy8eFBEnSV26rX4Q8njvfd/3ZphZFgsFnvfVMPfu3L23f3DfevDw0eMnncOn5yotcsbHLI3TfDKjisdC8jEIiPkkyzlNZjG/mF0Pa/3iA8+VSOUIVhmPErqUYiEYBU2RQ4P1TvExDsuEiLBysTj23YEbxvMUlNWDab8Wab4ME/qRAH5PTm2wJ45j9WoYMu3TeNSMgN0Rrp5BxI0PSCmIX9nNf1DZramdN7RH7sTRg4L/uxxXc1dr7Wpba3Fk9YLhtO/i0bQfbaYvh+6o0idI7HaVYwkiNw2iEYMhKeVLvaL2uHX6iRM5WAck3t+IIU8yWCkONStvgt/qxE18WUWWRTpd78hrCu8Cfw26aF1npPMlnKesSLgEFlOlAt/LICppDoLFvLLCQvGMsmu65IGGkiZcRWXzGirc08wcL9JcfxJww252lDRRapXMtDOhcKm2tZr8lxYUsHgTlUJmBXDJ2kWLIsaQ4vpp4bnIOYN4pQFludBZMbukOWWgH2B9Cf72kXfB+eDI9478d6+6J976Og7Qc/QC2chHr9EJeovO0Bgx45Pxzfhh/DQ/m9/NX+bv1moa655n6Fbt7f8BnSL7gg==</latexit>

Cn(T,X) = [Cn�1(T,X), tin(X)]<latexit sha1_base64="rw5m41ckLn4mJFNYmp1+wZ6ew4E=">AAADUHicdVJNa9tAEF3Z/UjVL6c99rLUGCSjBskU2ksgoEsvgbTYiUGSl/V67WwirYR2VGqEfmIvufV39NJDS7uS3ODY7cCyj3lvZt4uM89iocB1vxmd7r37Dx4ePDIfP3n67Hnv8MW5Souc8QlL4zSfzqnisZB8AgJiPs1yTpN5zC/m137NX3zmuRKpHMM641FCV1IsBaOgU+TQWA5O8TEOy4SIsHKwOPackRPGixSUOYDZsCZpvgoT+oUA/kROLbCmtm0OahgyrdN43LSA/RaO7kHErQ5IKYhXWc09qqxW1PbzrbEztXWj4P8q29G5qw13tcu1ODIHgT8bOng8G0bb7kvfGVf6BYnVjtIzBZHbCtGwgU9K+UbPqEVObX9qRzbWDon712PIkwzWioPpE3lr/E4hbuzLqq42TdLru0duE3gfeBvQR5s4I72bcJGyIuESWEyVCjw3g6ikOQgW88oMC8Uzyq7pigcaSppwFZXNQlR4oDMLvExzfSTgJrtdUdJEqXUy18qEwqXa5erkv7iggOX7qBQyK4BL1g5aFjGGFNfbhRci5wzitQaU5UJ7xeyS5pSB3sH6E7zdJ++D89GR5x55H9/2T9zNdxygV+g1spCH3qET9AGdoQlixlfju/HT+NW56fzo/O4arbSzudFLdCe65h8rlftK</latexit><latexit sha1_base64="rw5m41ckLn4mJFNYmp1+wZ6ew4E=">AAADUHicdVJNa9tAEF3Z/UjVL6c99rLUGCSjBskU2ksgoEsvgbTYiUGSl/V67WwirYR2VGqEfmIvufV39NJDS7uS3ODY7cCyj3lvZt4uM89iocB1vxmd7r37Dx4ePDIfP3n67Hnv8MW5Souc8QlL4zSfzqnisZB8AgJiPs1yTpN5zC/m137NX3zmuRKpHMM641FCV1IsBaOgU+TQWA5O8TEOy4SIsHKwOPackRPGixSUOYDZsCZpvgoT+oUA/kROLbCmtm0OahgyrdN43LSA/RaO7kHErQ5IKYhXWc09qqxW1PbzrbEztXWj4P8q29G5qw13tcu1ODIHgT8bOng8G0bb7kvfGVf6BYnVjtIzBZHbCtGwgU9K+UbPqEVObX9qRzbWDon712PIkwzWioPpE3lr/E4hbuzLqq42TdLru0duE3gfeBvQR5s4I72bcJGyIuESWEyVCjw3g6ikOQgW88oMC8Uzyq7pigcaSppwFZXNQlR4oDMLvExzfSTgJrtdUdJEqXUy18qEwqXa5erkv7iggOX7qBQyK4BL1g5aFjGGFNfbhRci5wzitQaU5UJ7xeyS5pSB3sH6E7zdJ++D89GR5x55H9/2T9zNdxygV+g1spCH3qET9AGdoQlixlfju/HT+NW56fzo/O4arbSzudFLdCe65h8rlftK</latexit><latexit sha1_base64="rw5m41ckLn4mJFNYmp1+wZ6ew4E=">AAADUHicdVJNa9tAEF3Z/UjVL6c99rLUGCSjBskU2ksgoEsvgbTYiUGSl/V67WwirYR2VGqEfmIvufV39NJDS7uS3ODY7cCyj3lvZt4uM89iocB1vxmd7r37Dx4ePDIfP3n67Hnv8MW5Souc8QlL4zSfzqnisZB8AgJiPs1yTpN5zC/m137NX3zmuRKpHMM641FCV1IsBaOgU+TQWA5O8TEOy4SIsHKwOPackRPGixSUOYDZsCZpvgoT+oUA/kROLbCmtm0OahgyrdN43LSA/RaO7kHErQ5IKYhXWc09qqxW1PbzrbEztXWj4P8q29G5qw13tcu1ODIHgT8bOng8G0bb7kvfGVf6BYnVjtIzBZHbCtGwgU9K+UbPqEVObX9qRzbWDon712PIkwzWioPpE3lr/E4hbuzLqq42TdLru0duE3gfeBvQR5s4I72bcJGyIuESWEyVCjw3g6ikOQgW88oMC8Uzyq7pigcaSppwFZXNQlR4oDMLvExzfSTgJrtdUdJEqXUy18qEwqXa5erkv7iggOX7qBQyK4BL1g5aFjGGFNfbhRci5wzitQaU5UJ7xeyS5pSB3sH6E7zdJ++D89GR5x55H9/2T9zNdxygV+g1spCH3qET9AGdoQlixlfju/HT+NW56fzo/O4arbSzudFLdCe65h8rlftK</latexit><latexit sha1_base64="rw5m41ckLn4mJFNYmp1+wZ6ew4E=">AAADUHicdVJNa9tAEF3Z/UjVL6c99rLUGCSjBskU2ksgoEsvgbTYiUGSl/V67WwirYR2VGqEfmIvufV39NJDS7uS3ODY7cCyj3lvZt4uM89iocB1vxmd7r37Dx4ePDIfP3n67Hnv8MW5Souc8QlL4zSfzqnisZB8AgJiPs1yTpN5zC/m137NX3zmuRKpHMM641FCV1IsBaOgU+TQWA5O8TEOy4SIsHKwOPackRPGixSUOYDZsCZpvgoT+oUA/kROLbCmtm0OahgyrdN43LSA/RaO7kHErQ5IKYhXWc09qqxW1PbzrbEztXWj4P8q29G5qw13tcu1ODIHgT8bOng8G0bb7kvfGVf6BYnVjtIzBZHbCtGwgU9K+UbPqEVObX9qRzbWDon712PIkwzWioPpE3lr/E4hbuzLqq42TdLru0duE3gfeBvQR5s4I72bcJGyIuESWEyVCjw3g6ikOQgW88oMC8Uzyq7pigcaSppwFZXNQlR4oDMLvExzfSTgJrtdUdJEqXUy18qEwqXa5erkv7iggOX7qBQyK4BL1g5aFjGGFNfbhRci5wzitQaU5UJ7xeyS5pSB3sH6E7zdJ++D89GR5x55H9/2T9zNdxygV+g1spCH3qET9AGdoQlixlfju/HT+NW56fzo/O4arbSzudFLdCe65h8rlftK</latexit>

Page 15: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

•  Example: ExploreKit [Katz et al., 2016]

•  A greedy selec1on algorithm

•  More scalable than grid search

•  Limita1ons: •  Does not allow feature composi1on

•  Greedy, which may result in sub-op1mal feature selec1on

•  Time consuming. Itera1ve algorithm is difficult to parallelize

FeatureSelec7onandComposi7onItera7veCombina7on

Page 16: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

FeatureSelec7onandComposi7onHierarchicalSearch

•  Example: Cognito [Khurana et al. 2016]

•  Use a tree-like structure (transforma1on graph) to represent possible feature composi1ons

•  Start with one node (original data) •  At each itera1on

•  Evaluate possible child nodes based on criteria like node accuracy and depth

•  Add best child node to current structure

•  Repeat un1l 1me budget is reached

Page 17: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

FeatureSelec7onandComposi7onHierarchicalSearch

•  Example: Cognito [Khurana et al. 2016]

•  Allow feature composi1on •  Can generate different feature combina1ons

by changing criteria

•  Limita1ons: •  Greedy algorithm may lead to sub-op1mal

solu1on •  Time consuming (itera1ve training and

valida1on) •  Criteria setup is not intui1ve

Page 18: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

FeatureSelec7onandComposi7onHierarchicalSearch

•  [Khurana et al. 2017]

•  Extension: reinforcement learning based search •  State: a transforma1on graph and remaining

budget value

•  Possible ac1ons: Add any feasible child node to current state

•  Objec1ve: learn op1mal ac1on policy given state

•  Policy learned on mul1ple training datasets

Page 19: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

FeatureSelec7onandComposi7onHierarchicalSearch

•  [Khurana et al. 2017]

•  Extension: reinforcement learning based search

•  Balance exploita1on with explora1on •  More efficient search with well-trained policy •  Policy training requires extra data, and can

take a long 1me

Page 20: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

RULLSUnsupervisedFeatureGenera7on

Namita Lokare Jorge Silva Ilknur Kabul

Page 21: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

RULLSFeatureEngineeringMethod

•  Idea: Aggrega1ng features from a random union of subspaces by describing points using globally chosen landmarks. Euclidean distances are encoded as features in the final feature matrix.

•  Features generated are: •  Sparse

•  Non-nega1ve

•  Rota1on invariant

•  Allow fast training when used in conjunc1on with simple models

•  Can be used for clustering tasks

•  Can be used for classifica1on

Page 22: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

RULLSUnionofSubspaces

•  Assump1ons: •  Globally the data may not be low-

dimensional

•  Locally data exhibit low-dimensional structure (subspaces)

•  Advantages: •  Reduces local dimensionality without

forcing global dimension reduc1on

•  Preserves local structure

Page 23: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

WorkingsofRULLSPipeline

Randomly select landmarks Construct local subspaces with landmarks’ neighbors

Project onto the subspace of each landmark, measure distances to the landmark

Use regularized distances as features

Page 24: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

WorkingsofRULLSAlgorithm

Page 25: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

VariantsofRULLS

•  Variant I •  Random projec1ons (no subspace learning)

•  Variant II •  Use Euclidean distance (no projec1on)

•  Use Robust PCA in presence of noise and outliers

Page 26: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

Exis7ngMethods

•  RandLocal •  Features are chosen randomly •  Use only one global neighbor to encode distances

•  Suggested range for T is between 100 and 500

*Suhang Wang, Charu Aggarwal, and Huan Liu. 2017. Randomized Feature Engineering As a Fast and Accurate Alterna1ve to Kernel Methods. In Proceedings of the 23rd ACM SIGKDD Interna=onal Conference on Knowledge Discovery and Data Mining (KDD ’17). ACM, New York, NY, USA, 485–494. heps://doi.org/10.1145/3097983.3098001

Page 27: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

AdvantagesofRULLS

•  RULLS selects features that are locally relevant unlike RandLocal, Variant I, and Variant II

•  RULLS can achieve a beeer performance than all the methods with fewer itera1ons

•  Simple machine learning models when used in conjunc1on with the features generated by RULLS are fast and efficient to train

•  RULLS allows for the use of robust PCA in presence of noise and outliers

Page 28: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

DatasetsTested

Page 29: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

ResultsClassifica7onTasks

Classification accuracy on datasets. Highlighted text shows the method with the best performance.

Page 30: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

Figure 3: Average classifica1on accuracy (%) with varying itera1ons (t = 1, 10, 50, and 100) for raw features, RandLocal, Variant I, Variant II and RULLS (PCA). (a) Japanese Vowel, (b) Fashion MNIST, (c) Breast Cancer Wisconsin, (d) Baseball and (e) Digits dataset. Methods compared here beat the raw features score in just a few itera1ons. RULLS performs beeer than other methods on all datasets compared.

Page 31: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

ResultsClassifica7onTasksinpresenceofnoise

RULLS with ROBPCA on the Breast Cancer dataset in the case of raw features and 10% noise added to columns and rows.

Classification performance in presence of 10% noise added to columns and rows in each dataset. Best performance is highlighted in blue. The numbers in the parenthesis indicate the difference between the performance with and without noise.

Page 32: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

ResultsClusteringTasks

Comparison of RULLS with PCA and ROBPCA on IRIS dataset. We see an improvement in performance with ROBPCA

Clustering performance on datasets. We report Normalized Mutual Information (NMI). Highlighted text shows the method with the best performance per dataset.

Page 33: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

VisualComparisonoffeatures

Visual interpretation of Japanese Vowel dataset features. (a) RandLocal, (b) Variant II, (c) Variant I, (d) RULLS (PCA), (e) ground truth class labels. The features are generated for T = 1, = 122, = 10 for RULLS (PCA), Variant I, and Variant II, and = 1 for RandLocal.

`p<latexit sha1_base64="V08ldP5spm5K44syrSwpkjRgylU=">AAADV3icdVJda9swFFWcrsu8jybb417EQsAOXrHDYHspFPKyl0I3kjZgO0JRlFStJBtLHgvGf3LspX9lL5tsZyVNtgvGh3POvfdIaJFyprTv37es9tGT46edZ/bzFy9fnXR7r69UkmeETknCk2y2wIpyJulUM83pLM0oFgtOrxd340q//kYzxRI50ZuUxgKvJVsxgrWhUK8lBhfwDEaFQCwqPcjOAm/kRXyZaGUP9HxYiThbRwJ/Rxp+RReOdmauaw8qGBHjM3hSj9CHIzwzA7EHn0YFQ0Hp1P9R6TSmZt7YmXgz1wwK/+9yPcPdbrXbfa3BsT0Ix/OhByfzYbybvhh7k9KcQDjNKrOTIbnrYLUajlEh35sdlcmr4s/c2IUmIfL/ZoyoSPVGUV2x8iH5o05Y55dl1W5HlHOU2qjb90/9uuAhCLagD7Z1ibo/omVCckGlJhwrFQZ+quMCZ5oRTks7yhVNMbnDaxoaKLGgKi7qd1HCgWGWcJVk5pMa1uxuR4GFUhuxME6B9Y3a1yryX1qY69WnuGAyzTWVpFm0yjnUCaweGVyyjBLNNwZgkjGTFZIbnGGizVOsLiHYP/IhuBqdBv5p8OVD/9zbXkcHvAXvgAMC8BGcg8/gEkwBaf1s/bLa1pF1b/1uH7c7jdVqbXvegEfV7v0BnaX7HA==</latexit><latexit sha1_base64="V08ldP5spm5K44syrSwpkjRgylU=">AAADV3icdVJda9swFFWcrsu8jybb417EQsAOXrHDYHspFPKyl0I3kjZgO0JRlFStJBtLHgvGf3LspX9lL5tsZyVNtgvGh3POvfdIaJFyprTv37es9tGT46edZ/bzFy9fnXR7r69UkmeETknCk2y2wIpyJulUM83pLM0oFgtOrxd340q//kYzxRI50ZuUxgKvJVsxgrWhUK8lBhfwDEaFQCwqPcjOAm/kRXyZaGUP9HxYiThbRwJ/Rxp+RReOdmauaw8qGBHjM3hSj9CHIzwzA7EHn0YFQ0Hp1P9R6TSmZt7YmXgz1wwK/+9yPcPdbrXbfa3BsT0Ix/OhByfzYbybvhh7k9KcQDjNKrOTIbnrYLUajlEh35sdlcmr4s/c2IUmIfL/ZoyoSPVGUV2x8iH5o05Y55dl1W5HlHOU2qjb90/9uuAhCLagD7Z1ibo/omVCckGlJhwrFQZ+quMCZ5oRTks7yhVNMbnDaxoaKLGgKi7qd1HCgWGWcJVk5pMa1uxuR4GFUhuxME6B9Y3a1yryX1qY69WnuGAyzTWVpFm0yjnUCaweGVyyjBLNNwZgkjGTFZIbnGGizVOsLiHYP/IhuBqdBv5p8OVD/9zbXkcHvAXvgAMC8BGcg8/gEkwBaf1s/bLa1pF1b/1uH7c7jdVqbXvegEfV7v0BnaX7HA==</latexit><latexit sha1_base64="V08ldP5spm5K44syrSwpkjRgylU=">AAADV3icdVJda9swFFWcrsu8jybb417EQsAOXrHDYHspFPKyl0I3kjZgO0JRlFStJBtLHgvGf3LspX9lL5tsZyVNtgvGh3POvfdIaJFyprTv37es9tGT46edZ/bzFy9fnXR7r69UkmeETknCk2y2wIpyJulUM83pLM0oFgtOrxd340q//kYzxRI50ZuUxgKvJVsxgrWhUK8lBhfwDEaFQCwqPcjOAm/kRXyZaGUP9HxYiThbRwJ/Rxp+RReOdmauaw8qGBHjM3hSj9CHIzwzA7EHn0YFQ0Hp1P9R6TSmZt7YmXgz1wwK/+9yPcPdbrXbfa3BsT0Ix/OhByfzYbybvhh7k9KcQDjNKrOTIbnrYLUajlEh35sdlcmr4s/c2IUmIfL/ZoyoSPVGUV2x8iH5o05Y55dl1W5HlHOU2qjb90/9uuAhCLagD7Z1ibo/omVCckGlJhwrFQZ+quMCZ5oRTks7yhVNMbnDaxoaKLGgKi7qd1HCgWGWcJVk5pMa1uxuR4GFUhuxME6B9Y3a1yryX1qY69WnuGAyzTWVpFm0yjnUCaweGVyyjBLNNwZgkjGTFZIbnGGizVOsLiHYP/IhuBqdBv5p8OVD/9zbXkcHvAXvgAMC8BGcg8/gEkwBaf1s/bLa1pF1b/1uH7c7jdVqbXvegEfV7v0BnaX7HA==</latexit><latexit sha1_base64="V08ldP5spm5K44syrSwpkjRgylU=">AAADV3icdVJda9swFFWcrsu8jybb417EQsAOXrHDYHspFPKyl0I3kjZgO0JRlFStJBtLHgvGf3LspX9lL5tsZyVNtgvGh3POvfdIaJFyprTv37es9tGT46edZ/bzFy9fnXR7r69UkmeETknCk2y2wIpyJulUM83pLM0oFgtOrxd340q//kYzxRI50ZuUxgKvJVsxgrWhUK8lBhfwDEaFQCwqPcjOAm/kRXyZaGUP9HxYiThbRwJ/Rxp+RReOdmauaw8qGBHjM3hSj9CHIzwzA7EHn0YFQ0Hp1P9R6TSmZt7YmXgz1wwK/+9yPcPdbrXbfa3BsT0Ix/OhByfzYbybvhh7k9KcQDjNKrOTIbnrYLUajlEh35sdlcmr4s/c2IUmIfL/ZoyoSPVGUV2x8iH5o05Y55dl1W5HlHOU2qjb90/9uuAhCLagD7Z1ibo/omVCckGlJhwrFQZ+quMCZ5oRTks7yhVNMbnDaxoaKLGgKi7qd1HCgWGWcJVk5pMa1uxuR4GFUhuxME6B9Y3a1yryX1qY69WnuGAyzTWVpFm0yjnUCaweGVyyjBLNNwZgkjGTFZIbnGGizVOsLiHYP/IhuBqdBv5p8OVD/9zbXkcHvAXvgAMC8BGcg8/gEkwBaf1s/bLa1pF1b/1uH7c7jdVqbXvegEfV7v0BnaX7HA==</latexit>

`k<latexit sha1_base64="XNScCfFOV1N1y2jDUheKpd6EDio=">AAADV3icdVJda9swFFWSrsu8jybb417EQsAOXrDDYHspFPKyl0I3kjZgO0JRlFSNJBtLHgvGf3LspX9lL5tsZyVNtgvGh3POvfdIaJFwprTn3TearZMnp0/bz6znL16+Out0X1+rOEsJnZKYx+lsgRXlTNKpZprTWZJSLBac3iw241K/+UZTxWI50duERgKvJVsxgrWhULch+pfwHIa5QCwsXMjOfXfkhnwZa2X19XxQijhdhwJ/Rxp+RZe2tmeOY/VLGBLjM3hSjdDHI1wzA7EHn0Y5Q35hV/9RYdemet7YnrgzxwwK/u9yXMPd7bS7Q63GkdUPxvOBCyfzQbSfPh+7k8KcQNj1KrOTIbnvYJUajFEu35sdpckt48+cyIEmIfL+ZgypSPRWUV2y8iH5o05Y5ZdF2W6FlHO0sVCn5w29quAx8HegB3Z1hTo/wmVMMkGlJhwrFfheoqMcp5oRTgsrzBRNMNngNQ0MlFhQFeXVuyhg3zBLuIpT80kNK3a/I8dCqa1YGKfA+lYdaiX5Ly3I9OpTlDOZZJpKUi9aZRzqGJaPDC5ZSonmWwMwSZnJCsktTjHR5imWl+AfHvkYXI+Gvjf0v3zoXbi762iDt+AdsIEPPoIL8BlcgSkgjZ+NX81W86R53/zdOm21a2uzset5Ax5Vq/sHlgz7Fw==</latexit><latexit sha1_base64="XNScCfFOV1N1y2jDUheKpd6EDio=">AAADV3icdVJda9swFFWSrsu8jybb417EQsAOXrDDYHspFPKyl0I3kjZgO0JRlFSNJBtLHgvGf3LspX9lL5tsZyVNtgvGh3POvfdIaJFwprTn3TearZMnp0/bz6znL16+Out0X1+rOEsJnZKYx+lsgRXlTNKpZprTWZJSLBac3iw241K/+UZTxWI50duERgKvJVsxgrWhULch+pfwHIa5QCwsXMjOfXfkhnwZa2X19XxQijhdhwJ/Rxp+RZe2tmeOY/VLGBLjM3hSjdDHI1wzA7EHn0Y5Q35hV/9RYdemet7YnrgzxwwK/u9yXMPd7bS7Q63GkdUPxvOBCyfzQbSfPh+7k8KcQNj1KrOTIbnvYJUajFEu35sdpckt48+cyIEmIfL+ZgypSPRWUV2y8iH5o05Y5ZdF2W6FlHO0sVCn5w29quAx8HegB3Z1hTo/wmVMMkGlJhwrFfheoqMcp5oRTgsrzBRNMNngNQ0MlFhQFeXVuyhg3zBLuIpT80kNK3a/I8dCqa1YGKfA+lYdaiX5Ly3I9OpTlDOZZJpKUi9aZRzqGJaPDC5ZSonmWwMwSZnJCsktTjHR5imWl+AfHvkYXI+Gvjf0v3zoXbi762iDt+AdsIEPPoIL8BlcgSkgjZ+NX81W86R53/zdOm21a2uzset5Ax5Vq/sHlgz7Fw==</latexit><latexit sha1_base64="XNScCfFOV1N1y2jDUheKpd6EDio=">AAADV3icdVJda9swFFWSrsu8jybb417EQsAOXrDDYHspFPKyl0I3kjZgO0JRlFSNJBtLHgvGf3LspX9lL5tsZyVNtgvGh3POvfdIaJFwprTn3TearZMnp0/bz6znL16+Out0X1+rOEsJnZKYx+lsgRXlTNKpZprTWZJSLBac3iw241K/+UZTxWI50duERgKvJVsxgrWhULch+pfwHIa5QCwsXMjOfXfkhnwZa2X19XxQijhdhwJ/Rxp+RZe2tmeOY/VLGBLjM3hSjdDHI1wzA7EHn0Y5Q35hV/9RYdemet7YnrgzxwwK/u9yXMPd7bS7Q63GkdUPxvOBCyfzQbSfPh+7k8KcQNj1KrOTIbnvYJUajFEu35sdpckt48+cyIEmIfL+ZgypSPRWUV2y8iH5o05Y5ZdF2W6FlHO0sVCn5w29quAx8HegB3Z1hTo/wmVMMkGlJhwrFfheoqMcp5oRTgsrzBRNMNngNQ0MlFhQFeXVuyhg3zBLuIpT80kNK3a/I8dCqa1YGKfA+lYdaiX5Ly3I9OpTlDOZZJpKUi9aZRzqGJaPDC5ZSonmWwMwSZnJCsktTjHR5imWl+AfHvkYXI+Gvjf0v3zoXbi762iDt+AdsIEPPoIL8BlcgSkgjZ+NX81W86R53/zdOm21a2uzset5Ax5Vq/sHlgz7Fw==</latexit><latexit sha1_base64="XNScCfFOV1N1y2jDUheKpd6EDio=">AAADV3icdVJda9swFFWSrsu8jybb417EQsAOXrDDYHspFPKyl0I3kjZgO0JRlFSNJBtLHgvGf3LspX9lL5tsZyVNtgvGh3POvfdIaJFwprTn3TearZMnp0/bz6znL16+Out0X1+rOEsJnZKYx+lsgRXlTNKpZprTWZJSLBac3iw241K/+UZTxWI50duERgKvJVsxgrWhULch+pfwHIa5QCwsXMjOfXfkhnwZa2X19XxQijhdhwJ/Rxp+RZe2tmeOY/VLGBLjM3hSjdDHI1wzA7EHn0Y5Q35hV/9RYdemet7YnrgzxwwK/u9yXMPd7bS7Q63GkdUPxvOBCyfzQbSfPh+7k8KcQNj1KrOTIbnvYJUajFEu35sdpckt48+cyIEmIfL+ZgypSPRWUV2y8iH5o05Y5ZdF2W6FlHO0sVCn5w29quAx8HegB3Z1hTo/wmVMMkGlJhwrFfheoqMcp5oRTgsrzBRNMNngNQ0MlFhQFeXVuyhg3zBLuIpT80kNK3a/I8dCqa1YGKfA+lYdaiX5Ly3I9OpTlDOZZJpKUi9aZRzqGJaPDC5ZSonmWwMwSZnJCsktTjHR5imWl+AfHvkYXI+Gvjf0v3zoXbi762iDt+AdsIEPPoIL8BlcgSkgjZ+NX81W86R53/zdOm21a2uzset5Ax5Vq/sHlgz7Fw==</latexit>

`k<latexit sha1_base64="XNScCfFOV1N1y2jDUheKpd6EDio=">AAADV3icdVJda9swFFWSrsu8jybb417EQsAOXrDDYHspFPKyl0I3kjZgO0JRlFSNJBtLHgvGf3LspX9lL5tsZyVNtgvGh3POvfdIaJFwprTn3TearZMnp0/bz6znL16+Out0X1+rOEsJnZKYx+lsgRXlTNKpZprTWZJSLBac3iw241K/+UZTxWI50duERgKvJVsxgrWhULch+pfwHIa5QCwsXMjOfXfkhnwZa2X19XxQijhdhwJ/Rxp+RZe2tmeOY/VLGBLjM3hSjdDHI1wzA7EHn0Y5Q35hV/9RYdemet7YnrgzxwwK/u9yXMPd7bS7Q63GkdUPxvOBCyfzQbSfPh+7k8KcQNj1KrOTIbnvYJUajFEu35sdpckt48+cyIEmIfL+ZgypSPRWUV2y8iH5o05Y5ZdF2W6FlHO0sVCn5w29quAx8HegB3Z1hTo/wmVMMkGlJhwrFfheoqMcp5oRTgsrzBRNMNngNQ0MlFhQFeXVuyhg3zBLuIpT80kNK3a/I8dCqa1YGKfA+lYdaiX5Ly3I9OpTlDOZZJpKUi9aZRzqGJaPDC5ZSonmWwMwSZnJCsktTjHR5imWl+AfHvkYXI+Gvjf0v3zoXbi762iDt+AdsIEPPoIL8BlcgSkgjZ+NX81W86R53/zdOm21a2uzset5Ax5Vq/sHlgz7Fw==</latexit><latexit sha1_base64="XNScCfFOV1N1y2jDUheKpd6EDio=">AAADV3icdVJda9swFFWSrsu8jybb417EQsAOXrDDYHspFPKyl0I3kjZgO0JRlFSNJBtLHgvGf3LspX9lL5tsZyVNtgvGh3POvfdIaJFwprTn3TearZMnp0/bz6znL16+Out0X1+rOEsJnZKYx+lsgRXlTNKpZprTWZJSLBac3iw241K/+UZTxWI50duERgKvJVsxgrWhULch+pfwHIa5QCwsXMjOfXfkhnwZa2X19XxQijhdhwJ/Rxp+RZe2tmeOY/VLGBLjM3hSjdDHI1wzA7EHn0Y5Q35hV/9RYdemet7YnrgzxwwK/u9yXMPd7bS7Q63GkdUPxvOBCyfzQbSfPh+7k8KcQNj1KrOTIbnvYJUajFEu35sdpckt48+cyIEmIfL+ZgypSPRWUV2y8iH5o05Y5ZdF2W6FlHO0sVCn5w29quAx8HegB3Z1hTo/wmVMMkGlJhwrFfheoqMcp5oRTgsrzBRNMNngNQ0MlFhQFeXVuyhg3zBLuIpT80kNK3a/I8dCqa1YGKfA+lYdaiX5Ly3I9OpTlDOZZJpKUi9aZRzqGJaPDC5ZSonmWwMwSZnJCsktTjHR5imWl+AfHvkYXI+Gvjf0v3zoXbi762iDt+AdsIEPPoIL8BlcgSkgjZ+NX81W86R53/zdOm21a2uzset5Ax5Vq/sHlgz7Fw==</latexit><latexit sha1_base64="XNScCfFOV1N1y2jDUheKpd6EDio=">AAADV3icdVJda9swFFWSrsu8jybb417EQsAOXrDDYHspFPKyl0I3kjZgO0JRlFSNJBtLHgvGf3LspX9lL5tsZyVNtgvGh3POvfdIaJFwprTn3TearZMnp0/bz6znL16+Out0X1+rOEsJnZKYx+lsgRXlTNKpZprTWZJSLBac3iw241K/+UZTxWI50duERgKvJVsxgrWhULch+pfwHIa5QCwsXMjOfXfkhnwZa2X19XxQijhdhwJ/Rxp+RZe2tmeOY/VLGBLjM3hSjdDHI1wzA7EHn0Y5Q35hV/9RYdemet7YnrgzxwwK/u9yXMPd7bS7Q63GkdUPxvOBCyfzQbSfPh+7k8KcQNj1KrOTIbnvYJUajFEu35sdpckt48+cyIEmIfL+ZgypSPRWUV2y8iH5o05Y5ZdF2W6FlHO0sVCn5w29quAx8HegB3Z1hTo/wmVMMkGlJhwrFfheoqMcp5oRTgsrzBRNMNngNQ0MlFhQFeXVuyhg3zBLuIpT80kNK3a/I8dCqa1YGKfA+lYdaiX5Ly3I9OpTlDOZZJpKUi9aZRzqGJaPDC5ZSonmWwMwSZnJCsktTjHR5imWl+AfHvkYXI+Gvjf0v3zoXbi762iDt+AdsIEPPoIL8BlcgSkgjZ+NX81W86R53/zdOm21a2uzset5Ax5Vq/sHlgz7Fw==</latexit><latexit sha1_base64="XNScCfFOV1N1y2jDUheKpd6EDio=">AAADV3icdVJda9swFFWSrsu8jybb417EQsAOXrDDYHspFPKyl0I3kjZgO0JRlFSNJBtLHgvGf3LspX9lL5tsZyVNtgvGh3POvfdIaJFwprTn3TearZMnp0/bz6znL16+Out0X1+rOEsJnZKYx+lsgRXlTNKpZprTWZJSLBac3iw241K/+UZTxWI50duERgKvJVsxgrWhULch+pfwHIa5QCwsXMjOfXfkhnwZa2X19XxQijhdhwJ/Rxp+RZe2tmeOY/VLGBLjM3hSjdDHI1wzA7EHn0Y5Q35hV/9RYdemet7YnrgzxwwK/u9yXMPd7bS7Q63GkdUPxvOBCyfzQbSfPh+7k8KcQNj1KrOTIbnvYJUajFEu35sdpckt48+cyIEmIfL+ZgypSPRWUV2y8iH5o05Y5ZdF2W6FlHO0sVCn5w29quAx8HegB3Z1hTo/wmVMMkGlJhwrFfheoqMcp5oRTgsrzBRNMNngNQ0MlFhQFeXVuyhg3zBLuIpT80kNK3a/I8dCqa1YGKfA+lYdaiX5Ly3I9OpTlDOZZJpKUi9aZRzqGJaPDC5ZSonmWwMwSZnJCsktTjHR5imWl+AfHvkYXI+Gvjf0v3zoXbi762iDt+AdsIEPPoIL8BlcgSkgjZ+NX81W86R53/zdOm21a2uzset5Ax5Vq/sHlgz7Fw==</latexit>

Page 34: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

References

•  Dong, Guozhu, and Huan Liu. "Feature Engineering for Machine Learning and Data Analy1cs." (2018). •  Katz, Gilad, Eui Chul Richard Shin, and Dawn Song. "Explorekit: Automa1c feature genera1on and selec1on." In Data Mining

(ICDM), 2016 IEEE 16th Interna1onal Conference on, pp. 979-984. IEEE, 2016.

•  Khurana, Udayan, Horst Samulowitz, and Deepak Turaga. "Feature Engineering for Predic1ve Modeling using Reinforcement Learning." arXiv preprint arXiv:1709.07150 (2017).

•  Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. (2017). arXiv:cs.LG/cs.LG/1708.07747

•  Lu, Yue M., and Minh N. Do. "A theory for sampling signals from a union of subspaces." IEEE transac1ons on signal processing 56, no. 6 (2008): 2334-2345.

•  UCI Machine Learning Repository. 2013. hep://archive.ics.uci.edu/ml •  Wang, Suhang, Charu Aggarwal, and Huan Liu. "Randomized Feature Engineering as a Fast and Accurate Alterna1ve to

Kernel Methods." In Proceedings of the 23rd ACM SIGKDD Interna1onal Conference on Knowledge Discovery and Data Mining, pp. 485-494. ACM, 2017.

•  Namita lokare, Jorge Silva, and Ilknur Kaynar Kabul. "RULLS: Randomized Union of Locally Linear Subspaces for Feature Engineering." arXiv preprint arXiv:1804.09770 (2018).Feature Engineering for Machine Learning and Data Analy1cs

•  Khurana, Udayan, Fatemeh Nargesian, Horst Samulowitz, Elias Khalil, and Deepak Turaga. "Automa1ng Feature Engineering." Transforma1on 10, no. 10 (2016): 10.

•  Hamaad Shah. “Automa1c feature engineering using deep learning and Bayesian inference.” heps://towardsdatascience.com/automa1c-feature-engineering-using-deep-learning-and-bayesian-inference-applica1on-to-computer-7b2bb8dc7351

Page 35: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

UnionofSubspacesExample:ImageAnalysis

Page 36: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

UnionofSubspacesExample:ImageAnalysis

Page 37: Automated Feature Engineering · 2018. 5. 29. · Copyright © SAS Ins1tute Inc. All rights reserved. References • Dong, Guozhu, and Huan Liu."Feature Engineering for Machine Learning

Copyr ight © SAS Ins1tute Inc . A l l r ights reserved.

UnionofSubspacesExample:ImageAnalysis

Green leaves

Road

Tree trunks and branches