高速・省メモリにlibsvm形式で ダンプする方法を研究してみた
TRANSCRIPT
libsvm
2016/11/27hskksk @ JapanR 2016
• :
• : R, Python, C++
• :
• :
xgboost kaggler
Bosch Production LinePerformance 15
xgboost
xgb.DMatrix
# feature
label = readRDS("label.rds")feature_set_A = readRDS("feature_set_A.rds")feature_set_B = readRDS("feature_set_B.rds")
# feature cbind
mat = cbind( feature_set_A, feature_set_B)
↑# DMatrix
dmat = xgb.DMatrix(mat, label=label)
cbind
※cbind rm(vars); gc()
xgb.DMatrix
Python
libsvm※R
1. cbind libsvm2. DMatrix
cbind libsvm
data.table::fwrite_libsvm(list_of_matrices, file)
data.table fork fwrite
# feature
label = readRDS("label.rds")feature_set_A = fread("feature_set_A.csv")feature_set_B = fread("feature_set_B.csv")
# feature list
# 1 label
matrices = list(label, feature_set_A, feature_set_B)
# libsvm
fwrite_libsvm(matrices, "libsvm.txt")
# DMatrix
dmat = xgb.DMatrix("libsvm.txt")
fwrite OpenMP
8.5GB/120sec @ Xeon 2.5GHz ✕ 8
data.table PR
https://github.com/hskksk/data.table
kaggler !!
Enjoy Kaggling with R !!