yiming - img.raqsoft.com.cn
TRANSCRIPT
One click intelligent data modeling
Y I MI NG
Yiming intelligent modeling VS
One click intelligent data modeling
Manual? Intelligence!
The modeling process is fully automated.
One click modeling, fast and good!
No data scientists needed to model
Artificial intelligence
Manual modeling
Explore data?
Missing value?
Data noise?
High cardinality variable? Standardization?
Time characteristics?
LR, RF,GBDT… ... what algorithm is to be used?
Parameter configuration?
How to evaluate the model effectively?
Long project cycle?
Non normal distribution?
Many model requirements?
Traditional modeling process
Data input Modeling
data preprocessing
Manual modeling
Model performance
Model output
Many human tasks that need to be done by data scientists
Exception handling Missing value handling High cardinality variable processing Data smoothing Intelligent filtering of numerical
variables Add derived variable
Filter important variables Optimize model parameters Select modeling method
AUC GINI MSE LIFT KS RECALL RATE...z
Variable recognition Generation of basic statistics
Intelligent modeling process
Data input Automatic
data preprocessing
Intelligent modeling
Model performance
Model output
Many human tasks originally completed by data scientists are completed by intelligent modeling
tool in one click, ensuring model quality and stability.
Intelligent export data quality report
Exception handling Missing value handling High cardinality variable processing Data smoothing Intelligent filtering of numerical
variables Add derived variable
Auto filter important variables Automatic and optimal
setting of model parameters Automatic and optimal
selection modeling method
AUC GINI MSE LIFT KS RECALL RATE...z
Variable recognition Generation of basic statistics
Yiming intelligent modeling architecture
Application integration invocation
Data source
Auto-Modeling
Tree Based Neural network All Regression GBDT
RDB NoSQL HDFS LocalFile HTTP
Data Preprocessing Missing value, outlier; correction, smoothing; high radix processing, derived variable
...
Modeling tool Prediction model
Data preparation (ETL)
Yiming intelligent modeling process
Au
tom
atic
ide
ntific
atio
n o
f va
riab
le
typ
es
Dataset size statistics
Continuous variable
Categorical variable Automatic preprocessing + modeling
Multiple model evaluation index
Why us?
me
2
The painstaking work that the statistics expert pursues all his life. 1
The exquisite product of R&D team.
Deep mathematical understanding, super software implementation, industry leading high-performance big data technology.
Decades of practical experience in data mining modeling, participated in and presided over
many domestic and foreign data mining projects in the banking and insurance industry, and
repeatedly led the team to win awards in the international SAS competition.
Case: personal credit default forecast
Target
• Establish a credit default model and give
the probability of user's credit default
• Give reasonable credit line to users
• Let business personnel select data
modeling based on experience, and help
business personnel accept the
application and popularization of the
model
• Improve the capture rate of defaulting
customers
Pain points
• Find a reasonable data dimension
• The influence of high cardinality
and nonlinear problems on the
model
• Select a reasonable model or
model combination
• Less positive samples, avoid
model over fitting
Comparison of modeling results
Intelligent modeling Traditional modeling
Number of people 1 1
Modeling time 5 minutes(Data preprocessing + modeling) 2 months
Modeling quantity 1 1
Data size 100000+ / 28MB 100000+/ 28MB
Model
performance 0.9728(test set 0.965) 0.957
Model performance (test set)
Case: Marketing recommendation of bank financial products
Customer group 1 Customer group 2 Customer group
3 Customer group
4
Number of modeling people
1 1 1 1
Number of models 13 13 13 13
Modeling time 1.5 hour/model 1.5 hour/model 1 minute/model 2 minutes/model
Data volume 1340k 1550k 6400 12k
1. The purchase rate of the first 5% data using the model is 14.4 times higher than that without the model. That is, for every 100 selected customers, 24.77 transactions can be completed. It is far higher than the average of 1.72 transactions per 100 customers.
2. 72.0% of the target customers can be captured from the first 5% of the data captured by the model. 96.0% of the target customers can be captured from the first 20% of the data captured by the model.
Cumulative
improvement
Cumulative capture
rate
First 5% 14.4 72%
First 10% 9.4 94%
First 15% 6.3 94.5%
First 20% 4.8 96%
The current purchase rate of the financial product is 1.72%
Intelligent modeling vs manual modeling:
Number of models Time Number of project
participants
Intelligent
modeling 50-60 2 weeks 1
Manual
modeling
Not suitable for mass
modeling
1 week ~ 2 month/model (It depends on the complexity of the
model and the skill of the modeler, and the time is uncontrollable)
Several
Characteristics of intelligent modeling of Yiming
Automatic modeling
Efficient
Non data scientist
Low cost
Model perfection
High accuracy
Intelligent modeling changes application mode: business user led, modeling anytime and anywhere in the application process.
Artificial intelligence - Less personnel
THANKS
Mining data value
Y i m i n g i n t e l l i g e n t m o d e l i n g