Download - Immoviz.io - real estate search engine
![Page 1: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/1.jpg)
Immoviz - #WeAreAnts
IMMOVIZ BORDEAUX
Emeline Gaulard - Du Phan 1
![Page 2: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/2.jpg)
Immoviz - #WeAreAnts
WHO ARE WE
EMELINE GAULARD
BACKEND DEVELOPER EPITECH ’18
DU PHAN
DATA SCIENTIST ENSC ‘17
2
![Page 3: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/3.jpg)
Immoviz - #WeAreAnts
WE ARE
ANTS
WHERE DO WE WORK
Prototyping Data Science Internet of Things Fun
3
![Page 4: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/4.jpg)
Immoviz - #WeAreAnts
Immoviz ?4
![Page 5: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/5.jpg)
Immoviz - #WeAreAnts 5
![Page 6: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/6.jpg)
Immoviz - #WeAreAnts 6
Elastic Search
Python
Search
Machine Learningipython notebook
pandas/numpy
seaborn/folium
scikit-learn
hyperopt
BackendNodeJs
Express
CasperJS
PostgreSQL
Slack bot (node-slackr)
TOOLBOX
![Page 7: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/7.jpg)
Immoviz - #WeAreAnts 7
Before
INFRASTRUCTURE
![Page 8: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/8.jpg)
Immoviz - #WeAreAnts
Now
INFRASTRUCTURE
8
![Page 9: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/9.jpg)
Immoviz - #WeAreAnts
OUTLINE
SCRAPPERS
ELASTIC SEARCH
DUPLICATE AGGREGATION
PRICE PREDICTION
1
2
3
4
9
![Page 10: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/10.jpg)
Immoviz - #WeAreAnts
Scrappers
CasperJS
PostgreSQL
Elastic SearchReal estate sites (seloger, leboncoin, bienici, sudouest,…)
10
![Page 11: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/11.jpg)
Immoviz - #WeAreAnts 11
Scrappers
Simple to use
Lightweight
Debugging is non-trivial
![Page 12: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/12.jpg)
Immoviz - #WeAreAnts
Indexing Mapping
Analyzing Querying
Elastic Search
12
![Page 13: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/13.jpg)
Immoviz - #WeAreAnts 13
Analyzer example
![Page 14: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/14.jpg)
Immoviz - #WeAreAnts 14
Query example
![Page 15: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/15.jpg)
Immoviz - #WeAreAnts
Text analysis
Price comparison
Elastic Search
Duplicate aggregation
ID comparison
Price comparison
PostgreSQL
15
![Page 16: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/16.jpg)
Immoviz - #WeAreAnts
Duplicate aggregation
ID comparison
Price comparison
PostgreSQL
16
![Page 17: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/17.jpg)
Immoviz - #WeAreAnts
Text analysis
Price comparison
Elastic Search
Duplicate aggregation
ID comparison
Price comparison
PostgreSQL
17
![Page 18: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/18.jpg)
Immoviz - #WeAreAnts
Error analysis Cross-validate for testing error
Locate sensitive zone Visualize error
…
MACHINE LEARNING WORKFLOW
Data Cleaning Check input format
Split data and hide holdout Drop/impute null values
Filter outlier …
Feature Engineering Extract features
Scale/normalize data Test contextual data
…
Data Modeling Cross-validate for model selection
Optimize hyper-parameters …
18
![Page 19: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/19.jpg)
Immoviz - #WeAreAnts 19
If a data set has affected any step in
the learning process, its ability to assess the
outcome has been compromised.
Data snooping
![Page 20: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/20.jpg)
Immoviz - #WeAreAnts 20
k-fold Cross Validation
![Page 21: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/21.jpg)
Immoviz - #WeAreAnts
Error analysis Cross-validate for testing error
Locate sensitive zone Visualize error
…
MACHINE LEARNING WORKFLOW
Data Cleaning Check input format
Drop/impute null values Filter outlier
Split data and hide holdout …
Feature Engineering Extract features
Scale/normalize data Test contextual data
…
Data Modeling Cross-validate for model selection
Optimize hyper-parameters …
21
![Page 22: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/22.jpg)
Immoviz - #WeAreAnts 22
Source: Professor Yaser Abu-Mostafa, Caltech
![Page 23: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/23.jpg)
Immoviz - #WeAreAnts X
If you torture the data long enough, it will
confess.
Data snooping
![Page 24: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/24.jpg)
Immoviz - #WeAreAnts 23
Some key numbers
60 000 adverts, including 20 432 selling ads
12 839 unique selling ads with 61 features
10 883 selling ads remaining with 52 features after filtering
8 months of data
![Page 25: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/25.jpg)
Immoviz - #WeAreAnts 24
Data Cleaning & EDA
Data Modeling
20%
Error Analysis
Allocation of time
10%
20%
Feature Engineering 50%
![Page 26: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/26.jpg)
Immoviz - #WeAreAnts
Location features Contextual data (Open Moulinette) Imputing Room features Removing contextual outliers Improving ES queries
Feature engineering - what work ?
Time series features NLP on text data Dimensionality reduction Numerical values transforming/scaling
25
![Page 27: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/27.jpg)
Immoviz - #WeAreAnts 26
Linear Model Tree-based model Average Ensemble method
Metamodel Ensemble method
Data Modeling: what algorithms to use ?
![Page 28: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/28.jpg)
Immoviz - #WeAreAnts 27
This is how you win ML competitions: you
take other peoples’ work and ensemble
them together.”
Vitaly Kuznetsov - NIPS2014
![Page 29: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/29.jpg)
Immoviz - #WeAreAnts X
Meta-model ensemble method: explanation
![Page 30: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/30.jpg)
Immoviz - #WeAreAnts 28
Kaggle Homesite winner
Source: Homesite Quote Conversion, Winners' Write-Up, 1st Place: KazAnova
![Page 31: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/31.jpg)
Immoviz - #WeAreAnts 29
Error analysis: visualization is key
![Page 32: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/32.jpg)
Immoviz - #WeAreAnts 30
Error analysis: visualization is key
![Page 33: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/33.jpg)
Immoviz - #WeAreAnts
Result
Linear Regression
Lasso
Random Forest
Gradient Boosting
Average Ensemble Method
Metamodel Ensemble Method
0 6,5 13 19,5 26
10-fold CV mean error (%)
31
![Page 34: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/34.jpg)
Immoviz - #WeAreAnts
Result
12.3% 13.1%
CV mean error Holdout mean error
32
8.8% 9.3%
CV median error Holdout median error
![Page 35: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/35.jpg)
Immoviz - #WeAreAnts 33
Feature importance
![Page 36: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/36.jpg)
Immoviz - #WeAreAnts 34
![Page 37: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/37.jpg)
Immoviz - #WeAreAnts 35
How to improve the model
More data
Improve ES queries (sector, type, … )
Leverage time series data
More data
![Page 38: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/38.jpg)
Immoviz - #WeAreAnts X
How to improve the model
![Page 39: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/39.jpg)
Immoviz - #WeAreAnts 36
Metrics
Recommendation System
User Experience
Speed
What’s next ?
![Page 40: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/40.jpg)
Immoviz - #WeAreAnts 37
Conclusion
Better data beats cleverer algorithm
System monitoring is vital
There needs to be a coherent data flow between backend and ML engine
![Page 41: Immoviz.io - real estate search engine](https://reader031.vdocuments.site/reader031/viewer/2022022412/58f9a925760da3da068b6b73/html5/thumbnails/41.jpg)
Immoviz - #WeAreAnts 38
Thank you for your attention.
Any questions ?