inferring age and gender of facebook users based on their status updates
TRANSCRIPT
- 1. Inferring Age and Gender of Facebook Users Based on their Status Updates Angel Oswaldo Vzquez-Patio Master of Science in Artificial Intelligence February, 2015, Leuven, Belgium
- 2. Outline Introduction Methodology Results Conclusions
- 3. Introduction: Social Media Users Kemp, Simon, 2014. Global Social Media Users Pass 2 Billion. We Are Social.
- 4. Introduction: Facebook Penetration Kemp, Simon, 2014. Global Social Media Users Pass 2 Billion. We Are Social.
- 5. Introduction: Computational social science
- 6. Introduction: Importance Social Media Marketing Important Attributes: Age and Gender Attribute Disclosure
- 7. Goal of the study Age and gender inference models Reduce the feature dimension Second Order Representation (SOR)
- 8. Literature review Study of Kosinski et al., 2013 relying on Facebook likes The Open Vocabulary Approach (Schwartz et al., 2013) General approach Extraction of features User representation Classification model
- 9. Methodology
- 10. Methodology Pre-processing 4-folds Vocabulary generation Feature selection Document representation 31,169
- 11. The Open Vocabulary Approach Linguistic Feature Extraction n-grams of 1 to 3 words PMI greater that 2*length Terms used by 1% of users Feature Dimension Reduction PCA Representation BOT 31,169
- 12. The Second Order Representation 1. Building term vectors 2. Building document vectors
- 13. Methodology Gender prediction SVMs: Linear and RBF kernels Age prediction Ridge regression Lasso regression
- 14. Results 1. OVA-PCA-DR 2. OVA-No-DR 3. OVA-CHI2-DR 4. SOR 5. SOR-CHI2-DR Classification Accuracy F1-score Regression R MAE MSE EVS
- 15. Results: OVA-x-DR Gender OVA-No-DR 0.905 OVA-2 -DR 10k 0.908 OVA-2 -DR 15k 0.908 OVA-No-DR 0.905 OVA-2 -DR 10k 0.908 OVA-2 -DR 15k 0.907
- 16. Results: OVA-x-DR Age
- 17. Results: SOR Gender OVA-PCA-DR 0.886 SOR-No-DR 0.815 OVA-PCA-DR 0.885 SOR-No-DR 0.813
- 18. Results: SOR Age
- 19. General comparison of models
- 20. Comparison of running time
- 21. Conclusions and future work Age and gender inference models Reduce the feature dimension X2 15K terms Second Order Representation (SOR) Reduce running time dramatically, age PAN 2015 workshop and competition Author Profiling
- 22. Thank you!