gender and interest targeting for sponsored post advertising at tumblr

48
Gender and Interest Targeting for Sponsored Post Advertising at Tumblr Mihajlo Grbovic, Vladan Radosavljevic, Nemanja Djuric, Narayan Bhamidipati, Ananth Nagarajan Yahoo Research Ad Sciences

Upload: mihajlo-grbovic

Post on 22-Jan-2018

795 views

Category:

Science


1 download

TRANSCRIPT

  1. 1. Gender and Interest Targeting for Sponsored Post Advertising at Tumblr Mihajlo Grbovic, Vladan Radosavljevic, Nemanja Djuric, Narayan Bhamidipati, Ananth Nagarajan Yahoo Research Ad Sciences
  2. 2. Talk Overview Audience Targeting Intro Tumblr Basics Overview of the network Advertising on Tumblr Tumblr Data Data Sources Tumblr User Profiles Tumblr Gender Prediction Approach and Results Tumblr Interest Prediction Approach and Results
  3. 3. Audience targeting Targeting unit = audience = a group of users 1. Audience expansion Seed modeling finding similar users to a provided set Click/Conversion prediction
  4. 4. Audience targeting Targeting unit = audience = a group of users 2. Off-the-shelf audiences (e.g. sports > basketball) Interest - based categorizing user actions (search, mail, news, tumblr, apps) no supervision - pure interest (recency & intensity) Intent - based categorizing clicks, purchases, etc. supervision fit a model that predicts clicks/purchases
  5. 5. Audience targeting retail PC sport shoes PC vitamins PC food PC outdoor PC cosmetics PC e.kohls.com dickssportinggoods.co m vitacost.com papajohns-specials.com rei.com hautelookmail.com landsend.com finishline.com luckyvitamin.com dominos.com backcountry.com maccosmetics.com sears.com footlocker.com wansonvitamins.com jimmyjohns.com campmor.com cs.sephora.com gap.com newbalance.com iherb.com grubhub.com orvis.com ulta.com jcrew.com zappos.com walgreens.com chipotle.com usoutdoor.com eyeslipsface.com o.macys.com 6pm.com christianbook.com pizzahut.com kelty.com hautelookmail.com [1] Grbovic, M. et al. "Sparse Principal Component Analysis with Constraints" AAAI 2012 [2] Grbovic, M. et al. "Generating Ad Targeting rules using Sparse Principal Component Analysis with Constraints" WWW 2014 [3] Grbovic, M. et al. "Search retargeting using directed query embeddings WWW 2015 3. Retargeting Search retargeting (finding similar queries to provided set of queries) Mail retargeting (finding similar domains to provided domain)
  6. 6. Tumblr Basics Tumblr official statistics 249 Million Blogs 117 Billion Posts 90 Million Daily Posts 13 Languages Source http://www.tumblr.com/about
  7. 7. Tumblr Basics 1 user has 1 primary blog (user=blog)
  8. 8. Tumblr Basics Blogs have informative descriptions: Tristan, i'm 15, Canada... Snowboarding Travel - Football FTB Hello, I'm Tess, I post a lot of stuff and Spot Conlon is my bae. Musicals are rad and Shawn Hunter is forever golden Alyssa|18|California. I like bands, books, shows, and random things. And geese. One bit me in the crotch once. Good times, good times I'm Carla // 19yrs old // Texas y'all listen, I just like to blog about anime and cute animals and and video games. My name's Kierstin. I love basketball
  9. 9. Tumblr Basics As blog owner you can create posts (your own or reblog) follow other blogs Post types: text photo quote link chat audio video 14.13% 78.11% 2.27% 0.46% 0.85% 2.01% 1.35%
  10. 10. Tumblr Basics regular post title body tags reblog like
  11. 11. Tumblr Basics sponsored post
  12. 12. Advertising on Tumblr
  13. 13. Advertising on Tumblr
  14. 14. Advertising on Tumblr How to enhance it? Targeting Reach only users that are interested in the product/category 1. Gender Targeting most basic form of ad targeting proven to work better than targeting random users 2. Interest Targeting more involved find users with interest in specific category, e.g. fashion, sports, etc. proven to work better than pure gender targeting
  15. 15. Data Sources Firehose (user actions + post details) 1. Blog details - title, description 2. Post details: photo posts: caption, tags text posts: title, tags audio posts: artist, tags 3. User actions - post, reblog, like, unlike gnip.com/sources/tumblr
  16. 16. Data Sources Follower Graph Subset we extracted: 96.9M nodes (users) 5.1B edges (follows 1 ) 18.2M blogs follow each other Average user follows 58.9 blogs
  17. 17. User Profiles User profile (details in paper) created from: 1. Declared Features Text from Blog Title Text from Blog Description 2. Content Features Tags from Blog Posts Text from Blog Post content Artist names from audio posts 3. User Actions Like Follow Reblog user 0 1 7 0 3 vector intensity + recency
  18. 18. Gender Prediction Main Goal Assign Gender to Tumblr Users For example: user x is most likely female Based on the results serve targeted ads Steps 1. Used Golden set (known gender) + user profiles to train a predictive model 2. Score all users for which we have a profile 3. Apply threshold to keep only most certain predictions
  19. 19. Gender Prediction Golden Sets: Based on Declared User First Names Extract first names from Blog Descriptions Use US Census data (1880 to 2013) to get probability of gender given the name male female 395K 564K regex count regex count my name is* 783,564 mi chiamo* 9,181 my names* 291,811 mein name ist* 1,025 me llamo* 47,663 meu nome e* 512 the names* 38,065 mon nom est* 215 mi nombre es* 9,751 mio nome e* 185 golden set size
  20. 20. Gender Prediction Model Training: Large-scale weighted Logistic Regression ground truth Predicts the probability of user being male weights - model parameter weighted learning
  21. 21. Gender Prediction Results: On hold-out set: Editorial Evaluation of 1000 random blogs: Coverage: The classified users cover >95% actions (posts, reblogs, likes, etc.) Gender Precision Recall female 0.806 0.838 male 0.794 0.689 Gender Correct Wrong Dont Know female 429 4 298 male 144 5 127
  22. 22. Interest Targeting Main Goal Assign Interest categories to Tumblr Users For example: user x is interested in fashion Based on the results serve targeted ads Interests picked from a fixed Advertising Taxonomy
  23. 23. Interest Targeting Level 2 Arts and Entertainment/Movies Arts and Entertainment/Television Style and Fashion/Clothing Hobbies and Interests/Photography Food and Drink/Dining Out Family and Parenting Food and Drink/Dining Out - Fast Food Education/K to 12 Education Beauty and Personal Care/Face and Body Care Arts and Entertainment/Music Arts and Entertainment/Books and Literature Beauty and Personal Care/Hair Care Style and Fashion/Footware Arts and Entertainment/Movies Level 1 Arts and Entertainment Style and Fashion Pets Shopping Food and Drink Home and Garden Health and Fitness Beauty and Personal Care Education Society Sports Technology and Computing Travel Automotive
  24. 24. Interest Targeting Intent Audiences (drives clicks) Collect clicks on categorized ads Train a model where: clicks (+1) and no clicks (-1) Score all users to estimate probability of click Interest Audiences (drives brand awareness with relevant audience) Infer user interest in certain category based on their activity Create categorized user profiles
  25. 25. Interest Targeting Approach (details in paper): 1. Categorize keywords from post content (post tags, post text) and blog titles and descriptions 2. Predict user interest categories based on the categorized tags and text in posts, blog titles and descriptions they use (intensity + recency) 3. Leverage follower graph and like actions to categorize users who do not create much content
  26. 26. Tag Categorization How to represent tags? 1) Traditional Bag of words bag of words 0 1 0 1 0 1 where query words are 0 everywhere else Tag 1: movie releases releasesmovie bag of words 1 0 1 1Tag 2: new blockbuster hits new blockbusterhits ISSUE No way we can find that these 2 tags are similar
  27. 27. Tag Categorization 2) Improvement add context You shall know a word by the company it keeps
  28. 28. Tag Categorization 3) New move from sparse to dense vectors Represent tags as numeric vectors Learn vectors from training data (user posts) Leverage context of tags (surrounding tags in same post) Result: tags with similar contexts will have similar vectors post 1: trip_ideas cheap_flights holiday_travel_deals post 2: trip_ideas air_tickets holiday_travel_deals tag vector 0.2 1.1 7.2 0.8 3.1
  29. 29. Tag Categorization Tag Dataset (tagged posts) #healthy_breakfast #yogurt #food #smoothie #brakfast_snack #healthy_living #nail #nail_polish #nails #nail_art #nail_color #pink_nails #fashion #chocolate #cheese_cake #baking #best_desserts #recipeP1 P2 P3 #hair #hair_style #hair_fashion #hair_cut #best_hairdooP5 #style #street_style #fashion #vintage #best_outfits #creative_fashionP4
  30. 30. Tag Categorization Word2Vec Classification model with word w and context c pairs: surrounding words treated as positives: D random sampling of negatives: D [1] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013. In our case, word w = tag tj-n tj tj-1 tj+1 tj+n Projec on j-th tag tags within a single post
  31. 31. Tag Categorization T8 T1 T2 T6 current tag neighborhood T1 T6 T8 RND T2 embedding space RND Tag2Vec Example search session: tag8, tag1, tag2, tag6
  32. 32. Tag Categorization movie releases new blockbuster hits similarity=0.9 0.2 1.1 7.2 0.8 3.1 0.21 1.2 6.8 0.74 3.2 Tag2Vec after training
  33. 33. Tag Categorization How to learn tag classes? Tag features: tag2vec vector Tag labels: human input 8,400 tags categorized by editors
  34. 34. Tag Categorization How to learn tag classes? 1. Supervised learning using tag vectors as features x and assigned classes y #movie_releases 0.2 1.1 7.2 0.8 3.1 18 features x label ytag f (x) y Fit a model that maps features to category labels: minimize prediction loss one-against all classifiers (multi-class)
  35. 35. Tag Categorization How to learn tag classes? 2. Semi-supervised learning of category vectors while we are learning tag vectors (predict the closest category vector) 0.2 1.1 7.2 0.8 3.1 #movie_releases 0.2 1.1 7.2 0.8 3.1 features xtag arts&ent./movies category similarity=0.9
  36. 36. Tag Categorization Skip-Gram semi-supervised Skip-Gram tj-n tj tj-1 tj+1 tj+n Projec on j-th tag tags within a single post c1 ck j-th tag categories tj-n tj tj-1 tj+1 tj+n Projec on j-th tag tags within a single post 2. Semi-supervised learning of category vectors while we are learning tag vectors (predict the closest category vector)
  37. 37. Tag Categorization Tag Dataset (tagged posts + categories) #healthy_breakfast #yogurt #food, food&drink #smoothie #brakfast_snack #healthy_living #nail #nail_polish #nails #nail_art #nail_color #pink_nails #fashion #chocolate #cheese_cake,food&drink/desserts #baking #best_desserts #recipe P1 P2 P3 #hair #hair_style #hair_fashion #hair_cut #best_hairdooP5 #style #street_style #fashion,style&fashion #vintage #best_outfits #creative_fashion,style&fashion P4
  38. 38. Tag Categorization Tag2Vec Final Model t = tags c = context (sorrounding tags) n = random negatives class = class tags c1 ck j-th tag categories tj-n tj tj-1 tj+1 tj+n Projec on j-th tag tags within a single post
  39. 39. Tag Categorization Tag2Vec - training Data: ~6.8B posts that contained tags Parameters: window size = 5, random negatives = 5, most frequent tags down sampled Output: ~2M tag vectors trained (d=300) Categorization: 380K most confident tag predictions kept (>0.5 cosine similarity to the closest category vector)
  40. 40. Tag Categorization Tag2Vec - evaluation Method Precision Recall Supervised LR-SG 0.71 0.65 Supervised k-NN-SG 0.82 0.62 Semi-supervised SG 0.85 0.63
  41. 41. Tag Categorization Food & Drink/DessertsHealth & Fitness/Weight Loss http://youtu.be/ygn5oUBydfM
  42. 42. Tag Categorization
  43. 43. Interest Prediction user category categorized features user 1 Arts and Entertainment/Mov ies tag spoilers:30 tag shrek:18 tag hercules:12 desc dvd:1 tag pokemon:7 tag thor:58 tag cinderella:3 tag hobbit:123 desc comedy:1 txt movies:100 desc movie:1 tag hulk:21 photo aladdin:28 tag disney:500 photo batman:10 txt bambi:12 desc animation:12 tag pixar:87 tag tarzan:8 tag marvel:385 tag wolverine:21 desc oscar:1 tag twilight:2 tag user 2 Style and Fashion txt fashion:108 tag womensfashion:110 tag fashiondiaries:133 tag redhair:2 tag menswear:125 tag springfashion:50 tag style:132 tag streetstyle:132 tag hairstylist:134 tag dapper:3 tag mensfashion:124 tag chanel:4 Repeat the semi-supervised process for post context text and phrases (phrase2vec) to increase reach Calculate users affinity based on intensity and recency
  44. 44. Interest Prediction user category categorized features user 1 Arts and Entertainment/Mov ies tag spoilers:30 tag shrek:18 tag hercules:12 desc dvd:1 tag pokemon:7 tag thor:58 tag cinderella:3 tag hobbit:123 desc comedy:1 txt movies:100 desc movie:1 tag hulk:21 photo aladdin:28 tag disney:500 photo batman:10 txt bambi:12 desc animation:12 tag pixar:87 tag tarzan:8 tag marvel:385 tag wolverine:21 desc oscar:1 tag twilight:2 tag user 2 Style and Fashion txt fashion:108 tag womensfashion:110 tag fashiondiaries:133 tag redhair:2 tag menswear:125 tag springfashion:50 tag style:132 tag streetstyle:132 tag hairstylist:134 tag dapper:3 tag mensfashion:124 tag chanel:4 user 3 Food and Drinks follows_user31:1 follows_user43:1 likes_user131:1 follows_user423:1 follows_user331:1 user 4 Style and Fashion follows_user556:1 follows_user221:1 likes_user191:1 follows_user13423:1 likes_user335831:1 Leverage follower graph and like actions We identify users with high value of ucat=k (influencers) Follows and likes of posts created by influencers in the k-th category serve as additional features Good for users who do not create much content
  45. 45. Tumblr Interest Targeting A/B Tests With 8 advertisers we ran consecutive untargeted and targeted campaigns On average 20% lift in engagement (likes, reblogs, follows) Campaign Control Targeted Home & Garden - +9.71% Style & Fashion - +42.53% Sports/Outdoor - +19.86% Arts & Enter./Television - +24.37% Arts & Enter./Video Games - +19.02% Pets/Dogs - +27.21% Arts & Enter. (1) - +9.08% Arts & Enter. (2) - +6.54%
  46. 46. Deployed System Deployed system Delivers inference for users that covers more than 90% of daily activities on Tumblr Adoption rate: 60% of all campaigns use our targeting today Interest and gender models are retrained on a regular basis Daily scoring by leveraging MapReduce on Hadoop
  47. 47. Evaluation Accuracy Tested on my Blog Gender Prediction Interest Prediction - high support ones Score #features inferred gender 1.330301 236 male Category #features why Sports 111 I follow a lot of soccer related blogs Arts and Entert./TV 107 I follow and reblog game of thrones blogs Photography 95 In my description I say I like photography and post about it Science 29 I reblog Yahoo Labs blogs and have it in description Advertising/Marketing 7 I follow advertising related blogs
  48. 48. Thank you! Questions?