cs531presentation

MUSTAFA ILKER SARAC20801528

UNDERSTANDING AND CLASSIFYING IMAGE TWEETSACM-MM 2013

Investigating Images Related to Twitter Trending Topics

1

CS531 - Mustafa Ilker SARAC 04/12/2023

Content

IntroductionMotivationImage-TweetsImage and Text RelationVisual/Non-Visual ClassificationExperimentsInitial Results

2


Introduction

Image-tweets Correlation between tweet’s image and text

50% of all posts are image-tweetsImage tweets retweeted more and survived

longer

3


Motivation

04/12/2023CS531 - Mustafa Ilker SARAC

4

Questions to ask What types of images do users embed? Do the images distinctly differ from images on

image/photo-sharing websites like Flickr? Do the textual contents of image tweets differ from posts

that are text-only?Contributions

Corpus Annotated subset Built a classifier to distinguish two subclasses of image-

tweets; Visual Non-Visual

Image-Tweets


5

Corpus Text-only and image-tweets from Weibo 7 months in 2012 ~57M tweets Manually annotated ~5K subset

Image-Tweets


6

Image Characteristics Images are post-processed by Weibo 45.1% of the corpus are image-tweets Images vary by quality and topics

70% of annotated corpus are natural photograph.

Image-Tweets


7

Image-tweets vs. Text-only When? What? Why? More image-tweets during daytime – When? LDA applied to a subset, ~1M, of corpus – What?

k=50 latent topics are learned Daily chatter or information sharing – Why?

Image and Text Relation


8

99% of image tweets have text. Status (event, time ,location) Logico – semantic

Image and Text Relation


9

Visually-relevant image-tweets At least one noun or verb corresponds to part of the

imageNon-visual image-tweets

Image and text has no visual correspondence Hard to distinguish by just looking images May exhibit emotional relevance

Visual/Non-Visual Classification


10

Dataset Construction Crowdsourcing to label a random subset of the image-

tweets Visual Non-visual

Each image is annotated by 3 different subjects 4811 image-tweets annotated

3206 (2/3) visual 1605 (1/3) non-visual

3 major types of features are used Text Image Context



11

Text Features Binary word features Previously learned topics from LDA Part of Speech(POS) density features Named Entities Microblog specific features

@mentions #hashtags Geolocation URLs



12

Image features Face detection SIFT features with bag of visual words representation

Applied LDA with k=35

Context Features Retweets Comments Follower Ratio Posting Time etc.

Experiment


13

10 fold cross-validation with Naïve Bayes is performed

Macro-averaged F1 score is computed.Baseline is using only words as feature

F1 = 64.8Each feature is combined individually to

observe the impact.When combined all positive features

F1 = 70.5

Experiment


14

Proposed Work


15

Re-rank images of image-tweets returned by Twitter search

Select good images in order to represent Trending Topics.

Twitter scraped and some initial results are obtained using Retweets, Favorites for contextual features SIFT for image features to compare images.

Initial Results


16

QUESTIONS?


17

Thank You

cs531presentation

Economy & Finance

imagetweets image tweets

imagetweets images

imagetweets correlation

relevant imagetweets

imagetweets visual nonvisual

subclasses of image

image characteristics

classifying image tweets