buzz word cloud
TRANSCRIPT
BUZZ WORD CLOUD
Word clouds
... Have been around for a while
... Visualise a word frequency table by changing font
Improvements
Using posts from blogs/fora, treat them independently
Remove posts which replicate content (similarity > 50%)
Pick up collocations (frequent bigrams)
Remove links
Process
Process in Python Import
create many lists of strings make lower case
From list of strings create list of lists Remove punctuation and links For each list create word frequency Deduplicate similar lists (score is >50%)
generate one list
Find collocations, assign ~ Remove top 500 most frequent words
Create frequency table Remove top 500 most frequent words
Output top 150 from frequency table
Create word cloud in Wordle http://www.wordle.net/advanced
Deduplication algorithm
For each word frequency of posts Compare to word frequencies processed Total=0, unique=0
For each letter in word frequency 1 Unique+1
If word frequency 1 in word frequency 2 Total+Min(frequencies)/max(frequencies)
For each letter in word frequency 2 not in word frequency 1 Unique+1
Score=Total/unique
Social media example
old
new
•Social media is a collocation•Needs to be developed further
BUZZ WORD CLOUD
Thanks
Where I work: Targetbase Claydon Heeley