monitoring regional alcohol consumption through social media

Download Monitoring Regional Alcohol Consumption through Social Media

Post on 07-May-2015

510 views

Category:

Social Media

0 download

Embed Size (px)

TRANSCRIPT

  • 1.Monitoring Regional Alcohol Consumption through Social Media Daniel Kershaw HighWire DTC@danjamker

2. People like to drink 3. Alcohol Consumption form 1950ssource: scotland.gov.uk 4. Varying rates of harmsource: BBC News 5. Collecting Current Data Quantity Frequency Questionnaires (QF)Time Line Method (TL)Both talk a while to performExpensiveData is only a snapshot of the past Miller, E. T., Neal, D. J., Roberts, L. J., Baer, J. S., Cressler, S. O., Metrik, J., & Marlatt, G. A. (2002) Babor, T. F., Stephens, R. S., & Marlatt, G. A. (1987) 6. Data Collection Errors Selective reportingRecall biasAccidental under-estimationStudy level under estimation by up to 40%Boniface, S., & Shelton, N. (2013) Del Boca, F. K., & Darkes, J. (2003) 7. People like to Tweet 8. Growth of Twittersource: forbes.com 9. Twitter users by countysource: indoboom.com 10. Reasons people use Twitter Minimal EffortMobile and pervasivePeople-based RSS feedsBroadcast Nature of TwitterKeeping in touch with friends and family / Raising visibilityGathering information / Seeking help / Releasing emotional stress 11. Twitter as a Spatio-Temporal Sense Network 12. Do people Tweet when they are drinking Alcohol? 13. Do people Tweet that they are drinking Alcohol? 14. Is it possible to track drinking habits on Twitter? 15. Research Question Is it possible to characterise and model UK alcohol consumption patterns of alcohol on social media data such as Twitter, and if so is there a variation across geographical location in drinking patterns and terminology usage? 16. Previous Work Monitoring Flu spreading though twitter - Culotta, A. (2010)Social media to track depression on a global scale - De Choudhury, M., Counts, S., & Horvitz, E. (2013)Stock market prediction through sentiment analysis - Bollen, Mao, Zeng. (2011)Detecting earthquakes through peoples tweets - Sakaki, T., Okazaki, M., & Matsuo, Y. (2010) 17. Finding the Grounded Truth Health and Social Care Information Centre (HSCIC)Statistics on Alcohol ReportLooking for daily granularity 18. Grounded Truth Combined16-2425-4445-6465 or over% of respondent40 30 20 10 0 SundayMondayTuesdayWednesdayThursdayFridaySaturdayHealth and Social Care Information Centre, L. S. (2012, May 31) 19. I need Tweets 20. A lot of Tweets 21. Twitter Streaming API 22. Twitter Streaming API 23rd November 2013 - 5th January 2013Bounding Box applied limiting resultsOnly GeoTagged to the UKJSON Object for each returned 23. The Numbers 31.6 million Tweets over 6 week period700,000 tweets/daily500 tweets/minute8 tweets/second40Gb of Data to process 24. Method Each tweet given an Alcohol ScoreBased on the number of keywords per tweetKey terms chosen to indicate alcohol consumptionClosed language method 25. Key-terms drunk pissed vodkawine wasted hungover hangover 26. The Maths 27. f (w, mj ) =1, if mj = w 0, otherwise 28. Groupings NationalRegonalPost Code RegonLocal Post CodeDailyHourly31.6 million tweets becomes 252.8 million data points 320 Gb to process 29. Geographical Groupings Each Tweet has Longitude and LatitudeShortest distance to the nearest centre of postcodePostcode Districts are from the initial characters of the alphanumeric UK postcode.Regions are groups of postcode regions 30. UK North West LA LA1 31. Processing Data Data to big to process on one machineTask can be spread across a number of machinesCan be implemented in a map reduce style 32. Map Reduce Explained Challenge: how many tweets per user, given tweets table?Input: key=row, value=tweet infoMap: output key=user_id, value=1Shufe: sort by user_idReduce: for each user_id, sumOutput: user_id, tweet countWith 2x machines, runs close to 2x faster. 33. Map Reduce Explained Challenge: how many tweets per user, given tweets table?Input: key=row, value=tweet infoMap: output key=user_id, value=1Shufe: sort by user_idReduce: for each user_id, sumOutput: user_id, tweet countWith 2x machines, runs close to 2x faster. 34. Map Reduce Explained Challenge: how many tweets per user, given tweets table?Input: key=row, value=tweet infoMap: output key=user_id, value=1Shufe: sort by user_idReduce: for each user_id, sumOutput: user_id, tweet countWith 2x machines, runs close to 2x faster. 35. Map Reduce Explained Challenge: how many tweets per user, given tweets table?Input: key=row, value=tweet infoMap: output key=user_id, value=1Shufe: sort by user_idReduce: for each user_id, sumOutput: user_id, tweet countWith 2x machines, runs close to 2x faster. 36. Map Reduce Explained Challenge: how many tweets per user, given tweets table?Input: key=row, value=tweet infoMap: output key=user_id, value=1Shufe: sort by user_idReduce: for each user_id, sumOutput: user_id, tweet countWith 2x machines, runs close to 2x faster. 37. Map Reduce Explained Challenge: how many tweets per user, given tweets table?Input: key=row, value=tweet infoMap: output key=user_id, value=1Shufe: sort by user_idReduce: for each user_id, sumOutput: user_id, tweet countWith 2x machines, runs close to 2x faster. 38. Map Reduce Explained Challenge: how many tweets per user, given tweets table?Input: key=row, value=tweet infoMap: output key=user_id, value=1Shufe: sort by user_idReduce: for each user_id, sumOutput: user_id, tweet countWith 2x machines, runs close to 2x faster. 39. Map Reduce Input: key=null, value=tweet infoMap: output key=location+time, value=alcohol twitter scoreShufe: sort by location+timeReduce: for each location+time, average(alcohol twitter score), collocation for each each setOutput: location+time, average(alcohol twitter score), collocation 40. A day of processing later 41. Lets see the data Ill drink to that 42. National UK North West Yorkshire & Humberside Greater London South West South East Northern Ireland West Midlands Channel Islands Home Counties Scotland (North) East England Scotland (South & Central) Wales (South) Wales (North) East Midlands North East AvrageWeek 1 0.93 *** 0.92 *** 0.93 *** 0.86 ** 0.94 *** 0.91 *** 0.91 *** 0.88 *** 0.00 * 0.91 *** 0.93 *** 0.94 *** 0.89 *** 0.97 *** 0.96 *** 0.90 *** 0.94 *** 0.86Week 2 0.96 *** 0.97 *** 0.96 *** 0.93 *** 0.94 *** 0.96 *** 0.89 *** 0.96 *** 0.00 * 0.95 *** 0.96 *** 0.97 *** 0.97 *** 0.90 *** 0.98 *** 0.90 *** 0.91 *** 0.88Week 3 0.86 ** 0.84 ** 0.79 *** 0.80 ** 0.81 *** 0.87 *** 0.80 ** 0.84 ** -0.30* 0.90 *** 0.88 *** 0.85 *** 0.93 *** 0.89 *** 0.93 *** 0.69 ** 0.79 ** 0.76Week 4 0.74 ** 0.76 ** 0.71 ** 0.67 ** 0.66 * 0.58 ** 0.57 ** 0.59 * -0.35 * 0.78 ** 0.95 *** 0.72 ** 0.88 *** 0.78 ** 0.76 ** 0.69 ** 0.81 ** 0.66Week 5 -0.27 * -0.22 * -0.41 * -0.27 * -0.33 * -0.29 -0.12 * -0.26 * -0.37 * -0.24 * -0.08 * -0.20 * -0.16 * -0.27 * -0.33 * -0.19 * -0.27 * -0.23Week 6 0.05 * 0.11 * 0.00 * 0.06 * 0.05 * 0.06 * 0.17 * 0.09 * -0.22 * 0.06 * -0.06 * 0.052 * -0.08 * -0.04 * 0.19 * 0.09 * 0.04 * 0.03 43. North WestYorkshir Greater e& London Humbers ideSouth WestSouth EastWest Channel Home East Scotland Northern Scotland Midlands Islands Counties England (South & Ireland (North) Central)Wales (South)Wales (North)East MidlandsNorth West1Yorkshire & Humberside0.961Greater London0.920.941South West0.940.960.941South East0.920.950.950.971Northern Ireland0.860.880.830.910.891West Midlands0.960.960.930.960.950.881Channel Islands0.010.020.020000.011Home Counties0.930.960.950.970.970.90.960.011Scotland (North)0.850.890.860.910.920.90.8800.911East England0.930.960.940.970.980.890.96-0.010.970.911Scotland (South & Central)0.860.90.870.930.930.930.890.010.920.950.911Wales (South)0.910.920.880.910.910.90.8900.910.90.90.91Wales (North)0.90.890.870.90.880.830.89-0.030.880.840.880.850.891East Midlands0.940.960.940.970.970.880.97-0.010.960.90.970.90.910.891North East0.930.930.890.930.910.890.9100.910.90.920.90.940.90.92North East1 44. CollocationsThursdayFridayYorkshire and HumberSaturday 45. Thursday, May 23rd 2013, 6:00:00 am 46. Friday, May 24th 2013, 8:00:00 pm 47. Some new tables 48. RegionDrank last weekAvrage SMAINorth East630.000715033North West630.00066108Yorkshire and the Humber670.000679924East Midlands650.000674305West Midlands600.000598496East of England660.000636277London510.000482961South East670.000626092South West660.000688288Wales570.000645785Scotland560.00059233 49. 0.001 R = 0.5936Avrage SMAI0.001000 0122335 % People Who Drank last week475870 50. Limitations Sample not representative of general population 70% of Americans online but only 18% report using twitterUsers are youngerSome ethnic groups are under representedSmall number of key terms measured Brenner, J. (2013, December 31) 51. Further Work Expanded keywords listOpen Langauge methodApply machine learning techniquesMake it real timeExplore using Twitter Store vs Apache Hadoop - http://stormproject.net/ 52. References Culotta, A. (2010). Towards detecting inuenza epidemics by analyzing Twitter messages. Presented at the SOMA '10: Proceedings of the First Workshop on Social Media Analytics, ACM Request Permissions. doi:10.1145/1964858.1964874 !Bollen, Mao, Zeng. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 88. doi:10.1016/ j.jocs.2010.12.007 !Sakaki, T., Okazaki, M., & Matsuo, Y. (2010). Earthquake shakes Twitter users: real-time event detection by social sensors. Presented at the WWW '10: Proceedings of the 19th international conference on World wide web, ACM. doi: 10.1145/1772690.1772777 !Health and Social Care Information Centre, L. S. (2012, May 31). Statistics on Alcohol: England, 2012. Catalogue.Ic.Nhs.Uk. Retrieved May 19, 2013, from https://catalogue.ic.nhs.uk/publications/public-health/alcohol/alco-eng-2012/alco-eng-2012rep.pdf !De Choudhury, M., Counts, S., & Horvitz, E. (2013). Social media as a measurement tool of depression in populations (pp. 47 56). Presented at the WebSci '13: Proceedings of the 5th Annual ACM Web Science Conference, New York, New York, USA: ACM Request Permissions. doi:10.1145/2464464.2464480 !Strunin, L. (2001). Assess

Recommended

View more >