Download - Meetup Analytics with R and Neo4j
![Page 1: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/1.jpg)
Exploring London NoSQL meetups using R
Mark Needham@markhneedham
![Page 2: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/2.jpg)
![Page 3: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/3.jpg)
![Page 4: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/4.jpg)
Scraper at the ready...
![Page 5: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/5.jpg)
Not needed :(
![Page 6: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/6.jpg)
Lots of bits of data
● Events● Members● Groups● RSVPs● Venues● Topics
![Page 7: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/7.jpg)
The data model
![Page 8: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/8.jpg)
Interesting questions to ask...
![Page 9: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/9.jpg)
Interesting questions to ask...● What day of the week do people go to meetups?● Where abouts in London are NoSQL meetups held?● Do people sign up for multiple meetups on the same
day?● Are there common members between groups?● What topics are people most interested in?● In which order do people join the NoSQL groups?● Who are the most connected people on the NoSQL
scene?
![Page 10: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/10.jpg)
The tool set
RNeo4j
Results as a data frame
Query
dplyrggplot2
igraph ggmapcluster
geosphere
![Page 11: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/11.jpg)
When do people go to meetups?
![Page 12: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/12.jpg)
When do people go to meetups?
(g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]-
({response: 'yes'})<-[:RSVPD]-()
![Page 13: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/13.jpg)
When do people go to meetups?MATCH (g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]-
({response: 'yes'})<-[:RSVPD]-()
WHERE (event.time + event.utc_offset) < timestamp()
RETURN g.name,
event.time + event.utc_offset AS eventTime,
event.announced_at AS announcedAt,
event.name,
COUNT(*) AS rsvps
![Page 14: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/14.jpg)
R Neo4jinstall.packages("devtools")
devtools::install_github("nicolewhite/Rneo4j")
library(RNeo4j)
graph = startGraph("http://localhost:7474/db/data/")
query = "MATCH … RETURN …"
cypher(graph, query)
![Page 15: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/15.jpg)
Grouping events by monthlibrary(dplyr)
events %>%
group_by(month) %>%
summarise(events = n(),
count = sum(rsvps),
max = max(rsvps)) %>%
mutate(ave = count / events) %>%
arrange(desc(ave))
![Page 16: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/16.jpg)
Grouping events by month## month events count ave
## 1 November 55 3018 54.87273
## 2 May 52 2676 51.46154
## 3 April 58 2964 51.10345
## 4 June 47 2384 50.72340
## 5 October 71 3566 50.22535
## 6 September 59 2860 48.47458
## 7 February 43 2047 47.60465
## 8 January 34 1592 46.82353
## 9 December 24 1056 44.00000
## 10 March 39 1667 42.74359
## 11 July 48 1866 38.87500
## 12 August 34 1023 30.08824
![Page 17: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/17.jpg)
Grouping events by dayevents %>%
group_by(day) %>%
summarise(events = n(),
count = sum(rsvps),
max = max(rsvps)) %>%
mutate(ave = count / events) %>%
arrange(day)
![Page 18: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/18.jpg)
Grouping events by day## day events count ave
## 1 Monday 63 4034 64.03175
## 2 Tuesday 151 6696 44.34437
## 3 Wednesday 225 9481 42.13778
## 4 Thursday 104 5394 51.86538
## 5 Friday 11 378 34.36364
## 6 Saturday 10 736 73.60000
![Page 19: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/19.jpg)
Some simple bar chartslibrary(ggplot2)
g1 = ggplot(aes(x = day, y = ave), data = byDay) +
geom_bar(stat="identity", fill="dark blue") +
ggtitle("Average attendees by day")
g2 = ggplot(aes(x = day, y = count), data = byDay) +
geom_bar(stat="identity", fill="dark blue") +
ggtitle("Total attendees by day")
grid.arrange(g1,g2, ncol = 1)
![Page 20: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/20.jpg)
London hits the pub
![Page 21: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/21.jpg)
![Page 22: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/22.jpg)
Where do people go to meetups?
(g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]-
({response: 'yes'})<-[:RSVPD]-(),
(event)-[:HELD_AT]->(venue)
![Page 23: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/23.jpg)
Where do people go to meetups?MATCH (g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]-
({response: 'yes'})<-[:RSVPD]-(), (event)-[:HELD_AT]->(venue)
WHERE (event.time + event.utc_offset) < timestamp()
RETURN g.name,
event.time + event.utc_offset AS eventTime,
event.announced_at AS announcedAt,
event.name,
venue.name AS venue,
venue.lat AS lat,
venue.lon AS lon,
COUNT(*) AS rsvps
![Page 24: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/24.jpg)
Where do people go to meetups?MATCH (g:Group)-[:HOSTED_EVENT]->(event)<-[:TO]-
({response: 'yes'})<-[:RSVPD]-(), (event)-[:HELD_AT]->(venue)
WHERE (event.time + event.utc_offset) < timestamp()
RETURN g.name,
event.time + event.utc_offset AS eventTime,
event.announced_at AS announcedAt,
event.name,
venue.name AS venue,
venue.lat AS lat,
venue.lon AS lon,
COUNT(*) AS rsvps
![Page 25: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/25.jpg)
Where do people go to meetups?byVenue = events %>%
count(lat, lon, venue) %>%
ungroup() %>%
arrange(desc(n)) %>%
rename(count = n)
![Page 26: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/26.jpg)
Where do people go to meetups?## lat lon venue count
## 1 51.50256 -0.019379 Skyline Bar at CCT Venues Plus 1
## 2 51.53373 -0.122340 The Guardian 1
## 3 51.51289 -0.067163 Erlang Solutions 3
## 4 51.49146 -0.219424 Novotel - W6 8DR 1
## 5 51.49311 -0.146531 Google HQ 1
## 6 51.52655 -0.084219 Look Mum No Hands! 22
## 7 51.51976 -0.097270 Vibrant Media, 3rd Floor 1
## 8 51.52303 -0.085178 Mind Candy HQ 2
## 9 51.51786 -0.109260 ThoughtWorks UK Office 2
## 10 51.51575 -0.097978 BT Centre 1
![Page 27: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/27.jpg)
Where do people go to meetups?library(ggmap)
map = get_map(location = 'London', zoom = 12)
ggmap(map) +
geom_point(aes(x = lon, y = lat, size = count),
data = byVenue,
col = "red",
alpha = 0.8)
![Page 28: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/28.jpg)
![Page 29: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/29.jpg)
library(geosphere)
library(cluster)
clusteramounts = 40
distance.matrix = byVenue %>% select(lon, lat) %>% distm
clustersx <- as.hclust(agnes(distance.matrix, diss = T))
byVenue$group <- cutree(clustersx, k=clusteramounts)
byVenueClustered = byVenue %>%
group_by(group) %>%
summarise(meanLat = mean(lat),
meanLon = mean(lon),
total = sum(count),
venues = paste(venue, collapse = ","))
Spatial clustering
![Page 30: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/30.jpg)
## group meanLat meanLon total
## 1 3 51.52349 -0.08506461 123
## 2 1 51.52443 -0.09919280 89
## 3 2 51.50547 -0.10325925 62
## 4 4 51.50794 -0.12714600 55
## 5 8 51.51671 -0.10028908 19
## 6 6 51.53655 -0.13798514 18
## 7 7 51.52159 -0.10934720 18
## 8 5 51.51155 -0.07004417 13
## 9 12 51.51459 -0.12314650 13
## 10 14 51.52129 -0.07588867 10
Spatial clustering
![Page 31: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/31.jpg)
ggmap(map) +
geom_point(aes(x = meanLon, y = meanLat, size = total),
data = byVenueClustered,
col = "red",
alpha = 0.8)
Spatial clustering
![Page 32: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/32.jpg)
![Page 33: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/33.jpg)
byVenue %>%
filter(group == byVenueClustered$group[1])
What’s going on in Shoreditch?
![Page 34: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/34.jpg)
Meetup Group Member Overlap
● Why would we want to know this?○ Perhaps for joint meetups○ Topics for future meetups
![Page 35: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/35.jpg)
Extracting the dataMATCH (group1:Group), (group2:Group)
WHERE group1 <> group2
OPTIONAL MATCH p = (group1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(group2)
WITH group1, group2, COLLECT(p) AS paths
RETURN group1.name, group2.name,
LENGTH(paths) as commonMembers
ORDER BY group1.name, group2.name
![Page 36: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/36.jpg)
![Page 37: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/37.jpg)
MATCH (group1:Group), (group2:Group)
WHERE group1 <> group2
OPTIONAL MATCH (group1)<-[:MEMBER_OF]-(member)
WITH group1, group2, COLLECT(member) AS group1Members
WITH group1, group2, group1Members, LENGTH(group1Members) AS numberOfGroup1Members
UNWIND group1Members AS member
OPTIONAL MATCH path = (member)-[:MEMBER_OF]->(group2)
WITH group1, group2, COLLECT(path) AS paths, numberOfGroup1Members
WITH group1, group2, LENGTH(paths) as commonMembers, numberOfGroup1Members
RETURN group1.name, group2.name,
toInt(round(100.0 * commonMembers / numberOfGroup1Members)) AS percentage
ORDER BY group1.name, group1.name
Finding overlap as a percentage
![Page 38: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/38.jpg)
![Page 39: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/39.jpg)
How many groups are people part of?MATCH (p:MeetupProfile)-[:MEMBER_OF]->()
RETURN ID(p), COUNT(*) AS groups
ORDER BY groups DESC
![Page 40: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/40.jpg)
How many groups are people part of?ggplot(aes(x = groups, y = n),
data = group_count %>% count(groups)) +
geom_bar(stat="identity", fill="dark blue") +
scale_y_sqrt() +
scale_x_continuous(
breaks = round(seq(min(group_count$groups), max(group_count$groups), by = 1),1)) +
ggtitle("Number of groups people are members of")
![Page 41: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/41.jpg)
![Page 42: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/42.jpg)
Who’s the most connected?
● i.e. the person who had the chance to meet the most people in the community
● Betweenness Centrality● Page Rank
![Page 43: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/43.jpg)
Who’s the most connected?
![Page 44: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/44.jpg)
Betweenness Centrality
Calculates the number of shortest paths that go through a particular node
![Page 45: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/45.jpg)
Betweenness Centralitylibrary(igraph)
nodes_query = "MATCH (p:MeetupProfile)-[:RSVPD]->({response: 'yes'})-[:TO]->(event)
RETURN DISTINCT ID(p) AS id, p.id AS name, p.name AS fullName"
nodes = cypher(graph, nodes_query)
edges_query = "MATCH (p:MeetupProfile)-[:RSVPD]->({response: 'yes'})-[:TO]->(event),
(event)<-[:TO]-({response:'yes'})<-[:RSVPD]-(other)
RETURN ID(p) AS source, ID(other) AS target, COUNT(*) AS weight"
edges = cypher(graph, edges_query)
g = graph.data.frame(edges, directed = T, nodes)
bwGraph = betweenness(g)
bwDf = data.frame(id = names(bwGraph), score = bwGraph)
![Page 46: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/46.jpg)
Betweenness CentralitybwDf %>% arrange(desc(score)) %>% head(5)
merge(nodes, bwDf, by.x = "name", by.y = "id") %>%
arrange(desc(score)) %>%
head(5)
![Page 47: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/47.jpg)
Page RankPageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.
![Page 48: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/48.jpg)
Page RankPageRank works by counting the number and quality of links to a person to determine a rough estimate of how important the person is. The underlying assumption is that more important people are likely to receive more links from other people.
![Page 49: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/49.jpg)
Page Rankpr = page.rank(g)$vector
prDf = data.frame(name = names(pr), rank = pr)
data.frame(merge(nodes, prDf, by.x = "name", by.y = "name")) %>%
arrange(desc(rank)) %>%
head(10)
![Page 50: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/50.jpg)
Blending back into the graphquery = "MATCH (p:MeetupProfile {id: {id}}) SET p.betweenness = {score}"
tx = newTransaction(graph)
for(i in 1:nrow(bwDf)) {
if(i %% 1000 == 0) {
commit(tx)
print(paste("Batch", i / 1000, "committed."))
tx = newTransaction(graph)
}
id = bwDf[i, "id"]
score = bwDf[i, "score"]
appendCypher(tx, query, id = id, score = as.double(score))
}
commit(tx)
![Page 51: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/51.jpg)
Blending back into the graphquery = "MATCH (p:MeetupProfile {id: {id}}) SET p.pageRank = {score}"
tx = newTransaction(graph)
for(i in 1:nrow(prDf)) {
if(i %% 1000 == 0) {
commit(tx)
print(paste("Batch", i / 1000, "committed."))
tx = newTransaction(graph)
}
name = prDf[i, "name"]
rank = prDf[i, "rank"]
appendCypher(tx, query, id = name, score = as.double(rank))
}
commit(tx)
![Page 52: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/52.jpg)
Are they in the Neo4j group?
MATCH (p:MeetupProfile)
WITH p
ORDER BY p.pageRank DESC
LIMIT 20
OPTIONAL MATCH member = (p)-[m:MEMBER_OF]->(g:Group)
WHERE group.name = "Neo4j - London User Group"
RETURN p.name, p.id, p.pageRank, NOT m is null AS isMember
ORDER BY p.pageRank DESC
![Page 53: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/53.jpg)
Are they in the Neo4j group?blended_data = cypher(graph, query)
![Page 54: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/54.jpg)
Have they been to any events?
![Page 55: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/55.jpg)
Have they been to any events?MATCH (p:MeetupProfile)
WITH p
ORDER BY p.pageRank DESC
LIMIT 20
OPTIONAL MATCH member = (p)-[m:MEMBER_OF]->(g:Group) WHERE g.name = "Neo4j - London User Group"
WITH p, NOT m is null AS isMember, g
OPTIONAL MATCH event= (p)-[:RSVPD]-({response:'yes'})-[:TO]->()<-[:HOSTED_EVENT]-(g)
WITH p, isMember, COLLECT(event) as events
RETURN p.name, p.id, p.pageRank, isMember, LENGTH(events) AS events
ORDER BY p.pageRank DESC
![Page 56: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/56.jpg)
Have they been to any events?blended_data = cypher(graph, query)
![Page 57: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/57.jpg)
Take Aways
● ggplot => visualisations with minimal code● dplyr => easy data manipulation for
people from other languages● igraph => find the influencers in a network● graphs => flexible way of modelling data
that allows querying across multiple dimensions
![Page 58: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/58.jpg)
And one final take away...
![Page 59: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/59.jpg)
![Page 60: Meetup Analytics with R and Neo4j](https://reader033.vdocuments.site/reader033/viewer/2022052509/55a520ab1a28aba8348b466f/html5/thumbnails/60.jpg)
http://github.com/mneedham/neo4j-meetup
Get the code