r statistics with mongo db
DESCRIPTION
TRANSCRIPT
![Page 1: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/1.jpg)
R Statistics with Mon‐R Statistics with Mon‐goDBgoDB
Dr. Markus SchmidbergerOctober 14th, 2013 Munich, Germany
Email: Twitter: @cloudHPC
R Statistics with MongoDB
1 von 36
![Page 2: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/2.jpg)
Dr. Markus SchmidbergerDr. Markus SchmidbergerR Statistics with MongoDB
2 von 36
![Page 3: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/3.jpg)
OutlineOutlineIntroduction to Big Data, MongoSoup and R
R statistics with MongoDB and Examples
Summary & Questions
R Statistics with MongoDB
3 von 36
![Page 4: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/4.jpg)
Big DataBig DataWikipedia: … a collection of data sets so large and complex that itbecomes difficult to process using on-hand database managementtools or traditional data processing. …
storing
processing
R Statistics with MongoDB
4 von 36
![Page 5: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/5.jpg)
Storing: NoSQL - MongoDBStoring: NoSQL - MongoDBdatabases using looser consistency models to store data
German MongoDB as a Service: MongoSoup
cloudControl Add-On
currently running on AWS EU-Region (Ireland)
all features available: shared / dedicated hosting, replicaset, sharding
24/7 support available
R Statistics with MongoDB
5 von 36
![Page 6: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/6.jpg)
MongoSoup in < 5 minMongoSoup in < 5 mingo to cloudControl:
add an account and a billing address
create a new app, e.g. “rmongodb”
install cloudControl command line tools: cctrlapp
enable your preferred MongoSoup hosting: cctrlapprmongodb/default addon.add mongosoup.medium
go to the cloudControl Web-Console-AddOns and get yourcredentials
www.cloudcontrol.com
https://www.cloudcontrol.com/console/app/rmongodb
R Statistics with MongoDB
6 von 36
![Page 7: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/7.jpg)
Processing: Analyzing with R and HadoopProcessing: Analyzing with R and Hadoopbackward-looking analysis is outdated
today: quasi real-time analysis
tomorrow: forward-looking predictive analysis
more complex methods, more data available, moreprocessing time required
Check my Strata London Tutorial “Big Data Analyses with R”
R Statistics with MongoDB
7 von 36
![Page 8: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/8.jpg)
Introduction to RIntroduction to RR is a free software environment for statistical computingand graphics
offers tools to manage and analyze data
standard statistical methods are implemented
compiles and runs under different OS
support via huge community
www.r-project.org
R Statistics with MongoDB
8 von 36
![Page 9: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/9.jpg)
huge online-libraries with > 5000 R-packages:
possibility to write personalized code and to contribute newpackages
really famous since January 6, 2009: The New York Times,“Data Analysts Captivated by R's Power”
http://cran.r-project.org
R Statistics with MongoDB
9 von 36
![Page 10: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/10.jpg)
RStudio IDERStudio IDE
http://www.rstudio.com
R Statistics with MongoDB
10 von 36
![Page 11: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/11.jpg)
R as calculatorR as calculator (5+5) - 1 * 3
[1] 7
x <- 3 x
[1] 3
x^2 + 4
[1] 13
R Statistics with MongoDB
11 von 36
![Page 12: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/12.jpg)
y <- c(1,2,3)y
[1] 1 2 3
x <- 1:10x
[1] 1 2 3 4 5 6 7 8 9 10
x < 5
[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
R Statistics with MongoDB
12 von 36
![Page 13: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/13.jpg)
x[3:7]
[1] 3 4 5 6 7
mean(x)
[1] 5.5
help("mean")?mean
R Statistics with MongoDB
13 von 36
![Page 14: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/14.jpg)
R Statistics with MongoDB
14 von 36
![Page 15: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/15.jpg)
Many Statistical FunctionsMany Statistical Functionskmeans(dat, 4)
K-means clustering with 4 clusters of sizes 21, 18, 30, 31
Cluster means: [,1] [,2]1 0.7755 0.85092 -0.1557 -0.23053 1.2299 1.14724 0.1510 0.1507
Clustering vector: [1] 4 2 4 4 2 4 4 4 2 4 4 4 2 2 4 4 1 4 2 2 2 4 4 4 2 4 2 4 4 2 4 2 2 4 4 [36] 4 4 4 4 4 4 4 4 2 4 2 2 4 2 2 1 1 1 1 3 1 3 3 3 1 1 3 3 3 3 1 3 1 3 3 [71] 1 3 1 1 3 3 3 3 1 1 3 3 1 1 1 3 3 3 3 1 3 1 3 3 3 3 1 3 3 3
Within cluster sum of squares by cluster:[1] 3.318 1.166 4.019 3.195 (between_SS / total_SS = 83.0 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" [5] "tot.withinss" "betweenss" "size"
R Statistics with MongoDB
15 von 36
![Page 16: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/16.jpg)
plot(dat, col = cl$cluster, cex=2, pch=16)points(cl$centers, col = 1:4, pch = 13, cex = 4)
R Statistics with MongoDB
16 von 36
![Page 17: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/17.jpg)
R Shiny - easy web applicationR Shiny - easy web applicationdeveloped by RStudio
turns R analyses into interactive web applications thatanyone can use
let your users choose input parameters using friendlycontrols like sliders, drop-downs, and text fields
easily incorporate any number of outputs like plots, tables,and summaries
no HTML or JavaScript knowledge is necessary, only R
http://www.rstudio.com/shiny/
R Statistics with MongoDB
17 von 36
![Page 18: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/18.jpg)
R and DatabasesR and DatabasesSQL provides a standard language to filter, aggregate, group,sort data
SQL in new places: Hive, Impala, …
ODBC provides SQL interface to non-database data (Excel,CSV, text files)
R stores relational data in data.frames (extended lists)
R Statistics with MongoDB
18 von 36
![Page 19: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/19.jpg)
data(iris)head(iris, n=3)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa3 4.7 3.2 1.3 0.2 setosa
class(iris)
[1] "data.frame"
R Statistics with MongoDB
19 von 36
![Page 20: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/20.jpg)
R package: sqldfR package: sqldf
running SQL statements on R data frames
library(sqldf)sqldf("select * from iris limit 2")
Sepal_Length Sepal_Width Petal_Length Petal_Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa
sqldf("select count(*) from iris")
count(*)1 150
R Statistics with MongoDB
20 von 36
![Page 21: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/21.jpg)
Other relational R packageOther relational R packageRMySQL package provides an interface to MySQL
RPostgreSQL package provides an interface to PostgreSQL
ROracle package provides an interface for Oracle
RJDBC package provides access to databases through aJDBC interface
RSQLite package provides access to SQLite(SQLite engine is included)
One big problem:all packages read the full result in R memory
R Statistics with MongoDB
21 von 36
![Page 22: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/22.jpg)
R and MongoDBR and MongoDB
on CRAN there are two packages to connect R with MongoDB
rmongodb supported by MongoDB, Inc.
powerful for big data
difficult to use due to BSON objects
RMongo
easy to use
limited functionality
reads full results in R memory
does not work on MAC OS X
R Statistics with MongoDB
22 von 36
![Page 23: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/23.jpg)
R package: RMongoR package: RMongolibrary(Rmongo)mongo <- mongoDbConnect("cc_JwQcDLJSYQJb", "dbs001.mongosoup.de", 27017)dbAuthenticate(mongo, username="JwQcDLJSYQJb", password="RSXPkUkXXXXX")
dbShowCollections(mongo)dbGetQuery(mongo, "zips","{'state':'AL'}")dbInsertDocument(mongo, "test_data", '{"foo": "bar", "size": 5 }')
dbDisconnect(mongo)
R Statistics with MongoDB
23 von 36
![Page 24: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/24.jpg)
R package: rmongodbR package: rmongodbdeveloped on top of the MongoDB supported C driver
library(rmongodb)mongo <- mongo.create(host="dbs001.mongosoup.de", db="cc_JwQcDLJSYQJb", username="JwQcDLJSYQJb", password="RSXPkUkXXXXX")
mongo
[1] 0attr(,"mongo")<pointer: 0x105a1de80>attr(,"class")[1] "mongo"attr(,"host")[1] "dbs001.mongosoup.de"attr(,"name")[1] ""attr(,"username")[1] "JwQcDLJSYQJb"attr(,"password")[1] "RSXPkUkxRdOX"attr(,"db")[1] "cc_JwQcDLJSYQJb"attr(,"timeout")[1] 0
R Statistics with MongoDB
24 von 36
![Page 25: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/25.jpg)
mongo.get.database.collections(mongo, "cc_JwQcDLJSYQJb")
[1] "cc_JwQcDLJSYQJb.zips" "cc_JwQcDLJSYQJb.ccp" "cc_JwQcDLJSYQJb.test"
mongo <- mongo.disconnect(mongo)
R Statistics with MongoDB
25 von 36
![Page 26: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/26.jpg)
buf <- mongo.bson.buffer.create()mongo.bson.buffer.append(buf, "state", "AL")
[1] TRUE
query <- mongo.bson.from.buffer(buf)query
state : 2 AL
R Statistics with MongoDB
26 von 36
![Page 27: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/27.jpg)
res <- mongo.find.one(mongo, "cc_JwQcDLJSYQJb.zips", query)res
city : 2 ACMAR loc : 4 0 : 1 -86.515570 1 : 1 33.584132
pop : 16 6055 state : 2 AL _id : 2 35004
R Statistics with MongoDB
27 von 36
![Page 28: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/28.jpg)
out <- mongo.bson.to.list(res)out$loc
[1] -86.52 33.58
typeof(out$loc)
[1] "double"
out$pop
[1] 6055
out$state
[1] "AL"
R Statistics with MongoDB
28 von 36
![Page 29: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/29.jpg)
cursor <- mongo.find(mongo, "cc_JwQcDLJSYQJb.zips", query)
res <- NULLwhile (mongo.cursor.next(cursor)){ value <- mongo.cursor.value(cursor) Rvalue <- mongo.bson.to.list(value) res <- rbind(res, Rvalue)}err <- mongo.cursor.destroy(cursor)
head(res, n=4)
city loc pop state _id Rvalue "ACMAR" Numeric,2 6055 "AL" "35004"Rvalue "ADAMSVILLE" Numeric,2 10616 "AL" "35005"Rvalue "ADGER" Numeric,2 3205 "AL" "35006"Rvalue "KEYSTONE" Numeric,2 14218 "AL" "35007"
R Statistics with MongoDB
29 von 36
![Page 30: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/30.jpg)
It is all about creating BSON query or field objects
b <- mongo.bson.from.list( list(name="Fred", age=29, city="Boston"))b
name : 2 Fred age : 1 29.000000 city : 2 Boston
mongo.bson.to.list(b)
$name[1] "Fred"
$age[1] 29
$city[1] "Boston"
R Statistics with MongoDB
30 von 36
![Page 31: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/31.jpg)
?mongo.bson?mongo.bson.buffer.append?mongo.bson.buffer.start.array?mongo.bson.buffer.start.object
buf <- mongo.bson.buffer.create()mongo.bson.buffer.append(buf, "aggregate", "zips")mongo.bson.buffer.start.array(buf, "pipeline") mongo.bson.buffer.start.object(buf, "$group") mongo.bson.buffer.append(buf, "_id", "$state") mongo.bson.buffer.start.object(buf, "totalPop") mongo.bson.buffer.append(buf, "$sum", "$pop") mongo.bson.buffer.finish.object(buf) mongo.bson.buffer.finish.object(buf)mongo.bson.buffer.start.object(buf, "$match") mongo.bson.buffer.start.object(buf, "totalPop") mongo.bson.buffer.append(buf, "$gte", "10000") mongo.bson.buffer.finish.object(buf)mongo.bson.buffer.finish.object(buf)mongo.bson.buffer.finish.object(buf)query <- mongo.bson.from.buffer(buf)
R Statistics with MongoDB
31 von 36
![Page 32: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/32.jpg)
CCP Web Analytics ChallengeCCP Web Analytics Challengebuf <- mongo.bson.buffer.create()query <- mongo.bson.from.buffer(buf)buf <- mongo.bson.buffer.create()err <- mongo.bson.buffer.append(buf, "user", 1)err <- mongo.bson.buffer.append(buf, "type", 1)field <- mongo.bson.from.buffer(buf)out <- mongo.find(mongo, "cc_JwQcDLJSYQJb.ccp", query, fields=field, limit=1000)res <- NULLwhile (mongo.cursor.next(out)){ value <- mongo.cursor.value(out) Rvalue <- mongo.bson.to.list(value) res <- rbind(res, Rvalue)}
R Statistics with MongoDB
32 von 36
![Page 33: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/33.jpg)
boxplot( as.integer(table(unlist(res[,2])) ), cex=4, horizontal=TRUE, main="Number of actions per user")
R Statistics with MongoDB
33 von 36
![Page 34: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/34.jpg)
Shiny MongoShiny MongoR based MongoDB User Interface
R packages shiny and rmongodb
less than 200 lines of code
DEMO: http://localhost:8100
https://github.com/comsysto/ShinyMongo
R Statistics with MongoDB
34 von 36
![Page 35: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/35.jpg)
SummarySummaryR is a powerful statistical tool to analyse many different kindof data
R can access databases
MongoDB and rmongodb ready for Big Data
start playing around with R, Big Data and MongoDB

http://www.r-project.org
http://www.mongodb.org
http://www.mongosoup.de
R Statistics with MongoDB
35 von 36
![Page 36: R statistics with mongo db](https://reader034.vdocuments.site/reader034/viewer/2022051515/54c63f7f4a7959ba0b8b45b2/html5/thumbnails/36.jpg)
See you soonSee you soonthanks a lot for your attention
there are R trainings in December 2013 in Munich
we are hosting many events and meetups
meet you at the MongoSoup booth
Email: Twitter: @cloudHPC
http://comsysto.com/events.html#r
R Statistics with MongoDB
36 von 36