analysis of big data and other sources - circabc.europa.eu allegrograph neo4j 7. ... full-text...
TRANSCRIPT
![Page 1: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/1.jpg)
Analysis of Big Dataand other sources
1
![Page 2: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/2.jpg)
1. Introduction to big data
2. A survey on tools
3. Data storage in depth
4. Data processing
5. Practice with R:
a. Word count with Spark
b. Graph analysis with Neo4J
Outline
2
![Page 3: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/3.jpg)
Outline
1. Introduction to big data
2. A survey on tools
3. Data storage in depth
4. Data processing
5. Practice with R:
a. Word count with Spark
b. Graph analysis with Neo4J
3
![Page 4: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/4.jpg)
Introduction to Big Data
4
![Page 5: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/5.jpg)
Introduction to Big Data
There are different working areas in big data:
● Data storage
● Data processing
● Data mining
● Data visualisation
● Business Intelligence Systems
5
![Page 6: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/6.jpg)
Outline
1. Introduction to big data
2. A survey on tools
3. Data storage in depth
4. Data processing
5. Practice with R:
a. Word count with Spark
b. Graph analysis with Neo4J
6
![Page 7: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/7.jpg)
A Survey on Tools- Data storage
DOCUMENTS KEY/VALUE COLUMNS GRAPHS
MongoDB
CouchDB
Riak
Riak
Voldemort
Redis
Memcached
Membase
DynamoDB
Google Bigtable
HBase
Cassandra
Sybase IQ
Hypertable
FlockDB
OrientDB
AllegroGraph
Neo4J
7
![Page 8: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/8.jpg)
A Survey on Tools- Data processing
ADQUISITION STORAGE ANALYSIS
BATCH HDFS commands
Scoop
Flume
HDFS
HBase
MapReduce
Spark, SparkQL
Hive
Pig
Cascading
STREAMING Flume Kafka
Kestrel
RabbitMQ
AWS SQS
Storm
Trident
Spark Streaming
Samza
HYBRID Lamda, Kappa, Summingbird, Lambdoop, Apache Flik
8
![Page 9: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/9.jpg)
A Survey on Tools- Data mining
SPSS Weka Rapid Miner Mahout
Gate NLTK KMine OpenNN
Scikit-learn Carrot2 R Torch
RapidMiner IBM Watson SAS Entreprise
Miner
Statistica Data
Miner
Oracle Data
Miner
Microsoft
Analysis Services
LIONSolver ClaraBridge
OP
EN
PR
OP
IET
AR
Y
9
![Page 10: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/10.jpg)
A Survey on Tools- Data visualisation
Vis.js D3.js
CartoDB Plot.ly
Trableau QlikView
R HighCharts
10
![Page 11: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/11.jpg)
A Survey on Tools- Business Intelligence
Pentaho Actuate
SpagoBI JasperReports
Trableau QlikView
Palo Tactic
IBM Cognos MicroStrategy
Microsoft PowerBI Plot.ly
11
![Page 12: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/12.jpg)
Outline
1. Introduction to big data
2. A survey on tools
3. Data storage in depth
4. Data processing
5. Practice with R:
a. Word count with Spark
b. Graph analysis with Neo4J
12
![Page 13: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/13.jpg)
Data Storage in Depth- SQL vs. NoSQL
SQL databases limitations:
● Fixed structure and integrity restrictions
● Ineficiency with large number of insertions,
modifications, deletions
● High complexity to model real-life relationships
NoSQL databases:
● NoSQL = Not only SQL
● Store large volumes of data in small units of time
13
![Page 14: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/14.jpg)
Data Storage in Depth- NoSQL types
There are basically four types of NoSQL databases, although some of them
share characteristics from more than one type:
● Document oriented: The basic unit is the document (e.g. XML,
json, …)
● Key/Value: Any object identified by a key and described by a set
of attributes (values). Also known as hash warehouses
● Column oriented: Data are stored around tables with families of
predefined columns, propitiating OLAP operations
● Graph databases: Not only store objects but also relationships
among them shaping graphs of information14
![Page 15: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/15.jpg)
Data Storage in Depth- Document oriented
● The basic unit is the document
● A document can have an arbitrary number of fields
● Each field can be of different type and size
● Each field can store multiple values
● Examples of documents are XML, JSON, or similar
● Document databases do not need a fixed schema of document
● Each document can have different fields than other documents in
the database
● Security is assigned at document level
● Full-text search capabilities with high performance15
![Page 16: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/16.jpg)
Data Storage in Depth- Document oriented
● JSON document example
● Unlike key/value model, id is
part of the document
● Full-text search is provided in
the whole document
16
![Page 17: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/17.jpg)
Data Storage in Depth- Document oriented
17
![Page 18: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/18.jpg)
Data Storage in Depth- Key/value warehouses
● Warehouses where store any kind of information of any type
● Objects are identified by a unique key
● Objects are defined by an arbitrary set of attributes
● There is neither structure nor restrictions
● They are also known as hash warehouses
18
![Page 19: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/19.jpg)
Data Storage in Depth- Key/value warehouses
19
![Page 20: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/20.jpg)
Data Storage in Depth- Column oriented
● Unlike SQL databases organised as rows, column-oriented
databases are organised around columns
● Tables are defined as families of columns
● It is easy to implement OLAP operations ○ Drill, roll, slice&dice, pivot
20
![Page 21: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/21.jpg)
Data Storage in Depth- Column oriented
21
![Page 22: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/22.jpg)
Data Storage in Depth- Graph databases
Bob’s friends
Alice’s friends-of-friends
What about big data?
Relational databases lack relationships
22
![Page 23: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/23.jpg)
Data Storage in Depth- Graph databases
Relationships can be emulated by aggregated fields, but:
- They should be maintained (update and delete)
programmatically.
- Aggregated links are not reflexive: there is no point
backward (e.g. to know who bought a product).
NoSQL databases also lack relationships
23
![Page 24: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/24.jpg)
Data Storage in Depth- Graph databases
A graph is a collection of vertices representing entities and
edges representing the relationships among them.
In a property graph both nodes and relationships can have
properties.
Graph data model means that data are modelled such a graph.
A (property) graph database is an online database management
system with Create, Read, Update and Delete methods that
expose a (property) graph data model.24
![Page 25: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/25.jpg)
Data Storage in Depth- Graph databases
Node with a property which value
is “Harry”
Relationship with a property which
value is “Follows”
Property graph
25
![Page 26: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/26.jpg)
Data Storage in Depth- Graph databases
Cypher is an expressive graph database query language.
Cypher is designed to be easily read and understood by
developers, database professionals and business stakeholders.
The key of Cypher is that enables to find data that matches a
specific pattern, following our intuition to describe graphs using
diagrams.
26
![Page 27: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/27.jpg)
Data Storage in Depth- Graph databases
Relation type
and direction
Nodes
Separation among
subgraphs
27
![Page 28: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/28.jpg)
Data Storage in Depth- Graph databases
The simplest query:
- a START clause followed by a MATCH and a RETURN clauses
28
![Page 29: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/29.jpg)
Data Storage in Depth- Graph databases
- START: specifies the starting point(s) in the graph (e.g.
nodes or relationships)
- MATCH: describes the specification by example, using
characters to represent nodes and relationships, in order to
draw the data we are interested in.
- RETURN: defines the nodes, relationships and/or attributes
that should be returned.
29
![Page 30: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/30.jpg)
Data Storage in Depth- Graph databases
OTHER CYPHER CLAUSES
- WHERE: provides criteria for filtering.
- CREATE (UNIQUE): for the creation of nodes and relationships.
- DELETE: removes nodes, relationships and properties.
- SET: sets property values to nodes and relations.
- FOREACH: allows to perform an updating action for a list of
elements.
- UNION: merges results from different queries.
- WITH: allows to pipe results from one query to the next.30
![Page 31: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/31.jpg)
Data Storage in Depth- Graph databases
31
![Page 32: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/32.jpg)
Outline
1. Introduction to big data
2. A survey on tools
3. Data storage in depth
4. Data processing
5. Practice with R:
a. Word count with Spark
b. Graph analysis with Neo4J
32
![Page 33: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/33.jpg)
Data Processing- Types
BATCH STREAMING
VOLUME VELOCITY
HYBRID
● Batch processing for large volumes of information (e.g. ADN
sequentiation)
● Streaming processing for rapid generated data (e.g. Twitter)
● Hybrid processing for large volumes rapidly generated (e.g. in-depth
analysis of Twitter tweets)33
![Page 34: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/34.jpg)
Data Processing- Processing steps
DATA ADQUISITION
DATA STORAGE
DATA ANALYSIS
34
![Page 35: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/35.jpg)
Data Processing- Types
htt
ps://w
ww
.youtu
be.c
om
/watc
h?v=
Yrq
ME
n-5
Pi8
- Retrieve and store
- Evolution
- Words and topics
- Labelling
- Hashtags
- People
- Locations
- Brands
- Polarity, stance
- Users, relationships
- Gender, age
- Author profile
- ...
In-depth analysis of a Twitter stream
tweets/second tweets/minute tweets/hour tweets/day 35
![Page 36: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/36.jpg)
Data Processing- Batch processing
Map/Reduce paradigm:
● Map: The Map process divides the data into subsets and sends them to each
process node in key-value format <K, V>
● Reduce: Each node returns the result in key-list of values format <K, L (V)>
and they are combine to produce the final result
Example of counting words in a text:
● Map: A line of text is sent to each node, where the key K is the line number,
and the value V is the line of text <nline, text>. The result of the task is a list
of pairs <word, 1> for each word in the text.
● Reduce: It collects all the outputs of Map processes as pairs <key, value> or
<word, 1>, and it is responsible for grouping them in pairs <word,
occurrence> by adding the ones of each word36
![Page 37: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/37.jpg)
Data Processing- Batch processing
37
![Page 38: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/38.jpg)
Data Processing- Batch processing
function Map (key, values) {for each word w in values {
return (w, 1)}
}
function Reduce (word, list_of_values) {
for each value v in list_of_values {total += v
}return (word, total)
}38
![Page 39: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/39.jpg)
Data Processing- Batch processing
ADQUISITION STORAGE PROCESSING
39
![Page 40: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/40.jpg)
Data Processing- Stream processing
© autoritas Cosmos-intelligence40
![Page 41: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/41.jpg)
Data Processing- Stream processing
ADQUISITION STORAGE PROCESSING
KESTREL trident
41
![Page 42: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/42.jpg)
Data Processing- Hybrid processing
42
![Page 43: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/43.jpg)
Data Processing- Hybrid processing
SUMMINGBIRD
43
![Page 44: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/44.jpg)
Outline
1. Introduction to big data
2. A survey on tools
3. Data storage in depth
4. Data processing
5. Practice with R:
a. Word count with Spark
b. Graph analysis with Neo4J
44
![Page 45: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/45.jpg)
Graph Databases. Ian Robinson, Jim Webber and Emil Eifrem. O’Reilly.
http://neo4j.com/books/graph-databases/
● Social Network Data Analytics. Charu C. Aggarwal. Springer.
http://www.springer.com/us/book/9781441984616
● Networks, Crowds and Markets: Reasoning about a Highly Connected
World. David Easly and Jon Kleinberg. Cambridge University Press.
https://www.cs.cornell.edu/home/kleinber/networks-book/
References
45
![Page 46: Analysis of Big Data and other sources - circabc.europa.eu AllegroGraph Neo4J 7. ... Full-text search capabilities with high performance 15. Data Storage in ... Each node returns the](https://reader033.vdocuments.site/reader033/viewer/2022051509/5ad7caf57f8b9ab8378c902d/html5/thumbnails/46.jpg)
● Aggargal, C. C. (2011). Social network data analytics. Springer
● Banker, K. (2012). Mongodb in action. Manning Publications
● Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R. E.
(2008). Bigtable: a distributed storage system for structured data. ACM Transactions on Computer Systems
● Dixon, J. (2015). Pentaho, hadoop and data lakes. James Dixon’s Blog
● Harrington, P. (2012). Machine learning in action. Manning Publications
● Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The rise of “big data” on
cloud computing: Review and open research issues. Information Systems
● Hewitt, E. (2011). Cassandra: the definitive guide. O’Reilly
● Jones, O. M., Robinson, A. (2009). Scientific programming and simulation using r. Taylor & Francis Group
● Lam, C. (2011). Hadoop in action. Manning Publications
● Leskovec, J., Rajaraman, A., Ullman, J. D. (2014). Mining of massive datasets. Stanford University Press
● Owen, S., Anil, R., Dunning, T., Friedman, E. (2013). Mahout in action. Manning Publications Co.
● Snijders, C.; Matzat, U.; Reips, U.D. (2012). Big data: big gaps of knowledge in the field of interent. International
Journal of Internet Science
● Stanton, J. (2012). An introduction to data science. Syracuse University
● Witten, I. H., Frank, E., Hall, M. A. (2011). Data mining. Practical machine learning tools and techniques. Morgan
Kaufmann Publishers
References
46