Download - NoSQL databases
NoSQL Database: New Era of Databases for Big data Analytics - Classification, Characteristics and
Comparison
A B M Moniruzzaman and Syed Akhter Hossain
04/10/23 1CSC 8710
Contents
• NoSQL databases definition• Why NoSQL databases?• Characteristics of NoSQL Databases• Primary Uses of NoSQL Database• Key-Value databases• Documents databases• Column-Family databases• Graph databases• Adoption of NoSQL Database • Conclusion
04/10/23 CSC 8710 2
NoSQL Database
• NoSQL for Not Only SQL, refers to an eclectic and increasingly familiar group of non-relational data management system
• databases are not built primarily on tables, and generally don't use SQL for data manipulation.
• NoSQL systems are distributed, non-relational database, designed for large-scale data storage and for massive-parallel data processing across a large number of commodity servers.
04/10/23 CSC 8710 3
NoSQL Database
• They also use non-SQL languages and mechanisms to interact with data.
• NoSQL database systems arose alongside major Internet companies, such as Google, Amazon, and Facebook which had challenges in dealing with huge quantities of data
• These systems are designed to scale thousands or millions of users doing updates as well as reads, in contrast to traditional DBMSs and data warehouses
04/10/23 CSC 8710 4
Why NoSQL?
• Relational DBMSs have been a successful technology for many years, providing persistence, concurrency control and integration mechanisms.
• The need of processing large amount of data changes the direction from scaling vertically to scaling horizontally on clusters.
04/10/23 CSC 8710 5
Why NoSQL?
• NoSQL databases focus on analytical processing of large scale datasets, offering increased scalability over commodity hardware
• Organizations that collect large amounts of unstructured data are increasingly turning to non-relational databases (NoSQL databases).
04/10/23 CSC 8710 6
Big Data
04/10/23 CSC 8710 7
Characteristics of NoSQL Databases
• Strong Consistency: all clients see the same version of data.
• High Availability: Data always available, at least one copy of the requested data even if one of the nodes is down.
• Partition-tolerance: the total system keeps its characteristic even when being deployed on different servers
04/10/23 CSC 8710 8
Characteristics of NoSQL Databases
04/10/23 CSC 8710 9
Primary Uses of NoSQL Database
1. Large-scale data processing
2. Exploratory analytics on semi-structured data (expert level)
3. Large volume data storage.
04/10/23 CSC 8710 10
Classification of NoSQL Databases
• Key-Value databases
• Documents databases
• Column Family databases
• Graphics databases
04/10/23 CSC 8710 11
Key-Value Databases
• These DMS store items as alpha-numeric identifiers that refer to the keys. Each key has associated values.
• The values could be simple text strings or more complex lists and sets
• Search only performed against keys, and limited to exact matches.
• Search cannot be performed against values
04/10/23 CSC 8710 12
Key-Value Databases
04/10/23 CSC 8710 13
Key-Value characterstics
• The simplicity of Key-Value Store makes them very quick and light.
• Highly scalable retrieval of the values needed for application tasks such as retrieving product names.
• This is why Amazon use K-V system, Dynamo, in its shopping cart. Dynamo is a highly available key-value storage system.
• Example: Dynamo (Amazon), Voldemort (LinkedIn) Redis, BerkeleyDB, Riak
04/10/23 CSC 8710 14
Pros and Cons
• pros: anything can be stored in an aggregate
• cons: only key lookup to access the entire aggregate is allowed (no query and part of aggregate retrieval mechanisms)
04/10/23 CSC 8710 15
Document Database
• Designed to manage and store documents.
• These documents are encoded in a standard data exchange format such as XML, JSON (Javascript Option Notation) or BSON (Binary JSON).
04/10/23 CSC 8710 16
Document Database
04/10/23 CSC 8710 17
Primary Uses
• Document databases are good for storing and managing Big Data-size collections of literal documents such as text documents, email messages.
04/10/23 CSC 8710 18
Pros And Cons
• pros: allow structured queries and partial aggregate retrieval based on the fields in the aggregate
• cons: imposes a limit on what can be placed in a database
04/10/23 CSC 8710 19
Column-Family Databases
• It consists of a Key-Value pair where the value consists of set of columns.
• The column family databases are represented in tables, each key-value pair being a row.
• All the related data can be grouped as one family
04/10/23 CSC 8710 20
Primary Uses
1. Large-scale, batch-oriented data processing: sorting, parsing, conversion :
- conversions between hexadecimal, binary and decimal code values.
2. Exploratory and predictive analytics performed by expert statisticians and programmers.
04/10/23 CSC 8710 21
Column-Family
04/10/23 CSC 8710 22
Graph Databases
• Graph databases replace relational tables with structured relational graphs of interconnected key-value pairings.
• Graph databases are useful when you are more interested in relationships between data than the data itself and it works perfectly for the social network.
• It is optimized for relationship traversing not for querying
• Examples: Neo4j, InfoGrid, Sones GraphDB, AllegroGraph, InfiniteGraph
04/10/23 CSC 8710 23
Graph Databases
04/10/23 CSC 8710 24
Adoption of NoSQL Database
• Organizations that have massive data storage are looking seriously at NoSQL.
• NoSQL Database expert are highly demanded for most of the developing organizations.
• The next graph shows job trends of five NoSQL Databases from Indeed.com
04/10/23 CSC 8710 25
Job Trends of Five NoSQL Databases
04/10/23 CSC 8710 26
Adoption of NoSQL Database
• MongoDB‘s growth means that it has cemented its place as the most popular NoSQL database.
• According to LinkedIn profile mentions, The mentions of NoSQL technologies form 45% in LinkedIn profiles.
04/10/23 CSC 8710 27
LinkedIn statistics
04/10/23 CSC 8710 28
Conclusion
• Computational and storage requirements of applications such as for Big Data analytics, Business Intelligence and social networking over peta-byte datasets led us to the change from SQL to NoSQL DBs.
• This led to the development of horizontally scalable, distributed non-relational No-SQL databases.
• MongoDB‘s is the most demanded one.
04/10/23 CSC 8710 29
Resources
• http://arxiv.org/ftp/arxiv/papers/1307/1307.0191.pdf
• http://en.wikipedia.org/wiki/Column_family
• http://en.wikipedia.org/wiki/NoSQL
04/10/23 30CSC 8710
04/10/23 31CSC 8710
04/10/23 32CSC 8710