NoSQL

Not only SQL (NoSQL) databases using different kind of technics for storing and retrieving data compared to classical relational databases like DB2 or Oracle. They emphasis more on scaling and storing different kind of data then on ACID or CAP support.

Type of NoSQL databases

NoSQL databases can be divided into five different types:

Column
- Tuple of three values: Unique name, value and timestamp
Document
- semi structured data in documents
Key Value
- dictionary of key value pairs
Graph
- graph oriented system of nodes with properties and edges
Multi Model
- mixture of data models with one backend

MongoDB belongs to the document based NoSQL databases.

HBase as part of apache Hadoop belongs to the column based NoSQL databases.

MongoDB

MongoDB is a open source document database mainly written in C++ as native application. The MongoDB Inc. is the company behind that product and offers commercial versions, suppport, etc.

License is A GPL 3.0

Features

Document oriented Database
aggregation
Sharding
Replication
Indexing
automatic fail-over

Apache Hadoop

The Apache Hadoop is a umbrella for different software projects to store and process distributed large data sets. It’s open source and maintained by the apache community.

License is Apache 2.0 license

HDFS

The Hadoop distributed file system (HDFS) is a distributed storage system written in java suitable for storing large files in a scalable and fault tolerance cluster.

HBase

HBase is a distributed, non relational database modeled after Google BigTable implementation.

Facebook used it for their messages to store over 100 PB on data.

Features

Linear and modular scalability.
Strictly consistent reads and writes.
Automatic and configurable sharding of tables
Automatic failover support between RegionServers.
Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
Easy to use Java API for client access.
Block cache and Bloom Filters for real-time queries.
Query predicate push down via server side Filters
Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
Extensible jruby-based (JIRB) shell
Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JM

Hive

data warehouse extensions for HBase to use HiveQL as like like query language

Ambari

Web based tool to monitor Hadoop installations

Pig

Ability to create MapReduce programs with the PigLatin language to analyze large datasets

Chukwa

Monitoring solution for distributed systems

Zoopkeper

Distributed configuration system

API

For MongoDB exists

a native Java API
Spring Data Adapter
JavaScript
Node.Js
PHP
…

For Hadoop exists

HBase
- CLI
- Java
- REST
Hive
- CLI
- REST

Integrate Hadoop into mongodb

There exists an adapter to use hadoops map reduce functions for aggregations on data. Hadoop jobs extract the data from mongodb, aggregate them and write back to mongodb.

decision criteria to choose the right tool

MongoDB
- best suited as operation database
- not for data analysis or data processing
Hadoop
- has many data analysis functionalities
- use for large amount of read only data

M	T	W	T	F	S	S
« Mar
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

M	T	W	T	F	S	S
« Mar
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Ralf Schäftlein's Blog

Comparision of NoSQL databases mongoDB and apache hadoop

NoSQL

Type of NoSQL databases

MongoDB

Features

Apache Hadoop

HDFS

HBase

Features

Hive

Ambari

Pig

Chukwa

Zoopkeper

API

Integrate Hadoop into mongodb

decision criteria to choose the right tool

References

tech stuff, development news…