Not only SQL (NoSQL) databases using different kind of technics for storing and retrieving data compared to classical relational databases like DB2 or Oracle. They emphasis more on scaling and storing different kind of data then on ACID or CAP support.
Type of NoSQL databases
NoSQL databases can be divided into five different types:
- Tuple of three values: Unique name, value and timestamp
- semi structured data in documents
- Key Value
- dictionary of key value pairs
- graph oriented system of nodes with properties and edges
- Multi Model
- mixture of data models with one backend
MongoDB belongs to the document based NoSQL databases.
HBase as part of apache Hadoop belongs to the column based NoSQL databases.
License is A GPL 3.0
- Document oriented Database
- automatic fail-over
The Apache Hadoop is a umbrella for different software projects to store and process distributed large data sets. It’s open source and maintained by the apache community.
License is Apache 2.0 license
The Hadoop distributed file system (HDFS) is a distributed storage system written in java suitable for storing large files in a scalable and fault tolerance cluster.
HBase is a distributed, non relational database modeled after Google BigTable implementation.
Facebook used it for their messages to store over 100 PB on data.
- Linear and modular scalability.
- Strictly consistent reads and writes.
- Automatic and configurable sharding of tables
- Automatic failover support between RegionServers.
- Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
- Easy to use Java API for client access.
- Block cache and Bloom Filters for real-time queries.
- Query predicate push down via server side Filters
- Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
- Extensible jruby-based (JIRB) shell
- Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JM
data warehouse extensions for HBase to use HiveQL as like like query language
Web based tool to monitor Hadoop installations
Ability to create MapReduce programs with the PigLatin language to analyze large datasets
Monitoring solution for distributed systems
Distributed configuration system
For MongoDB exists
- a native Java API
- Spring Data Adapter
For Hadoop exists
Integrate Hadoop into mongodb
There exists an adapter to use hadoops map reduce functions for aggregations on data. Hadoop jobs extract the data from mongodb, aggregate them and write back to mongodb.
decision criteria to choose the right tool
- best suited as operation database
- not for data analysis or data processing
- has many data analysis functionalities
- use for large amount of read only data