CouchDB vs Redis vs MongoDB vs Riak vs Membase vs Neo4j vs Cassandra vs HBase comparison

 

CouchDB Redis MongoDB Riak Membase Neo4j Cassandra Hbase
Written in: Erlang Written in: C/C++ Written in: C++ Written in: Erlang & C, some Javascript Written in: Erlang & C Written in: Java Written in: Java Written in: Java
Main point: DB consistency, ease of use Main point: Blazing fast Main point: Retains some friendly properties of SQL. (Query, index) Main point: Fault tolerance Main point: Memcache compatible, but with persistence and clustering Main point: Graph database – connected data Main point: Best of BigTable and Dynamo Main point: Billions of rows X millions of columns
License: Apache License: BSD License: AGPL (Drivers: Apache) License: Apache License: Apache 2.0 License: GPL, some features AGPL/commercial License: Apache License: Apache
Protocol: HTTP/REST Protocol: Telnet-like Protocol: Custom, binary (BSON) Protocol: HTTP/REST or custom binary Protocol: memcached plus extensions Protocol: HTTP/REST (or embedding in Java) Protocol: Custom, binary (Thrift) Protocol: HTTP/REST (also Thrift)
Bi-directional (!) replication, Disk-backed in-memory database, Master/slave replication (auto failover with replica sets) Tunable trade-offs for distribution and replication (N, R, W) Very fast (200k+/sec) access of data by key Standalone, or embeddable into Java applications Tunable trade-offs for distribution and replication (N, R, W) Modeled after BigTable
continuous or ad-hoc, Currently without disk-swap (VM and Diskstore were abandoned) Sharding built-in Pre- and post-commit hooks in JavaScript or Erlang, for validation and security. Persistence to disk Full ACID conformity (including durable data) Querying by column, range of keys Map/reduce with Hadoop
with conflict detection, Master-slave replication Queries are javascript expressions Map/reduce in JavaScript or Erlang All nodes are identical (master-master replication) Both nodes and relationships can have metadata BigTable-like features: columns, column families Query predicate push down via server side scan and get filters
thus, master-master replication. (!) Simple values or hash tables by keys, Run arbitrary javascript functions server-side Links & link walking: use it as a graph database Provides memcached-style in-memory caching buckets, too Integrated pattern-matching-based query language (“Cypher”) Writes are much faster than reads (!) Optimizations for real time queries
MVCC – write operations do not block reads but complex operations like ZREVRANGEBYSCORE. Better update-in-place than CouchDB Secondary indices: search in metadata Write de-duplication to reduce IO Also the “Gremlin” graph traversal language can be used Map/reduce possible with Apache Hadoop A high performance Thrift gateway
Previous versions of documents are available INCR & co (good for rate limiting or statistics) Uses memory mapped files for data storage Large object support (Luwak) Very nice cluster-management web GUI Indexing of nodes and relationships I admit being a bit biased against it, because of the bloat and complexity it has partly because of Java (configuration, seeing exceptions, etc) HTTP supports XML, Protobuf, and binary
Crash-only (reliable) design Has sets (also union/diff/inter) Performance over features Comes in “open source” and “enterprise” editions Software upgrades without taking the DB offline Nice self-contained web admin Cascading, hive, and pig source and sink modules
Needs compacting from time to time Has lists (also a queue; blocking pop) Journaling (with –journal) is best turned on Full-text search, indexing, querying with Riak Search server (beta) Connection proxy for connection pooling and multiplexing (Moxi) Advanced path-finding with multiple algorithms Jruby-based (JIRB) shell
Views: embedded map/reduce Has hashes (objects of multiple fields) On 32bit systems, limited to ~2.5Gb In the process of migrating the storing backend from “Bitcask” to Google’s “LevelDB” Indexing of keys and relationships No single point of failure
Formatting views: lists & shows Sorted sets (high score table, good for range queries) An empty database takes up 192Mb Masterless multi-site replication replication and SNMP monitoring are commercially licensed Optimized for reads Rolling restart for configuration changes and minor upgrades
Server-side document validation possible Redis has transactions (!) GridFS to store big data + metadata (not actually an FS) Has transactions (in the Java API) Random access performance is like MySQL
Authentication possible Values can be set to expire (as in a cache) Scriptable in Groovy
Real-time updates via _changes (!) Pub/Sub lets one implement messaging (!) Online backup, advanced monitoring and High Availability is AGPL/commercial licensed
Attachment handling
thus, CouchApps (standalone js apps)
jQuery library included
http://couchapp.org/page/index http://redis.io/commands
Best used: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important. Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory). Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks. Best used: If you want something Cassandra-like (Dynamo-like), but no way you’re gonna deal with the bloat and complexity. If you need very good single-site scalability, availability and fault-tolerance, but you’re ready to pay for multi-site replication. Best used: Any application where low-latency data access, high concurrency support and high availability is a requirement. Best used: For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense. Best used: When you write more than you read (logging). If every component of the system must be in Java. (“No one gets fired for choosing Apache’s stuff.”) Best used: If you’re in love with BigTable. 🙂 And when you need random, realtime read/write access to your Big Data.
For example: CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments. For example: Stock prices. Analytics. Real-time data collection. Real-time communication. For example: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back. For example: Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt. Could be used as a well-update-able web server. For example: Low-latency use-cases like ad targeting or highly-concurrent web apps like online gaming (e.g. Zynga). For example: Social relations, public transport links, road maps, network topologies. For example: Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is real time data analysis. For example: Facebook Messaging Database (more general example coming soon)

 

  CouchDB Redis MongoDB Riak Membase Neo4j Cassandra Hbase Written in: Erlang Written in: C/C++ Written in: C++ Written in: Erlang & C, some Javascript Written in: Erlang & C Written in: Java Written in: Java Written in: Java Main point: DB consistency, ease of use Main point: Blazing fast Main point: Retains some friendly…

Leave a Reply

Your email address will not be published. Required fields are marked *