CouchDB vs Redis vs MongoDB vs Riak vs Membase vs Neo4j vs Cassandra vs HBase comparison

by krishna
January 19, 2012

CouchDB	Redis	MongoDB	Riak	Membase	Neo4j	Cassandra	Hbase
Written in: Erlang	Written in: C/C++	Written in: C++	Written in: Erlang & C, some Javascript	Written in: Erlang & C	Written in: Java	Written in: Java	Written in: Java
Main point: DB consistency, ease of use	Main point: Blazing fast	Main point: Retains some friendly properties of SQL. (Query, index)	Main point: Fault tolerance	Main point: Memcache compatible, but with persistence and clustering	Main point: Graph database – connected data	Main point: Best of BigTable and Dynamo	Main point: Billions of rows X millions of columns
License: Apache	License: BSD	License: AGPL (Drivers: Apache)	License: Apache	License: Apache 2.0	License: GPL, some features AGPL/commercial	License: Apache	License: Apache
Protocol: HTTP/REST	Protocol: Telnet-like	Protocol: Custom, binary (BSON)	Protocol: HTTP/REST or custom binary	Protocol: memcached plus extensions	Protocol: HTTP/REST (or embedding in Java)	Protocol: Custom, binary (Thrift)	Protocol: HTTP/REST (also Thrift)
Bi-directional (!) replication,	Disk-backed in-memory database,	Master/slave replication (auto failover with replica sets)	Tunable trade-offs for distribution and replication (N, R, W)	Very fast (200k+/sec) access of data by key	Standalone, or embeddable into Java applications	Tunable trade-offs for distribution and replication (N, R, W)	Modeled after BigTable
continuous or ad-hoc,	Currently without disk-swap (VM and Diskstore were abandoned)	Sharding built-in	Pre- and post-commit hooks in JavaScript or Erlang, for validation and security.	Persistence to disk	Full ACID conformity (including durable data)	Querying by column, range of keys	Map/reduce with Hadoop
with conflict detection,	Master-slave replication	Queries are javascript expressions	Map/reduce in JavaScript or Erlang	All nodes are identical (master-master replication)	Both nodes and relationships can have metadata	BigTable-like features: columns, column families	Query predicate push down via server side scan and get filters
thus, master-master replication. (!)	Simple values or hash tables by keys,	Run arbitrary javascript functions server-side	Links & link walking: use it as a graph database	Provides memcached-style in-memory caching buckets, too	Integrated pattern-matching-based query language (“Cypher”)	Writes are much faster than reads (!)	Optimizations for real time queries
MVCC – write operations do not block reads	but complex operations like ZREVRANGEBYSCORE.	Better update-in-place than CouchDB	Secondary indices: search in metadata	Write de-duplication to reduce IO	Also the “Gremlin” graph traversal language can be used	Map/reduce possible with Apache Hadoop	A high performance Thrift gateway
Previous versions of documents are available	INCR & co (good for rate limiting or statistics)	Uses memory mapped files for data storage	Large object support (Luwak)	Very nice cluster-management web GUI	Indexing of nodes and relationships	I admit being a bit biased against it, because of the bloat and complexity it has partly because of Java (configuration, seeing exceptions, etc)	HTTP supports XML, Protobuf, and binary
Crash-only (reliable) design	Has sets (also union/diff/inter)	Performance over features	Comes in “open source” and “enterprise” editions	Software upgrades without taking the DB offline	Nice self-contained web admin		Cascading, hive, and pig source and sink modules
Needs compacting from time to time	Has lists (also a queue; blocking pop)	Journaling (with –journal) is best turned on	Full-text search, indexing, querying with Riak Search server (beta)	Connection proxy for connection pooling and multiplexing (Moxi)	Advanced path-finding with multiple algorithms		Jruby-based (JIRB) shell
Views: embedded map/reduce	Has hashes (objects of multiple fields)	On 32bit systems, limited to ~2.5Gb	In the process of migrating the storing backend from “Bitcask” to Google’s “LevelDB”		Indexing of keys and relationships		No single point of failure
Formatting views: lists & shows	Sorted sets (high score table, good for range queries)	An empty database takes up 192Mb	Masterless multi-site replication replication and SNMP monitoring are commercially licensed		Optimized for reads		Rolling restart for configuration changes and minor upgrades
Server-side document validation possible	Redis has transactions (!)	GridFS to store big data + metadata (not actually an FS)			Has transactions (in the Java API)		Random access performance is like MySQL
Authentication possible	Values can be set to expire (as in a cache)				Scriptable in Groovy
Real-time updates via _changes (!)	Pub/Sub lets one implement messaging (!)				Online backup, advanced monitoring and High Availability is AGPL/commercial licensed
Attachment handling
thus, CouchApps (standalone js apps)
jQuery library included
http://couchapp.org/page/index	http://redis.io/commands
Best used: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important.	Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory).	Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks.	Best used: If you want something Cassandra-like (Dynamo-like), but no way you’re gonna deal with the bloat and complexity. If you need very good single-site scalability, availability and fault-tolerance, but you’re ready to pay for multi-site replication.	Best used: Any application where low-latency data access, high concurrency support and high availability is a requirement.	Best used: For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense.	Best used: When you write more than you read (logging). If every component of the system must be in Java. (“No one gets fired for choosing Apache’s stuff.”)	Best used: If you’re in love with BigTable. 🙂 And when you need random, realtime read/write access to your Big Data.
For example: CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments.	For example: Stock prices. Analytics. Real-time data collection. Real-time communication.	For example: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.	For example: Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt. Could be used as a well-update-able web server.	For example: Low-latency use-cases like ad targeting or highly-concurrent web apps like online gaming (e.g. Zynga).	For example: Social relations, public transport links, road maps, network topologies.	For example: Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is real time data analysis.	For example: Facebook Messaging Database (more general example coming soon)

CouchDB Redis MongoDB Riak Membase Neo4j Cassandra Hbase Written in: Erlang Written in: C/C++ Written in: C++ Written in: Erlang & C, some Javascript Written in: Erlang & C Written in: Java Written in: Java Written in: Java Main point: DB consistency, ease of use Main point: Blazing fast Main point: Retains some friendly…

January 19, 2012

CouchDB vs Redis vs MongoDB vs Riak vs Membase vs Neo4j vs Cassandra vs HBase comparison

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories