Most articles on this type of subject have a paragraph explaining what NoSQL is. I will let others cover the basics. NoSQL is fashionable, but in the majority case a well configured RDBMS will perform as well. The articles by A Fowler are good strategic view. These note that search is currently a weak feature on most NoSQL (need access to BigTable I guess ~ the Google tool).
The last link is what this resource aims to achieve, in terms of overview rather than “howto”.
- UPDATE: I have some newer aggregate links Ronald Bourret, someone has built a better version of this article 1 ~ covering 150 products.
- Hbase, download * Hbase supports PHP access via thrift. This feels like FMP client server model, but hopefully better performance than FMP.
- Cassandra, download *
- MongoDB, download *
- Couch, download *
- Raven, download *
- Dynamo 2, as a business system for Amazon inc, I'm not sure that this is a public download.
- Riak, download *
- Azure 3, as this is a ms Windows product, this has limited use on servers, and will be a fiscally constrained licensed product only.
- Redis, download *
- GT.m, 4, download
- Neo4J, download *
- Allegro, download
- Virtuoso 5, download *
- Bigdata 6 7 According to 8, not publicly available, it is one of the internal systems for Google Inc.
- Aerospike is a commercial product that has been advertising quite hard since I wrote this text. Please read 9
I pulled the list of products from the wiki page (first link). This was to avoid getting too focussed on any particular product, or method of structuring NoSQL. The majority of these are opensource. This is a survey about preferences of the NoSQL used (need a current linkedin session to see it). Items marked with a (*) have PHP bindings.
Mongo tends to be at the top of the list 10 11, when developers are polled. It is claimed this is due to the simplicity of using the JSON. If you ask a DBA, you will get a different (more RDBMS focussed) perspective. As a comparison between Mongo and Couch, this 12 is more detailed, and is a good reference as it is focussing on the newer editions of each. The statistics at the end rate Mongo as better, mostly due to the ease of use. Silicon India puts Mongo in fourth place 13, and Amazons DynamoDB as first. The list is interesting as it includes an offering from Oracle, which most lists ignore. This reference 14 allows alot of comparison metrics (when one drives the DHTML correctly, it does put Mongo at the top), but has a confusing UI.
MongoDB has stable geo-spatial features (having this as a platform level feature saves hassle for application development). There are many introductions 15 to setup Mongo, the sysop side is trivial. The time expensive bit is data extraction and import.
The original reason to use a NoSQL was to hold a lot of data, and apply map reduce operations to it. Being a parallel structure, map-reduce can be setup with less effort than a traditional RDBMS, and by less expensive employees. They are a distinct API to SQL, but not any more esoteric. If you look at them with architecture granularity, they are little different to messaging middle-wares (e.g. iceberg). Parallel data crunching, but with a concentration on A.C.I.D. not cryptography and fault tolerance.
Having stated earlier on this resource that NoSQL was currently poor on search, this is the features of MongoDB. The most important point, is use the 64bit build of v2.4 or greater, which is when they added text search. Options are :
- If you are a traditional RDBMS person, aggregation features will make you a lot more confident.
- map-reduce is listed as the big reason to use NoSQL. This is written as the ability to do periodic queries for reporting functions (against large datasets).
- keyword search ~ this is performed inline, and targeted at structured data. It is the basic literal matching (i.e. no stemming, no synonyms, indexes computed at the point of execution).
- regex keyword search ~ exactly as previous, but via PCRE.
- One can make multiple keys in a keyword search.
- Text search ~ in RDBMS terms this is a full-text search. This will build a dictionary of every stemmed word in the dataset, and so is very expensive. This manual states that you must enable this for every node separately, and not use it on production systems. It also uses a large number of file descriptors.
- geospatial search ~ if you setup a spatial data index, either as a spherical surface, or a simplistic 2d surface, you can perform spatial searches as a native type. Spatial searching is discussed.