This was notes for a meeting, but they may be useful to a wider audience. The objective for this was to list data storage tools other than RDBMS, and to talk about architecture. If you are looking at NoSQL, you actually want to read my proper articles: structured-storage why-mongo mongo-interface scaling-mongo and an article I have yet to write; but is the most important, on indices. These articles are focussed on MongoDB for the document style problems I am solving.

This article is getting into “screen shots of windows installers” territory, but is covering alot of algorithms quickly. There is no need to store your data in a RDBMS. The following options are available

  • “Berkley style Database” ~ e.g. sleepycat 1,
    • This is a non-relational binary store, widely used as a cache when the solution has no need for relational features or normal forms. It stores data in a binary fashion, and is open source. It is cited as O(1), which is why it is used.
  • “Key/Value” ~ e.g. apache/httpd.conf 2.
    • This is simple and easy for humans to edit. No scalability, relatively large parsing time, as the software needs to translate everything to a computer friendly format.
  • key value storage e.g. aerospike
    • Another method to have non-structured values attached to a name or key. This is different to the above “config files” as it is a separate software service. These features are present in most mature NonSQL? or RDBMS offerings. This structure is discussed 3.
  • “hierarchical database” ~ e.g. IBM DB1 with VSAM 4 5 or the ms-windows registry 6
    • A tree structure is used to store the data, it may be accessed by walking the tree. Only sensible for large volumes of data. This seems to be the same as a BSP tree 7 dumped to disk, or EXT2 8. Doesn't seem to support the concept “table”, so everything would be in a single blob. No link to DB1 as its not sold anymore.
  • specific use binary data hashes ~ e.g. RRDtool 9, which I have previously used.
    • This stores aggregate data cleverly, as its storage doesn't expand with extra data. Fixed format designed for storing time tracked data. This problem domain could be implemented inside a RDBMS, but the specific format is computationally faster. RRDtool has good integration with libraries for drawing graphs.
  • OODBMS 10 ~ e.g. db4o
    • Record things as an object instead. In some situations, its faster.These have a heritage as long as RDMS, although tend to be less popular (historically there was less use of OO). Query language attached to OODB often allow the classes straight into the queries.
  • xmldb 11 ~ e.g. Oracle XML DB or XMLDB
    • Store and query XML as a data primitive. Normally data is searched via an Xpath library. Oracle supports this (where I have used it). This approach is accredited with good data connectivity between different XML data sets (not using the term foreign key, although the concept is appropriate).
  • graph DB (implement 12) ~ e.g. neo4j or Velocity
    • This is a different means to represent data, and connections between it. This is very fast for some usages. A comparison of neo4j graphs to RDBMS is published.
  • document storage ~ e.g. mongo DB
    • Notes...
  • semantic web 13 is a loose concept.
    • This is on the end of the list, as there is alot of products connected to it (this is a long list from a few years ago 14, semantic information is frequently stored in Triples, here are some specialised storage structures 15). These are frequently distributed.

Non SQL based data storage

RSS. Share: Share this resource on your twitter account. Share this resource on your linked-in account. G+

Non SQL based data storage

RSS. Share: Share this resource on your linked-in account. Share this resource on your twitter account. G+ ­ Follow edited