High availability for writes

Situation: there is a compliance table which continuously logs data. Your business needs to run at all times; so your audit must also. What can one do for more than 99 percent availability? How can you keep keep tables small and snappy?

NB: Most people have a distinct preference for either SQL/ noSQL. I have no axe to grind, I choose the strongest solution to the current situation. . This article presents both options in a disinterested fashion.

Architectures focussed on RDMS

  1. A) Shard data on one of the variables (e.g. Monthly table names); but use a view to hide this. Works better with some RDBMS than others.
  2. B) Above sharding, but write multiplexing code. I have implemented this, but couldn't with an ORM. When your shard is a predictable hash, this is easy.
  3. B2) The first sharding, and have the multiplexing code as a storproc. I have implemented this, when required to use an ORM.
  4. C) Singular table, but have a background copy process to copy data out to the shard structure. This is a weak solution.
  5. D) Have multiple write servers, and flip write channel over to another server ~ an n+1 architecture is useful anyway. This is hard to manage with RDBMS.
  6. E) Monthly, stop the service, copy all data to a manual shard; truncate original; restart service. I dislike this option, and it is slow.

Other architectures

This list is shorter, but the solutions are more useful.

  1. F) Write the data into a NoSQL platform that claims to have infinite scaling. Test to a power of ten more data than what you can guess will be expected use.
  2. G) Write the data into a buffer first, e.g. rabbitMQ; RabbitMQ? has event merging, write the data in merged chunks, into the shards.
  3. H) Redis, or Hadoop. Massively parallel processing at your convenience.
  4. I) Not for logs, but systems like RRDTool work very well for their intended purpose.