High availability for writes
Situation: there is a compliance table which continuously logs data. Your business needs to run at all times; so your audit must also. What can one do for more than 99 percent availability? How can you keep keep tables small and snappy?
NB: Most people have a distinct preference for either SQL/ noSQL. I have no axe to grind, I choose the strongest solution to the current situation. . This article presents both options in a disinterested fashion.
Architectures focussed on RDMS
- A) Shard data on one of the variables (e.g. Monthly table names); but use a view to hide this. Works better with some RDBMS than others.
- B) Above sharding, but write multiplexing code. I have implemented this, but couldn't with an ORM. When your shard is a predictable hash, this is easy.
- B2) The first sharding, and have the multiplexing code as a storproc. I have implemented this, when required to use an ORM.
- C) Singular table, but have a background copy process to copy data out to the shard structure. This is a weak solution.
- D) Have multiple write servers, and flip write channel over to another server ~ an n+1 architecture is useful anyway. This is hard to manage with RDBMS.
- E) Monthly, stop the service, copy all data to a manual shard; truncate original; restart service. I dislike this option, and it is slow.
This list is shorter, but the solutions are more useful.
- F) Write the data into a NoSQL platform that claims to have infinite scaling. Test to a power of ten more data than what you can guess will be expected use.
- G) Write the data into a buffer first, e.g. rabbitMQ; RabbitMQ? has event merging, write the data in merged chunks, into the shards.
- H) Redis, or Hadoop. Massively parallel processing at your convenience.
- I) Not for logs, but systems like RRDTool work very well for their intended purpose.