What are hash tables?

This is an article for a meeting. I am primarily putting this up to be able to display the images. This is necessary, as to draw a tidy diagram with readable handwriting will take longer than I am supposed to spend during the contact time.


Both these images are creative common images hosted in wiki. The hosting URL is embedded.

I sketch this as (in bigO, which was very popular when they designed these things)
First variant: O( X + Y ) degrading to O( X + nY )

Second variant: O( X + Y ) degrading to O( X + 5Y ) ~ this is a guess on the worse bound.

where X is the cost to make the hash, and Y the cost of retrieving the data (may or may not be in current memory).

Discuss (this is verbal, so I have not written):

  • concept.
  • why different to arrays.
  • why different to general AVP structures structures (config files, “moron grade” NoSQL?).
  • low density use
  • high density use
  • buffering (d/b vs o/s)

Historical requirements

When hashes where being designed as an algorithm, computers where much slower. Secondly storage media on large amounts of data was considerably slower than it is currently. Tricks to avoid any type of sequential search where considered valuable. The algorithm was designed 1 in the early 1950s, before computers reliably had harddisks.
The basic idea of “do a extra step of computation” which would be a fixed cost, to void any other lookup overheads was appealing. There are a lot of possible algorithms to make a hash, some are outlined 2.

These days, expensive machines could just use a flat address space of 2^32 (signed) offsets, and an array. Which is faster and simpler to compute.


I pulled up some references as I only have 10m to talk about this. These articles are longer and more detailed.

Some common implementations

  • Python has dictionaries, see src
  • Perl has hashes, see 3 4 couldn't find a readable source inside reasonable time.
  • PHP uses associative arrays , which are close relatives of hashes ~ src
  • C + + hash table libraries 5 6 7