Tom Ritchford
3 min readJan 21, 2020

--

It depends on how you define “efficient”, but this statement is probably not correct.

“Efficiency” can be one of at least four things: memory usage; CPU usage; wall clock latency; and human engineering time.

Hash tables are O(1) in terms of CPU cycles (for both insertion and retrieval) and memory which is a very desirable feature. But you don’t get this for free — there’s significant memory overhead per node to get this effect and a significant amount of CPU just to construct the empty hash table in the first place.

In a table like this one — 26 integers indexed by a character — the memory overhead is certainly greater than the size of the table itself! So if the efficiency you are worrying about is memory, then it is certain that the hashtable is less efficient than a plain old array.

Again, retrieval is O(1) in terms of CPU usage but again, the constant is quite high. For small tables, binary search can be significantly faster.

(In this application, wall clock time will be directly set by the CPU time, so we can ignore it.)

In this case, where there are 26 fixed slots, a fixed size array of 26 elements will be more efficient in every way, because you don’t have to search at all.

A sketch, using Python as pseudocode:

results = [0] * 26
for letter in letters:
letter = letter.lower()
if 'a' <= letter <= 'z':
results[ord(letter) - ord('a')] += 1

The hash table implementation would look like:

results = {}
for letter in letters:
letter = letter.lower()
if 'a' <= letter <= 'z':
results[letter] = results.get(letter, 0) + 1

In the real world, if I were writing in Python and I cared about performance, I’d use array.array or even better np.array (as in np.zeros(26, ‘uint64’)).

If I really cared about performance, I’d use C++ or C or some other language that compiles into machine code. In that case, the small array would likely outperform the small hashtable by half order of magnitude or so.

I’d also add that a hash table might well end up taking more engineering time for this application because you probably want to print out the results in alphabetical order, and there’s no way to do that with a hash table short of “putting it into an array and sorting it”.

As you’ve probably guessed, I’ve done this for a while.

Indeed, I have interviewed hundreds of software engineering and programming candidates, and if they use the word “efficient” without any modification, it’s a tiny red flag that they might be sloppy thinkers. It wouldn’t on its own change how I rated a candidate, but I would ask further questions to explore it.

If someone says this is “faster” or “uses less memory” I always ask, “Are you sure?” — if only to see how they explain their reasoning. Most of them time, people say, “It’s obvious” and then I ask, “What about this edge case?”

Years ago, I worked in a very large software company once where someone did a totally hacky measure of the final size (at destruction) of tables that we used, just for fun.

The average size of tables was quite large. But the median size of tables was less than 10, and the mode, the most common individual size, was zero! That’s because most tables were tiny, short-lived ones.

So remember this anecdote. Often the simplest solution will have the best performance characteristics for your specific application!

--

--

No responses yet