⛏️ sproc: subprocesseses for subhumanses ⛏

Part 2 of the Coroniad

Tom Ritchford
4 min readApr 26, 2020
ملف:Pickaxe.jpg

Introduction

This series of articles is aimed at any Python reader past the beginner level describing a tiny libraries that do just one essential thing.

This library sproc has been part of my toolkit for almost a decade in various forms — mostly wrong forms.

What is sproc?

Running a command in a subprocess is very common — for example, if you want to run a shell script or an external binary from a Python program.

sproc is a tiny, single file Python 3 library for communicating with subprocesses.

import sprocCMD = 'my-unix-command input.txt'
for ok, line in sproc.Sub(CMD) as sub
if ok:
use_data(line)
else:
handle_error(line)
sys.exit(sub.returncode)

Another way to do exactly the same thing:

returncode = sproc.call(CMD, use_data, handle_error)
sys.exit(returncode)

Why would you use it?

Python’s built-in subprocess is useful but tricky

Python has a built-in subprocess library which is very powerful but not entirely easy to use, particularly for processes which run a long time or produce a lot of output and use both stdout and stderr separately.

The most commonly used services like subprocess.run() and Popen.communicate() run the whole subprocess, and return a result after it’s done — but sometimes you want to report the results as you go, and worse, there’s a fixed size buffer that can run out of memory if you do this.

So much of the time you need to go to subprocess.Popen(). But it’s not trivial to get data from stdout and stderr separately. To do it right, you need to start two separate threads — if you look at the source code of Popen.communicate(), it does exactly that.

And there are traps…

Blue Giant Pick Axe Statue

Traps with subprocess.Popen()

These are not abstract traps: these are traps I have personally fallen into.

The command itself, cmd, needs to be a string if the argument shell=True, but a list of strings if shell=False — and you get unhelpful error messages or conceivably wrong behavior if you get this wrong.

More, the command is a UTF8 string, but the data that’s returned is by default byte strings. Oops!

Once you remember encodings, there’s an encoding parameter to subprocess.Popen()— except that doesn’t work in versions of before Python 3.6 (though Python 3.5 is very close to end-of-life).

And it’s really easy to ignore stderr and, like a physical pipe you forgot to cap, have it one day spew unexpected error messages to the console and confuse people, or even worse, have the errors mixed in with the data, resulting in a broken file.

Enter sproc

sproc gives you four ways to run a subprocess.

  • sproc.Sub(): iterate over lines as they come in from stdout and stderr.
  • sproc.call(): run subprocess, call back functions , return returncode
  • sproc.run(): run subprocess to end, return stdout, stderr, returncode
  • sproc.log(): run subprocess, print stdout and stderr

What can you learn from this code?

shlex!

shlex is the tool for joining or splitting command lines. It look me many years to distill down my fumbling with command lines to these few instructions.

Using a queue.Queue and an end marker to talk between threads

Since Python’s queue.Queue is thread-safe, it’s the logical way to do inter-thread communications, but Queue.get() can block indefinitely — so lots of ways to go wrong, resulting in deadlocks, hangs during shutdown, or missing data (if you forget to call Queue.get() when there is still data in it).

This little thread service routine makes sure that no matter what, the last thing on the queue is the “end marker” in this case, None. Once two Nones have arrived on the other end of the queue, the process is finished.

The thread-safe queue and end marker trick has a long and successful history, so bear it in mind.

Documentation without duplication

This was an experiment, but it came out well.

The functions and methods have a tiny of arguments, repeated over and over again. Instead of repeating their documentation, I put it together from little pieces here. It saves a ton of duplication and avoids mistakes.

I also automatically write the README.rst documentation file from the Python source here. Again, no duplication, no mistakes — it’s always hard keeping the documentation up-to-date with the code otherwise.

Thanks for reading!

If you want to read more:

--

--

No responses yet