⛏️ sproc: subprocesseses for subhumanses ⛏
Part 2 of the Coroniad
Introduction
This series of articles is aimed at any Python reader past the beginner level describing a tiny libraries that do just one essential thing.
This library sproc
has been part of my toolkit for almost a decade in various forms — mostly wrong forms.
What is sproc?
Running a command in a subprocess is very common — for example, if you want to run a shell script or an external binary from a Python program.
sproc
is a tiny, single file Python 3 library for communicating with subprocesses.
import sprocCMD = 'my-unix-command input.txt'
for ok, line in sproc.Sub(CMD) as sub
if ok:
use_data(line)
else:
handle_error(line)sys.exit(sub.returncode)
Another way to do exactly the same thing:
returncode = sproc.call(CMD, use_data, handle_error)
sys.exit(returncode)
Why would you use it?
Python’s built-in subprocess
is useful but tricky
Python has a built-in subprocess
library which is very powerful but not entirely easy to use, particularly for processes which run a long time or produce a lot of output and use both stdout
and stderr
separately.
The most commonly used services like subprocess.run()
and Popen.communicate()
run the whole subprocess, and return a result after it’s done — but sometimes you want to report the results as you go, and worse, there’s a fixed size buffer that can run out of memory if you do this.
So much of the time you need to go to subprocess.Popen()
. But it’s not trivial to get data from stdout
and stderr
separately. To do it right, you need to start two separate threads — if you look at the source code of Popen.communicate()
, it does exactly that.
And there are traps…
Traps with subprocess.Popen()
These are not abstract traps: these are traps I have personally fallen into.
The command itself, cmd
, needs to be a string if the argument shell=True
, but a list of strings if shell=False
— and you get unhelpful error messages or conceivably wrong behavior if you get this wrong.
More, the command is a UTF8 string, but the data that’s returned is by default byte strings. Oops!
Once you remember encodings, there’s an encoding
parameter to subprocess.Popen()
— except that doesn’t work in versions of before Python 3.6 (though Python 3.5 is very close to end-of-life).
And it’s really easy to ignore stderr
and, like a physical pipe you forgot to cap, have it one day spew unexpected error messages to the console and confuse people, or even worse, have the errors mixed in with the data, resulting in a broken file.
Enter sproc
sproc
gives you four ways to run a subprocess.
sproc.Sub()
: iterate over lines as they come in fromstdout
andstderr
.sproc.call()
: run subprocess, call back functions , returnreturncode
sproc.run()
: run subprocess to end, returnstdout, stderr, returncode
sproc.log()
: run subprocess, printstdout
andstderr
What can you learn from this code?
shlex
!
shlex
is the tool for joining or splitting command lines. It look me many years to distill down my fumbling with command lines to these few instructions.
Using a queue.Queue
and an end marker to talk between threads
Since Python’s queue.Queue
is thread-safe, it’s the logical way to do inter-thread communications, but Queue.get()
can block indefinitely — so lots of ways to go wrong, resulting in deadlocks, hangs during shutdown, or missing data (if you forget to call Queue.get()
when there is still data in it).
This little thread service routine makes sure that no matter what, the last thing on the queue is the “end marker” in this case, None
. Once two None
s have arrived on the other end of the queue, the process is finished.
The thread-safe queue and end marker trick has a long and successful history, so bear it in mind.
Documentation without duplication
This was an experiment, but it came out well.
The functions and methods have a tiny of arguments, repeated over and over again. Instead of repeating their documentation, I put it together from little pieces here. It saves a ton of duplication and avoids mistakes.
I also automatically write the README.rst
documentation file from the Python source here. Again, no duplication, no mistakes — it’s always hard keeping the documentation up-to-date with the code otherwise.
Thanks for reading!
If you want to read more:
- Click on
Watch
in the Coroniad repository (which only gets this series) - Or follow me on Medium (which gets all my posts)