Make Make Make #
An esteemed colleague of mine at a previous job introduced me to Make.
Make is a simple build system to check for the existence of particular files and to create them when they’re not found based on a very explicit set of defined rules.
Or as Mike Bostock puts it,
Makefiles are machine-readable documentation that make your workflow reproducible.
Why not just use a shell script? Well, in contrast, Make is much smarter. It checks to see if a file exists already. If it does, it skips building that file. This might not matter much if the files to be processed are small, but it gets to be problematic if the files are enormous or if the computation is complex.
As a data scientist, where the work is oftentimes mostly data plumbing, make is a super useful tool.
I’ve run processes before where without the correct checks in place, the whole thing could take days to run. Yikes.
A little crash course #
Make documents workflows, but in reverse. It checks to see if the final desired file is present. If not, it runs the immediately preceding action on the immediately preceding file(s). If the preceding file(s) are not present, it goes to the next preceding step. And so on until it reaches a point where the file(s) it needs at a given point are present, and then it resumes the waterfall (in the world of agile, this “waterfall” almost feels like a dirty word, but it accurately reflects how Make works) build process.
A Make script is usually saved as Makefile
.
Within that script, there are a bunch of blocks of the form:
targetfile: sourcefile
command
Where targetfile
represents what you want to create, sourcefile
represents what the targetfile
is derived from (it can be one or more files), and command
is what is run to create the targetfile
from the sourcefile
.
When I use make, the command
is often at most a few command lines for something simple, or an invocation of a bash script for something a bit more sophisticated (e.g., bash some_script.sh
).
These blocks are stacked one after another, where the one at the top is representative of the final state of the build process.
The whole thing might look something like:
file_3: file_2
<some command that turns file_2 into file_3>
file_2: file_1
<some command that turns file_1 into file_2>
file_1:
<some command that creates file_1. This might be a download.>
To run the Makefile, navigate to the directory where the Makefile is present and run make
.
Once Make is run once, and the targets are all present, nothing will happen – which is good.
For a nice, concise overview of using Make:
Cleanup #
Sometimes, it’s useful to add a cleanup chunk to the Makefile to remove sets of files.
This block (or another variation) could be added to the bottom of the Makefile:
clean:
rm *.<some file format>
To run the clean
portion of the makefile, run, make clean
.
For more details, check out the cleanup documentation.
Always tabs #
Just a note (and this is non-debatable): in the world of Make, it’s always tabs rather than spaces. Otherwise, there will be errors.
The magic touch #
Make checks for dependencies based on modification time. If for instance a sourcefile
somehow looks newer than a targetfile
based on the modification time, the Make command will execute. This might occur when unzipping or untarring a compressed file.
To avoid this, simply add on a touch <sourcefile>
command to update the modification time of the sourcefile
to avoid unnecessary executions.
What Make doesn’t do #
-
Make doesn’t check the contents of the generated files. That’s up to you.
-
Make doesn’t check the source data. That’s up to you too.
-
Make doesn’t run if a file already exists. If the source file is data, and the data at the origin has updated, you’ll have to delete the file first, then re-run make. Adding removal of the source data file to the cleanup section of the Makefile might be handy.
Play on words #
I’ve been throwing around the phrase Make Make make
for years. It’s painfully punny, and I’m sure I’m not the only one that’s said it. Let’s break it down.
-
The first
make
– A command -
The second
make
– the tool -
The third
make
– the desired outcome