This is a project for advanced beginner programmers.
The goal of this project is to help you make the leap from small practice exercises to complex, thorny programs like what you might encounter in your first future programming job. I’ve found an accessible open source library and have written 7 detailed feature requests for it. Your task is to read the library’s code, understand it, and write these features. You’ll get plenty of guidance along the way, but not too much, and you’ll have to do all the difficult bits yourself. I’ve written more about why and how I think this project will help you go from Advanced Beginner to Beginning Advanced here.
We’re going to be working with
bashplotlib, a Python library that displays ASCII graphs in the terminal. Created by Greg Lamp, it is small, approachable, and sensibly-written, whilst still having plenty of room for improvement in both its code and functionality.
We’re going to make
bashplotlib’s graphs look more attractive; add automated unit tests; and even add an entirely new graph type. We’ll learn how to search through a new codebase; when and when not to worry about code duplication; and how to safely change the way that a program is structured. After each task I’ve written pull requests (PRs) on GitHub to show you what I would do. You can use these PRs to help you out if you get stuck and to compare notes once you’re done.
Before we begin, let’s do some quick setup.
I’ve “forked” (git’s word for “made my own copy of”) the
bashplotlib repo and put it on my GitHub account. I did this so that no one else can sneak in and implement the 7 features in this project before you’ve had a chance to. By the time you read this, the code in the main
bashplotlib repo may have changed significantly, but my fork will stay exactly the same forever.
Here’s how to get
bashplotlib set up on your computer.
(optional but recommended)
I recommend that you fork your own version of my fork. This will allow you to practice using git to push your work back up to GitHub, and will also allow you to show the world what you’ve been working on. If you can’t or don’t want to do this then you can skip to step 3 and clone my version of
bashplotlib directly. Don’t worry about breaking anything; GitHub wouldn’t let you mess up my version even if you wanted to. To fork your own
bashplotlib and copy the files onto your computer:
cdto the directory you want to work in (eg.
bashplotlibin the previous section, clone it by running the command:
git clone https://github.com:$YOUR_GITHUB_NAME/bashplotlib.gitIf you didn't fork your own version, clone my version by running:
git clone https://github.com:robert/bashplotlib.gitIf you can't get cloning to work, despite having wasted your whole goddamn afternoon trying, you can download the code as a standard zip file by clicking on the Clone or download button on https://github.com/robert/bashplotlib.
The rigorous best practice for working with Python libraries is to use a tool called
virtualenv. If you want to try setting it up then you can follow the instructions here, but for now I recommend saving yourself the potential frustration and heartache of installing a new tool and following these simpler and perfectly valid steps instead:
cdinto the new directory you just created by your clone operation (eg.
# scratch.py from bashplotlib.scatterplot import plot_scatter x_coords = [-10,20,30] y_coords = [-10,20,30] width = 10 char = 'x' color = 'default' title = 'My Test Graph' plot_scatter( None, x_coords, y_coords, width, char, color, title)
python ./scratch.py. If the command prints a scatterplot then your setup is complete and you're ready to take on the world. If something goes wrong then congratulations, you get to practice your debugging skills to try and figure out what. Send me a message if you get stuck.
Let’s start small and get warmed up.
bashplotlib displays its graph titles in a moderately nice-looking box:
Let’s make these titles a bit more eye-catching. The way that the box edges smash into each other in the corners is too harsh. Let’s replace the corners with plus-signs so that the lines merge nicely with each other instead.
bashplotlib/scatterplot.py and have a look around. Where is the code that prints the title box?
Once you’ve found the box-printing code, you’ll see that it uses a function that is defined in another file in the
bashplotlib project. Which file? Use your text editor’s “Find in project” (or similar) feature, and locate the function’s code. (Hint - if you want to find where a Python function is defined, search for
"def my_function_name" instead of just
Once you believe you’ve found the box-generating function, make sure that you really have found it by changing some of the function’s code. For example, change the function so that the sides of the box are displayed using the
^ character instead of the
|. Run your scratch file (
cd to the root of the
bashplotlib project and run
python ./scratch.py), and make sure that the output is what you expect to be. If it is, you know you’re in the right place. If it isn’t, work out why.
Now that you’ve found the piece of code that builds the title box, you’re ready to change it so that it puts plus-signs in the box’s corners. Go ahead.
Make sure that you don’t accidentally break anything in the process. For example, make sure that the title box stays the same width as the graph, and that the text inside the box stays centered. If we had unit tests then we wouldn’t have to worry about accidentally breaking anything, because our tests would tell us if we had. We’re going to write some unit tests in section 4 of this project, but unfortunately we don’t have any yet. For now you’ll just have to be extra careful and manually check your work.
Done it? Congratulations! Keep going; add further embellishments as your personal taste dictates. For example:
title_alignparameter to the
…) instead of overflowing.
bashplotlib’s graphs are very clear, but the axes don’t have titles. This makes it hard to know what a graph is trying to tell you. Does that y-axis represent GDP growth or the amount of carbon it the atmosphere? Are the units percent, parts-per-million, or years-until-extinction?
Let’s add the ability to optionally pass in titles for the x- and y-axes of scatterplots, and display these titles like so:
In addition, let’s add
--ytitle options to the command line interface at the bottom of the
scatterplot.py file so that the command line tool can use them too.
Feel free to tidy up the code a little if it helps you understand it. I personally felt like
len(get_scale(xs, False, size) was being written out too many times, so I assigned its result to a variable at the top of the
_plot_scatter function and used that throughout the rest of the function.
Save space and show off your skills by printing the y-axis title vertically down the side of the graph, like so:
The README for
bashplotlib shows an example scatterplot that has its “0-axes” (lines showing where the axes cross 0) displayed in the middle of the graph:
However, it seems that somewhere the code that creates these 0-axes got lost (nb. I did not delete the code for teaching purposes, it really does seem to have got lost).
0-axes can be very useful for interpreting graphs, so let’s add them back in. Here’s how our 0-axes should work:
Owhere they cross
When I was working on this feature myself, I realized that I had to actually work out what the double-nested-for-loop inside the
_plot_scatter function was doing. Up until this point I hadn’t needed to, so I hadn’t bothered. I added a few comments and tweaked the code slightly to make it easier for me to understand. Feel free to do this yourself if it helps you.
Allow users to hide these new axes if they pass an optional
hide_zero_axes argument into the
plot_scatter function. 0-axes should be displayed by default, so the default value for this argument should be
Making changes to any codebase, especially a large, unfamiliar one, can be unnerving. You may have successfully added your new feature, but how do you know that it works correctly with every possible edge-case? And how do you know that you haven’t inadvertently broken an already existing feature in some subtle (or not-so-subtle) but important way?
The slow, unscalable, error-prone way to test your code is to manually run it using lots of different combinations of parameters, and laboriously check that everything looks correct. This is better than not testing at all, but does nothing to prevent someone else from unwittingly breaking your new feature in the future.
The solution is “unit tests”. A unit test is a block of code that calls functions from your main code, and verifies that these functions behave correctly. For example, if you wanted to check the correctness of a new function called
sum_digits, which takes a number and returns the sum of its digits (in base 10), one of your tests might be:
result = sum_digits(543) expected_result = 12 if result == expected_result: print("SUCCESS!") else: print("FAILURE!") print("Expected:") print(expected_result) print("Got:") print(result)
You might also write other tests that make sure that
sum_digits works correctly for other types of number, like negative numbers, decimals, and the number 0.
Unit tests make verifying the correctness of your code much quicker and more reliable - all you have to do is run your test file and make sure none of the tests fail. It also means that future people who work with your code (including yourself) will know how it’s meant to behave, and will be less likely to accidentally break it.
At Stripe, where I work, I would guess that the codebase has well over 10,000 tests, and maybe many more. This means that anyone can add new features and update existing ones, without being unduly concerned that they will unwittingly destroy the company.
bashplotlib does not currently have any tests. This is a perfectly reasonable choice by the author. Writing and maintaining tests takes time, and he probably felt that the library was small and experimental enough that he didn’t want the bother. Nonetheless, adding tests would make the library more reliable and easier to work with. While I was writing this project I noticed and fixed a small bug that would have been caught automatically if the library had had any tests.
However, right now
bashplotlib is not very “testable”. A piece of code’s “testability” is how easy it is to write tests for. There are many reasons why a piece of code might or might not be testable.
bashplotlib’s main problem is that it prints all of its output to the terminal piece-by-piece. Automated tests for the current version of
bashplotlib would have to somehow read in output that had previously been printed to the terminal. There are ways to do this (or something close to it), but they are somewhat complicated and fiddly. There is a better solution.
In this task we’ll first make
bashplotlib testable, and then we’ll write tests for it.
bashplotlib “testable” by changing the
_plot_scatter function so that it no longer prints the graph, but instead builds and returns the graph as a long string. The
plot_scatter function (which calls
_plot_scatter) will then be responsible for printing this long string to the terminal:
def _plot_scatter(...arguments, etc...) graph = "" # ...build a graph... graph += "+---+etc" # ...build the rest of the graph... return graph def plot_scatter(...arguments, etc...) # ...process arguments... graph_text = _plot_scatter(...arguments, etc...) print(graph_text)
This seemingly minor shuffling of responsibilities makes the
_plot_scatter function much easier to test. Now we can write test code that calls
_plot_scatter, stores the result in a variable, and compares this result to a pre-defined, expected value. For example, in a new file called
test.py, we can write:
# test.py from bashplotlib import _plot_scatter # Copy and paste the value for `expected_result` # from terminal output that you know is correct. expected_result = """+--------... | ...etc... | +--------...""" # Call `_plot_scatter` with the arguments that # you know will produce the `expected_result` # above. result = _plot_scatter(...arguments, etc...) if expected_result == result: print("SUCCESS!") else: print("FAILURE!") print("Expected:") print(expected_result) print("Found:") print(result)
Think of as many edge-cases as you can, and write individual tests for them all.
I suggest that you tackle this task using the following steps:
_plot_scatterso that it returns a string representing the graph, and doesn’t print anything
plot_scatterso that it stores and prints the output of
test.pyin the root of the library (alongside
_plot_scatterto make sure that it generates a normal graph correctly. Run it using
Your tests might have trouble interpreting color printing. Feel free to get rid of the color-printing functionality for now, or look at my version for suggestions on how to deal with it.
For another example of unit testing, see section 3 of Programming Projects for Advanced Beginners #2: Game of Life.
PyUnit, a Python testing framework, and rewrite our tests with it. Although our hand-crafted if-statements are perfectly serviceable, most test suites are written using a testing framework (like
PyUnit), which makes writing, running, and debugging tests much easier.
So far all of our work has been on improving scatterplots. But
bashplotlib can plot histograms too. Technically they’re not actually histograms; the y-axis in
bashplotlib’s graphs represents a raw count, whereas in a histogram the y-axis represents “probability density”. “Bar charts” would be a more accurate word for what
bashplotlib produces. But I think everyone still understands what the library is getting at, so let’s not say anything nasty on Twitter. For consistency I’m going to keep referring to them as “histograms”.
bashplotlib’s histograms generate their titles using the same
box_title function that we worked on in step 1, so they automatically benefit from our previous work. However, histograms also display a statistical summary of their data, in another box below the main graph. This box is generated by hand. We’ve already got a good way of printing text inside beautiful ASCII boxes, so let’s use this method here too. This will make our statistics look more elegant and consistent, and reduce code duplication
To work on histograms, you’ll have to call
plot_hist from your
scratch.py file. Try to set this up yourself.
We have a small problem -
box_text currently assumes that you only want to print a single line inside your box, but our statistics are spread out over multiple lines. To solve this snafu, update
box_text so that the argument it accepts is a list of strings, not just a single string. Have the function display each element of the list on a separate line, and surround all the lines with an ASCII box. We’ll then use the function to create our stats box like so:
center = ... # Defined earlier in the file summary_lines = [ "## Summary ##".center(center), ("observations: %d" % n).center(center), # ...etc... ] print(box_text(summary_lines, center))
To maintain compatibility with our existing code, update the places where
box_text is used for displaying titles so that the titles are passed in as a single element list, like this:
Because we’re now using the
box_text function to render the histogram statistics box, we no longer get an emphasizing line in between “Summary” and “observations” (see picture above). I thought that line looked quite fetching, so let’s bring it back. Update
box_text so that it takes an argument of a “list of lists”.
box_text should print out each line in the sub-lists, as normal, but in between each top level list it should print an ASCII line. For example, the code:
box_text([ ["Title"], ["Sub-title1", Sub-title2"], ["Point1", "Point2", "Point3"] ])
should produce the output:
+----------+ | Title | +----------+ |Sub-title1| |Sub-title2| +----------+ | Point1 | | Point2 | | Point3 | +----------+
Make sure all the other usages of
box_text still work.
Do the same thing that you did in task 2 - add optional titles to the x- and y-axes - but this time for histograms. It’s a lot easier the second time round, isn’t it?
You’ll probably find that you repeat code between the
histogram.py files. Repeating code like this should always make you feel a little bit uncomfortable, and should prompt you to consider ways to reduce or eliminate the duplication. That said, sometimes repeating code can be an entirely pragmatic choice.
In our situation, we could either try to wrap up our axis title code in some sort of
print_axis_title function that lives in
utils/helpers.py, or we could copy and paste and duplicate our code in both
histogram.py. For this task, I could personally go either way. On the one hand, it’s always good to reduce repeated code. But on the other hand, it’s not obvious to me that scatterplots and histograms will always and forever want to use the same logic for printing their axis labels. For example, what happens if one of them wants to print some extra characters on the same line as the axis label? We’d probably have to add something convoluted like optional
extra_chars_right parameters. Trying to combine two pieces of code that might not be as combine-able as they first appear can lead to some awkward approaches. Since the code for printing axis labels is relatively simple anyway, on this occasion I lean in favor of copying and pasting.
I’m still glad that I thought about it though.
I reckon you can tackle this one on your own.
I’ve got nothing. Call your parents and see how they’re doing.
This is my final feature request, and it’s a big one.
Let’s allow users to draw horizontal histograms:
This will help users display big, detailed histograms that are too tall for their short and stocky terminals, but that fit nicely when rotated.
This will be very hard. Take inspiration from
histogram.py, but not too much inspiration. Feel free to start from scratch if you prefer. I’d suggest that you:
histogram.pyinto a new
horizontal_histogram.pyand run its code in
scratch.py. Test it by using it to draw some normal, still-vertical histograms
Pull out as much duplicated code as you can from vertical and horizontal histograms, and move it into shared functions. These functions might perform common operations like bucketing data points and calculating scales.
bashplotlib/pie_chart.pyfile to put the logic in, and add a new
You’re done! Keep the momentum going - choose another open source project (use Google and this list for inspiration) and add experimental new features to it. Don’t worry much about actually submitting your work to the main project (unless you want to) - your goal should be to understand and experiment with new projects, not to produce something polished and perfect and ready to be deployed to production servers. Alternatively, try tackling the Programming Projects for Advanced Beginners series, where you build your own programs from scratch using a similar level of detailed (but not too detailed) guidance to that of this project.
I’d love to hear your feedback and find out how you got on. Send me an email or a Tweet (with a link to your GitHub repo if you have one) and I’ll send you back some insightful comments and a ASCII art trophy.