GSoC'21: Pre-Quarter Progress

“Well? Did you get it working?!”

Before I answer that question, if you’re missing the context, check out my previous blog’s last few lines.. promise it won’t take you more than 30 seconds to get the whole problem!

With this short writeup, I intend to talk about what we did and why we did, what we did. XD

Ostrich Algorithm#

Ring any bells? Remember OS (Operating Systems)? It’s one of the core CS subjects which I bunked then and regret now. (╥﹏╥)

The wikipedia page has a 2-liner explanation if you have no idea what’s an Ostrich Algorithm.. but I know most of y’all won’t bother clicking it XD, so here goes:

Ostrich algorithm is a strategy of ignoring potential problems by “sticking one’s head in the sand and pretending there is no problem”

An important thing to note: it is used when it is more cost-effective to allow the problem to occur than to attempt its prevention.

As you might’ve guessed by now, we ultimately ended up with the not-so-clean API (more on this later).

What was the problem?#

The highest level overview of the problem was:

❌ fontTools -> buffer -> ttconv_with_buffer
✅ fontTools -> buffer -> tempfile -> ttconv_with_file

The first approach created corrupted outputs, however the second approach worked fine. A point to note here would be that Method 1 is better in terms of separation of reading the file from parsing the data.

fontTools handles the Type42 subsetting for us, whereas ttconv handles the embedding.
ttconv_with_buffer is a modification to the original ttconv_with_file; that allows it to input a file buffer instead of a file-path

You might be tempted to say:

“Well, ttconv_with_buffer must be wrongly modified, duh.”

Logically, yes. ttconv was designed to work with a file-path and not a file-object (buffer), and modifying a codebase written in 1998 turned out to be a larger pain than we anticipated.

It came to a point where one of my mentors decided to implement everything in Python!#

He even did, but the efforts to get it to production / or to fix ttconv embedding were ⋙ to just get on with the second method. That damn ostrich really helped us get out of that debugging hell. 🙃

Font Fallback - initial steps#

Finally, we’re onto the second subgoal for the summer: Font Fallback!

To give an idea about how things work right now:

User asks Matplotlib to use certain font families, specified by:

matplotlib.rcParams["font-family"] = ["list", "of", "font", "families"]

This list is used to search for available fonts on a user’s system.
However, in current (and previous) versions of Matplotlib:

As soon as a font is found by iterating the font-family, all text is rendered by that and only that font.

You can immediately see the problems with this approach; using the same font for every character will not render any glyph which isn’t present in that font, and will instead spit out a square rectangle called “tofu” (read the first line here).

And that is exactly the first milestone! That is, parsing the entire list of font families to get an intermediate representation of a multi-font interface.

Don’t break, a lot at stake!#

Imagine if you had the superpower to change Python standard library’s internal functions, without consulting anybody. Let’s say you wanted to write a solution by hooking in and changing, let’s say str("dumb") implementation by returning:

>>> str("dumb")
["d", "u", "m", "b"]

Pretty “dumb”, right? xD

For your usecase it might work fine, but it would also mean breaking the entire Python userbase’ workflow, not to mention the 1000000+ libraries that depend on the original functionality.

On a similar note, Matplotlib has a public API known as findfont(prop: str), which when given a string (or FontProperties) finds you a font that best matches the given properties in your system.

It is used throughout the library, as well as at multiple other places, including downstream libraries. Being naive as I was, I changed this function signature and submitted the PR. 🥲

Had an insightful discussion about this with my mentors, and soon enough raised the other PR, which didn’t touch the findfont API at all.

One last thing to note: Even if we do complete the first milestone, we wouldn’t be done yet, since this is just parsing the entire list to get multiple fonts..

We still need to migrate the library’s internal implementation from font-first to text-first!

But that’s for later, for now: Bernie Sanders with text that read ‘I am once again thanking you for reading.’