I was selected as an intern to work on SciPy build system. In this blog post, I will be describing my journey of this 10-months long internship at SciPy. I worked on a variety of topics starting from migrating the SciPy build system to Meson, cleaning up the public API namespaces and adding Uarray support to SciPy submodules.
Meson Build System
The main reasons for switching to Meson include (in addition to
distutils being deprecated):
- Much faster builds
- Support for cross-compiling
- Better build logs
- Easier to debug build issues
For more details on the initial proposal to switch to Meson, see scipy-13615
I was initially selected to work on the migrating the SciPy build system to meson. I started by adding Meson build support for scipy.misc and scipy.signal. While working on this, we came across many build warnings which we wanted to fix, since they unnecessarily increased the build log and might point to some hidden bugs. I fixed these warnings, the majority of which came from deprecated NumPy C API calls.
- I also started benchmarking the Meson build with various optimization levels, during which I ended up finding some failing benchmark tests and tried to fix them.
- I implemented the dev.py interface that works in a similar way to
runtests.py, but using Meson for building SciPy.
- I extended my work on the Meson build by writing Python scripts for checking the installation of all test files and .pyi files.
- I documented how to use dev.py, and use parallel builds and optimization levels with Meson.
- I added meson option to switch between BLAS/LAPACK libraries.
Meson build support including all the above work was merged into SciPy’s
main branch around Christmas 2021. Meson will now become the default build in the upcoming 1.9.0 release.
Making cleaner public namespaces
What’s the issue?
“A basic API design principle is: a public object should only be available from one namespace. Having any function in two or more places is just extra technical debt, and with things like dispatching on an API or another library implementing a mirror API, the cost goes up.”
>>> from scipy import ndimage >>> ndimage.filters.gaussian_filter is ndimage.gaussian_filter # :( True
The API reference docs of SciPy define the public API. However, SciPy still had some submodules that were accidentally somewhat public by missing an underscore at the start of their name.
I worked on cleaning the pubic namespaces for about a couple of months by carefully adding underscores to the
.py files that were not meant to be public and added depecrated warnings if anyone tries to access them.
>>> from scipy import ndimage >>> ndimage.filters.gaussian_filter is ndimage.gaussian_filter <stdin>:1: DeprecationWarning: Please use `gaussian_filter` from the `scipy.ndimage` namespace, the `scipy.ndimage.filters` namespace is deprecated. True
Adding Uarray support
“SciPy adopted uarray to support a multi-dispatch mechanism with the goal being: allow writing backends for public APIs that execute in parallel, distributed or on GPU.”
For about the last four months, I worked on adding Uarray support to SciPy submobules. I do recommend reading this blog post by Anirudh Dagar covering the motivation and actual usage of
uarray. I picked up the following submodules for adding
At the same time, in order to show a working prototype, I also added
uarray backends in CuPy to the following submodules:
The pull requests contain links to Colab notebooks which show these features in action.
What does usage of such a backend look like?
import scipy import cupy as cp import numpy as np from scipy.linalg import inv, set_backend import cupyx.scipy.linalg as _cupy_backend x_cu, x_nu = cp.array([[1., 2.], [3., 4.]]), np.array([[1., 2.], [3., 4.]]) y_scipy = inv(x_nu) with set_backend(_cupy_backend): y_cupy = inv(x_cu)
- The “switch to Meson” project is nearing its completion. One of the final issues was to allow building wheels with the
- The PRs opened for adding
uarraysupport are still under heavy discussion, and the main aim will be get them merged as soon as possible once we have reached a concrete decision.
Things to remember
- Patience: Setting up new project always takes some time. We might need to update/fix the system libraries and try to resolve the errors gradually.
- Learning: Learning new things was one of the main key during the internship. I was completely new to build systems and GPU libraries.
I am very grateful to Ralf Gommers for providing me with this opportunity and believing in me. His guidance, support and patience played a major role during the entire course of internship. I am also thankful to whole SciPy community for helping me with the PR reviews and providing essential feedback. Also, huge thanks to Gagandeep Singh for always being a part of this wonderful journey.
In a nutshell, I will remember this experience as: Ralf Gommers has boosted my career by millions!