import numpy as np
x = np.array([2.0, 3.0])
f(x, 0.0)array([2., 3.])
[!NOTE] This section is based heavily on Ned Batchelder’s excellent article and PyCon 2014 talk Getting Started Testing and the corresponding chapter of Python 102.
How can you write reusable and trustworthy code?
After making changes to a program, how do you ensure that it will still give the same answers as before?
How can we make finding and fixing bugs a simple, fun, and rewarding minimally frustrating experience?
These seemingly unrelated questions all have the same answer, and it is automated testing.
For scalar \(t\), consider the parametrized matrix equation
\[A(t)f = b\]
and consider it’s solution as a function of \(t\), e.g. \(f(t) = A(t)^{-1}b\). If \(A(t)\) is differentiable and invertible for all \(t\), then there’s a cool matrix identity which tells us that
\[\frac{d}{dt} A(t)^{-1} = -A(t)^{-1} A'(t) A(t)^{-1}.\]
Hence
\[\frac{d}{dt} f(t) = \frac{d}{dt}\!\left(A(t)^{-1} x\right) = -A(t)^{-1} A'(t) A(t)^{-1} x.\]
For a concrete example, let
\[A(t) = \begin{pmatrix} 1+t & 0 \\ 0 & 1-t \end{pmatrix}.\]
Then
\[f(x; t) = \begin{pmatrix} \dfrac{x_1}{1+t} \\ \dfrac{x_2}{1-t} \end{pmatrix}, \qquad \frac{d}{dt} f(x; t) = \begin{pmatrix} -\dfrac{x_1}{(1+t)^2} \\ \dfrac{x_2}{(1-t)^2} \end{pmatrix}.\]
Here is an original implementation based on a hand derivation. However, I’ve made a critical error somewhere in here! There are bug(s) in this function that we need to find and fix. On your own, test the function for various inputs and compare the results obtained with expected output.
matrix_derivative_v1.py
import numpy as np
from numpy.typing import NDArray
def A(t: float) -> NDArray[float]:
"""Return the matrix A(t)."""
return np.array([[1.0 + t, 0.0], [0.0, 1.0 - t]])
def dA_dt(t: float) -> NDArray[float]:
"""Return an derivative of A(t)."""
return np.array([[-1.0, 0.0], [0.0, 1.0]])
def f(x: NDArray[float], t: float) -> NDArray[float]:
"""Compute f(x; t) = A(t)^{-1} x."""
return np.linalg.solve(A(t), x)
def df_dt(x: NDArray[float], t: float) -> NDArray[float]:
"""Compute the derivative using a buggy version of d(A^{-1})/dt."""
return -np.linalg.solve(A(t), dA_dt(t) @ f(x, t))The quickest way to test a mathematical claim is often to try a few simple values in an interactive Python session:
import numpy as np
x = np.array([2.0, 3.0])
f(x, 0.0)array([2., 3.])
df_dt(x, 0.0)array([ 2., -3.])
The value of \(f(x; 0)\) is correct, since \(A(0) = I\). But the derivative is already suspicious: from the formula above we expect \((-2, 3)\), not \((2, -3)\).
Interactive testing is useful, but it has the same limitations here as everywhere else. You have to remember what you tried, you have to rerun everything manually after each change, and you still have to inspect the results yourself.
A much better approach is to put the checks into a script:
test_matrix_derivative_v1.py
import numpy as np
def A(t: float) -> np.ndarray:
return np.array([[1.0 + t, 0.0], [0.0, 1.0 - t]])
def dA_dt(t: float) -> np.ndarray:
return np.array([[-1.0, 0.0], [0.0, 1.0]])
def f(x: np.ndarray, t: float) -> np.ndarray:
return np.linalg.solve(A(t), x)
def df_dt(x: np.ndarray, t: float) -> np.ndarray:
return -np.linalg.solve(A(t), dA_dt(t) @ f(x, t))
x = np.array([2.0, 3.0])
t = 0.2
print("f(x; t) =", f(x, t))
print("df_dt(x, t) =", df_dt(x, t))Now we can run the same checks whenever we want:
$ python test_matrix_derivative.py
f(x; t) = [1.66666667 3.75 ]
df_dt(x, t) = [ 1.38888889 -4.6875 ]This is more reproducible, but we still have to inspect the output and decide for ourselves whether the numbers are correct.
For mathematics-heavy code, assertions are especially useful because we usually know what a small reference calculation should be.
The assert statement checks whether a condition is true. If it is false, Python raises an AssertionError:
We can now replace printed output with a statement of the expected answer:
test_matrix_derivative_v2.py
import numpy as np
def A(t: float) -> np.ndarray:
return np.array([[1.0 + t, 0.0], [0.0, 1.0 - t]])
def dA_dt(t: float) -> np.ndarray:
return np.array([[-1.0, 0.0], [0.0, 1.0]])
def f(x: np.ndarray, t: float) -> np.ndarray:
return np.linalg.solve(A(t), x)
def df_dt(x: np.ndarray, t: float) -> np.ndarray:
return -np.linalg.solve(A(t), dA_dt(t) @ f(x, t))
x = np.array([2.0, 3.0])
t = 0.0
assert np.allclose(df_dt(x, t), np.array([-2.0, 3.0]))If we run this script,
$ python test_matrix_derivative.py
Traceback (most recent call last):
File "test_matrix_derivative.py", line 8, in <module>
assert np.allclose(df_dt(x, t), np.array([-2.0, 3.0]))
AssertionErrorwe immediately learn that the claimed derivative is inconsistent with a simple hand calculation at \(t = 0\).
When testing a derivative, it is often useful to have a second way to compute it. One common option is a finite-difference approximation:
\[\frac{d}{dt} f(x; t) \approx \frac{f(x; t+h) - f(x; t-h)}{2h}.\]
This is not an exact formula, but for a small value of \(h\) it provides an independent numerical check.
Here is a script that compares the analytic derivative with both a hand-computed formula and a centered finite difference:
test_matrix_derivative_v3.py
import numpy as np
from numpy.typing import NDArray
def A(t: float) -> NDArray[float]:
return np.array([[1.0 + t, 0.0], [0.0, 1.0 - t]])
def dA_dt(t: float) -> NDArray[float]:
return np.array([[-1.0, 0.0], [0.0, 1.0]])
def f(x: NDArray[float], t: float) -> NDArray[float]:
return np.linalg.solve(A(t), x)
def df_dt(x: NDArray[float], t: float) -> NDArray[float]:
return -np.linalg.solve(A(t), dA_dt(t) @ f(x, t))
def finite_difference_df_dt(
x: NDArray[float], t: float, h: float = 1.0e-6
) -> NDArray[float]:
return (f(x, t + h) - f(x, t - h)) / (2.0 * h)
x = np.array([2.0, 3.0])
t = 0.2
assert np.allclose(df_dt(x, t), np.array([-2.0 / 1.2**2, 3.0 / 0.8**2]))
assert np.allclose(df_dt(x, t), finite_difference_df_dt(x, t), atol=1.0e-8)$ python test_matrix_derivative.py
Traceback (most recent call last):
File "test_matrix_derivative.py", line 13, in <module>
assert np.allclose(df_dt(x, t), np.array([-2.0 / 1.2**2, 3.0 / 0.8**2]))
AssertionErrorAgain, the script stops at the first failure. That is enough to tell us there is a problem, but not enough to summarize all the checks we care about.
A test runner takes a collection of tests, executes them all, and then reports which passed and which failed. A very popular test runner for Python is pytest.
To use pytest, we rewrite each check as a separate test function:
test_matrix_derivative_v4.py
import numpy as np
from numpy.typing import NDArray
def A(t: float) -> NDArray[float]:
return np.array([[1.0 + t, 0.0], [0.0, 1.0 - t]])
def dA_dt(t: float) -> NDArray[float]:
return np.array([[-1.0, 0.0], [0.0, 1.0]])
def f(x: NDArray[float], t: float) -> NDArray[float]:
return np.linalg.solve(A(t), x)
def df_dt(x: NDArray[float], t: float) -> NDArray[float]:
return -np.linalg.solve(A(t), dA_dt(t) @ f(x, t))
def finite_difference_df_dt(
x: NDArray[float], t: float, h: float = 1.0e-6
) -> NDArray[float]:
return (f(x, t + h) - f(x, t - h)) / (2.0 * h)
def test_f_at_zero() -> None:
x = np.array([2.0, 3.0])
assert np.allclose(f(x, 0.0), np.array([2.0, 3.0]))
def test_df_dt_matches_formula_at_zero() -> None:
x = np.array([2.0, 3.0])
assert np.allclose(df_dt(x, 0.0), np.array([-2.0, 3.0]))
def test_df_dt_matches_finite_difference() -> None:
x = np.array([2.0, 3.0])
t = 0.2
assert np.allclose(df_dt(x, t), finite_difference_df_dt(x, t), atol=1.0e-8)If you are already using SciPy, it also provides tools for derivative checks. In particular, scipy.optimize.check_grad can be useful when your problem can be phrased in terms of gradients of scalar-valued functions. Here we will keep the test self-contained and write the expected formula directly.
To run the tests, use the project environment:
$ uv run pytest code/test_matrix_derivative_v4.py
collected 3 items
code/test_matrix_derivative_v4.py .FF [100%]
=================================== FAILURES ===================================
______________________ test_df_dt_matches_formula_at_zero ______________________
def test_df_dt_matches_formula_at_zero():
x = np.array([2.0, 3.0])
> assert np.allclose(df_dt(x, 0.0), np.array([-2.0, 3.0]))
E assert False
E + where False = <function allclose at ...>(array([ 2., -3.]), array([-2., 3.]))
code/test_matrix_derivative_v4.py:17: AssertionError
_____________________ test_df_dt_matches_finite_difference _____________________
def test_df_dt_matches_finite_difference():
x = np.array([2.0, 3.0])
t = 0.2
> assert np.allclose(df_dt(x, t), finite_difference_df_dt(x, t), atol=1.0e-8)
E assert False
E + where False = <function allclose at ...>(array([ 1.38888889, -4.6875 ]), array([-1.38888889, 4.6875 ]), atol=1e-08)
code/test_matrix_derivative_v4.py:23: AssertionError
========================= 2 failed, 1 passed in 0.07s ==========================The report is already informative. One test passed, so our function evaluation is fine. Two derivative tests failed, and both failures point to the same issue: the expected vector is the negative of what our code returned.
Now that we know how to run tests, which tests are actually useful for a derivative like this?
Arbitrary choices of \(x\) and \(t\) are less helpful than cases that make the structure of the mathematics visible. For this example, useful tests include:
These are the kinds of tests included in the pytest file above.
The tests suggest that the code used
\[+A(t)^{-1} A'(t) A(t)^{-1} x\]
instead of
\[-A(t)^{-1} A'(t) A(t)^{-1} x.\]
In other words, the hand derivation dropped the overall minus sign.
Here is the corrected implementation:
matrix_derivative_v2.py
import numpy as np
from numpy.typing import NDArray
def A(t: float) -> NDArray[float]:
"""Return the matrix A(t)."""
return np.array([[1.0 + t, 0.0], [0.0, 1.0 - t]])
def dA_dt(t: float) -> NDArray[float]:
"""Return the derivative of A(t) with respect to t."""
return np.array([[1.0, 0.0], [0.0, -1.0]])
def f(x: NDArray[float], t: float) -> NDArray[float]:
"""Compute f(x; t) = A(t)^{-1} x."""
return np.linalg.solve(A(t), x)
def df_dt(x: NDArray[float], t: float) -> NDArray[float]:
"""Compute the derivative using d(A^{-1})/dt = -A^{-1} A' A^{-1}."""
return -np.linalg.solve(A(t), dA_dt(t) @ f(x, t))If we update the tests to use the corrected implementation, we obtain
test_matrix_derivative_v5.py
import numpy as np
from numpy.typing import NDArray
def A(t: float) -> NDArray[float]:
return np.array([[1.0 + t, 0.0], [0.0, 1.0 - t]])
def dA_dt(t: float) -> NDArray[float]:
return np.array([[1.0, 0.0], [0.0, -1.0]])
def f(x: NDArray[float], t: float) -> NDArray[float]:
return np.linalg.solve(A(t), x)
def df_dt(x: NDArray[float], t: float) -> NDArray[float]:
return -np.linalg.solve(A(t), dA_dt(t) @ f(x, t))
def finite_difference_df_dt(
x: NDArray[float], t: float, h: float = 1.0e-6
) -> NDArray[float]:
return (f(x, t + h) - f(x, t - h)) / (2.0 * h)
def test_f_at_zero() -> None:
x = np.array([2.0, 3.0])
assert np.allclose(f(x, 0.0), np.array([2.0, 3.0]))
def test_df_dt_matches_formula_at_zero() -> None:
x = np.array([2.0, 3.0])
assert np.allclose(df_dt(x, 0.0), np.array([-2.0, 3.0]))
def test_df_dt_matches_finite_difference() -> None:
x = np.array([2.0, 3.0])
t = 0.2
assert np.allclose(df_dt(x, t), finite_difference_df_dt(x, t), atol=1.0e-8)and the test runner now reports
$ uv run pytest code/test_matrix_derivative_v5.py
collected 3 items
code/test_matrix_derivative_v5.py ... [100%]
============================== 3 passed in 0.10s ===============================This is exactly why tests are useful: they give us a way to check that the bug is fixed and that we did not break the rest of the calculation while fixing it.
Software testing is a vast topic and there are many levels and types of software testing.
For scientific and research software, the focus of testing efforts is primarily:
Test-driven development (TDD) is the practice of writing tests for a function or method before actually writing any code for that function or method. The TDD process is to:
Proponents of TDD suggest that this results in better code. Whether or not TDD sounds appealing to you, writing tests should be part of your development process and never an afterthought. In the process of writing tests, you often come up with new corner cases for your code and realize better ways to organize it. The result is usually code that is more modular, more reusable, and of course more testable than if you did not do any testing.
More tests are always better than fewer, and your code should have as many tests as you are willing to write. That being said, some tests are more useful than others. Designing a useful suite of tests is a challenge in itself, and it helps to keep the following in mind when growing tests: