software

This is a brief portfolio of software development projects I’ve been involved in, including both research software and proprietary software for my current role at the Natural History Museum.

R packages

  • predictsr is an R package to load the PREDICTS database into R dataframes and cache them to disk. Published on CRAN.
  • ssmooth is an R package for smoothing spatial data using statistical methods. Uses C++ for the core smoothing algorithms, and is published on CRAN.

Selected Proprietary Works

  • BII codebase (NHM): R packages to ingest, transform, and model the PREDICTS database, which is used to generate the Biodiversity Intactness Index. Using R geospatial stack: terra, sf, lme4, etc.
  • BII HPC pipeline (NHM): HPC pipelines to generate global biodiversity data for the BII. Using targets and Docker containers, running on AWS EC2.
  • BII QA pipeline (NHM): generate QA checks for the BII data, including visualisations and summary statistics. Standard geospatial data processing using R geospatial stack: terra, sf, ggplot2, etc.

Research software

  • statFEM code to accompany these papers (FEM code is implemented with FEniCS):
    • The 1D case (including the internal wave experimental data) is housed at statkdv-paper. This code computes the posterior (filtering) distribution using extended and ensemble Kalman filters.
    • The 2D case is located here. This also computes the posterior filtering distribution using a low-rank extended Kalman filter algorithm.
  • ula-statfem: code to explore Langevin dynamics samplers with statFEM. Published in SIAM JUQ. Includes sfmcmc package which implements a bunch of different MCMC samplers, including ULA (see this classic paper), MALA, and preconditioned Crank-Nicolson.
  • statfenics: a set of tools to help when doing statistics with FEniCS. Includes functions to build Gaussian process inference on top of FEniCS, and utilities to aid in interpolating finite element solutions.

Personal projects

  • dotfiles for *nix systems that I use day-to-day. I use yadm to manage these files across all the systems I use (e.g., work laptop, HPC). Includes configurations for zsh, tmux, neovim, and more.
  • uLinalg: a C++ library to handle 2D array operations, including broadcasting, LU decompositions, and Cholesky decompositions. A bit like a mini version of numpy.
  • pspectral: a C++ library to compute pseudospectral solutions to differential equations, such as Navier-Stokes, using Fourier transforms (FFTW).
  • punter: a C++ implementation of the Shunting-Yard to parse mathematical expressions from infix to postfix notation.