GSoC 2026 in libsemigroups

Last modified: Feb 12, 2026

Introduction

The libsemigroups library for computational semigroup theory contains efficient implementations of widely used and novel algorithms in computational semigroup theory. This library forms the core of the GAP package Semigroups which is used by researchers in semigroup theory for preliminary computational explorations that motivate mathematical discoveries or for finding counterexamples to conjectures. libsemigroups has also found applications in tooling for cryptography researchers and as a computational oracle for mathematical formalization projects.

The libsemigroups project focuses on efficiency and ease of use. For the second goal, we maintain bindings for the libsemigroups library in Python in the libsemigroups_pybind11 project and the GAP package Semigroups which depends on libsemigroups. There is also an ongoing effort to provide Julia bindings as part of the Semigroups.jl project.

The development of libsemigroups occurs primarily at the University of St Andrews with multiple external collaborators, and we would like to welcome many more as part of the Google summer of code program! We believe it is a great opportunity for participants to contribute to an open-source project in mathematics and gain insights into computational semigroup theory.

Project ideas

Below is a list of ideas for projects within the libsemigroups organization, along with potential mentors and a rough estimate of the length and difficulty of the project.

Contents

1. Julia bindings Semigroups.jl for libsemigroups

Description: The goal of this project is to create functioning bindings of the libsemigroups library into Julia as part of the Semigroups.jl project. The libsemigroups_pybind11 project implements something similar for Python. The aim is for Semigroups.jl to expose the low-level functionality of libsemigroups to be used as a foundation for a separate higher level package. In particular, a longer term aim of this project is to replace the kernel extension of the GAP package Semigroups with Semigroups.jl.

This project may involve:

Mentors: Joe Edwards and James Mitchell

Length: 350h

Difficulty: Easy-Medium

Prerequisites:


2. Formal verification of Knuth-Bendix and Todd-Coxeter

Description: A computer algebra system may produce an answer, but how much confidence can we have that it is correct? We might compute the same solution using two different algorithms, if they match, then we have higher confidence. Another approach is to use a proof assistant such as rocq or lean to produce a formal proof of the result, however doing this by hand is rather cumbersome and does not scale well. But of course, if the underlying algorithm is correct, it should be possible to produce such proof certificates automatically.

The aim of this project is to extend the existing implementation of the Knuth-Bendix and the Todd-Coxeter algorithms in libsemigroups to have them return a trace of the decisions made during their runtime. This trace can then be converted into a formal proof of correctness, as is done for example in the MonoidPresentation project in rocq.

This project may involve:

Mentors: Reinis Cirpons, Joe Edwards and Florent Hivert

Length: 175h or 350h

Difficulty: Medium-Hard

Prerequisites:


3. AI for auto-tuning undecidable problems

Description: The aim of this project is to use techniques from AI and machine learning to guide computations related to undecidable problems. A classical undecidable problem is the semigroup finiteness problem, which asks whether or not a finite presentation defines a finite semigroup. There is no program that can always answer this question in finite time, but there are several semidecision algorithms, which will always give an answer if a semigroup is finite, but may run forever on an infinite input.

Two key semidecision algorithms for the finiteness problem are the Knuth-Bendix and Todd-Coxeter algorithms, both of which are implemented in libsemigroups. These algorithms have different trade-offs and often one yields an answer to a particular question much faster than the other, but due to the undecidable nature of the problem is it very hard to determine which algorithm to apply first, or how long to wait before switching between them. Such situations are very common in computational semigroup theory and we believe there is potential to optimize the usage of these algorithms using a machine learning based approach.

This project may involve:

Mentors: James Mitchell and Finn Smith

Length: 175h

Difficulty: Easy-Medium

Prerequisites:


4. High-performance tropical algebra library

Description: There are a number of high-performance linear algebra packages for linear algebra involving matrix and vector manipulation over fields (usually the real numbers or finite fields). Some well-known examples among many others are: LinBox and Eigen. This project will involve implementing (at least some of) the BLAS specification for matrices and vectors of tropical semirings in modern C++. Tropical semirings, often using min-plus or max-plus algebra, are powerful tools for solving optimization, scheduling, and combinatorial problems by replacing addition/multiplication with minimum/maximum/summation. As such a high-performance library implementing linear algebra over tropical semirings is likely to be widely applicable.

This project may involve:

Mentors: Florent Hivert and Finn Smith

Length: 350h

Difficulty: Medium-Hard

Prerequisites:


5. Computing congruences of finite inverse semigroups

Description: In the paper

the authors present a novel algorithm for computing a congruence on an inverse semigroup from a collection of generating pairs. This algorithm uses a myriad of techniques from the theories of groups, automata, and inverse semigroups. An initial implementation of this algorithm outperforms existing implementations by several orders of magnitude. In this project, you would implement this algorithm either in GAP package Semigroups or within the C++ library libsemigroups.

This project may involve:

Mentors: Reinis Cirpons and James Mitchell.

Length: 90h or 175h

Difficulty: Medium

Prerequisites:


6. Improving errors and their messages in GAP

GAP is a system for computational discrete algebra, with particular emphasis on Computational Group Theory. GAP provides a programming language, a library of thousands of functions implementing algebraic algorithms written in the GAP language as well as large data libraries of algebraic objects.

The aim of this project is to improve errors and their messages in GAP in something of the same spirit as those introduced in Python 3.13.

This project may involve:

Mentors: Joe Edwards and Michael Young

Length: 90h or 175h

Difficulty: Easy

Prerequisites:


7. Nielsen-Schreier free bases

A first course in linear algebra proves that every subspace of a linear space has a basis. There is an analogue of this statement for free groups, known as the Nielsen-Schreier theorem, which states that every subgroup of a free group has a basis. For finitely generated subgroups, there exist rather constructive proofs, which lead to algorithms for computing the basis explicitly, in the same way that Gaussian elimination can be used to find the basis for a vector space. The goal of this project would be to implement an algorithm for finding the free basis of a finitely generated subgroup of a free group.

This project may involve:

Mentor: Reinis Cirpons

Length: 175h

Difficulty: Easy-Medium

Prerequisites:


8. Free bands

In the paper:

the authors present efficient computational solutions to the problems of checking equality, performing multiplication, and computing minimal representatives of elements of free bands. A reference implementation of the algorithms from this paper is given in https://doi.org/10.5281/zenodo.7071676. In this project, the reference implementation would be re-implemented in modern C++ as part of libsemigroups; and integrating subsemigroups of free bands and subsemigroups into the framework established in https://arxiv.org/abs/1510.01868.

This project may involve:

Mentors: Reinis Cirpons and James Mitchell

Length: 350h

Difficulty: Medium-Hard

Prerequisites:


9. Morphocompletion for Knuth-Bendix

Finitely presented semigroups are the basic objects of study in combinatorial semigroup theory. Each finitely presented semigroup is given by specifying an alphabet together with a set of equations that hold for some words over this alphabet, called the relations. The elements of this semigroup are words over the alphabet, which are compared by using the relations. In general equality in such a semigroup is undecidable!

The Knuth-Bendix algorithm attempts to solve this issue by orienting and extending the set of relations such that there is a straightforward method for checking equality. Since the underlying problem is undecidable, the algorithm does not necessarily terminate with respect to a given presentation. To make matters worse, it may sometimes be necessary to increase the alphabet by replacing certain words with new letters to make the Knuth-Bendix procedure terminate when it otherwise wouldn’t. One approach to doing this is the morphocompletion procedure developed in

Roughly speaking, the morphocompletion procedure consists of repeatedly running the Knuth-Bendix algorithm for a certain number of steps, picking some number of subwords to replace by letters according to some heuristic such as the frequency they occur with.

The goal of this project would be to produce an implementation of the morphocompletion procedure in C++ in the libsemigroups library or in Python as part of the libsemigroups_pybind11 library. As the procedure depends on various parameters, a further goal for the project would consist of fine-tuning these parameters to improve performance.

This project may involve:

Mentor: Joe Edwards

Length: 90h

Difficulty: Easy

Prerequisites: