Solving Sudoku Puzzles with Matrices

Posted on Feb 18, 2025

While working on a different project, I fell down the rabbit hole of trying to generate my own sudokus programmatically. Turns out, this rabbit hole had turns, crossings, twists and required a lot of backtracking.

Generating Sudokus

To generate Sudokus, I first needed to define my success criteria: a well-constructed puzzle must have exactly one solution. In order to achieve this, I landed at this algorithm.

Sudoku Generation Algorithm outline

Start with a complete, valid board (filled with 81 numbers)
Randomly select a cell and remove its number
Use a sudoku solver to find all possible solutions
If the current board only has one solution, go back to step 2 and repeat
If the current board has more than one solution, undo the last removal and continue step 2 with the next cell from the list
Stop when all 81 positions have been tested

Credit to this this stack overflow discussion where I found the algorithm.

In order to actually implement the algorithm that generates Sudoku puzzles, I needed another ingredient: an algorithm that solved Sudoku puzzles.

Backtracking

The simplest algorithm you can write to solve a sudoku puzzle is a backtracking algorithm. All it does is it attempts to fill the next empty square with the next valid number. If at some point there are no valid placements possible, it means it made a mistake. In this case, it backtracks (goes back to before the mistake was made) and attempts the next valid number. It's as brute force as it gets.

Speed:

Simple Python Sudoku Backtracking Solver

def solve_sudoku_backtracking(sudoku_grid):
    def backtrack(sudoku_grid):
        for row in range(9):
            for col in range(9):
                if sudoku_grid[row, col] == 0:
                    for num in range(1, 10):
                        if is_valid_placement(sudoku_grid, row, col, num):
                            sudoku_grid[row, col] = num
                            if backtrack(sudoku_grid):
                                return True
                            sudoku_grid[row, col] = 0
                    return False
        return True
    backtrack(sudoku_grid)

Turns out, this approach is actually really good at solving Sudoku. It solves most puzzles really fast! However, sometimes, the configuration of certain puzzles forces the algorithm to backtrack extensively, making the algorithm take much longer than we'd like.

Puzzle that is very hard for the Backtracking Solver

This puzzle will force the Backtracking algorithm to make almost two million recursive calls before being solved. Without the visualization, it takes my python code about two minutes to solve. With the visualization, which slows things down immensively, I doubt you can leave it running for long enough to see it getting solved.

Speed:

This annoying setback made me try to find more sophisticated algorithms that were better prepared to handle all kinds of edge cases. This is how I stumbled upon the Exact Cover Problem.

Exact Cover Problem

I'll give an example to explain the concept of an exact cover. Let's say we have a certain universe set $X=\{1,2,3,4\}$ . Along with $X$ , we also have some subsets of $X$ :

$A = \{2\}$
$B = \{1, 3\}$
$C = \{4\}$
$D = \{1, 2, 3\}$
$E = \{1\}$ .

An exact cover of $X$ is a set of these subsets (say $\{X_1, X_2, ..., X_n\}$ ) such that:

$X_1 \cap X_2 \cap \dots \cap X_n = \varnothing$
$X_1 \cup X_2 \cup \dots \cup X_n = X$

For the present example, we can actually find two exact covers: $\{A, B, C\}$ and $\{C, D\}$ .

So, how is finding exact covers in any way related to solving Sudoku puzzles?

In his paper about the Dancing Links algorithm, Donald Knuth provides a particularly instructive example of an exact cover problem. Given a Matrix of 0s and 1s,

\begin{bmatrix} 0 & 0 & 1 & 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 & 0 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 & 1 & 0 & 1 \end{bmatrix}

The goal is to find an exact cover: a set of rows such that they contain exactly one 1 in each column. Note that this is analogous to the example I outlined above. A different way of thinking about it is to find a set of rows such that when summed, the result is exactly $\begin{bmatrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 \end{bmatrix}$ . In this present example, the only solution is the set of rows $\{1, 4, 5\}$ . Note that

\begin{bmatrix} 0 \\ 0 \\ 1 \\ 0 \\ 1 \\ 1 \\ 0 \end{bmatrix}^T + \begin{bmatrix} 1 \\ 0 \\ 0 \\ 1 \\ 0 \\ 0 \\ 0 \end{bmatrix}^T + \begin{bmatrix} 0 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1 \end{bmatrix}^T = \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{bmatrix}^T

Turns out, any Sudoku puzzle can be described as a matrix of 1s and 0s. And even more interesting is that finding exact covers for that matrix is the same as finding solutions to the puzzle.

Translating Sudoku into an Exact Cover Problem

So how can this matrix of 1s and 0s be constructed?

I'll start by naming each row of the matrix in the following way: $R_iC_jD_k$ where $0 \leq i \leq 8$ , $0 \leq j \leq 8$ , $1 \leq k \leq 9$ , and $i, j, k \in \mathbb{Z}$ , $R$ stands for rows, $C$ for columns and $D$ for digits. There's a total of $9^3 = 729$ rows. Here's a list of all the rows:

R0C0D1
R0C0D2
...
R0C0D7
R0C0D8
R0C0D9
R0C1D1
...

R0C8D7
R0C8D8
R0C8D9
...
R1C0D1
R1C0D2
R1C0D3
...

R5C3D2
R5C3D3
R5C3D4
R5C3D5
...
R8C8D7
R8C8D8
R8C8D9

Note that each row represents putting a certain digit in a certain position of a Sudoku grid! If I pick 81 rows out of the 729, I might just have a completed sudoku puzzle.

Now the columns: It's easier to think of the columns as the constraints (rules) that exist in a Sudoku puzzle. Consider the same integers as before, $0 \leq i \leq 8$ , $0 \leq j \leq 8$ , $1 \leq k \leq 9$ , and $i, j, k \in \mathbb{Z}$ . There's four different constraints to worry about:

There needs to be exactly one digit in every cell of the sudoku board. I'll represent each of these constraints like this: $R_iC_j$ .
In every row, there can only be one of each digit: $R_iD_k$ .
Analogous to the rows, there can only be one of each digit per column. Therefore, we get these constraints: $C_jD_k$ .
Finally, there can only be one of each digit per box: $B_iD_k$ , where $B$ represents each of the nine boxes.

In total, this adds up to $4 \times 9^2 = 324$ columns.

Alright: I've named the rows and the columns. Now where should the 1s in the matrix be? It's pretty intuitive. I'll focus on the following row: $R_3C_5D_5$ . It'll take this form:

$\begin{array}{cccccccccccc} & R_0C_0 & ... & R_3C_5 & ... & R_3D_5 & ... & C_5D_5 & ... & B_4D_5 & ... \\ \hline R_3C_5D_5 & 0 & ... & 1 & ... & 1 & ... & 1 & ... & 1 & ... \end{array}$

$\begin{array}{cccccccccccc} & R_0C_0 & ... & R_3C_5 & ... & R_3D_5 & ... \\ \hline R_3C_5D_5 & 0 & ... & 1 & ... & 1 & ...\end{array}$

$\begin{array}{cccccccccccc}... & C_5D_5 & ... & B_4D_5 & ... \\ \hline ... & 1 & ... & 1 & ...\end{array}$

For each row, there will be exactly four 1s: one for each of the rules. Every other element of the row will be a 0. Note that this works! Let's compare the rows $R_3C_5D_1$ and $R_3C_5D_2$ . Both of these rows have 1s in the column $R_3C_5$ . Therefore, they can't both belong in the same solution. It makes sense: there can't be both a $1$ and a $2$ in $R_3C_5$ .

Another important thing is that we can very easily transform a row index into an element in the grid and vice versa. For example, $R_3C_5D_2$ will be row $81 \times 3 + 9 \times 5 + 2 = 290$ . In case you're curious, here's the function I used to generate an empty Sudoku grid in matrix form:

Generating 729x324 matrix (corresponding to empty Sudoku grid)

In these code snippets, "size" refers to the dimensions of the Sudoku grid. This approach is just as effective for standard 9x9 puzzles as it is for larger grids like 16x16 or any other size!

def _one_constraint(row, size):
    return row // size
 
def _row_constraint(row, size):
    return size**2 + size * (row // (size**2)) + row % size
 
def _col_constraint(row, size):
    return 2 * (size**2) + (row % (size**2))
 
def _box_constraint(row, size):
    return (int(3 * (size**2)
                + (row // (sqrt(size) * size**2)) * (size * sqrt(size))
                + ((row // (sqrt(size) * size)) % sqrt(size)) * size
                + (row % size)))
 
def empty_sudoku_exact_cover(size=9):
    constraints, rows = 4 * (size**2), size**3
    matrix = []
    for r in range(rows):
        row = np.zeros(constraints, dtype=int)
        positions = [
            _one_constraint(r, size),
            _row_constraint(r, size), 
            _col_constraint(r, size), 
            _box_constraint(r, size)
        ]
        row[positions] = 1
        matrix.append(row)
 
    numbered_column = np.arange(1, rows + 1, dtype=int).reshape(-1, 1)
    return np.hstack((numbered_column, matrix))

But now we have a different problem. How do we find exact covers in this huge matrix?

Algorithm X

Algorithm X is the name of the algorithm presented by Donald Knuth to solve exact cover problems written in this matrix format. In his words:

Algorithm X is simply a statement of the obvious trial-and-error approach.

Here's how it works. Given a matrix $A$ of 0s and 1s:

1. If A is empty, the problem is solved
2. Otherwise choose a column c
3. For each row, r, such that A[r,c] = 1
	Include r in the partial solution
	For each j such that A[r,j] = 1
		for each i such that A[i, j] = 1,
			delete row i from matrix A
		delete column j from matrix A
4. Repeat this algorithm recursively on the reduced matrix A

And that's exactly what I implemented. Now, using the matrix obtained from translating Sudoku into an exact cover problem, I can solve any Sudoku puzzle I want simply by writting a Sudoku in this 729x324 matrix format and including some of the rows (initial clues) in my partial solution.

My Implementation of Knuth's Algorithm X

I modified the algorithm a little: in order to reliably know which row is which, I started by appending an additional column at the start of the matrix with row ids:

\begin{bmatrix} 1 \\ 2 \\ \vdots \\ n \end{bmatrix}

Then, instead of ending the algorithm when A is empty, I end it when there's one column left: the column with the row ids. This way, I can then associate each row index in the partial solutions with a specific digit in a specific cell of the Sudoku grid. It's a bit of a cheap trick, but it does the job.

def solve(matrix, partial_solution, solutions):
    rows, columns = matrix.shape
    if columns == 1:
        solutions.append(partial_solution)
        return
 
    column_sums = np.sum(matrix[:, 1:], axis=0)
    min_col_idx = np.argmin(column_sums) + 1
    candidate_rows = [r for r in range(rows) if matrix[r, min_col_idx] == 1]
 
    if not candidate_rows:
        return
 
    for row_idx in candidate_rows:
        new_partial_solution = partial_solution.copy()
        new_partial_solution.add(matrix[row_idx, 0])
        reduced_matrix = choose_row(matrix, row_idx)
        solve(reduced_matrix, new_partial_solution, solutions)

But you might've noticed something: this is also a backtracking algorithm. It keeps calling itself until the problem is solved. How is this any better than the brute force method I described at the start of this article?

The secret is step 2: choosing a column. If in this step we choose the column with the least amount of 1s in it, we guarantee that the number of guesses we have to make is much smaller, and therefore, our function will have to recurse a lot less. In Sudoku puzzle terms, the brute force backtracking algorithm tries the first valid digit in the first empty cell, while Algorithm X tries the first valid digit in the cell with less possible digits to place. In this visualization, this becomes very apparent:

Speed:

Comparing the two algorithms

To benchmark both algorithms, I generated 50 puzzles, each starting with around 22-27 filled cells. Both algorithms were then tasked with solving all the puzzles. Please note that the results come from my rather unoptimized Python code. I'm sure the times could be vastly improved. However, I believe these numbers effectively demonstrate the relative performance of each algorithm.

Note: When I refer to "Nodes," I mean recursive calls: essentially, the number of times each algorithm needs to backtrack before solving a puzzle.

	Avg Time	Max Time	Min Time	Avg Nodes	Max Nodes	Min Nodes
Backtracking	8.3997s	125.1491s	0.0643s	124144.76	1883620	924
Algorithm X	0.0351s	0.0527s	0.0302s	69.28	157	55

	Backtracking	Algorithm X
Avg Time	8.3997s	0.0351s
Max Time	125.1491s	0.0527s
Min Time	0.0643s	0.0302s
Avg Nodes	124144.76	69.28
Max Nodes	1883620	157
Min Nodes	924	55

To me, the most fascinating result is the average recursive calls Algorithm X took to solve each puzzle: 69! By selecting the column with the least 1s during the algorithm, we almost entirely remove the need to ever take a guess. Comparing that to the 124145 average recursive calls the Backtracking algorithm needs doesn't even seem fair.

Here's both algorithms attempting to solve the same puzzle at the same time:

Speed:

An interesting thing to note is the existence of clear outliers when solving with Backtracking: some puzzles require significantly more recursive calls than the rest. I think this plot makes that very clear.

Backtracking scatterplot

I find it funny how only one puzzle required over one million recursive calls, but that one needed almost two million! Over twice as many recursive calls as any other puzzle.

Solving lots of Sudoku with Algorithm X

Despite the many interesting conclusions to take away from solving puzzles with Backtracking, I was a lot more curious about the data I could get out of Algorithm X. Therefore, I had it solve 49151 puzzles I found in a repository online. Here's the results:

Avg Time	Max Time	Min Time	Median Time
0.0388s	1.9746s	0.0264s	0.0341s

The results for time are fairly consistent with the first fifty puzzles solved: the average time for the initial set was 0.0351s, and this time it increased slightly to 0.0388s. The median also seems to be around these values, so the number of outliers is probably not too significant.

Avg Nodes	Max Nodes	Min Nodes	Median Nodes
88.07	11498	64	64

On the other hand, the results for the nodes were fascinating. The first thing I noticed was that the Min nodes and Median nodes are the same! Upon closer inspection, I found that out of the 49151 puzzles, 28853 puzzles required exactly 64 nodes to be solved. That's 58.7% of all puzzles!

So what exactly is going on here? What's so special about the number 64? Well, the dataset had something consistent across all puzzles: all of them started out with 17 filled cells. Subtract that from 81 and you get 64. This meant that over half of the puzzles were solved without the algorithm having to take any guess. For sudoku enthusiasts, this means that over half of these 49151 puzzles were solvable using naked singles only: the method where you exclude all but one possible number from a cell based on the row, column, and box.

Another interesting find can be observed in this scatterplot:

Algorithm X scatterplot

Only four puzzles required over 1s to solve! That's 0.008% of all puzzles. Algorithm X isn't easy to trick. So how do these hard puzzles actually look like?

Hardest puzzle for Algorithm X

This puzzle required the maximum amount of recursive calls (11498) and the most time to solve (1.9746s): Note: the visualization is much slower than the actual solve time.

Speed:

It seems that having a mostly empty bottom half of the puzzle combined with some wrong guesses at the start of the solving process drastically increase the number of nodes.

Back to generating Sudoku puzzles

Having finished the puzzle solver, I can generate my own puzzles reasonably fast! Here are a few of them:

Now that I've reached this point, I realize I've only scratched the surface of this topic. As this point, my only criteria is that the puzzles I generate only have one solution. But in order to generate a puzzle that's pleasant for a human to solve, there's a lot of other challenges: can the generated puzzles be solved without having to make any guesses? Can the algorithm generate puzzles of all difficulties?

While searching around for answers to my questions in this realm, I found this incredibly interesting comment from someone who seems to be the dev of a sudoku mobile app. It's a bit long, but I think the read is worth it.

Final Remarks

There are still plenty of loose ends to explore in this topic, but I thought this was a good time to pause and share what I've discovered so far.

Here's the next steps I'll take:

Currently, my code still takes a around 2~3 seconds to generate puzzles. I'd like to improve that and build an API that generates Sudokus instantly on demand, potentially using a lower-level language like Rust or C/C++.
Implement Dancing Links, a more advanced version of Algorithm X by Donald Knuth that uses linked lists instead of matrices.
Translate other puzzles to Exact Cover problems, like Futoshiki or the N-queens problem.

In case you want to check it out, here's the github repo in which I have been playing around with these algorithms: just FYI, it's a lot of very quick and dirty code that I used to deepen my understanding of the ideas in this blog post.