Solving Sudoku Puzzles with Matrices
Posted on
While working on a different project, I fell down the rabbit hole of trying to generate my own sudokus programmatically. Turns out, this rabbit hole had turns, crossings, twists and required a lot of backtracking.
Generating Sudokus
To generate Sudokus, I first needed to define my success criteria: a well-constructed puzzle must have exactly one solution. In order to achieve this, I landed at this algorithm.
Sudoku Generation Algorithm outline
- Start with a complete, valid board (filled with 81 numbers)
- Randomly select a cell and remove its number
- Use a sudoku solver to find all possible solutions
- If the current board only has one solution, go back to step 2 and repeat
- If the current board has more than one solution, undo the last removal and continue step 2 with the next cell from the list
- Stop when all 81 positions have been tested
In order to actually implement the algorithm that generates Sudoku puzzles, I needed another ingredient: an algorithm that solved Sudoku puzzles.
Backtracking
The simplest algorithm you can write to solve a sudoku puzzle is a backtracking algorithm. All it does is it attempts to fill the next empty square with the next valid number. If at some point there are no valid placements possible, it means it made a mistake. In this case, it backtracks (goes back to before the mistake was made) and attempts the next valid number. It's as brute force as it gets.
Simple Python Sudoku Backtracking Solver
def solve_sudoku_backtracking(sudoku_grid):
def backtrack(sudoku_grid):
for row in range(9):
for col in range(9):
if sudoku_grid[row, col] == 0:
for num in range(1, 10):
if is_valid_placement(sudoku_grid, row, col, num):
sudoku_grid[row, col] = num
if backtrack(sudoku_grid):
return True
sudoku_grid[row, col] = 0
return False
return True
backtrack(sudoku_grid)
Turns out, this approach is actually really good at solving Sudoku. It solves most puzzles really fast! However, sometimes, the configuration of certain puzzles forces the algorithm to backtrack extensively, making the algorithm take much longer than we'd like.
Puzzle that is very hard for the Backtracking Solver
This puzzle will force the Backtracking algorithm to make almost two million recursive calls before being solved. Without the visualization, it takes my python code about two minutes to solve. With the visualization, which slows things down immensively, I doubt you can leave it running for long enough to see it getting solved.
This annoying setback made me try to find more sophisticated algorithms that were better prepared to handle all kinds of edge cases. This is how I stumbled upon the Exact Cover Problem.
Exact Cover Problem
I'll give an example to explain the concept of an exact cover. Let's say we have a certain universe set . Along with , we also have some subsets of :
- .
An exact cover of is a set of these subsets (say ) such that:
For the present example, we can actually find two exact covers: and .
So, how is finding exact covers in any way related to solving Sudoku puzzles?
In his paper about the Dancing Links algorithm, Donald Knuth provides a particularly instructive example of an exact cover problem. Given a Matrix of 0s and 1s,
The goal is to find an exact cover: a set of rows such that they contain exactly one 1 in each column. Note that this is analogous to the example I outlined above. A different way of thinking about it is to find a set of rows such that when summed, the result is exactly . In this present example, the only solution is the set of rows . Note that
Turns out, any Sudoku puzzle can be described as a matrix of 1s and 0s. And even more interesting is that finding exact covers for that matrix is the same as finding solutions to the puzzle.
Translating Sudoku into an Exact Cover Problem
So how can this matrix of 1s and 0s be constructed?
I'll start by naming each row of the matrix in the following way: where , , , and , stands for rows, for columns and for digits. There's a total of rows. Here's a list of all the rows:
|
|
|
Note that each row represents putting a certain digit in a certain position of a Sudoku grid! If I pick 81 rows out of the 729, I might just have a completed sudoku puzzle.
Now the columns: It's easier to think of the columns as the constraints (rules) that exist in a Sudoku puzzle. Consider the same integers as before, , , , and . There's four different constraints to worry about:
- There needs to be exactly one digit in every cell of the sudoku board. I'll represent each of these constraints like this: .
- In every row, there can only be one of each digit: .
- Analogous to the rows, there can only be one of each digit per column. Therefore, we get these constraints: .
- Finally, there can only be one of each digit per box: , where represents each of the nine boxes.
In total, this adds up to columns.
Alright: I've named the rows and the columns. Now where should the 1s in the matrix be? It's pretty intuitive. I'll focus on the following row: . It'll take this form:
For each row, there will be exactly four 1s: one for each of the rules. Every other element of the row will be a 0. Note that this works! Let's compare the rows and . Both of these rows have 1s in the column . Therefore, they can't both belong in the same solution. It makes sense: there can't be both a and a in .
Another important thing is that we can very easily transform a row index into an element in the grid and vice versa. For example, will be row . In case you're curious, here's the function I used to generate an empty Sudoku grid in matrix form:
Generating 729x324 matrix (corresponding to empty Sudoku grid)
In these code snippets, "size" refers to the dimensions of the Sudoku grid. This approach is just as effective for standard 9x9 puzzles as it is for larger grids like 16x16 or any other size!
def _one_constraint(row, size):
return row // size
def _row_constraint(row, size):
return size**2 + size * (row // (size**2)) + row % size
def _col_constraint(row, size):
return 2 * (size**2) + (row % (size**2))
def _box_constraint(row, size):
return (int(3 * (size**2)
+ (row // (sqrt(size) * size**2)) * (size * sqrt(size))
+ ((row // (sqrt(size) * size)) % sqrt(size)) * size
+ (row % size)))
def empty_sudoku_exact_cover(size=9):
constraints, rows = 4 * (size**2), size**3
matrix = []
for r in range(rows):
row = np.zeros(constraints, dtype=int)
positions = [
_one_constraint(r, size),
_row_constraint(r, size),
_col_constraint(r, size),
_box_constraint(r, size)
]
row[positions] = 1
matrix.append(row)
numbered_column = np.arange(1, rows + 1, dtype=int).reshape(-1, 1)
return np.hstack((numbered_column, matrix))
But now we have a different problem. How do we find exact covers in this huge matrix?
Algorithm X
Algorithm X is the name of the algorithm presented by Donald Knuth to solve exact cover problems written in this matrix format. In his words:
Algorithm X is simply a statement of the obvious trial-and-error approach.
Here's how it works. Given a matrix of 0s and 1s:
1. If A is empty, the problem is solved
2. Otherwise choose a column c
3. For each row, r, such that A[r,c] = 1
Include r in the partial solution
For each j such that A[r,j] = 1
for each i such that A[i, j] = 1,
delete row i from matrix A
delete column j from matrix A
4. Repeat this algorithm recursively on the reduced matrix A
And that's exactly what I implemented. Now, using the matrix obtained from translating Sudoku into an exact cover problem, I can solve any Sudoku puzzle I want simply by writting a Sudoku in this 729x324 matrix format and including some of the rows (initial clues) in my partial solution.
My Implementation of Knuth's Algorithm X
I modified the algorithm a little: in order to reliably know which row is which, I started by appending an additional column at the start of the matrix with row ids:
Then, instead of ending the algorithm when A is empty, I end it when there's one column left: the column with the row ids. This way, I can then associate each row index in the partial solutions with a specific digit in a specific cell of the Sudoku grid. It's a bit of a cheap trick, but it does the job.
def solve(matrix, partial_solution, solutions):
rows, columns = matrix.shape
if columns == 1:
solutions.append(partial_solution)
return
column_sums = np.sum(matrix[:, 1:], axis=0)
min_col_idx = np.argmin(column_sums) + 1
candidate_rows = [r for r in range(rows) if matrix[r, min_col_idx] == 1]
if not candidate_rows:
return
for row_idx in candidate_rows:
new_partial_solution = partial_solution.copy()
new_partial_solution.add(matrix[row_idx, 0])
reduced_matrix = choose_row(matrix, row_idx)
solve(reduced_matrix, new_partial_solution, solutions)
But you might've noticed something: this is also a backtracking algorithm. It keeps calling itself until the problem is solved. How is this any better than the brute force method I described at the start of this article?
The secret is step 2: choosing a column. If in this step we choose the column with the least amount of 1s in it, we guarantee that the number of guesses we have to make is much smaller, and therefore, our function will have to recurse a lot less. In Sudoku puzzle terms, the brute force backtracking algorithm tries the first valid digit in the first empty cell, while Algorithm X tries the first valid digit in the cell with less possible digits to place. In this visualization, this becomes very apparent:
Comparing the two algorithms
To benchmark both algorithms, I generated 50 puzzles, each starting with around 22-27 filled cells. Both algorithms were then tasked with solving all the puzzles. Please note that the results come from my rather unoptimized Python code. I'm sure the times could be vastly improved. However, I believe these numbers effectively demonstrate the relative performance of each algorithm.
Note: When I refer to "Nodes," I mean recursive calls: essentially, the number of times each algorithm needs to backtrack before solving a puzzle.
Avg Time | Max Time | Min Time | Avg Nodes | Max Nodes | Min Nodes | |
---|---|---|---|---|---|---|
Backtracking | 8.3997s | 125.1491s | 0.0643s | 124144.76 | 1883620 | 924 |
Algorithm X | 0.0351s | 0.0527s | 0.0302s | 69.28 | 157 | 55 |
Backtracking | Algorithm X | |
---|---|---|
Avg Time | 8.3997s | 0.0351s |
Max Time | 125.1491s | 0.0527s |
Min Time | 0.0643s | 0.0302s |
Avg Nodes | 124144.76 | 69.28 |
Max Nodes | 1883620 | 157 |
Min Nodes | 924 | 55 |
To me, the most fascinating result is the average recursive calls Algorithm X took to solve each puzzle: 69! By selecting the column with the least 1s during the algorithm, we almost entirely remove the need to ever take a guess. Comparing that to the 124145 average recursive calls the Backtracking algorithm needs doesn't even seem fair.
Here's both algorithms attempting to solve the same puzzle at the same time:
An interesting thing to note is the existence of clear outliers when solving with Backtracking: some puzzles require significantly more recursive calls than the rest. I think this plot makes that very clear.
Backtracking scatterplot
I find it funny how only one puzzle required over one million recursive calls, but that one needed almost two million! Over twice as many recursive calls as any other puzzle.
Solving lots of Sudoku with Algorithm X
Despite the many interesting conclusions to take away from solving puzzles with Backtracking, I was a lot more curious about the data I could get out of Algorithm X. Therefore, I had it solve 49151 puzzles I found in a repository online. Here's the results:
Avg Time | Max Time | Min Time | Median Time |
---|---|---|---|
0.0388s | 1.9746s | 0.0264s | 0.0341s |
The results for time are fairly consistent with the first fifty puzzles solved: the average time for the initial set was 0.0351s, and this time it increased slightly to 0.0388s. The median also seems to be around these values, so the number of outliers is probably not too significant.
Avg Nodes | Max Nodes | Min Nodes | Median Nodes |
---|---|---|---|
88.07 | 11498 | 64 | 64 |
On the other hand, the results for the nodes were fascinating. The first thing I noticed was that the Min nodes and Median nodes are the same! Upon closer inspection, I found that out of the 49151 puzzles, 28853 puzzles required exactly 64 nodes to be solved. That's 58.7% of all puzzles!
So what exactly is going on here? What's so special about the number 64? Well, the dataset had something consistent across all puzzles: all of them started out with 17 filled cells. Subtract that from 81 and you get 64. This meant that over half of the puzzles were solved without the algorithm having to take any guess. For sudoku enthusiasts, this means that over half of these 49151 puzzles were solvable using naked singles only: the method where you exclude all but one possible number from a cell based on the row, column, and box.
Another interesting find can be observed in this scatterplot:
Algorithm X scatterplot
Only four puzzles required over 1s to solve! That's 0.008% of all puzzles. Algorithm X isn't easy to trick. So how do these hard puzzles actually look like?
Hardest puzzle for Algorithm X
This puzzle required the maximum amount of recursive calls (11498) and the most time to solve (1.9746s): Note: the visualization is much slower than the actual solve time.
It seems that having a mostly empty bottom half of the puzzle combined with some wrong guesses at the start of the solving process drastically increase the number of nodes.
Back to generating Sudoku puzzles
Having finished the puzzle solver, I can generate my own puzzles reasonably fast! Here are a few of them:
Now that I've reached this point, I realize I've only scratched the surface of this topic. As this point, my only criteria is that the puzzles I generate only have one solution. But in order to generate a puzzle that's pleasant for a human to solve, there's a lot of other challenges: can the generated puzzles be solved without having to make any guesses? Can the algorithm generate puzzles of all difficulties?
While searching around for answers to my questions in this realm, I found this incredibly interesting comment from someone who seems to be the dev of a sudoku mobile app. It's a bit long, but I think the read is worth it.
Final Remarks
There are still plenty of loose ends to explore in this topic, but I thought this was a good time to pause and share what I've discovered so far.
Here's the next steps I'll take:
- Currently, my code still takes a around 2~3 seconds to generate puzzles. I'd like to improve that and build an API that generates Sudokus instantly on demand, potentially using a lower-level language like Rust or C/C++.
- Implement Dancing Links, a more advanced version of Algorithm X by Donald Knuth that uses linked lists instead of matrices.
- Translate other puzzles to Exact Cover problems, like Futoshiki or the N-queens problem.
In case you want to check it out, here's the github repo in which I have been playing around with these algorithms: just FYI, it's a lot of very quick and dirty code that I used to deepen my understanding of the ideas in this blog post.