The terror of debugging a chess program (and the unexpected benefits…)

Goal

The goal of this article is to provide an overview of how a chess program can be debugged when a problem occurs. The Huo Chess program is used for this tutorial – the program used anyway for all tutorials in this site.

About Huo Chess

Check Huo Chess – Chess programming learning tutorials for details about Huo Chess and the source code of the program.

Background story

When testing the Huo Chess Quick Basic edition, it occurred to me that the program did not play the anticipated moves in one specific variant of Grob opening.

In particular, when the computer played with white, in the following sequence of moves…

1. g4 e5
2. d3 d5
3. Bg2 Bxg4
4. c4 dc4

that resulted in the following position…

The position where the problem appeared…

…the computer continued the game with 5. f3? and did not capture the pawn on b7 with the bishop.

What was more disturbing was that the other editions of Huo Chess (namely the C# edition and the Java edition) did play the correct move 5. Bxb7 (well, that was also a bit reassuring in the sense that the other editions played well).

So the stage was set: Try to debug the Huo Chess QBasic edition and discover what it makes it play differently than the other editions!

Important note: For illustration purposes, I used screenshots from the version with more elaborate graphics for the chessboard. For this version I have re-used the graphics from the Deep Chess program (by Thomas McBurney). Please see the related article Programming a chess application in QBasic (QB64) for details.

Debugging AI

Before we delve into the details of debugging Huo Chess, a brief introduction about debugging is adue.

Debugging an application that presents a problem (bug) is always a difficult task. It involves looking at the source code again and again until the problem is detected, something that is becoming even more difficult if the source of the problem is not known or at least suspected somehow. Multiple tests are needed to check if potential solutions are able to fix the bug – tests that also must include regression testing to verify that the changes applied do not mess up with existing functionality that had no problems in the first place.

When coming to the debugging of a chess application things get even more complicated. If the problem lies in the UI or some additional features of the program (like viewing the sets of moves or taking back a move) then of course the problem is straightforward and the programmer knows where to look for problems. If the problem lies in the thinking mechanism of the chess program though, then you must dive deep into that thinking mechanism to try and understand why the computer did not play the move it was expected to play.

Artificial Intelligence has become a kind of a black box lately. That is because the increasing complexity of the programs make it hard or sometimes even impossible to determine why the computer made a specific decision and not a different one. (source) (source) A chess program is one of those examples where knowing exactly why the computer played this move and not the other is a really hard task, or in the case of neural networks close to impossible.

Debugging a chess program entails also taking a look at the logs that describe the thinking process. Such logs document all the moves checked by the chess program, the score of each position analyzed and the process of determining the best move of them all. These logs were key to the discovery of the problem as you will see later on. Note though that checking of the logs is possible because of the small size of Huo Chess and its output when analyzing moves. Checking the logs could be a daunting task when it comes down to checking the thinking process of computers which analyze millions of positions per second.

With that introduction in hand, let us describe how I tried to find the problem with Huo Chess.

Step -1: Determine the problem

As mentioned in the beginnning, the problem was that a specific position the computer did not play the optimum move. We know that it is a problem because the other editions of Huo Chess did play the correct move.

Step 0: Setup the environment

When doing anything, one must first the scene and get the right tools to do the job.

First of all I created a test copy of Huo Chess in a separate folder. That would be the version I would work on while trying to find the problem. It is crucial not to mess up with your existing ‘production’ version. No matter how serious the problem is, the fact is that you already have a version that is working and that is distributed to others. So this must stay intact.

What is more, in order to faciliate the testing of the changes I would make, I set up the program to start directly from the ‘problematic’ chessboard position. In that way I saved myself a lot of clicks until I reached the problematic situation that I needed to debug.

Last but not least, I improved the Huo Chess logs so that they included all the information I needed for the debugging.

Namely, the Huo Chess logs write in txt files the contents of all the nodes in the thinking mechanism tree (the first move, all the subsequent answers to that move, then all the subsequent answers to that move and so on), including the move analyzed in that node, the position score and the ‘parent’ node (i.e. the node under which this node belongs to). There are two different logs: The log of the nodes before the application of the MinMax algorithm and the log of the nodes as they changed with the application of the MinMax algorithm.

Step 1: List possible causes

If one does not know what the cause of an error might be, there is nothing else to do that start brainstorming possible causes. So I started by listing things that I knew were different between the C#/ Java and the QBasic editions. There are some features which I have not yet implemented in QBasic edition but which do not play a serious role in the thinking mechanism, like the detection of dangerous squares. To be sure, I listed all of them in a text file (fancy tools have no place in debugging).

The list at the end of this process looked something like this…

Differences from CS version

No dangerous squares detection
No stupid move detection
Do not copy table to table_after to pass to deeper level function
Position score is a global variable, not a variable returned from countScore
Could it be the fault of the i-j variables used in the thinking mechanism? (Use different in each SUB)
parentNode = -999: Possibly needed everywhere.
Pat detected: Remove the setting of best move variables!
Differences in MinMax algorithm
Differences in the countScore function
The rule of not thinking deeper if the ProsorinoKommati is not null could need revision. We lose impoortant moves there. But if we rely on the ProsorinoKommatiCM2 then we have memory problems.

With this list at hand, the next step is obvious.

Step 2: Test the possible causes

For each one of the possible causes, I started testing if they were indeed the cause of the problem. After doing a prioritization I started with the ones that seemed more likely to play a role in the problem, thus eliminating them one by one.

Long story short, the first things that I tried were unsucessful. Nothing could fix the problem.

At the end of the investigation, the list of causes looked something like this…

Differences from CS version

No dangerous squares detection
No stupid move detection
Do not copy table to table_after to pass to deeper level function [FIXED] [NO EFFECT - NOT NEEDED]
Position score is a global variable, not a variable returned from countScore
Could it be the fault of the i-j variables used in the thinking mechanism? (Use different in each SUB) [TESTED - NO EFFECT]
parentNode = -999: Possibly needed everywhere. [TESTED - NO EFFECT] [NEEDED] [NEEDED ALSO FOR CS EDITION]
Pat detected: Remove the setting of best move variables! [POSSIBLY NO EFFECT] [NEEDED ALSO FOR CS EDITION]
Differences in MinMax algorithm
Differences in the countScore function
The rule of not thinking deeper if the ProsorinoKommati is not null could need revision. We lose impoortant moves there. But if we rely on the ProsorinoKommatiCM2 then we have memory problems.

So I had to go where no chess application programmers dares to go: Into the dungeons of the thinking mechanism logs…

Step 3: Check the logs!

The logs are… big. And one must have patience to check them all. Most importantly, verifying all the steps of the thinking mechanism for a position is practially impossible since it entails thousands of positions and calculations. If one would be able to do all these calculations manually he would of course be able to detect the problem. But something like that is easier said than done.

So the best thing one can do is search the logs for indications of a problem. Even scrolling through the thousands of lines could help you detect an anomaly that could be the first clue on where to focus your attention to.

In this case, I knew the move I was looking for (5. Bxb7) so I searched for that move in the logs. So I searched for the move ‘7 2 -> 2 7’ (remember the logs write down the coordinates of the move).

I did found the move in various places, but something strike me as odd…

Huo Chess logs…

Even though the move was there, some moves after that move were missing from the nodes of the thinking mechanism. For example, a logical answer to 5. Bxb7 is 5… Nd7. Well, this sequence of moves (i.e. 5. Bxb7 Nd7) was not anywhere to be found.

Logs missing some moves…

Logs missed the 5. Bxb7 Nd7 (2 7 -> 7 2,  2 8 -> 4 7) sequence of moves!

Specific sequences of moves were not there…

That was the first indication of the source of the problem. Something in the program misbehaved and make the computer ‘miss’ some moves while thinking. Now at least we knew something about the problem we were facing.

Attempt 1

A new entry in the possible causes list…

ElegxosNomimotitas not working properly [POSSIBLE!]

At first I was worried that the ElegxosNomimotitas SUB that checks (‘Elegxo’ in Greek) the legality (‘Nomimotita’ in Greek) of moves is not working properly. So I played the game and made some moves with my knight to see if something blocked me from moving. Everything was fine. All legal knight moves were allowed and the illegal ones blocked as they should be. So the move legality function was working properly.

The possible cause was erased from the list.

Attempt 2

As looking at the logs, I noticed also that some other sequences of moves were missing.

Moves missing from the nodes in the thinking mechanism

By now it was clear that a major problem existed in the algorithm that made the computer think incorrectly. After checking the ElegxosNomimotitas SUB (sub-routine in QBasic, something similar to a function), it was time to seach the code again and see how this routine is called from within the program. Perhaps there was something there.

And indeed there was the problem…

The code invoking ElegxosNomimotas in e.g. HumanMove1 SUB was the following…

whoPlays$ = "Human"

CALL ElegxosNomimotitas(HM1Skakiera$(), 0, startingColumnHM1, startingRankHM1, finishingColumnHM1, finishingRankHM1, MovingPieceHM1$)

whoPlays$ = "HY"

That was all fine and well, but what about that whoPlays$ variable? It strike me as odd that I needed to change the value of this variable before and after calling ElegxosNomimotitas. So I tried to remember why did I use this variable.

It turned out that indeed the variable whoPlays$ needed to change before calling the SUB that checks the legality of a move, because in this specific case the thinking mechanism analyzes the potential moves of the human opponent. But one must never let go of a lead until he is sure that it does not lead anythwere…

Coiuld this variable create problems elsewhere?

Attempt 3

As a next logical step, I searched for the whoPlays$ variable inside the whole program to see where it is used and how.

Surprisingly enough, the whoPlays$ variable is used in the places where the program checks for the existence of check…

'Check the check

IF playerColor$ = "b" AND Nomimotita = 1 THEN

    'whoPlays$ = "HY"

    CALL checkBlackCheck(HM1Skakiera$())

    'Restore Nomimotita! (because it is 'broken' in the checkCheck function)

    Nomimotita = 1

    IF blackCheck = 1 THEN Nomimotita = 0

    'whoPlays$ = "Human"

END IF

IF playerColor$ = "w" AND Nomimotita = 1 THEN

    'whoPlays$ = "HY"

    CALL checkWhiteCheck(HM1Skakiera$())

    'Restore Nomimotita! (because it is 'broken' in the checkCheck function)

    Nomimotita = 1

    IF whiteCheck = 1 THEN Nomimotita = 0

    'whoPlays$ = "Human"

END IF

‘Check the check

IF playerColor$ = “b” AND Nomimotita = 1 THEN

    ‘whoPlays$ = “HY”

    CALL checkBlackCheck(HM1Skakiera$())

    ‘Restore Nomimotita! (because it is ‘broken’ in the checkCheck function)

    Nomimotita = 1

    IF blackCheck = 1 THEN Nomimotita = 0

    ‘whoPlays$ = “Human”

END IF

IF playerColor$ = “w” AND Nomimotita = 1 THEN

    ‘whoPlays$ = “HY”

    CALL checkWhiteCheck(HM1Skakiera$())

    ‘Restore Nomimotita! (because it is ‘broken’ in the checkCheck function)

    Nomimotita = 1

    IF whiteCheck = 1 THEN Nomimotita = 0

    ‘whoPlays$ = “Human”

END IF

This check seems to again perform an update of the WhoPlays$ value, even though in this case the reason was not as obvious as in the HumanMove1 SUB (see above). Why did we change the value from HY (HY denoted Computer, it comes from ‘Hle to Human and vice versa? Was the whoPlays$ variable used anywhere in the checkBlackCheck or checkWhiteCheck SUBs? The answer is No.

I started feeling that I am close to the solution.

Did the ElegxosNomimottas SUB use the whoPlays$ variable? The answer was Yes. In a specific point, the routine that checks the legality of the moves, checks who plays and whether the player has selected a piece of its own colour to move.

'If player moves a different colour piece then move is illegal

    IF whoPlays$ = "Human" THEN

        IF MID$(playerColor$, 1, 1) <> MID$(ENSkakiera$(startColumn, startRank), 1, 1) THEN NomimotitaEN = 0

    END IF

    IF whoPlays$ = "HY" THEN

        IF MID$(playerColor$, 1, 1) = MID$(ENSkakiera$(startColumn, startRank), 1, 1) THEN NomimotitaEN = 0

    END IF

If not, the move is illegal. So that was a clear and strong indication that this whoPlays$ variable which changed values for no reason had a effect on the legality of moves checked. Could that be the cause of the problem at hand?

The end

Attempting to solve the problem, I removed the updates of the whoPlays$ variable before and after the calling of the checkBlackCheck or checkWhiteCheck SUBs.

So the code changed to something like this…

'Check the check

IF playerColor$ = "b" AND Nomimotita = 1 THEN

    'whoPlays$ = "HY"

    CALL checkBlackCheck(HM1Skakiera$())

    'Restore Nomimotita! (because it is 'broken' in the checkCheck function)

    Nomimotita = 1

    IF blackCheck = 1 THEN Nomimotita = 0

    'whoPlays$ = "Human"

END IF

IF playerColor$ = "w" AND Nomimotita = 1 THEN

    'whoPlays$ = "HY"

    CALL checkWhiteCheck(HM1Skakiera$())

    'Restore Nomimotita! (because it is 'broken' in the checkCheck function)

    Nomimotita = 1

    IF whiteCheck = 1 THEN Nomimotita = 0

    'whoPlays$ = "Human"

END IF

I then executed the program.

IF playerColor$ = “b” AND Nomimotita = 1 THEN

    ‘whoPlays$ = “HY”

    CALL checkBlackCheck(HM1Skakiera$())

    ‘Restore Nomimotita! (because it is ‘broken’ in the checkCheck function)

    Nomimotita = 1

    IF blackCheck = 1 THEN Nomimotita = 0

    ‘whoPlays$ = “Human”

END IF

IF playerColor$ = “w” AND Nomimotita = 1 THEN

    ‘whoPlays$ = “HY”

    CALL checkWhiteCheck(HM1Skakiera$())

    ‘Restore Nomimotita! (because it is ‘broken’ in the checkCheck function)

    Nomimotita = 1

    IF whiteCheck = 1 THEN Nomimotita = 0

    ‘whoPlays$ = “Human”

END IF

I ran the program.

And… voila!

Bishop captures pawn at b7!!!

Problem solved!

Huo Chess playing the correct move after debugging!

Epilogue (and the benefits…)

Was it hard? Yes. The description of the process in a short article looks much nicer than the actual process of debugging. While trying to solve this I spent many sleepless nights full of anxiety trying to figure out why my program did not behave as it should. Every failure in the process increased the frustration and the pressure.

However one must always remember that a bug is at the end always discovered, like a murderer in a Poirot movie. There is no place to hide. Sooner or later the source of the problem is revealed because, simply enough, it… is there! 🙂

But what about the benefits of the debugging process? Are there any? Yes! And they are important!

You see, by checking the chess program to find the problem I gained a much better understanding of how it works (or remembered things that I had forgot). Now I am in a much better position to improve the program even further in the future! (as you can see at the possible causes list, I did discover some potential improvements as well) All this tiredness could come in handy after all…

Happy coding!

And remember…

Keep experimenting!

2 responses to “The terror of debugging a chess program (and the unexpected benefits…)”

  1. […] The terror of debugging a chess program (and the unexpected benefits…) […]

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: