Independent Studies

Since the release on eXtreme Gammon, there have been several independent studies on the strength of the program. We'd like to thank all the people who did all that hard and very long work, particularly Michael Depreli.
Michael Depreli Study 2012
Michael Depreli Study 2010
Mike Corbet Study

 

Michael Depreli Study 2012: published on BGonline.org (on January 28th 2012)

The study is comparing the different top programs (at multiple strength level). Using 500 money games any difference of opinion is analyzed very deeply using a rollout. The mistakes each program/level makes are accumulated. This is a long process (more than 5000 moves or cube decisions needed to the rolled).  Rollouts were made using eXtreme Gammon 2 (Rollout parameters: 3-ply Checker, XGRoller For cube, Roll until the 95% confidence of the equity is less than 0.005, minimum 1296 trials). All numbers are normalized equity.
The process does not take into account search interval. Each candidate was analyzed in the level requested regardless of the search interval used.

From the table one can see that eXtreme Gammon 2, using 3-ply makes about twice less error than the previous version at the same level (and is faster).
eXtreme Gammon 2 XGRoller+, which is faster than Snowie 3-ply makes 6 times less error than Snowie.

Program Level Checker
play
Missed
 Double
Wrong
 Double
Wrong
Take
Wrong
Pass
Total PR

eXtreme Gammon 2 XGRoller++ 3.538 0.113 0.223 0.196 0.028 4.097 0.11
eXtreme Gammon 2 XGRoller+ 5.341 0.477 0.260 0.389 0.020 6.487 0.18
eXtreme Gammon 2 5-Ply 6.891 0.967 0.392 0.615 0.104 8.969 0.25
eXtreme Gammon 2 4-Ply 9.701 0.355 1.158 0.562 0.160 11.936 0.33
GnuBg 1.00 4-ply 9.107 1.132 0.555 1.435 0.308 12.536 0.35
eXtreme Gammon 2 XGRoller 12.843 0.495 0.989 0.401 0.160 14.887 0.41
eXtreme Gammon 2 3-Ply 13.128 1.208 1.000 0.700 0.196 16.231 0.45
GnuBg 1.00 3-ply 12.865 0.778 1.814 0.898 0.421 16.775 0.46
XG mobile Champion 14.331 0.907 0.741 0.822 0.196 16.996 0.47
GnuBg 1.00 2-ply 16.619 1.063 1.270 1.579 0.420 20.951 0.58
eXtreme Gammon 2 3-Ply Red 19.689 1.424 1.160 0.704 0.196 23.173 0.64
BgBlitz 2.8.0 4-ply 23.586 2.698 1.640 3.860 0.701 32.485 0.90
BgBlitz 2.8.0 3-ply 29.747 2.050 1.760 3.464 0.466 37.487 1.04
Snowie 4* 3-ply 37.424 1.922 1.139 3.651 0.867 45.003 1.24

Program Level Total PR
eXtreme Gammon 2 XGRoller++ 4.097 0.11
eXtreme Gammon 2 XGRoller+ 6.487 0.18
eXtreme Gammon 2 5-Ply 8.969 0.25
eXtreme Gammon 2 4-Ply 11.936 0.33
GnuBg 1.00 4-ply 12.536 0.35
eXtreme Gammon 2 XGRoller 14.887 0.41
eXtreme Gammon 2 3-Ply 16.231 0.45
GnuBg 1.00 3-ply 16.775 0.46
XG mobile Champion 16.996 0.47
GnuBg 1.00 2-ply 20.951 0.58
eXtreme Gammon 2 3-Ply Red 23.173 0.64
BgBlitz 2.8.0 4-ply 32.485 0.90
BgBlitz 2.8.0 3-ply 37.487 1.04
Snowie 4* 3-ply 45.003 1.24

Legend:
 - (*) The data for these programs are not yet available the data presented here are from the 2010 study
 -Ply: Search depth as defined for the program (GnuBG 2-ply is equivalent to other bot 3-ply)
 -Total: total equity lost

Here is a chart that shows the relative strength based on this study (in Elo compared to XG 3-ply). The speed test were performed by GameSite 2000 ltd and are not from an independent source.
Speed was evaluated using a core i7 computer analyzing of a money session and a match. Speed test were made using the using a search interval where the last ply looks up to 4 moves within 0.080 equity (eXtreme Gammon: Huge for 3-ply, GnuBG 3-ply and 4-ply: Large)

Note about BGBlitz: as it cannot analyze a full match. Its speed was determined using Rollout speed.

Click on the images to enlarge

Michael Depreli Study 2010: published on BGonline.org (finished on April 25th 2010)

Important: the 2010 Study was made using eXtreme Gammon 1. The newest version is noticeably stronger. Michael Depreli is in the process of rolling the position again using eXtreme Gammon 2 with much stronger settings than the one use in the 2010 one. The new results will also include eXtreme Gammon 2 results and will be soon available.

The study is comparing the different top programs (at multiple strength level). Using 500 money games any difference of opinion is analyzed very deeply using a rollout. The mistake each program/level makes are accumulated. This is a long process (more than 4500 moves or cube decisions needed to the rolled). The project got completed after 6 month of intense analysis. Rollouts were made using GnuBG (Rollout parameters GnuBG 2-ply world class 1296 trials or 2.33 JSD (98% conf) if sooner). All number are normalized equity.
We'd like to commend Michael for his extraordinary dedication and all his hard work to get that project completed.
The process does not take into account search interval. Each candidate was analyzed in the level requested regardless of the search interval used.

Program Level Checker play Missed Double Wrong Double Wrong Take Wrong Pass Total

eXtreme Gammon 1 XGR+ 13.397 1.088 0.658 0.970 0.241 16.354
eXtreme Gammon 1 XGR 22.269 1.661 0.783 2.173 0.264 27.150

eXtreme Gammon 1 5-ply 17.169 1.507 0.789 2.859 0.450 22.774
GnuBG 4-ply 21.599 2.663 0.644 4.061 0.127 29.094

eXtreme Gammon 1 4-ply 22.967 0.426 1.647 0.818 0.555 26.413
GnuBG 3-ply 29.313 0.903 10.276 0.775 5.880 47.147

eXtreme Gammon 1 3-ply 27.814 1.831 1.528 3.996 0.520 35.689
GnuBg 2-ply 33.247 2.763 1.670 4.261 0.476 42.417
Snowie 4 3-ply 37.424 1.922 1.139 3.651 0.867 45.003
BgBlitz 3-ply 41.286 1.692 10.864 4.168 2.159 60.169

Program Level Total
eXtreme Gammon 1 XGR+ 16.354
eXtreme Gammon 1 XGR 27.150
eXtreme Gammon 1 5-ply 22.774
GnuBG 4-ply 29.094
eXtreme Gammon 1 4-ply 26.413
GnuBG 3-ply 47.147
eXtreme Gammon 1 3-ply 35.689
GnuBg 2-ply 42.417
Snowie 4 3-ply 45.003
BgBlitz 3-ply 60.169
Legend:
 - Ply: Search depth as defined for the program (GnuBG 2-ply is equivalent to other bot 3-ply)
 -Total: total equity lost

Here is a chart that shows the relative strength based on this study (in Elo compared to XG 3-ply). The speed test were performed by GameSite 2000 ltd and are not from an independent source.
Speed was evaluated using a core i7 computer analyzing of a money session and a match. Speed test were made using the using a search interval where the last ply looks up to 4 moves within 0.080 equity (eXtreme Gammon: Huge for 3-ply, GnuBG 3-ply and 4-ply: Large)

Note about BGBlitz: as it cannot analyze a full match. Its speed was determined using Rollout speed. Click to see a bigger picture.

Mike Corbett Study:


Phil Simborg ran a test at Mike Corbett's request on 10 positions of his book.

On 9 positions eXtreme Gammon version 1 did better than Snowie. GnuBG did better on 6 (all move in 3-ply (2-ply for GnuBG).

As the positions picked were the ones Snowie gets wrong this test does not reflect the difference between eXtreme Gammon and Snowie. It does, however show the difference between eXtreme Gammon and GnuBG.

Page Number eXtreme did better than Snowie Gnu did better than Snowie Error avoided by eXtreme Gammon
1 yes yes 0.059
2 yes yes 0.076
15 yes yes 0.024
23 yes no 0.001
61 yes yes 0.012
68 no no None
70 yes yes 0.054
83 yes no 0.051
87 yes yes 0.166
133 yes no 0.023