Difference between revisions of "Talk:Handicap"
(Fixing signature) |
|||
Line 6: | Line 6: | ||
− | ''As a general rule of thumb, c3 and the 2-2 obtuse corner (shown below) give Red approximately a 0.25 move advantage, at least for board sizes 13×13 to 19×19.'' Is there any evidence to support this? It seems to me to be at most an educated guess. For example, KataHex assigns these openings a winning probability above 94% for 13x13 (which may not translate into an actual probability). 20:43, 30 March 2023 (UTC) | + | ''As a general rule of thumb, c3 and the 2-2 obtuse corner (shown below) give Red approximately a 0.25 move advantage, at least for board sizes 13×13 to 19×19.'' Is there any evidence to support this? It seems to me to be at most an educated guess. For example, KataHex assigns these openings a winning probability above 94% for 13x13 (which may not translate into an actual probability). [[User:Selinger|Selinger]] ([[User talk:Selinger|talk]]) 20:43, 30 March 2023 (UTC) |
I think the win probability converted to an Elo advantage (or equivalently, log odds) is more useful than win probability itself for measuring the "strength of a stone." Empirically, if you set up various board positions with 2 (1 Red / 1 Blue) or 3 (2 Red / 1 Blue) stones and ask KataHex for its win percentage, it pretty closely approximates the answer you'd get if you simply used Elo to linearly interpolate what "fraction of a full stone" each stone was. (If you're not convinced this is true, I can try to elaborate on why I think so). If you believe that premise, we can make inferences from bot swap maps (which I believe are more reliable than human game statistics, which are subject to tricky biases, like the players being unevenly matched, and the weaker player being more likely to play a questionable opening like c3). | I think the win probability converted to an Elo advantage (or equivalently, log odds) is more useful than win probability itself for measuring the "strength of a stone." Empirically, if you set up various board positions with 2 (1 Red / 1 Blue) or 3 (2 Red / 1 Blue) stones and ask KataHex for its win percentage, it pretty closely approximates the answer you'd get if you simply used Elo to linearly interpolate what "fraction of a full stone" each stone was. (If you're not convinced this is true, I can try to elaborate on why I think so). If you believe that premise, we can make inferences from bot swap maps (which I believe are more reliable than human game statistics, which are subject to tricky biases, like the players being unevenly matched, and the weaker player being more likely to play a questionable opening like c3). |
Latest revision as of 21:21, 23 August 2024
Any suggestions for how handicap play can be implemented are welcome. We need more discussion on this wiki. :)
what about moving some stuff about the winning ways on non symmetrical boards to theory page ? Halladba 21:08, 3 February 2008 (CET)
OK, discussing you will get :-) Regarding fixed starting moves instead of swap as a handicap option, it would be lovely to get some empirical data. For instance, in a game where I place (and get to keep) the first piece at A1, just how much more do I lose than in a game I got to keep B2 instead? Vintermann 12:47, 5 February 2008 (CET)
As a general rule of thumb, c3 and the 2-2 obtuse corner (shown below) give Red approximately a 0.25 move advantage, at least for board sizes 13×13 to 19×19. Is there any evidence to support this? It seems to me to be at most an educated guess. For example, KataHex assigns these openings a winning probability above 94% for 13x13 (which may not translate into an actual probability). Selinger (talk) 20:43, 30 March 2023 (UTC)
I think the win probability converted to an Elo advantage (or equivalently, log odds) is more useful than win probability itself for measuring the "strength of a stone." Empirically, if you set up various board positions with 2 (1 Red / 1 Blue) or 3 (2 Red / 1 Blue) stones and ask KataHex for its win percentage, it pretty closely approximates the answer you'd get if you simply used Elo to linearly interpolate what "fraction of a full stone" each stone was. (If you're not convinced this is true, I can try to elaborate on why I think so). If you believe that premise, we can make inferences from bot swap maps (which I believe are more reliable than human game statistics, which are subject to tricky biases, like the players being unevenly matched, and the weaker player being more likely to play a questionable opening like c3).
- leela_bot: Looking at leela_bot's swap map, the worst move i1 has a 100%-85.6%=14.4% win rate, which is -310 Elo using the logistic Elo formula. The move a1 is -206 Elo; c3 is +198; b12 is +180. I think we have to make inferences here (which you may not agree with) for what win rate a "pass" move would be. I think -400 is quite appropriate, since it implies a full stone is 800 Elo, and a1 is worth about twice as much as i1. If so, this would imply c3 and b12 are worth about 3/4 of a stone, hence giving Red a 0.25 move advantage. If you think a pass move is -450 or -350 Elo, that implies c3 is worth 72% or 78% of a stone respectively, still close to 3/4.
- We don't have to guess as much with KataHex. It's hard on 13×13 because the probabilities are so close to 100%, leading to huge error bars in Elo terms. But if we consider 15×15, the strongest move is about 97.2% (+616 Elo), so a full move is +1232 Elo. Then c3 is +301 Elo, or 74% of the way between -616 and +616. Similarly for 19×19. Hexanna (talk) 00:08, 31 March 2023 (UTC)
OK, I see where you are coming from. I'm not sure how well I trust KataHex's win rates (or those of other bots) to be actual win rates, because as you pointed out elsewhere, it is easy to construct two demonstrably equivalent positions to which KataHex assigns vastly different win rates. Ultimately KataHex's training probably only rewards accuracy in relative win rates, i.e., predicting which move is better than another move in a given position, rather than accuracy in absolute win rates.
For my taste, "as a general rule of thumb" is not strong enough as a disclaimer, because the statement that follows is a bit more speculative than its formulation suggests. Perhaps something like "Based on the win rates predicted by some AIs on boards of size 13x13 to 19x19, we may speculate that..."? Selinger (talk) 01:57, 5 April 2023 (UTC)
I think you bring up good points that a bot could be miscalibrated in win rate but still play very well, and I'll make the wording more moderate. I'm still inclined to trust their percentages on c3/b12:
- KataHex and leela_bot imply the same swap map (they think the same set of moves are over/under 50%) on 13×13. This could be a coincidence but it's some evidence they're probably not too far off. I'd except not-bad (but not necessarily great) calibration for a typical AI trained with a reasonable loss function. Interestingly, KataHex also thinks this position is quite balanced, which is consistent with c3 valued at 3/4 of a stone (Red 1 is a good opening move, Red 3 is a half-move mistake; if this were a normal game with swap, both sides have made half a stone's worth of mistakes): https://hexworld.org/board/#15nc1,a15c3m2m13