but this is why they write all the moves down and post them online, why poker is easier to cheat; just not be on a stream, also you are more likely to get away with it since chess could ban you from everything since it’s just a few entities with everything
This Regan guy is in way over his head. They need to hire a statistician to build the cheat detection models, not a computer scientist. He clearly doesn’t understand Bayesian statistical theory at all while making some really oddball claims like this one:
In general, Bayes’s Theorem is the gateway to deep and murky areas of statistical science.
That is laughable. The reason that (inferior) frequentist parametric statistics dominated the field early on is because we didn’t have the computational power to run Bayesian and non-parametric models since everything was by hand in the early days. Also seems to think that everything is dependent on the prior, but all of the Bayesians I know use flat priors for basically everything. In many cases, the estimates you get from Bayesian modeling with flat / uninformed priors isn’t appreciably different from frequentist methods. Of course, you can because it offers that flexibility, but it’s more about the philosophical approach to the parameters being random as opposed to fixed.
Afaict, the model he is using is no more advanced than what you’d learn in an undergraduate statistics course, and that’s certainly not what you want here when ROC and being right are so important.
the part i dont get is out of the millions of chess games in the database no one has ever played one single game over 98% engine correlation except for the French cheater guy? its not uncommon for GMs to know theory for 10 moves in most openings so i find it hard to believe there wouldnt be at least a few short 15 move games where someone played perfectly. also why do these videos only ever look at Hans’ results in detail but they never explore the other top GMs games in detail so we have something to actually compare it to?
Games that just follow theoretical lines aren’t given a score. Hikaru analyzed some of his games, and for some of them the result was something like not enough moves, so there needs to be a certain number or percentage of moves once they get out of theory
I think Chessbase gives an “N/A” rating to games where you don’t play enough moves. That seemed to come up when Hikaru was playing with it on his stream.
It does seem like there may be some bad logic going on where people are saying “Hans had more 100% games than Magnus so it’s cheating” without taking into account the volume of games. That kind of error in logic is the same as misunderstandings like “the number of Covid deaths is split equally between vaxxed and unvaccinated people so the vaccine doesn’t work”.
Anyway, I think more damning is the specific games where there are 30+ moves and Hans played perfect engine moves including some unintuitive tactics. That’s not statistically conclusive but its weird. I’d like to see these people apply the same aggressive hunting for suspicious games from all the other strong GMs. I don’t think the averages and stats are a good way to go at it, I think it would be better to have a group of strong GMs examine anonymous games (some of Hans some of others) and see if they can detect the “cheating” games in a double blind control.
I haven’t watched the video because I watched like 30 seconds of it and read some of the reddit thread and was like wtf is this shit. Like cmon what even is “engine correlation”, you might think it is following the top line of the engine but it isn’t, I still don’t exactly really understand what it is and I think it heavily depends on what settings you use. ChessBase in their documentation explicitly say it shouldn’t be used for cheating detection, which doesn’t mean it CAN’T be used that way, but I am going to want more than some rando doing a video using totally ad-hoc settings.
First it was like “basically nobody ever has any 100% games evar!” which was completely untrue, then some guy tweeted “Hans has 10 100% games and Magnus only has 2 ZOMG” before being like “oh btw the denominator there is way bigger for Hans than Magnus” like that’s the level these amateur statisticians are on, not even expressing it as a rate.
Another clear potential source of bias is that Magnus’s average opponent is way stronger than Hans’s average opponent and it is way easier to have a high engine correlation (again, whatever the fuck that actually means) against weaker opponents. So Magnus is probably not the ideal comparison here.
Basically this is all bullshit and I am not even close to taking it seriously, I am going to want to see something way more rigorous than this, comparisons done against other rising-star type GMs where there are consistent settings which are chosen in advance.
Anyone qualified to do this analysis has probably been following the drama and already knows all the suspicious Hans games, so it might not be possible to do this as a blind control.
Yeah probably not. But I think it’s a classic mistake to just examine his game and only his games looking for suspicious patterns and not establishing some kind of control group. I know that there are no easy solutions to cheating in chess, but the approach that has been used is a classic way that people reach wrong conclusions.
I don’t own a copy of Chessbase so I don’t know what settings are available for the Let’s Check feature, but you can see during her video it shows which moves/lines are suggested by different engines. For example, pausing her video at 7:05, I can see on screen Fritz 16, Fritz 11, Stockfish 13, Fritz 16, Stockfish 10, Stockfish 15, Komodo 14.1, and Stockfish 12. I don’t know if this helps about narrowing down which settings to use, but it seems like you analyzed it with only 2 engines: Deep Fritz 14 and Stockfish 15.
If true, then her analysis is pretty much completely useless. With that many engines in the pool to match to, getting a match with any one of them on every move becomes a lot more likely. I seriously doubt the 98% statistic was produced under the same conditions
There is both the absolute level of “engine matching” and the relative value. The fact that he scored “100%” doesn’t mean as much as IF his score is high relative to the super GM norm. Note that I am saying IF because its not clear to me that the work has even shown a meaningfully high relative score.
I still think statistical analysis is not the way to do this. It would be theoretically nice to have an “objective” standard, but the way they are trying to do it isn’t using objective processes as a foundation. I also think we are WAY underestimating how accurate the intuition is of GMs when they suspect they are being cheated. It sounds religious, but they do in fact know what an “engine move” looks like. People rightfully don’t want to use a subjective standard but I’ve seen lesser GMs like Naroditsky sniff out engine moves in real time on his stream. Their Jedi sense in this stuff is remarkable.
Having GMs look at games is not enough, IMO. For one thing, some strong GMs will recognize many OTB games (“oh this is when Magnus played Vishy in 2004 Olympiad”). For another, this is just too impractical to do on any scale. You’d have to do an analysis of many players and many games. Plus you’d have to analyze the GM’s analysis (e.g. how often are there false positives produced by the GM team?)
Any GMs could have computer-like moves here and there, maybe they got lucky or made the right move for the wrong reason. I think looking at the rate of these moves as compared to other GMs would be informative. But like @ChrisV says, you’d have to be consistent about the search settings and choose them in advance.
I totally agree with your observations on the challenges and impracticalities. Having said that we use juries of a dozen lay people to decide if people will go to jail forever. Having a jury of respected ambassador GMs (maybe retired or semi retired legends like Anand and Kramnik) decide based on subjective and objective data, with a “reasonable doubt” standard, would be an approach. It would obviously suck, but would it suck more than the status quo?
Even if there was some objective standard, it’s still a slave to the ROC curve which is never going to be a perfect classifier, and especially not for the kind of subtle cheating at the GM+ level that’s being alleged. Caruana was absolutely certain that someone cheated who Regan exonerated. There’d be a lot of that under any such system since they’ll always conservatively choose very high specificity.
the youtube account chess vibes looked into it, capablanca has a handful of way up there games (even a 100) but he played simpler games than basically any other top player ever really has (a fair bit of that being intentionally)
so even 100 isn’t hard proof, but it’s definitely something to be sus about and look into
unfortunately we’re at the beginning of trying to figure this out of analyzing games, ie the guy who says he could detect it is likely completely full of shit and just nobody ever paid attention to anything he said before