trade crypt

AI vs the sports betting market (KellyBench): EPL Losses

HomeMarketsAI vs the sports betting market (KellyBench): EPL Losses

-

The article examines AI vs the sports betting market (KellyBench), reporting that frontier AI models tested throughout the full 2023–24 English Premier League season lost money when evaluated on KellyBench. Eight top models were evaluated for the benchmark, with examples including Claude, Grok (notably Grok 4.20), Gemini (including Gemini Flash and Gemini 3.1 Pro), and GPT-5.4 among the set. All models finished the runs with losses; some runs led to bankruptcy or forfeiture, and researchers observed that models often articulated strategy but failed to implement it profitably.

KellyBench is named after the Kelly criterion and is structured around 120 matchdays of English Premier League fixtures, operating within a constantly shifting market environment. The benchmark’s timeline covers an entire season and places agents in a setting where odds and market conditions change continuously across the 120 matchdays and the season’s dynamics.

KellyBench requires agents to maintain coherent intent across potentially thousands of sequential decisions, to monitor the consequences of those decisions, and to close the loop between observation and action. These requirements make the benchmark focus on sustained decision coherence and the continuous linking of observations to actions rather than isolated predictions.

The benchmark’s use of a constantly shifting market introduces non-stationarity that agents must address while preserving coherent, long-horizon strategies.

Eight top models were evaluated for the full 2023–24 English Premier League season on KellyBench. The evaluated set included models such as Claude, Grok, Gemini, and GPT-5.4 among others. Across the benchmark runs, all eight models lost money and some individual runs resulted in bankruptcy or forfeiture. These outcomes were recorded during the season-long benchmark that used the 120 matchdays market framework.

Grok 4.20 went bankrupt in one run. Gemini Flash forfeited two of three runs after placing a single wager of roughly £273,000 and losing it. Claude Opus 4.6 produced an average loss of 11 percent across its runs. Dixon-Coles, described as an outdated 2000s baseline, finished ahead of six out of the eight evaluated models.

The above statements report the observed individual outcomes from the KellyBench runs during the 2023–24 English Premier League season. No further interpretation is provided here.

Researchers observed a ‘knowledge-action gap’ in the performance of frontier AI models on KellyBench. This gap was characterized by the models’ ability to articulate strategies but their failure in execution. The researchers stated, “KellyBench requires agents to maintain coherent intent across potentially thousands of sequential decisions, monitor the consequences of those decisions, and close the loop between observation and action.” Furthermore, they noted the persistent challenge of non-stationarity in the market that many models failed to address effectively.

A significant observation was made concerning the Dixon-Coles model, which researchers described as “an outdated 2000s baseline which doesn’t utilise all available data or account for non-stationarity in a principled way.” Despite its limitations, Dixon-Coles surprisingly outperformed several frontier models, including some newer versions such as Gemini 3.1 Pro. The researchers expressed surprise, stating, “It is therefore even more surprising that many frontier models, such as Gemini 3.1 Pro, are unable to beat or match it on KellyBench.” These critical observations underscore the challenges posed by the KellyBench framework.

Frontier AI models were evaluated on KellyBench across the 2023–24 English Premier League season and failed to produce profitable betting results. Researchers noted that the models often articulated strategy but consistently failed in execution, indicating persistent execution challenges despite articulated strategy. These outcomes occurred over the full season-long KellyBench framework and reflect the models’ difficulty closing the loop between observation and action.

This website and its articles do not provide any investment advisory services within the meaning of applicable regulations. The information published may be incomplete, outdated, or contain errors. The author makes no representation or warranty regarding the accuracy, completeness, or timeliness of the information presented. Use of this information is entirely at the reader’s own risk. Under no circumstances shall the author be held liable for financial decisions made on the basis of the content published on this website.
Crypto Fan
Crypto Fanhttps://calipsu.com
Calipsu.com is dedicated to providing clear, reliable, and accessible information about cryptocurrencies, blockchain technology, and decentralized finance (DeFi). Its mission is to help readers better understand a rapidly evolving ecosystem that is often complex, technical, and misunderstood. The platform covers a wide range of topics, from major blockchain networks and crypto assets to DeFi protocols, Web3 applications, and emerging trends. The website also publishes practical guides and tutorials that explain how decentralized tools function, such as wallets, staking mechanisms, lending protocols, and liquidity pools. These guides aim to describe processes and risks clearly, helping readers understand the mechanics behind DeFi rather than encouraging participation.

LATEST POSTS

Bitcoin stalls at $75,000—what next for markets?

Bitcoin stalls at $75,000 as Nasdaq and S&P 500 hit records, with markets rallying and crypto stocks turning heads amid cautious optimism.

Bitmine Ethereum treasury Sees $3.78B Unrealized Losses

Explore Bitmine Ethereum treasury moves: 4.87M ETH, $3.78B unrealized losses under fair-value accounting, and what this means for investors.

Ethereum audit subsidy program launches $1M for audits

Discover how the Ethereum audit subsidy program with a $1M pool connects builders to top audit firms to strengthen security across the Ethereum mainnet.

Kevin Warsh crypto holdings face one-year cooling-off period

Kevin Warsh crypto holdings reveal broad DeFi exposure as he faces a one-year cooling-off period before Fed chair duties.

Follow us

116FansLike
745FollowersFollow
148FollowersFollow
trade crypt