trade crypt

CAISI Evaluation: DeepSeek V4 Pro Behind Frontier AI

HomeMarketsCAISI Evaluation: DeepSeek V4 Pro Behind Frontier AI

-

On May 1, the Center for AI Standards and Innovation (CAISI), a unit within NIST, conducted an evaluation of DeepSeek V4 Pro. The assessment revealed that DeepSeek V4 Pro lags behind the frontier AI models by approximately 8 months. In terms of performance metrics, the Elo score for DeepSeek V4 Pro was recorded to be around 800. This positions it below models like Claude Opus 4.6 and GPT-5.5, which scored 999 and 1,260 respectively.

The Center for AI Standards and Innovation (CAISI), a unit of NIST, conducted an evaluation of DeepSeek V4 Pro on May 1. The evaluation highlighted that DeepSeek V4 Pro lags the frontier AI models by about 8 months. Among the models evaluated, GPT-5.5 achieved the highest IRT-estimated Elo score of 1,260, followed by Claude Opus 4.6 with a score of 999. DeepSeek V4 Pro recorded an Elo score of approximately 800, with a margin of ±28, while GPT-5.4 mini scored 749.

In terms of performance on public benchmarks, DeepSeek V4 Pro scored 90% on the GP QA-Diamond, compared to Claude Opus 4.6’s 91%. On specific tasks like OTIS-AIME-2025, PUMaC 2024, and SMT 2025, DeepSeek V4 Pro performed admirably, scoring 97%, 96%, and 96%, respectively. However, on the SWE-Bench Verified, DeepSeek V4 Pro scored 74%, which was lower than GPT-5.5’s 81%. These results chart a competitive landscape where DeepSeek V4 Pro demonstrates strong capabilities in certain benchmarks but falls short in others against top-tier AI models.

In the cost comparison conducted, it was found that only the GPT-5.4 mini managed to clear the cost bar. However, DeepSeek V4 Pro was more cost-effective across five out of the seven benchmarks evaluated. This provides a competitive edge in pricing for DeepSeek V4 Pro compared to other prominent AI models.

In its technical report, DeepSeek claims that the V4 Pro model performs comparably to both Claude Opus 4.6 and GPT-5.4, despite the perceived lag reported by CAISI. Contrasting this evaluation, Ex0bit, a representative of DeepSeek, strongly denied the claim that DeepSeek V4 Pro is 8 months behind. The quote from Ex0bit emphatically states, “There’s no ‘gap’, and no one’s 8 months behind. We’ve been trolled on every closed U.S drop and flexed on with open weights.” This highlights the skepticism towards CAISI’s methodology and insists on the competitive positioning of DeepSeek V4 Pro.

On May 1, the Center for AI Standards and Innovation (CAISI), a unit of NIST, evaluated DeepSeek V4 Pro and concluded it lags the frontier by about eight months, reflected in IRT-estimated Elo scores: GPT-5.5 1,260; Claude Opus 4.6 999; DeepSeek V4 Pro approximately 800 (±28); and GPT-5.4 mini 749.

On public benchmarks, DeepSeek scored 90% on GPQA-Diamond (Opus 4.6 91%), 97% on OTIS-AIME-2025, 96% on PUMaC 2024, 96% on SMT 2025, and 74% on SWE-Bench Verified (GPT-5.5 81%).

DeepSeek’s technical report asserts V4 Pro matches Opus 4.6 and GPT-5.4, and Ex0bit rejected the assessment’s 8-month lag finding.

This website and its articles do not provide any investment advisory services within the meaning of applicable regulations. The information published may be incomplete, outdated, or contain errors. The author makes no representation or warranty regarding the accuracy, completeness, or timeliness of the information presented. Use of this information is entirely at the reader’s own risk. Under no circumstances shall the author be held liable for financial decisions made on the basis of the content published on this website.
Crypto Fan
Crypto Fanhttps://calipsu.com
Calipsu.com is dedicated to providing clear, reliable, and accessible information about cryptocurrencies, blockchain technology, and decentralized finance (DeFi). Its mission is to help readers better understand a rapidly evolving ecosystem that is often complex, technical, and misunderstood. The platform covers a wide range of topics, from major blockchain networks and crypto assets to DeFi protocols, Web3 applications, and emerging trends. The website also publishes practical guides and tutorials that explain how decentralized tools function, such as wallets, staking mechanisms, lending protocols, and liquidity pools. These guides aim to describe processes and risks clearly, helping readers understand the mechanics behind DeFi rather than encouraging participation.

LATEST POSTS

Bitcoin price holds below $81,000 ahead of Trump-Xi talks?

Bitcoin price holds below $81,000 ahead of Trump-Xi talks, with BTC around $80,900 and mixed moves across ETH, DeFi tokens, and macro headlines.

Ce que révèlent les assistants de politiciens et marchés prédictifs

Cryptoast révèle comment les assistants de politiciens et marchés prédictifs peuvent générer des gains, soulevant des questions éthiques.

Mistral AI PyPI malware supply-chain attack: Key Takeaways

Analysis of the Mistral AI PyPI malware supply-chain attack reveals how malicious code ran on Linux, stole credentials, and spread via PyPI.

MARA pivots from Bitcoin mining to AI infrastructure

MARA pivots from Bitcoin mining to AI infrastructure, selling $1.5B of BTC and posting a $1.26B Q1 loss while reshaping debt.

Follow us

116FansLike
745FollowersFollow
148FollowersFollow
trade crypt