trade crypt

Stanford study: AI outperforms law professors in legal reasoning

HomeMarketsStanford study: AI outperforms law professors in legal reasoning

-

A Stanford-led study found that AI outperformed law professors in legal reasoning, with models selected over human instructors in head-to-head assessments. The study reported 2,918 blinded comparisons between AI models and human instructors, involved 16 professors from 14 U.S. law schools, and used 40 contract law questions covering legal doctrine, case law, hypotheticals, and policy issues.

Two high-performing models were Google Gemini 2.5 Pro, which won 75.92% of its matchups against human instructors, and NotebookLM, which won 74.75%, with AI models outperforming humans in recall questions, hypotheticals, and policy discussions.

Google Gemini 2.5 Pro and NotebookLM were top-performing models in the Stanford-led study, with Gemini winning 75.92% and NotebookLM winning 74.75% of their matchups against human instructors. Those win percentages were reported from direct, blinded comparisons between AI models and human instructors. The study carried out 2,918 blinded comparisons and involved 16 professors from 14 U.S. law schools. Researchers used a set of 40 contract law questions that covered legal doctrine, case law, hypotheticals, and policy issues.

Among additional models evaluated, Anthropic’s Claude Opus 4.7 ranked first, followed by OpenAI’s ChatGPT 5.4 and Gemini 2.5 Pro in subsequent positions. Across the tested question types, AI models outperformed human instructors in recall questions, hypotheticals, and policy discussions. The comparative rankings and win rates were reported for multiple models included in the study.

The study cautioned that it did not measure alignment with any individual professor’s teaching preferences. It stated that AI responses may be generally acceptable rather than tailored to an individual instructor. Those caveats were reported alongside the numerical performance and ranking results.

The study’s experimental design used 2,918 blinded comparisons between AI models and human instructors, with 16 professors from 14 U.S. law schools participating. Researchers created 40 contract law questions that covered legal doctrine, case law, hypotheticals, and policy issues. Each comparison presented AI-generated and human-written answers in a blinded format for professor evaluation. The study recorded outcomes across those head-to-head assessments.

To probe surface-level writing factors, the study engineered a set of lexico-syntactic features and measured their association with preference patterns. The features examined included answer length, structural organization, reasoning nuance, legal anchors, confidence tone, clarity, and pedagogical support. The analysis tested how much of the preference pattern these features could explain relative to substantive content. The study reported these feature analyses alongside the blinded comparison results.

The methodology section emphasized the blinded, comparative design and the engineered lexico-syntactic feature set. The study noted that this analysis aimed to differentiate surface-level writing style from substantive content. These procedural details were reported alongside the numerical comparison outcomes.

The study reported harmfulness rates when comparing AI-generated answers to human-written answers: Google Gemini 2.5 Pro recorded a 3.41% harmfulness rate, NotebookLM recorded 3.64%, and human instructors recorded 12.06%. These rates were reported as part of the study’s evaluation metrics. The harmfulness comparisons were presented alongside the models’ performance metrics in blinded head-to-head assessments. The study included 2,918 blinded comparisons and 16 professors from 14 U.S. law schools.

The study cautioned that it did not measure alignment of AI responses with any individual professor’s teaching preferences. It stated that AI responses may be generally acceptable rather than tailored to an individual instructor. These caveats were reported alongside the harmfulness and performance results.

The Stanford-led study demonstrated that AI models outperformed law professors in legal reasoning in head-to-head, blinded assessments. The study presented its results with an analytical and cautious approach, emphasizing methodology, engineered lexico-syntactic analyses, and limitations. It also reported harmfulness comparisons and cautioned that it did not measure AI response alignment with any individual professor’s teaching preferences, noting that AI responses may be generally acceptable rather than tailored.

This website and its articles do not provide any investment advisory services within the meaning of applicable regulations. The information published may be incomplete, outdated, or contain errors. The author makes no representation or warranty regarding the accuracy, completeness, or timeliness of the information presented. Use of this information is entirely at the reader’s own risk. Under no circumstances shall the author be held liable for financial decisions made on the basis of the content published on this website.
Crypto Fan
Crypto Fanhttps://calipsu.com
Calipsu.com is dedicated to providing clear, reliable, and accessible information about cryptocurrencies, blockchain technology, and decentralized finance (DeFi). Its mission is to help readers better understand a rapidly evolving ecosystem that is often complex, technical, and misunderstood. The platform covers a wide range of topics, from major blockchain networks and crypto assets to DeFi protocols, Web3 applications, and emerging trends. The website also publishes practical guides and tutorials that explain how decentralized tools function, such as wallets, staking mechanisms, lending protocols, and liquidity pools. These guides aim to describe processes and risks clearly, helping readers understand the mechanics behind DeFi rather than encouraging participation.

LATEST POSTS

CME to sue CFTC over Kalshi perpetual futures approval?

CME to sue CFTC over Kalshi perpetual futures approval: CME questions Dodd-Frank interpretations and seeks clarity before listing.

France’s ANSSI Rule: quantum-safe encryption by 2027

France will stop certifying non-quantum-safe products by 2027 and push quantum-safe encryption adoption by 2030, signaling a cautious, steady shift.

FIFA Avalanche blockchain ticketing to curb World Cup scalping

FIFA Avalanche blockchain ticketing to curb World Cup scalping: how FIFA Collect, RTB and RTT move resale into FIFA's ecosystem.

Cryptocurrency indexes Enable Transparent Pricing Across Markets

Cryptocurrency indexes cement transparent pricing across digital asset markets, anchoring benchmarks, derivatives, and institutional adoption.

Follow us

116FansLike
745FollowersFollow
148FollowersFollow
trade crypt