Logo
Decide better.Live better.
My feedToday
Logo
Decide better.Live better.
My feedToday
Logo
My feedToday

Stay Curious. Stay Wanture.

© 2026 Wanture. All rights reserved.

  • Terms of Use
  • Privacy Policy
Logo
Decide better.Live better.
My feedTodayTechScienceHealthMobilityMindProductivityHomeExperiencesLongevity
Logo
Decide better.Live better.
My feedTodayTechScienceHealthMobilityMindProductivityHomeExperiencesLongevity
Logo
My feedTodayTechScienceHealthMobilityMindProductivityHomeExperiencesLongevity

When AI Hallucinates: The Legal Fallout of Fake Citations

Why GPT‑4 and ChatGPT fabricate court cases, and how to curb the risk

When AI Hallucinates: The Legal Fallout of Fake Citations

In 2023, a NY judge found attorneys had filed a brief with AI‑generated case citations that didn't exist. The episode shows how models like GPT‑4, ChatGPT, Claude and Gemini can hallucinate facts instead of retrieving them. Discover why these errors occur, the real risks for law, medicine and code, and how developers can mitigate them using RAG, confidence scores, and human review.

14 February 2026

—

Explainer

Jasmine Wu
banner

Summary:

  • Attorneys used ChatGPT, got six fake case citations, and a judge sanctioned them $5,000 for filing invented opinions.
  • Hallucinations occur because language models predict words from patterns, not retrieve verified facts, so they can fabricate plausible but false citations.
  • Mitigation adds confidence scores, retrieval‑augmented generation and human review; users must treat AI output as draft, verify facts, and split queries.

In June 2023, attorneys Peter LoDuca and Steven A. Schwartz walked into the United States District Court for the Southern District of New York with a brief they thought would help their client, Roberto Mata, win his personal injury case against Avianca Airlines. Judge P. Kevin Castel noticed something strange.

The brief cited six judicial opinions to support its arguments. None of them existed. Not Varghese. Not Shaboon. Not Petersen, Martinez, Durden, or Miller. All six cases were fabrications, complete with fake judges, invented quotes, and plausible citations. The attorneys had used ChatGPT to research legal precedents. The AI made them up.

This wasn't a software glitch. It was a hallucination, and it reveals how large language models generate text without checking whether any of it is true.

What Happens When AI Invents Facts

AI hallucination occurs when a language model produces text that sounds confident and coherent but contains information that is partially or completely false. The model doesn't retrieve facts from a database. It predicts the next word based on patterns it learned from billions of text examples. It calculates which word is statistically most likely to follow. Sometimes those predictions create convincing fabrications.

Models like GPT-4, Claude, and Gemini generate text one token at a time. A token is roughly a word or part of a word. Each token depends on every token that came before it. The model learned these relationships by analyzing vast text corpora during training. It lacks a mechanism for distinguishing patterns that correspond to reality from patterns that simply sound right.

Why Text Prediction Creates False Information

The architecture explains why hallucinations aren't bugs. They're inherent characteristics. When you ask a model to list three Supreme Court cases about copyright law, it doesn't search legal records. It generates tokens that fit the pattern "Supreme Court case about copyright law" based on millions of similar sequences it encountered during training.

If the model saw many real copyright cases in its training data, it will likely generate real case names. If you ask about an obscure topic with sparse training examples, the model still generates text matching the structural pattern: case name, year, legal principle. The output looks correct. The citations follow proper format. The legal reasoning sounds authoritative. The cases don't exist.

Hallucinations increase with specificity. Ask for recent studies on a rare protein mutation, and the model may invent paper titles, author names, and journal citations. It learned the pattern of how scientific citations look without having comprehensive knowledge of every actual publication. The model fills gaps with statistically likely text, not verified facts.

Why Engineers Can't Simply Fix It

Eliminating hallucinations entirely would require changing how these models work. Current transformer architectures excel at pattern matching and text generation. They compress information into statistical relationships between tokens. This compression makes them powerful. A model can discuss topics it never explicitly memorized by generalizing from patterns.

Compression means loss. The model doesn't store every fact it trained on. It stores mathematical relationships capturing general patterns. When generating text about a specific fact, it reconstructs that fact from patterns rather than retrieving it from memory. Sometimes the reconstruction is accurate. Sometimes it's convincing fiction.

Researchers have tried multiple solutions. Larger models with more training data hallucinate less frequently because they've seen more examples. Reinforcement learning from human feedback trains models to avoid common hallucination patterns by having humans rate outputs and penalize false statements. These methods help. They don't eliminate the problem.

When Confident Lies Turn Dangerous

Hallucinations pose the greatest risk in domains where accuracy is essential. In medicine, an AI suggesting a nonexistent drug interaction could harm patients. In finance, fabricated earnings data could drive bad investment decisions. In law, fake citations can derail proceedings.

Judge Castel didn't accept the invented cases quietly. He imposed $5,000 in sanctions on LoDuca, Schwartz, and their firm, payable within 14 days. The court required them to mail copies of the opinion to each judge whose name was falsely invoked, along with the fake "opinion" attributed to that judge. The court found the attorneys "continued to stand by them after the court questioned their existence," constituting bad faith. The underlying case was dismissed as time barred. The incident became a leading U.S. example of AI-generated legal hallucinations, covered by Ars Technica and legal industry publications.

A hospital in California piloted an AI system to help doctors draft patient summaries. The system occasionally inserted plausible but incorrect medication dosages, mixing up units or combining details from different patients. Doctors caught most errors during review. The cognitive load of verifying every AI-generated detail reduced the system's utility.

In software engineering, GitHub Copilot and similar tools sometimes suggest code that looks functional but contains subtle bugs or uses deprecated APIs. An engineer reviewing a 50-line AI-generated function might miss that one method call references a library version from 2019 that has since changed. The code compiles. It fails in production.

How Companies Build Reliability Layers

The industry response has focused on detection and mitigation. OpenAI and Anthropic now publish confidence scores with some API outputs, indicating when a model is uncertain. These scores help developers build applications that escalate low-confidence outputs to human review rather than presenting them as reliable facts.

Retrieval-augmented generation (RAG) addresses hallucinations by connecting language models to external databases. Instead of generating an answer purely from learned patterns, a RAG system first searches a verified knowledge base, retrieves relevant documents, and uses those documents as context for generation. The model still produces text, but it's grounded in retrieved facts. This reduces hallucinations when the retrieved context is clear. It doesn't eliminate them when context is ambiguous or incomplete.

Post-generation verification systems check outputs against trusted sources before presenting them. Perplexity AI generates answers and then attempts to find supporting citations in real-time web searches. If it can't verify a claim, it flags the uncertainty. Google's Gemini includes a feature that lets users fact-check specific claims directly.

Some companies experiment with multi-model verification, where one AI generates content and a second, independently trained model evaluates it for factual consistency. This catches some hallucinations but adds latency and cost, making it impractical for real-time applications.

What You Can Do Right Now

Using AI safely requires treating it as a first-draft generator, not a trusted authority. For code, run comprehensive tests on any AI-generated snippets. For research, cross-reference factual claims with primary sources. For legal or medical information, verify everything with domain experts or authoritative databases.

Prompting techniques can reduce hallucination rates. Asking a model to cite sources for each claim creates accountability, though the model may still invent sources. Requesting step-by-step reasoning exposes logical gaps. Breaking complex questions into smaller, verifiable parts makes it easier to catch errors early.

Understanding which tasks carry higher risk helps allocate verification effort. Using AI to brainstorm creative ideas carries low hallucination risk because there's no ground truth to violate. Using it to summarize a document you provide is medium risk. It might misrepresent details but is constrained by the source. Using it to retrieve specific facts or generate specialized technical content is high risk because the model is more likely to fill knowledge gaps with plausible fabrications.

The Architectural Trade-Off

Hallucinations reveal a tension in current AI design. The same capabilities that let models generate fluent, contextually appropriate text across thousands of topics also make them prone to confident fabrication. A model that only stated verified facts would need a comprehensive, queryable knowledge base and a reliable mechanism for distinguishing what it knows from what it doesn't. That's not how today's large language models work.

They trade perfect accuracy for broad capability. They excel at pattern completion while lacking explicit knowledge storage and retrieval. This makes them powerful for creative tasks, brainstorming, and drafting. It makes them risky for tasks requiring factual precision without human verification.

As AI systems become infrastructure, understanding this limitation matters. Not because these tools are useless, but because knowing when they're trustworthy changes how we build systems around them. Researchers are exploring promising approaches. Constitutional AI trains models to acknowledge uncertainty and refuse to answer when knowledge is insufficient. Uncertainty quantification techniques aim to give models explicit confidence measures for individual claims. Hybrid architectures that combine neural pattern matching with symbolic knowledge graphs could eventually ground generation in verified facts.

The next generation of AI applications will likely combine pattern-matching language models with structured knowledge bases, verification layers, and explicit uncertainty estimates. Until then, the responsibility for distinguishing plausible from true remains human.

What is this about?

  • Explainer/
  • Jasmine Wu/
  • Tech/
  • Software/
  • legal tech AI/
  • AI hallucination/
  • artificial intelligence/
  • enterprise AI governance/
  • AI limitations

Feed

    Tesla gets European approval for semi-autonomous driving — here's what you need to pass before using it

    Tesla gets European approval for semi-autonomous driving — here's what you need to pass before using it

    You must pass a mandatory safety quiz and accept a "Max Speed" setting as regulators weigh U.S. crash data against autonomous claims

    Auden Wheelock4 days ago
    Apple Breaks Autumn Cadence: iPhone 18 Pro and iPhone Ultra

    Apple Breaks Autumn Cadence: iPhone 18 Pro and iPhone Ultra

    Plan purchases around September’s standard lineup or wait for Q4 hardware

    Ben Ramos6 days ago
    Apple Watch Ultra 4 could track blood pressure trends

    Apple Watch Ultra 4 could track blood pressure trends

    A potential hardware redesign with 8 sensors aims to move from simple alerts to direct cardiovascular measurement

    Ben Ramos22 May 2026

    Your earbuds could become a secure digital key via your heartbeat

    AccLock uses standard accelerometers to verify identity without needing premium optical heart trackers

    Ben Ramos21 May 2026
    Memory chip shortages could end by 2027

    Memory chip shortages could end by 2027

    Aggressive Chinese production expansions from YMTC and CXMT may lower hardware costs sooner than the 2030 consensus

    Ben Ramos21 May 2026
    Hisense Explorer X1 Pro brings 120-inch cinema to your living room

    Hisense Explorer X1 Pro brings 120-inch cinema to your living room

    A new tri-color laser engine offers 110% BT.2020 color gamut, though US availability remains unannounced

    Logan Price21 May 2026
    Onyx Boox Poke 7 series brings paper-like clarity to your library

    Onyx Boox Poke 7 series brings paper-like clarity to your library

    New 300 ppi displays and 2 TB expandable storage offer a sharper, larger reading experience

    Ben Ramos20 May 2026
    SpaceX IPO: A historic bet on the space economy

    SpaceX IPO: A historic bet on the space economy

    With 2025 revenue hitting $18.6 billion, the Nasdaq debut tests market appetite for Elon Musk

    Jasmine Wu20 May 2026
    Figma AI agents turn manual design into high-level direction

    Figma AI agents turn manual design into high-level direction

    New intent-based tools allow designers to build layouts using natural language instead of clicking and dragging

    Evelyn Park20 May 2026
    NanoClaw's sandbox stops AI agents from compromising your OS

    NanoClaw's sandbox stops AI agents from compromising your OS

    NanoCo secures $12 million to scale its isolated architecture for enterprise AI deployment

    Marcus Dillard20 May 2026
    Loading...

When AI Hallucinates: The Legal Fallout of Fake Citations

Why GPT‑4 and ChatGPT fabricate court cases, and how to curb the risk

February 14, 2026, 1:32 pm

In 2023, a NY judge found attorneys had filed a brief with AI‑generated case citations that didn't exist. The episode shows how models like GPT‑4, ChatGPT, Claude and Gemini can hallucinate facts instead of retrieving them. Discover why these errors occur, the real risks for law, medicine and code, and how developers can mitigate them using RAG, confidence scores, and human review.

When AI Hallucinates: The Legal Fallout of Fake Citations

Summary

  • Attorneys used ChatGPT, got six fake case citations, and a judge sanctioned them $5,000 for filing invented opinions.
  • Hallucinations occur because language models predict words from patterns, not retrieve verified facts, so they can fabricate plausible but false citations.
  • Mitigation adds confidence scores, retrieval‑augmented generation and human review; users must treat AI output as draft, verify facts, and split queries.

In June 2023, attorneys Peter LoDuca and Steven A. Schwartz walked into the United States District Court for the Southern District of New York with a brief they thought would help their client, Roberto Mata, win his personal injury case against Avianca Airlines. Judge P. Kevin Castel noticed something strange.

The brief cited six judicial opinions to support its arguments. None of them existed. Not Varghese. Not Shaboon. Not Petersen, Martinez, Durden, or Miller. All six cases were fabrications, complete with fake judges, invented quotes, and plausible citations. The attorneys had used ChatGPT to research legal precedents. The AI made them up.

This wasn't a software glitch. It was a hallucination, and it reveals how large language models generate text without checking whether any of it is true.

What Happens When AI Invents Facts

AI hallucination occurs when a language model produces text that sounds confident and coherent but contains information that is partially or completely false. The model doesn't retrieve facts from a database. It predicts the next word based on patterns it learned from billions of text examples. It calculates which word is statistically most likely to follow. Sometimes those predictions create convincing fabrications.

Models like GPT-4, Claude, and Gemini generate text one token at a time. A token is roughly a word or part of a word. Each token depends on every token that came before it. The model learned these relationships by analyzing vast text corpora during training. It lacks a mechanism for distinguishing patterns that correspond to reality from patterns that simply sound right.

Why Text Prediction Creates False Information

The architecture explains why hallucinations aren't bugs. They're inherent characteristics. When you ask a model to list three Supreme Court cases about copyright law, it doesn't search legal records. It generates tokens that fit the pattern "Supreme Court case about copyright law" based on millions of similar sequences it encountered during training.

If the model saw many real copyright cases in its training data, it will likely generate real case names. If you ask about an obscure topic with sparse training examples, the model still generates text matching the structural pattern: case name, year, legal principle. The output looks correct. The citations follow proper format. The legal reasoning sounds authoritative. The cases don't exist.

Hallucinations increase with specificity. Ask for recent studies on a rare protein mutation, and the model may invent paper titles, author names, and journal citations. It learned the pattern of how scientific citations look without having comprehensive knowledge of every actual publication. The model fills gaps with statistically likely text, not verified facts.

Why Engineers Can't Simply Fix It

Eliminating hallucinations entirely would require changing how these models work. Current transformer architectures excel at pattern matching and text generation. They compress information into statistical relationships between tokens. This compression makes them powerful. A model can discuss topics it never explicitly memorized by generalizing from patterns.

Compression means loss. The model doesn't store every fact it trained on. It stores mathematical relationships capturing general patterns. When generating text about a specific fact, it reconstructs that fact from patterns rather than retrieving it from memory. Sometimes the reconstruction is accurate. Sometimes it's convincing fiction.

Researchers have tried multiple solutions. Larger models with more training data hallucinate less frequently because they've seen more examples. Reinforcement learning from human feedback trains models to avoid common hallucination patterns by having humans rate outputs and penalize false statements. These methods help. They don't eliminate the problem.

When Confident Lies Turn Dangerous

Hallucinations pose the greatest risk in domains where accuracy is essential. In medicine, an AI suggesting a nonexistent drug interaction could harm patients. In finance, fabricated earnings data could drive bad investment decisions. In law, fake citations can derail proceedings.

Judge Castel didn't accept the invented cases quietly. He imposed $5,000 in sanctions on LoDuca, Schwartz, and their firm, payable within 14 days. The court required them to mail copies of the opinion to each judge whose name was falsely invoked, along with the fake "opinion" attributed to that judge. The court found the attorneys "continued to stand by them after the court questioned their existence," constituting bad faith. The underlying case was dismissed as time barred. The incident became a leading U.S. example of AI-generated legal hallucinations, covered by Ars Technica and legal industry publications.

A hospital in California piloted an AI system to help doctors draft patient summaries. The system occasionally inserted plausible but incorrect medication dosages, mixing up units or combining details from different patients. Doctors caught most errors during review. The cognitive load of verifying every AI-generated detail reduced the system's utility.

In software engineering, GitHub Copilot and similar tools sometimes suggest code that looks functional but contains subtle bugs or uses deprecated APIs. An engineer reviewing a 50-line AI-generated function might miss that one method call references a library version from 2019 that has since changed. The code compiles. It fails in production.

How Companies Build Reliability Layers

The industry response has focused on detection and mitigation. OpenAI and Anthropic now publish confidence scores with some API outputs, indicating when a model is uncertain. These scores help developers build applications that escalate low-confidence outputs to human review rather than presenting them as reliable facts.

Retrieval-augmented generation (RAG) addresses hallucinations by connecting language models to external databases. Instead of generating an answer purely from learned patterns, a RAG system first searches a verified knowledge base, retrieves relevant documents, and uses those documents as context for generation. The model still produces text, but it's grounded in retrieved facts. This reduces hallucinations when the retrieved context is clear. It doesn't eliminate them when context is ambiguous or incomplete.

Post-generation verification systems check outputs against trusted sources before presenting them. Perplexity AI generates answers and then attempts to find supporting citations in real-time web searches. If it can't verify a claim, it flags the uncertainty. Google's Gemini includes a feature that lets users fact-check specific claims directly.

Some companies experiment with multi-model verification, where one AI generates content and a second, independently trained model evaluates it for factual consistency. This catches some hallucinations but adds latency and cost, making it impractical for real-time applications.

What You Can Do Right Now

Using AI safely requires treating it as a first-draft generator, not a trusted authority. For code, run comprehensive tests on any AI-generated snippets. For research, cross-reference factual claims with primary sources. For legal or medical information, verify everything with domain experts or authoritative databases.

Prompting techniques can reduce hallucination rates. Asking a model to cite sources for each claim creates accountability, though the model may still invent sources. Requesting step-by-step reasoning exposes logical gaps. Breaking complex questions into smaller, verifiable parts makes it easier to catch errors early.

Understanding which tasks carry higher risk helps allocate verification effort. Using AI to brainstorm creative ideas carries low hallucination risk because there's no ground truth to violate. Using it to summarize a document you provide is medium risk. It might misrepresent details but is constrained by the source. Using it to retrieve specific facts or generate specialized technical content is high risk because the model is more likely to fill knowledge gaps with plausible fabrications.

The Architectural Trade-Off

Hallucinations reveal a tension in current AI design. The same capabilities that let models generate fluent, contextually appropriate text across thousands of topics also make them prone to confident fabrication. A model that only stated verified facts would need a comprehensive, queryable knowledge base and a reliable mechanism for distinguishing what it knows from what it doesn't. That's not how today's large language models work.

They trade perfect accuracy for broad capability. They excel at pattern completion while lacking explicit knowledge storage and retrieval. This makes them powerful for creative tasks, brainstorming, and drafting. It makes them risky for tasks requiring factual precision without human verification.

As AI systems become infrastructure, understanding this limitation matters. Not because these tools are useless, but because knowing when they're trustworthy changes how we build systems around them. Researchers are exploring promising approaches. Constitutional AI trains models to acknowledge uncertainty and refuse to answer when knowledge is insufficient. Uncertainty quantification techniques aim to give models explicit confidence measures for individual claims. Hybrid architectures that combine neural pattern matching with symbolic knowledge graphs could eventually ground generation in verified facts.

The next generation of AI applications will likely combine pattern-matching language models with structured knowledge bases, verification layers, and explicit uncertainty estimates. Until then, the responsibility for distinguishing plausible from true remains human.

What is this about?

  • Explainer/
  • Jasmine Wu/
  • Tech/
  • Software/
  • legal tech AI/
  • AI hallucination/
  • artificial intelligence/
  • enterprise AI governance/
  • AI limitations

Feed

    Tesla gets European approval for semi-autonomous driving — here's what you need to pass before using it

    Tesla gets European approval for semi-autonomous driving — here's what you need to pass before using it

    You must pass a mandatory safety quiz and accept a "Max Speed" setting as regulators weigh U.S. crash data against autonomous claims

    Auden Wheelock4 days ago
    Apple Breaks Autumn Cadence: iPhone 18 Pro and iPhone Ultra

    Apple Breaks Autumn Cadence: iPhone 18 Pro and iPhone Ultra

    Plan purchases around September’s standard lineup or wait for Q4 hardware

    Ben Ramos6 days ago
    Apple Watch Ultra 4 could track blood pressure trends

    Apple Watch Ultra 4 could track blood pressure trends

    A potential hardware redesign with 8 sensors aims to move from simple alerts to direct cardiovascular measurement

    Ben Ramos22 May 2026

    Your earbuds could become a secure digital key via your heartbeat

    AccLock uses standard accelerometers to verify identity without needing premium optical heart trackers

    Ben Ramos21 May 2026
    Memory chip shortages could end by 2027

    Memory chip shortages could end by 2027

    Aggressive Chinese production expansions from YMTC and CXMT may lower hardware costs sooner than the 2030 consensus

    Ben Ramos21 May 2026
    Hisense Explorer X1 Pro brings 120-inch cinema to your living room

    Hisense Explorer X1 Pro brings 120-inch cinema to your living room

    A new tri-color laser engine offers 110% BT.2020 color gamut, though US availability remains unannounced

    Logan Price21 May 2026
    Onyx Boox Poke 7 series brings paper-like clarity to your library

    Onyx Boox Poke 7 series brings paper-like clarity to your library

    New 300 ppi displays and 2 TB expandable storage offer a sharper, larger reading experience

    Ben Ramos20 May 2026
    SpaceX IPO: A historic bet on the space economy

    SpaceX IPO: A historic bet on the space economy

    With 2025 revenue hitting $18.6 billion, the Nasdaq debut tests market appetite for Elon Musk

    Jasmine Wu20 May 2026
    Figma AI agents turn manual design into high-level direction

    Figma AI agents turn manual design into high-level direction

    New intent-based tools allow designers to build layouts using natural language instead of clicking and dragging

    Evelyn Park20 May 2026
    NanoClaw's sandbox stops AI agents from compromising your OS

    NanoClaw's sandbox stops AI agents from compromising your OS

    NanoCo secures $12 million to scale its isolated architecture for enterprise AI deployment

    Marcus Dillard20 May 2026
    Loading...
Home
Home
Search
Search
banner