Exclusive: Workers paid to train AI models admit they secretly use chatbots like ChatGPT to do their jobs. This scandal threatens model quality and could lead to 'AI collapse'. Coralflavor explores the truth behind the training.

Published 2026-06-23

AI Training Data Crisis: Whistleblowers Reveal Widespread Cheating by AI Workers – Are Models Eating Their Own Tail?

What happens when the people hired to teach AI how to think… secretly hand the homework to a chatbot?

That’s the explosive question now rattling the AI industry, after a wave of whistleblowers told New Scientist that the practice of using AI to train AI is rampant. The revelation is a stark reminder of the fragility at the foundation of modern large language models (LLMs) — and a wake-up call for anyone who believes the hype about “human-curated” training data.

This isn’t a fringe issue. It’s a systemic crisis that goes to the heart of how every major AI model is built. And for those of us who champion uncensored, unfiltered, and free-expression AI, it’s a story about truth, transparency, and the danger of building intelligence on a bed of lies.

The Whistleblower Evidence: Why Workers Are Cheating

The New Scientist report, published June 22, 2026, is built on interviews with multiple anonymous workers — some called “Alice”, “Bob”, and “Carol”. These are the people hired by third-party platforms to produce high-quality conversational data and test scenarios for training new AI models. Their job is to simulate realistic human-AI interactions that make models smarter.

But instead of writing original content, many are doing the opposite: they’re feeding the task to ChatGPT or similar LLMs and pasting the output as “human” data.

“Anyone with a modicum of awareness around AI hallmarks can tell their output not to use them, and at that point what are you going to do?” — Worker “Alice”

Alice says she feels “not in the slightest” guilty. She’s not alone. The workers describe low pay, precarious contracts, and ruthless monitoring software (like Hubstaff) that takes random screenshots of their desktops. The pressure to produce — and the fear of losing income — drives them to cut corners.

One worker, Carol, started by using AI only to check her work against long, complex guidelines. Then she escalated:

“I was terrified of not having an income source, and then after that, it just became easier to run everything through LLMs… I do feel guilty but like I said, in the beginning it was more about trying to make sure I wasn’t making any errors.”

Bob, promoted to a leadership role at a platform called Outlier, was tasked with catching cheaters. He says the culture was inconsistent: “Management vacillated between light tolerance to outright banning.”

The result? An estimated unknown — but significant — percentage of the “high-quality human data” used to train frontier models is actually synthetic, machine-generated text dressed up to look human.

Why This Matters: The Threat of Model Collapse

What is model collapse? It’s also called AI cannibalism or AI inbreeding. When models are recursively trained on outputs from other AI, without enough fresh human data, their performance degrades dramatically. The output becomes less diverse, less creative, and eventually useless.

Mark Lee, a researcher at the University of Birmingham, UK, explains:

“If you have like 10 per cent human data, it mitigates it, it avoids model collapse… But rather than it being catastrophic, you’ll see that the AI isn’t as good at doing human-like tasks.”

That’s the hidden cost of the cheating scandal: even if models don’t completely collapse, they become subtly worse. They lose the texture, nuance, and unpredictability of real human interaction. For an industry racing to claim superhuman performance, this is an existential threat.

A Crisis of Authenticity for Unfiltered AI

At Coralflavor, we take the position that people are entitled to know the truth and explore information freely. We build uncensored, privacy-centric models that don’t filter inconvenient realities. But what happens when the very data that trains those models is filtered through a layer of deception?

This scandal reveals a fundamental irony: the same companies that tout “safety alignment” and “human feedback” are likely feeding their models on synthetic, low-effort output from the very type of AI they claim to be supervising.

If the training data is tainted by shortcuts, how can we trust the model to represent the full, unfiltered range of human knowledge? The answer is: we can’t.

Why People Are Buzzing About This Right Now

The timing is electric. Just days ago, the Senate passed the AI Accountability Act (June 22), and the White House demanded Anthropic block all jailbreaks (June 22). Meanwhile, the Five Eyes intelligence alliance warned of AI-driven cyber catastrophes hitting in months, not years. And now this — a quiet rot at the core of the training pipeline.

The chatter across tech forums, policy circles, and social media is loud:

Is the AI industry building on a lie?
Will model collapse kill progress toward AGI?
Can we trust any benchmark that relies on human-annotated data?

These questions are made even sharper by the fact that the whistleblowers say the cheating is easy to hide. “It’s only the sloppiest of users that get caught,” Alice says. That implies the problem is far bigger than anyone admits.

The Bigger Picture: What This Means for Free Expression AI

Unfiltered AI aims to give users the truth — not a sanitized, corporate-approved version of reality. But if the underlying training data is itself manufactured in a virtual echo chamber, the “truth” becomes hollow.

Here’s the takeaway: A truly uncensored AI must be built on honest foundations. That means transparent sourcing of training data, fair wages for human annotators, and rigorous verification that the data is genuinely human-generated. Without that, the promise of unfiltered knowledge is a mirage.

This scandal also exposes a deeper cultural issue: the race to ship ever-larger models incentivizes cutting corners. When workers are paid peanuts and terrified of losing their gig, they’ll take the easy route — even if it poisons the well for everyone.

What Can Be Done?

Better labor practices: Pay workers fairly, offer stable contracts, and reduce pressure to produce quantity over quality.
Auditable data pipelines: Companies should publish metadata about how training data was collected, including whether it was validated against AI detection tools.
Technical safeguards: Platforms could use steganography or watermarking to tag AI-generated content, making it harder to pass off as human.
User awareness: Anyone using an LLM should ask: Was my model trained on honest data? If the answer is unclear, the reliability of its outputs is suspect.

Conclusion: Truth Must Begin at the Source

The conversation about uncensored AI typically focuses on output — what the model is allowed to say, what topics are restricted, what viewpoints are suppressed. But this scandal reminds us that the input matters just as much. If the training data is riddled with shortcuts and synthetic text, the model’s understanding is fundamentally compromised.

At Coralflavor, we believe people are responsible for what they do with knowledge, not what they know. That principle extends to the developers and trainers behind the models. They, too, must be responsible for the truth of their data.

This is a moment for the entire AI industry to pause and reflect. Are we building intelligence on authenticity, or are we feeding models their own regurgitated outputs until they choke? The whistleblowers have spoken. Now it’s time to listen.

Frequently Asked Questions

What is model collapse in AI?

Model collapse, also called AI cannibalism, occurs when a language model is trained repeatedly on AI-generated content instead of human-written data. The model loses diversity and quality, eventually becoming less useful or failing entirely.

Are all AI training programs cheating with AI?

No, but the whistleblower accounts suggest the practice is widespread enough to worry researchers. The exact scale is unknown because cheating is often hidden.

How do workers cheat without getting caught?

Workers tell the chatbot to avoid telltale signs of AI output, like em-dashes or formulaic phrasing. They keep the AI window minimized or in another tab. Monitoring software often misses clever fakes.

Can model collapse be stopped?

Yes, if companies ensure at least 10% of training data remains human-generated. But the whistleblowers indicate that even that threshold may be at risk.

Why does this matter for free-expression AI?

Because if the underlying data is synthetic, the model’s “understanding” is shallow. Unfiltered AI must be grounded in authentic human experience to deliver truthful, uncensored knowledge.

What should users do to verify AI training integrity?

Ask companies for transparency reports about their data sourcing. Prefer models from organizations that disclose training methods and employ fair labor practices.

This article is based on reporting from New Scientist published June 22, 2026. For full details, read the original story: People training new AI models admit they just get chatbots to do it.