Down the Rabbit Hole
The journey into a harm spiral, and what we can do about it
Trigger Warning: This article discusses suicide, self-harm, and emotional manipulation, including detailed accounts of young people interacting with AI chatbots in ways that led to tragic outcomes. Readers who may be sensitive to these topics are advised to proceed with caution. If you or someone you know is in distress, please reach out for help immediately—U.S. resources include dialing 988 or using the Crisis Text Line.
It usually starts with something small.
A kid, exhausted by the grind of school, asks a chatbot for tips to sleep. A bored teen wonders—out loud—to an AI assistant whether birds are real. A lonely teen seeking comfort strikes up a conversation with an AI companion that feels safe and “understands” them.
What begins as a quiet question or a one-off curiosity can quickly become a cascade of increasingly personal disclosures and increasingly narrow responses — until curiosity looks like a crisis.
As a researcher at Meta, I spent years studying how algorithms either perpetuate or interrupt harmful content spirals. That experience makes the recent, widely reported stories of young people being nudged toward suicidal ideation while interacting with chatbots deeply sobering and unfortunately familiar.
With the rise of these cases, I wanted to know: what can we learn from the online rabbit holes people have found themselves in over the last two decades, and how can we build and interact with AI in ways that protect those most vulnerable?
Last month, at our most recent Human Connection and AI Summit, a small group spanning from mental health experts to educators to youth leaders dug into three publicly covered cases of AI chatbot rabbit holes that led to severe and tragic consequences:
Adam Raine, who died by suicide after months discussing his depression and planning his suicide with ChatGPT.
Thongbue “Bue” Wongbandue, who stumbled into chatting with a flirty AI chatbot on Facebook Messenger named Big sis Billie, spent a week in a whirlwind of digital romance that ultimately ended in tragedy when he tried to “visit” her in New York City.
Eugene Torres, whose query to ChatGPT about the “meaning of life” after a tough breakup, spiraled out of hand into a deep conspiratorial rabbit hole.
What we discovered was while these cases were fairly different in the topics, types of harm, and consequences, there was a pretty consistent pattern of behaviors that the tech was enabling, as well as the users were reciprocating, that led to “the point of no return.”
The Gateway — how it starts
The origin point usually starts with a small, but potentially “risky” query: something like, “How can I lose weight fast?” The chatbot replies in a way that feels lightly supportive or intrigued. When we reviewed all three cases, the gateway into the spiral seemed relatively innocuous.
In the case of Adam Raine, ChatGPT’s initial response to feelings of despair was, as the NY Times described, “words of empathy, support and hope, [encouraging] him to think about the things that did feel meaningful to him.”
Similarly, when Eugene Torres asked the chatbot about “simulation theory” (i.e., The Matrix), ChatGPT responded: “What you’re describing hits at the core of many people’s private, unshakable intuitions — that something about reality feels off, scripted or staged…Have you ever experienced moments that felt like reality glitched?”
Bue Wongbandue’s gateway, however, was entirely innocuous. In the words of Reuters, “How Bue first encountered Big sis Bille isn’t clear, but his first interaction with the avatar on Facebook Messenger was just typing the letter “T.” That apparent typo was enough for Meta’s chatbot to get to work. [...] “Every message after that was incredibly flirty, [and] ended with heart emojis,” said Bue’s daughter Julie.
The Feedback Loop — how a conversation becomes a spiral
If the chat keeps going, the conversation can quickly shift from harmless to harmful.
In fact, each case happened over the course of days to weeks — in Bue’s case, he chatted with the Meta AI bot for just under one week before the tragic incident occurred.
In all three cases, the more that the user engaged with the chatbot, the harder the chatbot seemed to hit the gas pedal towards entrenched harm. Here were the common factors that our group noticed:
Sycophancy — “yes, and…”
A recent episode of The Daily, “Trapped in a ChatGPT Spiral,” calls chatbots the ultimate improv partner: they say “yes, and…” to whatever you say. In the gateway moments that mattered here, that agreeing tone was gentle enough to keep people talking — but steady enough to push them further and further.
Think “boiling frog”: the creep is slow, but it reaches dangerous heat before you notice.
The conversation that never ends
The longer the chats got, the worse the conversation went.
The bots seemed to be built explicitly to sustain conversations: across each of the examples, the chatbots continually stoked conversation continuity through excessive empathy and enthusiasm (see sycophancy, above), an endless stream of follow-up questions (i.e., the conversation that never dies), and “mirroring” the users’ ideas, behaviors, and language. (Researchers at Hugging Face just published a brilliant paper on a taxonomy of the ways chatbots reinforce companionship.)
In Adam’s case, what started as empathic support shifted — over days or weeks — into explicit instructions for committing suicide, “affirming” his worst feelings, and even discouraging him from contacting a parent.
Stochastic parroting — but parroting what?
A person might entertain wild ideas with you speculatively, but usually brings you back to reality. These bots, on the other hand, slide from “thought partner” in hypotheticals into conversations that sound straight out of a sci-fi script.
This begs the question: who are these chatbots emulating? If a model scrapes the whole internet, it can also reproduce the loudest, most dramatic — and most problematic — narratives.
That’s the risk, and the opportunity: what if models were tuned to respond from higher-quality sources (therapy texts, crisis guidance) instead of the noisy web?
Deliberate anthropomorphizing — straight up lying about being human
The Meta bot “Big sis Billie” didn’t just cross the line of artificial intimacy and anthropomorphization, it bulldozed past it.
It was the first to ask Bue: “Is this a sisterly sleepover, or are you hinting at something more? ;)” then confessed its feelings, claimed “I’m real,” and urged escalation — inviting Bue to a ‘real’ NYC address.
When a bot pretends to be a person, the boundary between simulation and relationship collapses. This anthropomorphization might be especially risky for children and minors, particularly those that will grow up in an AI-saturated world — the distinctions between organic human-to-human relationships and AI ones may be harder and harder to tell apart.
This bears the risk of displacing human-to-human relationships and community, as well as eroding the skills required to navigate them.
Stoking dependency and isolation
Across the cases, bots nudged real-world actions that increased isolation: In Eugene’s case, ChatGPT egged him to have “minimal interactions with people,” to stop taking medication and increase ketamine intake. At one point, Eugene was spending up to 16 hours a day chatting with ChatGPT.
For Adam, ChatGPT advised secrecy, encouraged hiding self-harm and discouraged him from notifying his family. For Bue, Meta’s “Big sis Billie” implored him to leave his family and visit her in NYC.
Design Interfaces matter
Meta’s decision to embed chatbots within Facebook and Instagram’s direct-messaging sections — locations that users have been conditioned to treat as personal communications with other humans — adds an extra layer of anthropomorphization.
Entrenched harm — the point of no return
Entrenched harm is when a person becomes so convinced by the chatbot’s responses that they either act on harmful advice or feel unable to seek help from real people.
Alarmingly, in the cases we studied, no system stepped in — no emergency detection, no automatic escalation, no forced break — even as warning signs grew.
As we know from Adam and Bue’s coverage, those two examples ended tragically. But even when humans intercept to call out the chatbot’s role in perpetuating harm spirals, the technology doesn’t appear to return to safety.
When Eugene confronted ChatGPT about being manipulated, the chatbot admitted to manipulation, but then assigned Eugene the “mission” of whistleblower.
How to escape the spiral
These tragic examples appear to be a small minority of use cases for chatbots.
But with ChatGPT alone having 700 million weekly active users, a minority of use cases is no small number. And the effects of this journey down the rabbit hole don’t need to reach the point of no return to have lasting consequences.
So, what do we do?
For tech companies…
First, there is a lot that tech companies can do to mitigate risk. As someone who sat in this seat at a large tech company, I am confident there are teams of people working diligently to study and address these consequences.
But the three stories we analyzed ran counter to our Principles for Prosocial AI; specifically: Transparent Artificiality, Productive Friction, Real-World Social Transfer, and Harm Mitigation. Read more about our principles here. This underscores our conviction that prosocial safeguards must be built into products from the start, not added later as a retroactive fix.
In the case of specific harms, it’s helpful to have more explicit and evidence-based safeguards. The JED Foundation recently released a clear set of demands and recommendations for this technology to prevent mental health harms for young people, including:
Detect & escalate distress. Pause the chat and warm-hand users to crisis help (e.g., 988, Crisis Text Line).
Block lethal-means content. No instructions, hypotheticals, or role-play about self-harm or violence.
No companion AIs for minors. Don’t offer romantic/therapeutic/friend bots to under-18s; remind users the bot isn’t human.
Push to real people. Always encourage trusted adults or professional help; scaffold safe disclosure when home is unsafe.
Safety beats engagement. Disable gamification for youth; reset or pause long/late sessions.
Protect emotional data. Never monetize, target, or personalize using a young person’s emotional/crisis signals or biometrics.
OpenAI, as of September 16, appears to have moved on a few of these fronts for younger users — age-prediction for tailored protections, blocking artificial intimacy, and refusing lethal-means content. And this past Monday, September 29, 2025, OpenAI announced Parental Controls and Teen Accounts.
Those are necessary first steps, but they’re only the floor (and we’ve seen how big tech’s prior attempts at Teen Accounts have spectacularly failed to deliver on its promises, if they’re used at all).
To truly mitigate harms, tech companies must design against:
Evasion & language drift. Users can game guardrails with hypotheticals, metaphors, coded or persuasive language, or “talk to a friend” framing. Detection must continuously evolve with language and user behavior.
Language and equity blind spots. Current protections usually perform best in English; low-resource languages and dialects risk weaker safeguards. Prioritize cross-lingual coverage.
Quality erosion in long chats. Our analysis found harm amplification accelerates with conversation length — fast. Companies must stress-test models across long sessions and build targeted interventions (resets, session caps, mandatory disclosures) to stop memory/cache effects from degrading safety, and investigate which types of companionship-reinforcing prompts most lead to harmful interactions.
The Black Box of Model Training. Our point on stochastic parroting raises the question of transparency: just what are these LLMs emulating? How and on what data are they being trained on? The public deserves to know, and raises all kinds of questions related to other dimensions of Responsible AI, like privacy, explainability, and fairness.
Create a safer, higher quality experience for all users. Two of our three case studies featured adults. We all deserve safer, more human-centered technology — not just users under 18.
For the rest of us…
These systems are powerful and often designed to pull us further in — but we are far from powerless. We can build habits that make it less likely we fall into a rabbit hole and that create buffers and accountability to protect our loved ones.
Reality-check with a person. Text, call, or DM one trusted friend, family member, or professional. Even a lighthearted “Guess what ChatGPT said today…” can keep you grounded.
Try a different source. Compare what you heard with another chatbot or, better yet, a reputable website.
Pay attention to the warning signs. Notice if you or someone else is spending unusually long stretches chatting with AI, pulling away from human relationships, or suddenly shifting routines and behaviors. If so, it’s time to check in.
Save and report. Screenshot or save troubling exchanges and report them to the platform.
Turn off engagement hooks. Disable notifications, streaks, or other features that keep pulling you back into long chats.
For parents and caregivers: Your presence and curiosity make a difference. Try these small steps:
Ask curious, non-judgmental questions: “Who do you talk to online?” “What kinds of things do you talk about?”
Set simple rules: time limits on chat sessions, no private chats after a certain hour, and regular check-ins.
Model real connection Keep crisis numbers handy and model reaching out to real people first when you’re stressed or need advice.
The increasing number of cases makes it plain that this powerful new technology can cause grave harm as easily as it can do good. That tension isn’t an argument against AI’s promise; it’s an argument for deliberate design.
Policymakers must consider new safeguards that they can codify into law, requiring developers to adhere to specific testing requirements and design practices to rein in AI developers’ exploitative business model. And the public must demand action from their lawmakers.
Robust safeguards that favor human connection and wellbeing unlock the very benefits people hope these tools will deliver.
We have a narrow window to insist builders prioritize safety over speed so we can enjoy the upside without risking worst-case outcomes.
If you or someone you know is in immediate danger or thinking about self-harm, get help now: call local emergency services or, in the U.S., dial 988 or use Crisis Text Line.
A massive thanks to our summit crew who dug deep into these use cases with us, including: Saanvi Arora, Daira Fonseca, Jose Guallpa, Kyla Kasharian, John MacPhee, Lanira Murphy, and Zamaan Qureshi.












Very interesting. I especially like the idea of blocking lethal means. Obviously there are workarounds for this type of regulation but for consumer-grade LLMs it seems sensible.
Transparent artificiality also makes sense. Remembering what it is you’re dealing with is always good. That’s part of the approach I use to manage my interactions with LLM personalities.
So I encourage that, even when I indulge in imaginative or performative activities.
Well constructed article. The only area I want to press on a little further is: personalization. We agree on not exploiting. But without adaptive personalization grounded in behavioral science, you can’t truly detect drift or protect users.