Essay · May 2026

The Slop-Aware Researcher — Not Some Rando.

The moment

OpenAI announced on May 20, 2026 that an internal model autonomously disproved Erdős's planar unit distance conjecture, open since 1946. Fields Medalist Tim Gowers told mathematicians to "make sure you are sitting down before reading further."

Dr. Kareem Carr's response: "It's telling that this was achieved by OpenAI itself, not some rando with no math training using a public model. It suggests this level of result is expensive, requiring specialized infrastructure, mathematical expertise, and serious money. Good news for now."

I remember the feeling I had working at Skillit when Opus 4.6 was actually able to take a bug report through investigation, proposed plan, steering from me, solid initial implementation, more steering, and PR-ready work — all in a single arc.

Thesis

LLMs equip experts to contribute in fields they're unfit for on paper. Evaluate claims at the artifact level, not the credentials of the human steering the model. A disciplined but novice researcher right-sizes their contributions to avoid wasting a domain expert's time.

Degrees of AI research

There are various degrees of AI research that will emerge. I don't know if OpenAI really had pure auto-research. Their experts are still setting clear goals, clearer goals than a novice can. Vibe doesn't have to be novice. Expert software engineers can sling slop.

A scene

A software engineer opens Claude Code. Asks Claude to find open problems that if solved would change the way experts think about their field. Claude returns some NP-complete problems. Claude outlines some theories in biology. A few quantum physics problems pique the engineer's interest.

He studies business operators by reading case studies on their zero to one stories and tries to pull threads on number-theory arguments until he realizes he's in over his head.

He opens his terminal, types claude --dangerously-skip-permissions and follows that up with /goal solve quantum gravity and goes for a walk. After a stroll along the boardwalk and back up Cookman Ave, the terminal shows Claude confidently claiming the problem is solved. His first paper is published to arXiv.

Over the following weeks, a few physicists stumble on the paper. It's striking. It's the claim each of them has been missing to support their ideas. A bridge they've been waiting for seemingly came from nowhere.

They study the 20-page paper. It's exactly what they need. They dust off their old research notebooks and get back to work.

A few months later, another group finds itself in a similar place. New papers are published. The paper is cited 47 times in two years.

There was an error buried on page 16. It was too subtle for a careful first read, yet the framing was so convincing that it fooled even some experts. These experts built infrastructure using a bridge assembled with vibes. Their arguments are hollow, but they won't find that out for a while.

The discipline

Generative. I bring a familiar frame to an unfamiliar problem. I think about energy in terms of money, arbitrage, and complacency — the model bridges the gap to a subject I can only intuit.

Adversarial. I have a rigorous, LLM-driven review process where swarms of agents roleplaying skeptical industry-leading experts red team my claims. Each round of review tightens the claim and makes it more defensible to a domain expert.

Discount-and-mine. Claude is often sycophantic — I had to be skeptical of its feedback and its excitement. It dragged out the length of the discovery process, but led to more humble and less hollow claims. I only kept what could be confirmed via code.

Empirical and withheld. It's not enough to make the claim. It's not enough to explain the claim rigorously. I provide the experiments I ran in code alongside my formalized claim. I won't publish my work into sacred spaces like arXiv so I don't contribute slop in the event it is not defensible at an expert's level of rigor.

What's at stake

I want to explore unsolved problems and bridge domains that otherwise wouldn't have been bridged. To gatekeep a field from LLM-equipped outsiders like me would be a mistake. Operators in other domains are slop-aware and want to contribute. I never gain the confidence to explore the frontier. I'll stay confined to the sandbox that I've been playing in for the last 10+ years.