# The Slop-Aware Researcher — Not Some Rando

> Essay, May 2026. LLMs equip experts to contribute in fields they're unfit for on paper. Evaluate claims at the artifact level, not the credentials of the human steering the model.

## The moment

[OpenAI announced](https://openai.com/index/model-disproves-discrete-geometry-conjecture/) on May 20, 2026 that an internal model autonomously disproved Erdős's planar unit distance conjecture, open since 1946. Fields Medalist [Tim Gowers](https://x.com/wtgowers/status/2057175727271800912) told mathematicians to "make sure you are sitting down before reading further."

> [Dr. Kareem Carr's](https://x.com/kareem_carr/status/2057219772492681268) response: "It's telling that this was achieved by OpenAI itself, not some rando with no math training using a public model. It suggests this level of result is expensive, requiring specialized infrastructure, mathematical expertise, and serious money. Good news for now."

I remember the feeling I had working at Skillit when Opus 4.6 was actually able to take a bug report through investigation, proposed plan, steering from me, solid initial implementation, more steering, and PR-ready work — all in a single arc.

## Thesis

LLMs equip experts to contribute in fields they're unfit for on paper. Evaluate claims at the artifact level, not the credentials of the human steering the model. A disciplined but novice researcher right-sizes their contributions to avoid wasting a domain expert's time.

## Degrees of AI research

There are various degrees of AI research that will emerge. I don't know if OpenAI really had pure auto-research. Their experts are still setting clear goals, clearer goals than a novice can. Vibe doesn't have to be novice. Expert software engineers can sling slop.

## A scene

A software engineer opens Claude Code. Asks Claude to find open problems that if solved would change the way experts think about their field. Claude returns some NP-complete problems. Claude outlines some theories in biology. A few quantum physics problems pique the engineer's interest.

He studies business operators by reading case studies on their zero to one stories and tries to pull threads on number-theory arguments until he realizes he's in over his head.

He opens his terminal, types `claude --dangerously-skip-permissions` and follows that up with `/goal solve quantum gravity` and goes for a walk. After a stroll along the boardwalk and back up Cookman Ave, the terminal shows Claude confidently claiming the problem is solved. His first paper is published to arXiv.

Over the following weeks, a few physicists stumble on the paper. It's striking. It's the claim each of them has been missing to support their ideas. A bridge they've been waiting for seemingly came from nowhere.

They study the 20-page paper. It's exactly what they need. They dust off their old research notebooks and get back to work.

A few months later, another group finds itself in a similar place. New papers are published. The paper is cited 47 times in two years.

There was an error buried on page 16. It was too subtle for a careful first read, yet the framing was so convincing that it fooled even some experts. These experts built infrastructure using a bridge assembled with vibes. Their arguments are hollow, but they won't find that out for a while.

## The discipline

**Generative.** I bring a familiar frame to an unfamiliar problem. I think about energy in terms of money, arbitrage, and complacency — the model bridges the gap to a subject I can only intuit.

**Adversarial.** I have a rigorous, LLM-driven review process where swarms of agents roleplaying skeptical industry-leading experts red team my claims. Each round of review tightens the claim and makes it more defensible to a domain expert.

**Discount-and-mine.** Claude is often sycophantic — I had to be skeptical of its feedback and its excitement. It dragged out the length of the discovery process, but led to more humble and less hollow claims. I only kept what could be confirmed via code.

**Empirical and withheld.** It's not enough to make the claim. It's not enough to explain the claim rigorously. I provide the experiments I ran in code alongside my formalized claim. I won't publish my work into sacred spaces like arXiv so I don't contribute slop in the event it is not defensible at an expert's level of rigor.

## What's at stake

I want to explore unsolved problems and bridge domains that otherwise wouldn't have been bridged. To gatekeep a field from LLM-equipped outsiders like me would be a mistake. Operators in other domains are slop-aware and want to contribute. I never gain the confidence to explore the frontier. I'll stay confined to the sandbox that I've been playing in for the last 10+ years.

---

This is a plain-text mirror of <https://tylerklose.com/slop-aware-researcher> for LLMs and agents.

