The Pipeline, Not the Model: What Actually Disproved the Unit Distance Conjecture

Mihir Wagle May 31, 2026 13 min read

aiopenaiepistemics

In May 2026, OpenAI announced that one of its internal reasoning models had disproved Erdős's planar unit distance conjecture, an eighty-year-old problem in discrete geometry. The headline making the rounds is "AI solved an open math problem." Tim Gowers, in the companion paper, wrote: "There is no doubt that the solution to the unit-distance problem is a milestone in AI mathematics: if a human had written the paper and submitted it to the Annals of Mathematics and I had been asked for a quick opinion, I would have recommended acceptance without any hesitation." That framing is real, and incomplete in a way that matters.

The result is real. The construction is novel. The proof is valid. But the headline asks you to credit a single agent (the model) for an outcome that was produced by a months-long pipeline of human design, human selection, and human rewriting. The agent is one component. The pipeline is the result.

This isn't a debunking. It's an attempt to credit the right components, because how we read this announcement determines how we read the next twenty.

What the announcement actually says

The facts as disclosed:

May 20, 2026. OpenAI publishes a result attributed to an internal general-purpose reasoning model.
The model produces a counterexample to the unit distance conjecture: an infinite family of point arrangements yielding more unit distances than the classical square-grid construction.
The construction relies on algebraic number theory: Golod-Shafarevich theory, Chebotarev density, infinite class field towers, work by Ellenberg-Venkatesh and Hajir-Maire-Ramakrishna.
Cost and runtime are not in OpenAI's official disclosure. Per Latent Space's breakdown they are speculated to be under $1,000 and roughly 32 hours.
The model's output, after editing, is approximately 125 pages.
A nine-mathematician team (Noga Alon, Thomas Bloom, W.T. Gowers, Daniel Litt, Will Sawin, Arul Shankar, Jacob Tsimerman, Victor Wang, Melanie Matchett Wood) publishes a 19-page paper titled "Remarks on the disproof of the unit distance conjecture" that they describe as "a human-digested, somewhat simplified, and somewhat generalized version of the AI proof."
Lijie Chen (OpenAI) authors the primary proof; Mark Sellke and Mehtaab Sawhney handle verification. The trio spent months building scaffolding before the winning run.
Will Sawin, post-hoc, makes the lower bound explicit: δ ≥ 0.014114.

Hold those facts in your head. They contain everything you need to understand who did what.

The math, briefly

The unit distance problem asks: given n points in the plane, what is the maximum number u(n) of pairs separated by exactly one unit? Erdős posed it in 1946.

Three things were known going in:

Upper bound. Spencer, Szemerédi, and Trotter proved in 1984 that u(n) ≤ O(n^{4/3}). That ceiling has stood for forty years.
Lower bound. Erdős's own construction, a √n × √n square grid with appropriate scaling, achieves roughly n^{1+c/log log n} unit distances for some c > 0. Slightly superlinear, but only just.
The conjecture. Erdős believed u(n) was essentially n^{1+o(1)}. That is, the grid was close to optimal and no construction would beat it by a polynomial factor.

That conjecture is what was disproved. The OpenAI model produced a family of point sets with u(n) > n^{1.014}, a polynomial improvement over the grid, with Sawin's explicit value δ ≥ 0.014114.

The construction works through algebraic number theory rather than geometry. The idea, in outline: take a CM field K over a totally real field F. Project lattice vectors in K down to the plane. The squared length of the projection lands in F, so vectors with the same norm in F project to the same length in the plane. If you can construct a number field with many lattice vectors sharing a single norm, those vectors give you many pairs of points at the same distance: a unit distance configuration. Golod-Shafarevich class field towers supply an infinite sequence of such fields with controlled discriminants, which is what makes the family infinite and the improvement polynomial rather than asymptotic noise.

That is the result, mathematically: a new infinite family of constructions that beats the square grid by a polynomial factor, built from machinery that lives in algebraic number theory rather than combinatorial geometry.

The causal question the headline hides

When we read "AI solved an Erdős problem," the verb is doing causal work. It's asking us to attribute the result to a particular agent. Judea Pearl, in The Book of Why, would call this a rung-three claim: a counterfactual statement about what caused what.

The evidence we actually have is rung-one. These things co-occurred: a model output, months of scaffolding, a curated problem, nine mathematicians, a published proof. To upgrade from co-occurrence to causation we need to identify which components were necessary, which were sufficient, and which were incidental.

There are six plausible hypotheses for what produced the result. Each can be ruled in or out based on what the published "Remarks" paper says and what the mathematicians themselves have stated.

The verdict table

Hypothesis	Verdict	Key evidence
H1. The mathematical community was already close to a disproof	Partially in	Wood, in Section 11 of the Remarks paper: "if the level and type of human expertise that is represented on this note had been assembled to find a counterexample to this conjecture a month ago, and those people put in similar amounts of time working on it than they did to reading and thinking about Chat GPT's solution, the mathematicians would have found a counterexample." Bloom, in Section 4: "most of the human efforts spent on this problem have been on trying to prove the upper bound, rather than spending serious time on trying to disprove it." Sawin's near-instant extension. Active 2025 rigidity work on the unit distance problem (arXiv 2507.15679). The math was reachable; the bottleneck was direction of effort.
H2. Heavy selection on which problem to attempt	In	The Dream Team spent months building scaffolding calibrated toward this specific problem, with context weighting toward algebraic number theory. This wasn't a model thrown at a random open problem.
H3. Scaffolding did substantial work, not the model alone	Partially in	Critical commentary on the methodology describes an AI grading pipeline filtering large numbers of generated reasoning traces, with parallel-agent exploration of the prove/disprove fork. The "one-shot" claim refers to the winning trace inside a heavily curated pipeline.
H4. Disproofs are categorically easier than proofs for LLMs	In, strongly	The scaffolding "naturally rewarded branches where the AI was generating explicit algebraic constructions because those branches yielded tangible, calculable progress." Construction is search; proof is deduction. LLMs are stronger at the first. No comparable AI-generated positive theorem exists.
H5. Human absorption was upstream, not downstream	In	Bloom, in Section 4 of the Remarks paper: "while the original proof produced by AI was completely valid, it was significantly improved by the human researchers." The published paper is explicitly "a human-digested, somewhat simplified, and somewhat generalized version." Sawin found the AI's argument was "unnecessarily subtle" and replaced multiple primes with a single split prime.
H6. The construction is recombination of known techniques	Partially in	Wood, in Section 11 of the Remarks paper: "Chat GPT is in some sense 'familiar' with all the previous work," and yet failed to cite related literature appropriately. The underlying techniques (Ellenberg-Venkatesh, Golod-Shafarevich, Hajir-Maire-Ramakrishna) are not new. The application to this problem may be. Synthesis, not invention.

Each verdict is supported by direct evidence from the published paper or the mathematicians' own commentary. What the people involved have said about their own work, not hostile inference.

Reconstructing the pipeline

If we trace the actual sequence of events, here is what produced the disproof.

First, the scaffolding. Three mathematicians at OpenAI (Lijie Chen, Mark Sellke, Mehtaab Sawhney) spent several months building infrastructure around the internal model. This included prompting strategy, retrieval-augmented generation tuning weighted toward relevant mathematics, a verifier subsystem, and a grading pipeline that filtered model output. None of this appears in the headline. All of it was load-bearing.

Second, problem selection. The unit distance conjecture was not chosen at random. It has a particular shape that suits the pipeline: a counterexample is a constructive object, and the relevant techniques sit within the model's training distribution. Problem selection determines what kinds of results are possible. A pipeline tuned for algebraic constructions is not a pipeline that can prove general inequalities.

Third, the parallel fork. The scaffolding split the attack into two branches: try to prove the conjecture, try to disprove it. This is a structural choice humans rarely make on their own. Most mathematicians pick a direction based on their prior about which side is true. The parallel fork is what allowed the disproof direction to receive serious compute. This is the single most consequential design decision in the pipeline, and it was made by humans, not by the model.

Fourth, the run. The model executed for roughly 32 hours, producing a large volume of reasoning. The grading pipeline filtered the stream. One branch produced a valid construction. No external mathematician has seen the raw output, only an edited version of approximately 125 pages.

Fifth, the rewrite. Nine mathematicians took the 125-page edited document and produced a 19-page paper. Their own description of what they did: "a human-digested, somewhat simplified, and somewhat generalized version of the AI proof." Sawin identified that the AI's original argument used multiple primes where one split prime would suffice and replaced the construction with a cleaner version. This is mathematical work, not verification.

Sixth, the extension. Sawin further made the lower bound explicit. The original AI output gave an existence claim; the explicit constant δ ≥ 0.014114 came from human follow-up.

That is the pipeline. It includes a model as one component. The model is necessary. It is not sufficient.

The pharma analogy

The closest analogy is drug discovery, not software engineering or chess.

When a new drug is announced as Pfizer's treatment for some condition, the molecule gets the headline. But a molecule, on its own, cures no one. What delivers a clinical outcome is a pipeline: target identification, medicinal chemistry, screening, lead optimization, preclinical safety, Phase I dosing, Phase II efficacy, Phase III, regulatory submission, manufacturing scale-up. The molecule sits at the front of that pipeline. Everything downstream is what makes it real.

The OpenAI model is the molecule. The scaffolding, problem selection, grading pipeline, and nine-mathematician rewriting team are the trial machinery. A drug headline that named only the molecule would not be wrong, but it would mislead the reader about where the cost, the effort, and the institutional knowledge actually sit.

You don't congratulate the molecule. You congratulate the program.

The most interesting finding: the contribution was sociological, not cognitive

The verdict table contains a finding the headline cannot accommodate. Two of the nine mathematicians said something close to the same thing.

Bloom, in Section 4 of the Remarks paper, observed that "most of the human efforts spent on this problem have been on trying to prove the upper bound, rather than spending serious time on trying to disprove it." Erdős's conjectures rarely fail. The square-grid construction looked plausibly optimal. Effort allocation across the field was lopsided toward proof attempts.

Wood, in Section 11, went further. Two sentences worth quoting in full:

"I believe if the level and type of human expertise that is represented on this note had been assembled to find a counterexample to this conjecture a month ago, and those people put in similar amounts of time working on it than they did to reading and thinking about Chat GPT's solution, the mathematicians would have found a counterexample. However, without the claimed proof by Chat GPT, there is no particular reason anyone would have tried to look for a counterexample, assembled a group of experts with the appropriate expertise, or that the experts would have agreed to turn their attention to this problem."

That second sentence matters. The techniques were available. The community had the talent. What was missing was a reason to point that talent at disproof. The bottleneck was not mathematical capability. It was sociological: a field-wide allocation of effort that under-weighted one of two possible answers because of who originally proposed the conjecture.

It is tempting to read this as "the AI was neutral toward Erdős's reputation, and so it explored the disproof direction that humans avoided." That reading credits the model with an epistemic virtue: neutrality, freedom from authority, intellectual independence.

That reading is wrong. The model has no stance toward Erdős. The model has no stance toward anything. What removed the sociological filter was the scaffolding's structural choice to fork into both branches, a choice made by humans, embedded in the pipeline design.

There is a second sociological function in play, and Wood named it directly. The existence of an AI-claimed proof gave human mathematicians a reason to take the disproof direction seriously, not because the claim was trusted on faith but because verifying it produced a concrete artifact to engage with. The pipeline generated a prompt for attention, not just a result.

So the AI contribution is structural in two ways. The pipeline explored a direction the community had under-weighted (structural agnosticism). And the existence of an AI-claimed output gave humans a reason to allocate verification effort to it (structural catalysis). Both effects are about how mathematical attention gets allocated across open problems. Neither is about what the model can think.

The lesson is not "AI is capable of mathematical creativity." The lesson is "run pipelines that systematically attack the unfashionable side of open conjectures, because the existence of a claim is itself a reason for humans to look."

How to read the next announcement

The next "AI solves open math problem" announcement is already being prepared somewhere. Four questions to ask before sharing the headline:

What was the scaffolding? If the methodology involved months of custom infrastructure, the result is about the pipeline, not the model. Treat the pipeline as the unit of credit.
What was the denominator? How many problems were attempted before this one worked? Without the denominator, success on one problem tells you little about the base rate. We don't know OpenAI's denominator here.
Why was this problem selected? Was it chosen because it had a constructive answer reachable by techniques in the model's training distribution? Problem selection determines what kinds of results are possible.
What did humans rewrite from the raw output? If the published proof is described as "human-digested," "human-verified," or contains meaningful simplifications, the credit is shared. Substantive simplification (like Sawin's single-prime improvement) is mathematical work, not editing.

If an announcement does not answer these questions, treat the claim as associational rather than causal. Something useful happened. Crediting which component requires evidence the announcement has not provided.

What survives

The result is genuine. The unit distance conjecture, an eighty-year-old problem, is disproved. The construction is novel. The proof is valid. Nine first-rate mathematicians have signed off. None of this is in question.

What is in question is what the result is evidence for. The headline reads it as evidence that AI has crossed a threshold in mathematical capability. The evidence supports a narrower and more interesting claim: a particular kind of human-AI pipeline can produce real mathematical contributions on problems that fit its shape.

That's not a smaller story. It's a different story. The exciting question is what the pipeline will become. What its operators can target next, how much of the human labor can be automated, whether the model is the rate-limiting component or whether the scaffolding is.

My bet, today, is that the scaffolding is. If that's right, the next generation of results will come from teams that get better at building pipelines, not from labs that get better at building models. That is a much more interesting frontier than "AI does math now," and it credits the people who actually did the work.

Data and sources

Remarks on the disproof of the unit distance conjecture: arxiv.org/abs/2605.20695. The 19-page paper by Alon, Bloom, Gowers, Litt, Sawin, Shankar, Tsimerman, Wang, and Matchett Wood. Source of the Bloom (Section 4) and Wood (Section 11) commentary quotes and the authors' own description of their paper as "a human-digested, somewhat simplified, and somewhat generalized version of the AI proof."
An explicit lower bound for the unit distance problem (Sawin, 2026): arxiv.org/abs/2605.20579. Follow-up paper making the lower bound explicit at δ ≥ 0.014114.
OpenAI announcement page: openai.com/index/model-disproves-discrete-geometry-conjecture.
Gil Kalai, "Amazing: Erdős' Unit Distance Problem was Disproved! It was achieved by AI!": gilkalai.wordpress.com/2026/05/21/amazing-erdos-unit-distance-problem-was-disproved-it-was-achieved-by-ai. Community reception and context.
Nature news coverage, "AI cracks 80-year-old mathematics challenge — researchers are astonished": nature.com/articles/d41586-026-01651-0.
Erdős's unit distance problem and rigidity (July 2025): arxiv.org/abs/2507.15679. The rigidity-based approach referenced in H1.
Latent Space, "OpenAI GPT-next disproves 80 year old Erdős planar unit distance problem for under $1000": latent.space/p/ainews-openai-gpt-next-disproves. Cost, runtime, and output-size figures.
AIchats, "OpenAI disproves the unit distance conjecture": aichats.substack.com/p/openai-disproves-the-unit-distance. Critical methodology analysis cited in H3.
Judea Pearl, The Book of Why (2018): source of the rung-one / rung-three causal ladder used to frame the causal question in this post.