AI and Proving Theorems

Paul Erdős famously said that mathematicians are machines that turn coffee into theorems. A couple of recent articles in the Wall Street Journal revealed the current state-of-the-art for AI to do the same.

In July, Ben Cohen published the article The High-Schoolers Who Just Beat the World’s Smartest AI Models. The focus of the article was the 2025 International Mathematical Olympiad, the pinnacle of the calendar for high school mathematics competitions. In the United States, the pathway to the IMO is first excellng at a sequence of increasing difficult exams: the AMC->12 (or possibly AMC->10), then the American Invitational Mathematics Exam (AIME), and then USA Mathematical Olympiad (USAMO) or USA Junior Mathematical Olympiad (USJAMO). The top USAMO and USJAMO participants then get invited to a special camp from which the participants in that year’s IMO are selected.

My personal story: back in high school, my score on the AMC->12 (then called the AHSME) qualified me for the AIME my sophomore and junior years, where my run in the competition ended with a resounding thud. My senior year, I caught lightning in a bottle and somehow qualified for the USAMO; I’m not sure what the cut-off is these days, but back then only 150 or so high school students qualified for the USAMO each year. My excitement at qualifying for the USAMO gave way to utter humiliation after I actually attempted the exam (to say that I “took” the exam is probably a misuse of the work “took”.) All this to say: I never came close to sniffing the IMO. From the Wall Street Journal article:

The famously grueling IMO exam is held over two days and gives students three increasingly difficult problems a day and more than four hours to solve them. The questions span algebra, geometry, number theory and combinatorics—and you can forget about answering them if you’re not a math whiz. You’ll give your brain a workout just trying to understand them. 

Because those problems are both complex and unconventional, the annual math test has become a useful benchmark for measuring AI progress from one year to the next. In this age of rapid development, the leading research labs dreamed of a day their systems would be powerful enough to meet the standard for an IMO gold medal, which became the AI equivalent of a four-minute mile. 

But nobody knew when they would reach that milestone or if they ever would—until now. 

The unthinkable occurred earlier this month when an AI model from Google DeepMind earned a gold-medal score at IMO by perfectly solving five of the six problems. In another dramatic twist, OpenAI also claimed gold despite not participating in the official event. The companies described their feats as giant leaps toward the future—even if they’re not quite there yet. 

In fact, the most remarkable part of this memorable event is that 26 students got higher scores on the IMO exam than the AI systems. 

A second article by Ben Cohen, The Math Legend Who Just Left Academic — for an AI Startup Run by a 24-Year-Old, might be a precursor of things to come. One of the two starts of the article is number theorist Dr. Ken Ono. From the article:

In recent years, Ono began tracking AI’s remarkable progress as it rapidly improved. He was intrigued, though not intimidated. AI was astonishing at cognitive tasks and solving problems it had already seen, but it struggled with the creative elements of his field, which require intuition and abstract thinking.

That creativity is so fundamental to pure mathematics that Ono figured his job would be safe for decades.

But last spring, he was one of 30 mathematicians invited to curate research-level problems as a test of the AI models. He left the symposium profoundly shaken by what he’d seen.

“The lead I had on the models was shrinking,” he said. “And in areas of mathematics that were not in my wheelhouse, I felt like the models were already blowing me away.”

For months afterward, Ono felt like he was grieving his identity. He didn’t know what to do next, knowing that AI models would only get smarter.

“Then I had an epiphany,” he said. “I realized what the models were offering was a different way of doing math.”

Dr. Ono is now taking an extended leave from the University of Virginia to join a new AI startup company called Axiom. From Tech Funding News:

Led by Carina Hong, Axiom Math is developing an AI system that not only solves complex math problems but also generates new mathematical knowledge by proposing conjectures: mathematical statements that have yet to be proven.

The model produces rigorous, step-by-step proofs that can be independently verified using proof assistants such as Lean and Coq. This approach aims to transform English-language math from textbooks and research papers into code, enabling the AI to create and validate new problems that push the boundaries of existing knowledge…

Currently, Axiom is working on models that can discover and solve new math problems. The researchers also hope to apply their work in areas like finance, aircraft design, chip design, and quantitative trading.

Beyond pure mathematics, Axiom’s AI tool is being tested for practical applications in fields requiring rigorous computational precision, including finance, aircraft and chip design, and quantitative trading.

Time will tell if the intersection of AI with mathematics can generate a profitable company. What I don’t doubt is that the previously unthinkable — original mathematical work by AI — will eventually happen, given enough time.

Different Ways of Expressing Small Proportions

I recently read the article Pipe Dreams about treating wastewater. I’m not an engineer and make no claims of expertise about the accuracy of the article. What did catch my attention, as a mathematician, is how the author chose to express small proportions. For example, the opening sentence:

Wastewater is 99.9 percent water, but boy, that last little bit.

Later in the article:

Orange County’s is an example of indirect potable reuse, where wastewater is cleansed to 99.9999999999 percent free of pathogens before it goes to an environmental buffer like a reservoir or an aquifer for further natural filtering and then to homes. 

And later:

After treating the water to even higher standards—demonstrating a 99.999999999999999999 percent removal rate of viruses and similarly high removal rates of protozoa—they may send the cleansed water directly into the water distribution system.

I was struck about the psychology of communicating all those consecutive 9s when expressing these proportions. For example, if the proportion of impurities instead of the proportion of water was given, the previous sentences could be rewritten as:

Only one part per thousand of wastewater is impurities.

Orange County’s is an example of indirect potable reuse, where impurities are reduced to one part per trillion before it goes to an environmental buffer like a reservoir or an aquifer for further natural filtering and then to homes. 

After treating the water to even higher standards, reducing impurities to one part per 100 million trillion, they may send the cleansed water directly into the water distribution system.

Of these two different ways of expressing the same information, it seems to me that the author’s original prose is perhaps most psychologically comforting. “One part per trillion” seems a little abstract, as most people don’t have an intuitive notion of just how big a trillion is. The phrase “99.9999999999 percent,” on the other hand, seems at first reading to be ridiculously close to 100 percent (which, of course, it is).

Bryan Bros and Units of Measurement

For the last couple years, one of my favorite sources of entertainment has been the wonderful world of YouTube Golf. Intending no disrespect to any other content creators, my favorite channels are the ones by Grant Horvat, the Bryan Bros (not to be confused with the twin tennis duo), Peter Finch, Bryson DeChambeau (of course), and Golf Girl Games (all of them absolutely, positively should have been in the Internet Invitational… but that’s another story for another day).

In a recent Bryan Bros video, my two interests collided. To make a long story short, a golf simulator projected that a tee shot on a par-3 ended 8 feet, 12 inches from the cup.

Co-host Wesley Bryan, to his great credit, immediately saw the computer glitch — this is an unusual way of saying the tee shot ended 9 feet from the cup. Hilarity ensued as the golfers held a stream-of-consciousness debate on the merits of metric and Imperial units. The video is below: the fun begins at the 21:41 mark and ends around 25:30.

My Mathematical Magic Show: Part 5e

As discussed earlier in this blog, here’s one of my favorite mathematical magic tricks. The trick works best when my audience has access to a calculator (including the calculator on a phone). The patter:

Write down any five-digit number you want. Just make sure that the same digit repeated (not something like 88,888).

(pause)

Now scramble the digits of your number, and write down the new number. Just be sure that any repeated digits appear the same number of times. (For example, if your first number was 14,232, your second number could be 24,231 or 13,422.)

(pause)

Is everyone done? Now subtract the smaller of the two numbers from the bigger, and write down the difference. Use a calculator if you wish.

(pause)

Has everyone written down the difference. Good. Now, pick any nonzero digit in the difference, and scratch it out.

(pause)

(I point to someone.) Which numbers did you not scratch out?

The audience member will say something like, “8, 2, 9, and 6.” To which I’ll reply in three seconds or less, “The number you scratched out was a 2.”

Then I’ll turn to someone else and ask which numbers were not scratched out. She’ll say something like, “3, 2, 0, and 7.” I’ll answer, “You scratched out a 6.”

green line

As discussed in a previous post, the difference found by the audience member must be a multiple of 9. Since the sum of the digits of a multiple of 9 must also be a multiple of 9, the magician can quickly figure out the missing digit. In the previous example, 3+2+0+7=12. Since the next multiple of 9 after 12 is 18, the magician knows that the missing digit is 18-12 = 6.

To speed things up (and to reduce the possibility of a mental arithmetic mistake), the magician doesn’t actually have to add up all of the digits. If the audience member gives a digit of either 0 or 9, then the magician can ignore that digit for purposes of the trick. Likewise, if the magician notices that some subset of the given digits add up to 9, then those digits can be effectively ignored. In the current example, the magician could ignore the 0 and also the 2 and 7 (since 2+7=9). That leaves only the 3, and clearly one needs to add 6 to 3 to get the next multiple of 9.

I was a little curious about how often this happens — how often the magician can get away with these shortcuts to find the missing digits. So I did some programming in Mathematica. Here’s what I found. If the audience starts with a 5-digit number, so that the difference must be some multiple of 9 between 9 and 99,999:

  • There are 690 multiples (out of 11,111, or about 6%) that do not reduce at all (for example, 57,888). So the magician can expect to do the full addition about one-sixth of the time.
  • There are 5535 multiples (about 50%) whose digits can be divided into subsets that sum to 9. So, about half the time, the magician can expect to quickly find the missing digit without having to add past 9.

I’ve put on my mathematical wish-list some kind of theorem about this splitting of digits of multiples of 9s.

Polynomial Long Division and Megan Moroney

A brief clip from Megan Moroney’s video “I’m Not Pretty” correctly uses polynomial long division to establish that 2x+3 is a factor of 2x^4+5x^3+7x^2+16x+15. Even more amazingly, the fact that the remainder is 0 actually fits artistically with the video.

And while I have her music on my mind, I can’t resist sharing her masterpiece “Tennessee Orange” and its playful commentary on the passion of college football fans.