As a junior at BYU, I was hired by a research group in the mathematics department headed by Dr. Mark Hughes, who specializes in low-dimensional topology, including knot theory. Knot theory involves more data than most areas of pure mathematics research—a theoretically infinite number of knots, thousands of which have been characterized and catalogued, each described by dozens of mathematical attributes (“invariants”)—so he created the research group to explore data-oriented approaches to the study of knots.
The question that came to occupy most of my time was this: could machine learning uncover unexpected interactions among knot features? In other words, could it identify large-scale patterns in knot invariants that, according to known theory, should not be there? If so, it would hint at latent mathematical structure that topologists could then investigate.
I used KnotInfo1, a database of all 12,965 prime knots with crossing number up to 13 and all their known invariants.2 (A prime knot is one that isn’t made by connecting two smaller prime knots; the crossing number is the minimum number of crossings in any diagram of the knot). Most invariants are known for lower-crossing knots, but as the number of crossings grows, so does the difficulty of calculating invariants; thus, not every knot in the database has a value for every invariant. But there’s enough to work with. I also used Matlab code Dr. Hughes had previously written to pseudo-randomly generate knots of arbitrary crossing number, so our dataset could include knots larger than crossing number 13.

Here are the details for the approach that proved successful (meaning that among a lot of predictable, boring results I found a single unpredictable, interesting one). I set out to see if I could accurately predict any of a knot’s invariants using only its Jones polynomial.
Data Validation
First I filtered out knots for which the Jones polynomial was not definitively known—in KnotInfo, where the lower bound did not equal the upper bound, or in the generated dataset, where the Jones calculation returned “failed.”
Encoding the Features
Example Jones polynomial:
Since Jones polynomials are strings, I had to parse them into a format machine learning models could accept. I wrote a custom parser to turn them into lists where the index corresponds to the exponent, in the order 0, 1, -1, 2, -2, …, and the value is the coefficient. The example above would have become [3, -3, -3, 2, 3, -1, -1]. Each list index becomes its own feature column.
Different knots have different Jones polynomial lengths, so I padded the end of each list, appending zeros until each was as long as the longest list in the dataset. Generally, filling missing values with zeros would be bad practice, but in this case it is an appropriate solution; each list element represents the coefficient of one term in a polynomial, so a list element being nonexistent is mathematically equivalent to it being zero (e.g., x^2 = x^2 + 0x + 0). This made the inputs consistent and ready to input into a neural net.
Model Setup
Since I wanted to predict as many invariants as possible using only the Jones polynomial, I made it so I could pass in the name of an invariant and a model would train using Jones as the input and the named invariant as the target. This was a baseline framework I would have expanded had I stayed with the research group longer; every invariant, some with various possible encodings, could have been either an input or an output of this generalized model pipeline, and I then I could have compared every 1:1 input:output combination, or which combinations of invariants were more predictive than any one alone.
I first used Jones to predict all the integer invariants, since those were the easiest to encode; you can either treat them as numbers or treat them as categories using a label encoder, and both are simple to set up. My model architecture was a simple neural network using the Keras API over TensorFlow, trained independently for each target invariant. These were my prediction accuracies on each test set:

From here, my ability to contribute was nil; I had no training in topology, so I didn’t know what any of these invariants were or whether evidence of their connection to the Jones polynomial was significant. I emailed the results to Dr. Hughes’s, who said the connection between the Jones polynomial and the Arf invariant was well known, but the result for Genus-4D was a little more suprising. Additionally, he later discovered a bug in his Matlab code that was causing it to produce invalid knots, and once it was fixed, the accuracy of 4D genus predictions jumped even higher, to 97.7%.
Unfortunately, I left the research group soon afterward, but Dr. Hughes told me it continued to be a significant research question for him for some time. A group of physicists in South Africa he worked with expressed interest in the connection with the 4D genus, and he sent my results on to them for further testing. Meanwhile, I got a new job, and I lost the trail of the research. But it was fun while it lasted!
Code