Though proteins are basic blocks of life, mathematicians are only starting to formalize the fundamental concepts of structural biology. The key missing piece was the definition of a practical equivalence on (tertiary structures of) proteins embedded in 3-dimensional space. Since protein structures are determined in a rigid form, the strongest equivalence in practice is rigid motion or isometry also including reflections. We can consider a protein an ordered sequence of ordered alpha-carbons (protein backbone) or a cloud of unlabeled atomic centers, which allows us to compare any molecules under isometry.
The pairwise comparisons of all protein chains in the Protein Data Bank (PDB) by complete isometry invariants unexpectedly detected thousands of pairs that have identical coordinates of all alpha-carbon atoms (often all atoms as well). More than 325 billion pairwise comparisons were completed in less than two days on a modest desktop, implemented by Alexey Gorelov in our joint work.
Some pairs of chains differ in primary sequences of their amino acids, which seems physically impossible. We discussed the findings with the PDB validation team and several authors confirmed that corrections in the PDB are needed. Using more flexible isometry invariants for rigid clouds of unlabeled atomic centers, we produced a continuous map revealing hot spots in the whole PDB.