First BioWIP Seminar of the Summer this Wednesday (May 25th)
Talk Title: "New Protein Homolog Detection Tool Shows Major Improvements in Accuracy and Speed"
Abstract: There are several hundred million protein sequences, but the relationships among them are not fully available from existing homolog detection. There is an essential need for an improved method to push homolog detection to lower levels of sequence identity. Our method relies on a language model to represent proteins numerically in a matrix (an embedding) and uses discrete cosine transforms to extract the most essential part of these embeddings, thereby significantly reducing the size of each representative matrices. These are used to efficiently compute the distances between pairs of protein sequences and result in homolog detection at significantly lower levels of identity than previously possible. Our Protein LAnguage model Search Tool (PLAST) is significantly faster with linear runtimes. Results are validated by the near-identity of the pair of homolog structures. The number of remote homologs that are detectable is significantly increased and pushes the effective sequence matches more deeply into the twilight zone. Tests on human protein sequences presently have no assigned functions to enable function assignments for 96% of these proteins. The PLAT web server is accessible at https://mesihk.github.io/plast.
The Bio-WIP Seminars (formerly BBMB WIP seminars) are sponsored by the BBMB Graduate Learning Community. All are welcome to join!