Bioinformatics

K-mers for genomic analyses

Why we need to know about k-mers.

Image for post
Image for post
Photo by Chris Stenger on Unsplash

K-mers is simply a sequence of string with k characters. For example this string:

If k-mers with k = 3, we have a list of string like

In bioinformatics, estimation of k-mer abundance histograms or just enumerat-ing the number of unique k-mers and the number of singletons are desirable in many genome sequence analysis applications. The applications include predicting genome sizes, data pre-processing for de Bruijn graph assembly methods (tune runtime parameters for analysis tools), repeat detection, sequenc-ing coverage estimation, measuring sequencing error rates, etc. Different methods for cardinality estima-tion in sequencing data have been developed in recent years.

An example use K-mers for genome comparison and analysis:

Hope it helps~~~

Thanks for reading my post.

PEACE!!!

Reference

https://hub.gke2.mybinder.org/user/dib-lab-sourmash-hbh66r0t/notebooks/doc/kmers-and-minhash.ipynb

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store