Member-only story

Bioinformatics

K-mers for genomic analyses

Why we need to know about k-mers.

Donald Le

--

Photo by Chris Stenger on Unsplash

K-mers is simply a sequence of string with k characters. For example this string:

AGCTTGACGTACT

If k-mers with k = 3, we have a list of string like

AGC,GCT,CTT,TTG,TGA,GAC,ACG,CGT,GTA,TAC,ACT (from left to right)

In bioinformatics, estimation of k-mer abundance histograms or just enumerat-ing the number of unique k-mers and the number of singletons are desirable in many genome sequence analysis applications. The applications include predicting genome sizes, data pre-processing for de Bruijn graph assembly methods (tune runtime parameters for analysis tools), repeat detection, sequenc-ing coverage estimation, measuring sequencing error rates, etc. Different methods for cardinality estima-tion in sequencing data have been developed in recent years.

An example use K-mers for genome comparison and analysis:

1.0
1.0
0.3333333333333333
0.0
0.5

--

--

No responses yet