K-mers for genomic analyses
K-mers is simply a sequence of string with k characters. For example this string:
If k-mers with k = 3, we have a list of string like
AGC,GCT,CTT,TTG,TGA,GAC,ACG,CGT,GTA,TAC,ACT (from left to right)
In bioinformatics, estimation of k-mer abundance histograms or just enumerat-ing the number of unique k-mers and the number of singletons are desirable in many genome sequence analysis applications. The applications include predicting genome sizes, data pre-processing for de Bruijn graph assembly methods (tune runtime parameters for analysis tools), repeat detection, sequenc-ing coverage estimation, measuring sequencing error rates, etc. Different methods for cardinality estima-tion in sequencing data have been developed in recent years.
An example use K-mers for genome comparison and analysis:
Hope it helps~~~
Thanks for reading my post.
k-mers provide sensitive and specific methods for comparing and analyzing genomes. This notebook provides pure Python…
In bioinformatics, k-mers are subsequences of length contained within a biological sequence. Primarily used within the…