Quality Control

Quality Control Fastq File

It is necessary to understand, identify and exclude error-types that may impact the interpretation of downstream analysis. Sequence quality control is therefore an essential first step in your analysis. Catching errors early saves time later on.

Photo by Patrick Mueller on Unsplash

The FASTQ file format is the defacto file format for sequence reads generated from next-generation sequencing technologies. This file format evolved from FASTA in that it contains sequence data, but also contains quality information. Similar to FASTA, the FASTQ file begins with a header line. The difference is that the FASTQ header is denoted by a @ character. For a single record (sequence read) there are four lines, each of which are described below:

Check total number of words

The total number of words in fastq file should be divided by 4. We can check the total number of words by “wc -l” command in Unix/Linux.

wc -l mutant_R1.fastq
49920 mutant_R1.fastq

Quality Score

The quality score for each sequence is a string of characters, one for each base of the nucleic sequence, used to characterize the probability of mis-identification of each base. The score is encoded using the ASCII character table.

We can use fastqc tool to check the quality of our fastq file.

Tool for check quality of fastq file

Thanks for reading my post.

Happy analysis!!

Reference

A passionate automation engineer who strongly believes in “A man can do anything he wants if he puts in the work”.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store