Quality Control Fastq File
It is necessary to understand, identify and exclude error-types that may impact the interpretation of downstream analysis. Sequence quality control is therefore an essential first step in your analysis. Catching errors early saves time later on.
The FASTQ file format is the defacto file format for sequence reads generated from next-generation sequencing technologies. This file format evolved from FASTA in that it contains sequence data, but also contains quality information. Similar to FASTA, the FASTQ file begins with a header line. The difference is that the FASTQ header is denoted by a
@ character. For a single record (sequence read) there are four lines, each of which are described below:
Check total number of words
The total number of words in fastq file should be divided by 4. We can check the total number of words by “wc -l” command in Unix/Linux.
wc -l mutant_R1.fastq
The quality score for each sequence is a string of characters, one for each base of the nucleic sequence, used to characterize the probability of mis-identification of each base. The score is encoded using the ASCII character table.
We can use fastqc tool to check the quality of our fastq file.
Thanks for reading my post.
Quality control using FASTQC
Evaluate the quality of your NGS data using FastQC Create and run a job submission script to automate quality…