Member-only story

Bioinformatics

Parse GFF file with

gffutils

Donald Le

--

To install gffutils

pip install gffutils

Gffutils allow us to create sqlite db from gff file.

import gffutils
gffutils.create_db(filename, database_filename)

Then we can use the db for easily query data.

db = gffutils.FeatureDB(dbfn=database_filename)

For example, let’s say we need to work with gencode gff3 file look like:

##gff-version 3
#description: evidence-based annotation of the human genome (GRCh38), version 35 (Ensembl 101)
#provider: GENCODE
#contact: gencode-help@ebi.ac.uk
#format: gff3
#date: 2020-06-03
##sequence-region chr1 1 248956422
chr1 HAVANA gene 11869 14409 . + . ID=ENSG00000223972.5;gene_id=ENSG00000223972.5;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;level=2;hgnc_id=HGNC:37102;havana_gene=OTTHUMG00000000961.2
chr1 HAVANA transcript 11869 14409 . + . ID=ENST00000456328.2;Parent=ENSG00000223972.5;gene_id=ENSG00000223972.5;transcript_id=ENST00000456328.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=processed_transcript;transcript_name=DDX11L1-202;level=2;transcript_support_level=1;hgnc_id=HGNC:37102;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000362751.1

And we want to get “seqid”, “start”, “end”, “attributes” from the features type. Sample code for this is below.

Running this will show us the result

84127
LINC02455
chr12
752579
911452
WNK1
chr12
...

Thanks for reading my post.

~~PEACE~~

--

--

No responses yet