Bioinformatics

Parse GFF file with

gffutils

To install gffutils

pip install gffutils

Gffutils allow us to create sqlite db from gff file.

import gffutils
gffutils.create_db(filename, database_filename)

Then we can use the db for easily query data.

db = gffutils.FeatureDB(dbfn=database_filename)

For example, let’s say we need to work with gencode gff3 file look like:

##gff-version 3
#description: evidence-based annotation of the human genome (GRCh38), version 35 (Ensembl 101)
#provider: GENCODE
#contact: gencode-help@ebi.ac.uk
#format: gff3
#date: 2020-06-03
##sequence-region chr1 1 248956422
chr1 HAVANA gene 11869 14409 . + . ID=ENSG00000223972.5;gene_id=ENSG00000223972.5;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;level=2;hgnc_id=HGNC:37102;havana_gene=OTTHUMG00000000961.2
chr1 HAVANA transcript 11869 14409 . + . ID=ENST00000456328.2;Parent=ENSG00000223972.5;gene_id=ENSG00000223972.5;transcript_id=ENST00000456328.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=processed_transcript;transcript_name=DDX11L1-202;level=2;transcript_support_level=1;hgnc_id=HGNC:37102;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000362751.1

And we want to get “seqid”, “start”, “end”, “attributes” from the features type. Sample code for this is below.

Running this will show us the result

84127
LINC02455
chr12
752579
911452
WNK1
chr12
...

Thanks for reading my post.

~~PEACE~~

Written by

A passionate automation engineer who strongly believes in “A man can do anything he wants if he puts in the work”.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store