header image
Home arrow Software
Short Reads ToolKit
Written by Administrator   
Monday, 07 December 2009
Short Reads Toolkit is a set of programs developed to work with RNA-seq data (short reads).

The package is a collection of python and perl scripts released under the GPLv3 license and is downloadable HERE.

coverage.pl


SYNTAX:

  coverage.pl <file.vmf> 

The script calculates the length of short reads (R), the length of aligned reference sequence covered by short reads (L), the number of mapped reads (N) and, the the depth of coverage (D).

 D = (N x R) / L

gff2knowngene.pl


SYNTAX:

 gff2knowngene.pl <annotation.gff> 

The script parses the file <annotation.gff> and prints out a knownGene.txt file.

res2fa.pl


SYNTAX:

 res2fa.pl <solexaOutput.txt> 

The script parses the file <solexaOutput.txt> and print out a fasta file.

runMaq.pl


SYNTAX:

 runMaq.pl <reference.bfa> <solexaFile1.txt> [solexaFile2.txt, ...] 

SYNTAX (paired-end):

 runMaq.pl <reference.bfa> -p <solexaFile1.txt> <solexaFile2.txt> 

The script runs all command needed to identify candidate SNPs from RNA-seq data.

mapreadslocation.py


SYNTAX:

 mapreadslocation.py [-onlyMulti] <DataBase> <file.vmf> 

The scripts maps each short-read onto an exon, splice junction, intron/UTR, external exon or intergenic region. By default it works only with unique match short reads, use the <-onlyMulti> option to use the multiple match short reads. The <DataBase> file is a sqlite3 database created by the createDB.py script.

createDB.py


SYNTAX:

 createDB.py <DataBase> <knownGene.txt> <''/geneSymbols> <nearDistance[0/x]>  

Parameters:

 knownGene.txt: the knownGene.txt file  geneSymbols:   file containing the gene name conversion symbols (use '' if not exist)  nearDistance:  the range distance used by erange (short reads mapped in this range will be marked as NEARGENE)

mapSNPlocation.py


SYNTAX:

 mapSNPlocation.py <DataBase> <snpDB.sqlite> <outputMAQ.snp>  

The script maps each candidate SNP reported by the maq analysis onto an exon, splice junction, intron/UTR, external exon or intergenic region. The <DataBase> file is a sqlite3 database created by the createDB.py script. The <snpDB.sqlite> file is a sqlite3 database created by the createSNPdb.py script.

createSNPdb.py


SYNTAX:

 createSNPdb.py <snp.txt> <snpDB.sqlite> 


The <snp.txt> is downloadable from UCSC site.
More...

A patched version of Cistematic 2.5 containing the vitisvinifera.py script is available HERE.

The set of scripts getallsplice.tgz needed to create a dataset of all possible splice junctions is available HERE. You need a knownGene.txt file to run it.

The project is hosted also on googlecode at http://code.google.com/p/shortreadstoolkit/

 

Other scripts

filter_oligos.py

Last Updated ( Thursday, 20 January 2011 )