SNP effect annotates a set of human SNPs. Input data contains either genome reads or a list of variations in one of supported formats. Position numbering in such a list must correspond to GRCh37/hg19 genome assembly. Variations in output file can be filtered by user-defined criteria. The program evaluates single nucleotide polymorphisms (SNPs), multiple nucleotide polymorphisms (MNPs) and insertions and deletions (indels).
The following information is reported in output file for each SNP, if applicable:
Items 5-14 are repeated for each SNP effected gene.
This is an example of running SNP-effect from a command line:
python SNP-effect_pipeline.py SRR701474_1.filt.fastq,SRR701474_2.filt.fastq reads_paired SRR701474.txt
python SNP-effect_pipeline.py infile infile_format outfile
infile - input file name
infile_format - input file format
outfile - output file nameSupported input file formats:
All SNP file formats are tab-delimited, but any sequence of tabs or spaces is considered as a column separator. A reference nucleotide is taken from a reference genome sequence, so any valid DNA nucleotide (A, C, G or T) is permitted in an SNP description string. All positions must correspond to GRCh37/hg19 genome assembly, and base numbering starts from 1.
Input data examples:
1 10469 . C G 100 PASS . GT:AP 1|0:0.740,0.450 1 1477244 rs7290 T C 100 PASS . GT:AP 1|1:0.700,0.820 2 183699584 rs7775 G C 100 PASS . GT:AP 0|1:0.440,0.695
1 10469 10469 C G 1 1477244 1477244 T C comments: rs7290 2 183699584 183699584 G C
1 10469 C G 1 1477244 T C 2 183699584 G C
. 1 10469 CG rs7290 1 1477244 CT rs7775 2 183699584 CCExample of output file:
#VARIATION chr1 10469 C --> G INTERGENIC Name: between FR137075 and uc010nxq.1 #VARIATION chr1 1477244 T --> C dbSNP ID: rs7290 #INTERSECTED GENE Name: uc001agd.3 Strand: - Region: 1477053..1510262 CDS: 1477446..1509937 Exons: 1477053..1477547 1479249..1479367 1480243..1480382 1500153..1500296 1509858..1510262 Description: Homo sapiens SSU72 RNA polymerase II CTD phosphatase homolog (S. cerevisiae) (SSU72), mRNA. Type of gene: Protein coding Clinical significance: unknown Variation location: Out of CDS, 3' UTR. #VARIATION chr2 183699584 G --> C FREQ: 0.669682 dbSNP ID: rs7775 OMIM link: http://omim.org/entry/605083#0001#INTERSECTED GENE Name: uc002upa.2 Strand: - Region: 183698005..183731498 CDS: 183699576..183731280 Exons: 183698005..183699692 183702676..183702739 183703137..183703341 183707206..183707271 183723514..183723561 183730803..183731498 Description: Homo sapiens frizzled-related protein (FRZB), mRNA. Type of gene: Protein coding Clinical significance: osteoarthritis; colorectal cancer; Defects in FRZB are associated with susceptibility to osteoarthritis type 1(OS1); Variation location: Exon 6 Position in protein: 324 Protein length: 325 Codon: CGC => GGC Translation: R => G Tolerance Score: 0.01 (damaging)
Comfiguration file name is snp.ini.
Path to files with data are set in snp.ini in [global] section. Filtering parameters are set in [sift_score_filter] and [add_filters] sections.
enabled=0 turns on SIFT score filter.
show_only_scored=yes means ignoring variations without SIFT scores.
If compare_less=0, SNPs with scores more than compare_level are printed in output file, compare_less=1 outputs SNPs with scores less than compare_level. Small values of SIFT score correspond to variations that are not dmaged. Usual cutoff level (compare_level) is 0.05.
With coding_region only SNPs in coding region of genes are printed in output file.
[global] base_dir = ./data pssm_dir = pssm snp_dir = dbSNP omim_dir = omim seq_dir = ref_fa annot_fname = knownGene.fg2 gaps_fname =gaps.kg2 repeats_fname =norepeats ;hg19_rmsk.txt info_dir = annotations info_fname = knownGene.ann8.chr chr_list = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y ;comma separated snpedia_fname=snpedia/snpedia.txt gad_fname=gad/GAD.txt freq_dir = frequencies tpl_dir = ../templates readsMap_dir = ./readsMap [sift_score_filter] enabled=1 show_only_scored=yes compare_less=0 compare_level=0.05 [add_filters] coding_region=1 omim=0 snpedia=0 snpedia=0
Copyright © Softberry, Inc., 2014-2016