ExonScan Web Server

(provided by the Burge Lab)

Use the following features (in addition to splice sites):

ESEs
ESSs
intronic GGG

DNA sequence containing internal exons:
(This can be in FASTA format; lines beginning with '>' delimit new sequences.)

Try some sample data.


How it works

ExonScan expects a primary transcript sequence, preferably excluding the first and last exon. It also currently needs at least about 20 bases upstream of the first exon it can locate, and at least about 60 bases downstream of the last. Numbers in the sequence are discarded, so you can cut and paste from, e.g., a GenBank file.

Splice sites are scored using a maximum entropy model described in [1]. Currently, ExonScan can only locate splice sites whose introns begin with GT (5') or end with AG (3').

ESEs are determined using RESCUE-ESE and assigned a score based on the log-odds of frequency in exons versus introns. RESCUE-ESE is described in [2].

ESSs are the FAS-hex3 set and assigned a score based on the log-odds of frequency in pseudoexons versus exons. FAS is described in [3].

Intronic GGGs are given a small constant score 100 to 40 bases upstream of a candidate exon, as well as 10 to 70 bases downstream.

A "candidate" exon has a total score which is the sum of the scores from the above features. A list of all candidate exons (which must be between 50 and 250 bases long) is sorted in score-decreasing order, with those below a given cutoff discarded. If a candidate falls within sixty bases of a higher-scoring candidate, it is discarded as well. What remains is the final list of "predicted" exons.

Training set

To test ExonScan's effectiveness and set appropriate cutoffs, we used a set of 1,820 full-length CDSs with no alternative splicing, and which were obtained from cDNA alignments. It is available here in the following formats:

Please send feedback to Mike Rolish (merolish at mit dot edu).

References

[1] Yeo, G. and Burge, C. B. (2004). Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comp. Biol. 11, 377-394.

[2] Fairbrother, W., Yeh, R.-F., Sharp, P. A. and Burge, C. B. (2002). Predictive identification of exonic splicing enhancers in human genes. Science 297, 1007-1013.

[3] Wang, Z., Rolish, M. E., Yeo, G., Tung, V., Mawson, M. and Burge, C. B. (2004). Systematic identification and analysis of exonic splicing silencers. Cell 119, 831-845.

Burge Lab home