[Piggy] Xmaxentscan for Short Sequence Motifs

Burge Lab

Xmaxentscan is a general way of building distributions over short sequence motifs that takes into account non-neighboring dependencies.


How to use Xmaxentscan:

Each sequence must be the same length. Input sequences as a FastA file with one sequence per line (no linebreaks). Letters in the sequence other than ATCG will crash the program!

Example Test File "5" is the positive label and "0" is the negative label convention used in this program.
> 5
aagattg
> 0
cagaata
> 0
aagaaaa
...


Maximum Entropy Distribution for Short Sequence Motifs

Parameters

Input distribution parameters below:

sequence length (required)       The program currently does not handle lengths greater than 8. These are short motifs! Remember that all sequences (test and training) have to be the same length.
marginal order (default=1)       Refers to dependencies. Can go up to length-2.
marginal skip (default=0)       Works for marginal order equal to 2 only. Basically pair-wise dependencies.
maximum skip (default=0)       Works for marginal order equal to 2 only.

Distributions

Do you want the program to return Distributions? (Check if YES,default is NO)
Distributions are in lexicographic order. AAAAA, AAAAC, AAAAG ... TTTTT

Test Performance

Do you want to test the performance on test sequences? (Check if YES,default is NO).
Returns thresholds, True Positive rate, False Positive rate, Specificity, Approximate Correlation, Correlation Coefficient Values (Column Headings).
Enter TEST SEQUENCES filename:

Enter the RESULTS filename:
Enter the SCORE filename to output scored test sequences:


This program was developed by Gene Yeo geneyeo@mit.edu and Christopher Burgecburge@mit.edu.