Overview
This tool
models genetic diversity by summarizing a large input dataset into an
epitome, a short sequence capturing many overlapping subsequences from
the dataset.
For example,
applying the tool to modeling the diversity of HIV, the epitome produces
relatively small vaccine immunogens covering a large number of immune
system targets known as epitopes. Our experiments have shown that the
epitome includes more epitopes than other vaccine designs of similar
length, including cocktails of consensus strains, phylogenetic tree
centers, and observed strains.
The
tool optimizes greedily, that is, it
iteratively increases the length of
the epitome by appending a patch (possibly with overlap) from the data
which maximally reduces the ratio of the sum of the patch weights of the
included patches to the length of the epitome. The process can be
stopped once the desired length is achieved (rather than when the entire
set of patches is included as in the superstring problem).
Input Format
The
tool accepts input in two formats.
First, a text table:
Patch |
Weight |
NKIVRMYSP |
167 |
LNKIVRMYS |
167 |
PQDLNTMLN |
166 |
QDLNTMLNT |
166 |
GATPQDLNT |
166 |
EGATPQDLN |
166 |
ATPQDLNTM |
165 |
TPQDLNTML |
165 |
Separate the
columns with space, tab, or comma. The headers are required.
The first
column contains patches for possible inclusion in the epitome. The
second column gives their relative weights.
This format
is easily created via a spreadsheet program such as Excel. Transfer data
to the tool either with cut (cntl-C) and paste (cntl-V) or by saving the
spreadsheet in text format and using the tool's "Upload File" button.
The tool also
accepts a second format: free text without weights. For example,
Twinkle,
twinkle, little star;
How I wonder what you are. |
Output Format
When "Show
Only Last" is unchecked, the tool shows the sequence of epitomes
created. This output is tab-delimited and suitable for cutting (cntl-A,cntl-C)
and pasting (cntl-V) into a spreadsheet such as Excel.
Method |
AminoAcidLength |
numComponents |
coverage |
Vaccine |
Greedy |
9 |
1 |
0.125753 |
LNKIVRMYS |
Greedy |
10 |
1 |
0.251506 |
LNKIVRMYSP |
Greedy |
17 |
1 |
0.376506 |
PQDLNTMLNKIVRMYSP |
Greedy |
18 |
1 |
0.500753 |
TPQDLNTMLNKIVRMYSP |
Greedy |
19 |
1 |
0.625 |
ATPQDLNTMLNKIVRMYSP |
Greedy |
20 |
1 |
0.75 |
GATPQDLNTMLNKIVRMYSP |
Greedy |
21 |
1 |
0.875 |
EGATPQDLNTMLNKIVRMYSP |
Greedy |
30 |
2 |
1 |
EGATPQDLNTMLNKIVRMYSP,QDLNTMLNT |
Method |
AminoAcidLength |
numComponents |
coverage |
Vaccine |
Greedy |
7 |
1 |
0.3 |
TWINKLE |
Greedy |
7 |
1 |
0.3 |
TWINKLE |
Greedy |
10 |
2 |
0.4 |
TWINKLE,HOW |
Greedy |
13 |
2 |
0.5 |
WHATWINKLE,HOW |
Greedy |
12 |
1 |
0.5 |
HOWHATWINKLE |
Greedy |
15 |
2 |
0.6 |
HOWHATWINKLE,YOU |
Greedy |
18 |
3 |
0.7 |
HOWHATWINKLE,YOU,ARE |
Greedy |
20 |
3 |
0.8 |
HOWHATWINKLE,YOU,STARE |
Greedy |
26 |
4 |
0.9 |
HOWHATWINKLE,YOU,STARE,LITTLE |
Greedy |
32 |
5 |
1 |
HOWHATWINKLE,YOU,STARE,LITTLE,WONDER |