ConDens: Kinase Substrate Predictor

The ConDens Predictor (or simply "ConDens") is a tool that implements the ConDens algorithm (see paper), which predicts functional conservation of short linear motifs. This program can be run with a graphical user interface using the command java -jar ConDens.jar (or by double clicking ConDens.jar in Windows screen. The command-line mode can also be used by typing java -jar ConDens.jar [insert input file] in shell. See instructions on running the program in command-line below.

There are 5 types of inputs that can be entered into the program:

Proteins
Multiple sequence alignemtns
Motifs
Output file path
Optional: A validation set that provides annotations on specific coordinates of the protein input

Protein input

**Figure 1**: Schematic of the ConDens program's user interace.

Figure 1: Schematic of the ConDens program's user interace.

Example of a Custom Protein Input File

Gene
CDH1
CDH6
ORC2
ORC6
...

Multiple Sequence Alignment Input

Structure of Alignment Input

Motifs

Ctrl

Shift

appropriate

Regular Expressions in ConDens

Output file path

Structure of Data Output

Validation Data Input

Example of a Validation File

Gene  Position  Label
CDH1  335  positive
CDC6  56  positive
CDC6  75  unknown
CDC6  189  negative
...

If any one coordinate in the validation file has a label of "positive", the protein is given a label of "positive"
Otherwise, if all motifs on the protein are mapped to coordinates with a "negative" label, the protein is given a label of "negative
For all other cases, the protein is given a label of "unknown"

Running in Command-Line

java -jar ConDens.jar [insert input file]

Example of a Command-Line Input XML File


<settings>


<proteins option="0" path="" />


<msa option="0" path="" />


<validation path="" />


<output path="output" />


<consensuses>


<consensus name="Cdk" regex="(?<r>[ST])P" />
<consensus name="Mec1" regex="(?<r>[ST])Q" />
<consensus name="Prk1" regex="[LIVM]XXXX(?<r>T)G" />
<consensus name="Ipl1" regex="[RK]X(?<r>[ST])[LIV]" />
<consensus name="PKA" regex="R[RK]X(?<r>S)" />
<consensus name="CKII" regex="(?<r>[ST])[DE]X[DE]" />
<consensus name="Ime2" regex="RPX(?<r>[ST])" />

</consensuses>

</settings>

Comments:

It is important to make sure the protein input matches the multiple sequence alignment input. If the former uses Uniprot names and the latter uses Ensembl names, then the program will skip through the entire list of proteins and output nothing. Likewise, proteins listed in the protein input will not be scanned if it is not mapped to an input alignment.
It is possible for users to generate their preset proteomes, alignments, and even their default settings (as in all the file paths and motifs are loaded automatically once the program starts running). Instructions on how to do these things can be found in Modifying Default Program Settings.