ConDens: Kinase Substrate Predictor

Regular expressions are required to be used in defining the residue specificity of a motif. However, the ConDens Predictor does not accept just any form of regular expressions due to technical reasons. By and large, the regular expressions should include only:

Alpha-numeric characters to denote amino acids. "X" is used to denote any amino acid residue
[] brackets to denote character classes
^ to denote negation of a character class
() brackets to encapsulate groups
{x,y} to denote the number of occurrences of a group or a character, where 0 <= x < y
| to denote "or" conditionals
A special ?<r> to label a capturing group of interest (i.e. post-translationally-modified residue).

Example Motifs

Motif	Regex	Description
Cdk	(?<r>[ST])P	Phosphorlyation motif of Cyclin-dependent kinases, where a S or T is phosphorlyated and the phosphorylated residue is followed by a P.
KEN box	KEN	Degradation motif that is a K-E-N tripeptide
IAP-binding	[^M]{0,1}AX[AP]X	A binding motif that can have 0 or 1 non-M residues followed by an A, then an amino acid residue, then A or P, and then another amino acid residue

**Figure 1**: Example of a regular expression that should not be used because it is decomposed into flattened expressions that overlap with each other.

Figure 1: Example of a regular expression that should not be used because it is decomposed into flattened expressions that overlap with each other.

Other than these restrictions, there is an additional constraint that must be accounted for when using the Condens Predictor software. Whenever {} and | are used in the regular expressions, then the program will decompose the regex into a set of flattened regular expressions for reasons relating to computational efficiency. Example - A{1,2}XXE will be decomposed into:

AXXE
AAXXE

The tricky part is that the flattened expressions must not intersect with each other for the ConDens model's calculations to be correct (and there is a very good computational reason for this limitation to be imposed). In this example, AXXE and AAXXE intersect because the latter is a subset of the former, which will cause the program to complain.

**Figure 2**: Example of a well-formed regular expression that can be used because it is decomposed into flattened expressions that do not overlap with each other.

Figure 2: Example of a well-formed regular expression that can be used because it is decomposed into flattened expressions that do not overlap with each other.

In order to use the original regular expression the proper way, it must be rephrased into AXXE|A[^A]X[^E]E. Since spotting intersections between decomposed expressions can be a great hassle, ConDens Predictor has a built-in tool to check for these issuesm, which is accessed through the "Analyze" button. However, it only detects problems and does not offer solutions.