ConDens: Kinase Substrate Predictor

Proteomes tend to involve tens of thousands of genes. Rather than overloading the program with this many input files, the ConDens softwares reads an alignment mapping file instead. This file is a 2-column tab-delimited file where the first column is the gene/protein name and the second column is the path to the sequence alignment file for that particular protein.

Example of a Mapping File

Gene FilePath
CDH1 alignments/alignment15.aln
CDH6 alignments/alignment7.aln
ORC2 alignments/alignment1292.aln
ORC6 alignments/alignment854.aln
...

The file paths are strictly expected to be relative paths with respect to the directory of where the mapping file is located. In other words, if the input file is located at /usr/john/msa_mapping.txt and CDH1's alignment is located in /usr/john/msa/alignment15.aln, then the file path should be msa/alignment15.aln (See Figure 1).

**Figure 1**: Structure of an alignment mapping file. In this example, the mapping file is located in the folder /usr/john/. To correctly annotate the location of CDH1's alignment, which is /usr/john/msa/alignment15.aln, its file path must be written as msa/alignment15.aln because that's the relative path of alignment15.aln with respect to /usr/john/, which is the folder that contains the mapping file.

Figure 1: Structure of an alignment mapping file. In this example, the mapping file is located in the folder /usr/john/. To correctly annotate the location of CDH1's alignment, which is /usr/john/msa/alignment15.aln, its file path must be written as msa/alignment15.aln because that's the relative path of alignment15.aln with respect to /usr/john/, which is the folder that contains the mapping file.

Comments:

Proteins that are not assigned a file path will not be analyzed by the softwares.
The first line of the input file is assumed to be header and will not be read, as a result .
The MSA files must be in FASTA format
Optionally, the user can indicate the species of each sequence in the FASTA file by putting a pipe character '|' behind the name, followed by the species name.
i.e. "CDH1|S. cerevisiae"