Proteomes tend to involve tens of thousands of genes. Rather than overloading the program with this many input files, the ConDens softwares reads an alignment mapping file instead. This file is a 2-column tab-delimited file where the first column is the gene/protein name and the second column is the path to the sequence alignment file for that particular protein.
Example of a Mapping File
The file paths are strictly expected to be relative paths with respect to the directory of where the mapping file is located. In other words, if the input file is located at /usr/john/msa_mapping.txt and CDH1's alignment is located in/usr/john/msa/alignment15.aln , then the file path should be msa/alignment15.aln (See Figure 1).
Gene FilePath CDH1 alignments/alignment15.aln CDH6 alignments/alignment7.aln ORC2 alignments/alignment1292.aln ORC6 alignments/alignment854.aln ... |
The file paths are strictly expected to be relative paths with respect to the directory of where the mapping file is located. In other words, if the input file is located at /usr/john/msa_mapping.txt and CDH1's alignment is located in
- Proteins that are not assigned a file path will not be analyzed by the softwares.
- The first line of the input file is assumed to be header and will not be read, as a result .
- The MSA files must be in FASTA format
- Optionally, the user can indicate the species of each sequence in the FASTA file by putting a pipe character '|' behind the name, followed by the species name.
i.e. "CDH1|S. cerevisiae"