Pre-required running environment
- 1. Linux operational platform;
- 2. Python 2.7 or higher version;
- 3. R 2.7 or higher version;
- 4. d2Tools;
- 5. Molecular Evolutionary Genetics Analysis(MEGA5.0 or higher version)
• Processing procedure
- Step 1: Download this pipeline relevant source code to your workspace directory.
- Step 2: Counting the 1-10 length sequence tuple for each sample by scanning each short reads with d2Tools. Command as follows:
Python ./TupleCount10bp.py -l samplelist.txt -k $i –t TuplecountFiles
1. –l: List file of sample names. Write the file names of sequencing data to a text file(sample_list.txt) with the following format:
2. –t: Store path for output files of k-tuple frequency.The path can be complete absolute format(like“/home/Meta/TestData/TupleCountFiles”) or relative path(like“./TupleCount_Files/”).
3. –k: the length of k-tuple(1-10 for short k-tuple)
Note: All generated tuple-count files from 1-10 should be linked together to one file for each sample.
Output file format: The format of output k-tuple frequency files is as follow:
Cannot resolve image macro, invalid image name or id.
- Step 3: Produce the Probability of ith k-tuple under the Variable length Markov model. Command as follows:
python ./VLMC327proliulincontext.py –i sampleXtuplecount.txt -t TuplecountFiles -K 120.0 -p MarkovProbability_Files
1. –i: File name of the generated tuple-count file for each sample.
2. –t: Store path for output files of k-tuple frequency.The path can be complete absolute format(like“/home/Meta/TestData/TupleCountFiles”) or relative path(like“./TupleCount_Files/”).
3. –K: The threshold value for pruning.
4. –p: Store path for probability files of VLMC model.The path can be complete absolute format(like“/home/Meta/TestData/TupleCountFiles”) or relative path(like“./TupleCount_Files/”).
Output file format: The format of output probability files is as follow:
- Step 4: Produce the dissimilarity measurement matrix for the input sequencing datasets with d2Tool. Command as follows:
Python ./calculatedissimiliraty.py -l samplelist.txt -k $i -d d2 -m MarkovProbabilityFiles -o DissmilarityMatrixFiles/d2/output_d2
1. –l: List file of sample names. Write the file names of sequencing data to a text file(sample_list.txt) with the following format:
2. –k: The length of k-tuple(1-10 for short k-tuple)
3. –d: The options for dissimilarity measurement : d2、Eu、Ma、Ch、d2S、d2Star;
4. –m: The directory of the probability files produced by MarkovProbabilityZeroToThree.py.
5. –o: The directory to keep the produced dissimilarity matrix between input datasets pairs.
Output file format: The format of out put dissimilarity files is as follow: