Tutorial¶
Installation¶
Like many other Python packages and bioinformatics softwares, MAmotif can be obtained easily from PyPI or Bioconda.
Prerequisites¶
Python >= 3.6
MAnorm >= 1.3.0
motifscan >= 1.2.0
numpy >= 1.15
scipy >= 1.0
Install with pip¶
The latest release of MAmotif is available at PyPI, you can install via pip
:
$ pip install mamotif
Install with conda¶
You can also install MAmotif with conda through Bioconda channel:
$ conda install -c bioconda mamotif
Usage of MAmotif¶
To check whether MAmotif is properly installed, you can inspect the version of
MAmotif by the -v/--version
option:
$ mamotif --version
Configuration¶
Before running MAmotif, you need to configure the genome and motif data files for MotifScan:
Please refer to the QuickStart section of MotifScan for the details.
Run complete MAmotif workflow¶
MAmotif provide a console script mamotif
for running the program, the
mamotif run
sub-command is used to run complete MAmotif workflow
(MAnorm + MotifScan + Integration).
$ mamotif run --p1 sampleA_peaks.bed --p2 sampleB_peaks.bed --r1 sampleA_reads.bed --r2 sampleB_reads.bed -g <genome>
–m <motif_set> -o <output_dir>
Tip
The run
sub-command only provides basic MAnorm/MotifScan options.
If you want to control other advanced options (MAnorm normalization
options or MotifScan scanning options), please run them independently and
call MAmotif integration module with the mamotif integrate
sub-command.
Options¶
- -h, --help
Show help message and exit.
- --verbose
Enable verbose log output.
- --p1, --peak1
[Required] Peak file of sample A.
- --p2, --peak2
[Required] Peak file of sample B.
- --pf, --peak-format
Format of the peak files. Default: bed
- --r1, --read1
[Required] Read file of sample A.
- --r2, --read2
[Required] Read file of sample B.
- --rf, --read-format
Format of the read files. Default: bed
- --n1, --name1
Name of sample A.
- --n2, --name2
Name of sample B.
- --s1, --shiftsize1
Single-end reads shiftsize of sample A. Default: 100
- --s2, --shiftsize2
Single-end reads shiftsize of sample B. Default: 100
- --pe, --paired-end
Paired-end mode.
- -m
[Required] Motif set to scan for.
- -g
[Required] Genome name.
- -p
P value cutoff for motif scores. Default: 1e-4
- -t, --threads
Number of processes used to run in parallel.
- --mode
Which sample to perform MAmotif on {both,A,B}. Default: both
- --split
Split genomic regions into promoter/distal regions and run separately.
- --upstream
TSS upstream distance for promoters. Default: 4000
- --downstream
TSS downstream distance for promoters. Default: 2000
- --correction
Method for multiple testing correction {benjamin,bonferroni}. Default: benjamin
- -o, --output-dir
Directory to write output files.
Integrate MAnorm and MotifScan results¶
The mamotif integrate
sub-command is used when users have already got the
MAnorm and MotifScan results, and only run the final integration procedure.
Suppose you have the MAnorm result (sample A vs sample B), and the MotifScan results for both samples:
To find cell type-specific co-factors for sample A:
$ mamotif integrate -i A_MAvalues.xls -m A_motifscan/motif_sites_numbers.xls -o <path>
Convert M=log2(A/B) to -M=log2(B/A) and find co-factors for sample B:
$ mamotif integrate -i B_MAvalues.xls -m B_motifscan/motif_sites_numbers.xls -n -o <path>
Options¶
- -h, --help
Show help message and exit.
- --verbose
Enable verbose log output.
- -i
MAnorm result for sample A or B (A/B_MAvalues.xls).
- -m
MotifScan result for sample A or B (motif_sites_number.xls).
- -n, --negative
Convert M=log2(A/B) to -M=log2(B/A). Required when finding co-factors for sample B.
- -g
Genome name. Required if –split is enabled.
- --split
Split genomic regions into promoter/distal regions and run separately.
- --upstream
TSS upstream distance for promoters. Default: 4000
- --downstream
TSS downstream distance for promoters. Default: 2000
- --correction
Method for multiple testing correction {benjamin,bonferroni}. Default: benjamin
- -o, --output-dir
Directory to write output files.
MAmotif Output¶
After finished running MAmotif, all output files will be written to the directory you specified with “-o” argument.s
The MAmotif output table includes the following columns:
1. Motif Name
2. Target Number: Number of motif-present peaks
3. Average of Target M values: Average M-value of motif-present peaks
4. Std. of Target M values: M-value Std. of motif-present peaks
5. Non-target Number: Number of motif-absent peaks
6. Average of Non-target M-value: Average M-value of motif-absent peaks
7. Std. of Non-target M-value: M-value Std. of motif-absent peaks
8. T-test Statistics: T-Statistic for M-values of motif-present peaks against motif-absent peaks
9. T-test P-value: Right-tailed P-value of T-test
10. T-test P-value By Benjamin/Bonferrroni correction
11. RanSum-test Statistic
12. RankSum-test P-value
13. RankSum-test P-value By Benjamin/Bonferroni correction
14. Maximal P-value: Maximal corrected P-value of T-test and RankSum-test