Try it out now!¶
diffTF runs on Linux and macOS and is even independent on the operating system if combined with Singularity
. The following quick start briefly summarizes the necessary steps to install and use it.
Principally, there are two ways of installing diffTF and the proper tools:
1a. The “easy” way: Using Singularity
and our preconfigured diffTF containers that contain all necessary tools, R, and R libraries
You only need to install Snakemake (see below for details) and
Singularity
. Snakemake supports Singularity in Versions >=2.4. You can check whether you already haveSingularity
installed by simply typingsingularity --versionSnakemake requires at least version 2.4. If your version is below, please update to the latest
Singularity
version.Note
Make to read the section Adaptations and notes when running with Singularity properly!
1b. The “more complicated” way: Install the necessary tools (Snakemake, samtools, bedtools, Subread, and R along with various packages).
Note
Note that all tools require Python 3.
We recommend installing all tools except R via conda, in which case the installation then becomes as easy as
conda config --add channels defaults conda config --add channels conda-forge conda config --add channels bioconda conda install snakemake bedtools samtools subreadIf conda is not yet installed, follow the installation instructions. Installation is quick and easy. Make sure to open a new terminal after installation, so that conda is available.
Note
You do not need to uninstall other Python installations or packages in order to use conda. Even if you already have a system Python, another Python installation from a source such as the macOS Homebrew package manager and globally installed packages from pip such as pandas and NumPy, you do not need to uninstall, remove, or change any of them before using conda.
If you want to install the tools manually and outside of the conda framework, see the following instructions for each of the tools: snakemake, samtools, bedtools, Subread.
In addition, R is needed along with various packages (see below for details).
Clone the Git repository:
git clone https://git.embl.de/grp-zaugg/diffTF.git
If you receive an error, Git may not be installed on your system. If you run Ubuntu, try the following command:
sudo apt-get install git
For macOS, there are multiple ways of installing it. If you already have Homebrew (http://brew.sh) installed, simply type:
brew install git
Otherwise, consult the internet on how to best install Git for your system.
To run diffTF with an example ATAC-Seq / RNA-seq dataset for 50 TF, simply perform the following steps (see section Example dataset for dataset details):
Change into the
example/input
directory within the Git repositorycd diffTF/example/input
Download the data via the download script
sh downloadAllData.shTo test if the setup is correct, start a dryrun via the first helper script
sh startAnalysisDryRun.shOnce the dryrun is successful, start the analysis via the second helper script.
sh startAnalysis.shIf you want to include
Singularity
(which we strongly recommend), simply edit the file and add the--use-singularity
and--singularity-args
command line arguments in addition to the other arguments (see the Snakemake documentation and the section Adaptations and notes when running with Singularity for more details).Thus, the command you execute should look like this:
snakemake --snakefile ../../src/Snakefile --cores 2 --configfile config.json \ --use-singularity --singularity-args "--bind /your/diffTF/path"Read in section Adaptations and notes when running with Singularity about the
--bind
option and what/your/diffTF/path
means here , it is actually very easy!You can also run the example analysis with all TF instead of only 50. For this, simply modify the
TF
parameter and set it to the special wordall
that tells diffTF to use all recognized TFs instead of a specific list only (see section TFs for details).
- To run your own analysis, modify the files
config.json
andsampleData.tsv
. See the instructions in the section Run your own analysis for more details. - If your analysis finished successfully, take a look into the
FINAL_OUTPUT
folder within your specified output directory, which contains the summary tables and visualization of your analysis. If you received an error, take a look in Section Handling errors to troubleshoot.
Prerequisites for the “easy” way¶
The only prerequisite here is that Snakemake and Singularity
must be installed on the system you want to run diffTF. See above for details with respect to the supported versions etc. For details how to install Snakemake, see below.
Prerequisites for the “manual” way¶
Note that most of this section is only relevant if you use Snakemake without Singularity
. This section lists the required software and how to install them. As outlined in Section Try it out now!, the easiest way is to install all of them via conda
. However, it is of course also possible to install the tools separately.
Snakemake¶
Please ensure that you have at least version 5.3 installed. Principally, there are multiple ways to install Snakemake. We recommend installing it, along with all the other required software, via conda.
samtools, bedtools, Subread¶
In addition, samtools, bedtools and Subread are needed to run diffTF. We recommend installing them, along with all the other required software, via conda.
R and R packages¶
A working R
installation is needed and a number of packages from either CRAN or Bioconductor have to be installed. Type the following in R
to install them:
install.packages(c("checkmate", "futile.logger", "tidyverse", "reshape2", "RColorBrewer", "ggrepel", "lsr", "modeest", "boot", "grDevices", "pheatmap", "matrixStats", "locfdr"))
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("limma", "vsn", "csaw", "DESeq2", "DiffBind", "geneplotter", "Rsamtools", "preprocessCore", "apeglm"))
Run your own analysis¶
Running your own analysis is almost as easy as running the example analysis (see section Example dataset). Carefully read and follow the following steps and notes:
- Copy the files
config.json
andstartAnalysis.sh
to a directory of your choice. - Modify the file
config.json
accordingly. For example, we strongly recommend running the analysis for all TF instead of just 50 as for the example analysis. For this, simply change the parameter “TFs” to “all”. See Section General configuration file for details about the meaning of the parameters. Do not delete or rename any parameters or sections. - Create a tab-separated file that defines the input data, in analogy to the file
sampleData.tsv
from the example analysis, and refer to that in the fileconfig.json
(parametersummaryFile
) - Adapt the file
startAnalysis.sh
if necessary (the exact command line call to Snakemake and the various Snakemake-related parameters). If you run with Singularity, see the section below for modifications. - Since running the pipeline is often computationally demanding, read Section Executing diffTF - Running times and memory requirements and decide on which machine to run the pipeline. In most cases, we recommend running diffTF in a cluster environment (see Section Running diffTF in a cluster environment for details). The pipeline is written in Snakemake, and we strongly suggest to also read Section Running diffTF to get a basic understanding of how the pipeline works.
Adaptations and notes when running with Singularity¶
WithSingularity
, each rule will be executed in pre-configured isolated containers that contain all necessary tools. To enable it, you only have to add the following arguments when you execute Snakemake:
--use-singularity
: Just type it like this!--singularity-args
: You need to make all directories that contain files that are referenced in the diffTF configuration file available within the container also. By default, only the directory and subdirectories from which you start the analysis are automatically mounted inside the container. Since the diffTF source code is outside theinput
folder for the example analysis, however, at least the root directory of the Git repository has to be mounted. This is actually quite simple! Just use--singularity-args "--bind /your/diffTF/path"
and replace/your/diffTF/path
with the root path in which you cloned the diffTF Git repository (the one that has the subfoldersexample
,src
etc.). If you reference additional files, simply add one or multiple directories to the bind path (use the comma to separate them). For example, if you reference the files/g/group1/user1/mm10.fa
and/g/group2/user1/files/bla.txt
in the configuration file file, you may add/g/group1/user1,/g/group2/user1/files
or even just/g
to the bind path (as all files you reference are within/g
).
Note
We note again that within a Singularity container, you cannot access paths outside of the directory from where you started executing Snakemake. If you receive errors in the
checkParameterValidity
rule that a directory does not exist even though you can cd into it, you most likely forgot to include the path this folder or a parent path as part of thebind
option.
--singularity-prefix /your/directory
(optional): You do not have to, but you may want to add the--singularity-prefix
argument to store allSingularity
containers in a central place (here:/your/directory
) instead of the local.snakemake
directory. If you intend to run multiple diffTF analyses in different folders, you can save space and time because the containers won’t have to be downloaded each time and stored in multiple locations.
Please read the following additional notes and warnings related to Singularity
:
Warning
If you use
Singularity
version 3, make sure you have at least version 3.0.3 installed, as there was an issue with Snakemake and particularSingularity
versions. For more details, see here.