############################################################# # # MetaBakery: Customized BioBakery wmgx workflow # http://metabakery.fe.uni-lj.si # # Quick start guide; details are given in the users' manual # and in the config template: # # http://metabakery.fe.uni-lj.si/metabakery_manual.pdf # http://metabakery.fe.uni-lj.si/config_template.txt # ############################################################# PREREQUISITES (need to be dealt with only once) 1. Singularity container and squashfs file system need to be installed on a computer to run MetaBakery Installation through a repository of the actual Linux distribution is recommended; other installation methods (like Conda) generally do NOT work, since Singularity is tightly coupled with the underlying operating system. Recently, Ubuntu Linux does not provide Singularity through its own repository, but installation is still possible (please , see the MetaBakery users' manual). 2. Download MetaBakery's edition of your choice: edition 4, size 72 GB: https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_2b_edit4.sif edition 3, size 67 GB: https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_2b_edit3.sif edition 2, size 30 GB: https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_2b_edit2.sif ------------------------------------------------------------------------ NOTE: if large files make downloading difficult, please take a look at: http://metabakery.fe.uni-lj.si/download_smaller_pieces.txt ------------------------------------------------------------------------ SPECIFIC TO AN ACTUAL ANALYSIS 3. Collect an arbitrary number of paired (R1 & R2) and/or unpaired (R1) fastq reads into a subdirectory of your choice under your home directory. PLEASE NOTE: input files must be stored under your home directory in order to be accessible by MetaBakery. Other possibilities exist, but they depend on the actual configuration of the Singularity system. Instead of your own input files, the following files are also suitable for the task: https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_A_R1.fq.gz https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_A_R2.fq.gz https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_B_R1.fq.gz https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_B_R2.fq.gz https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_C_R1.fq.gz https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_C_R2.fq.gz https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_D_R1.fq.gz https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_D_R2.fq.gz https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_E_R1.fq.gz https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_E_R2.fq.gz https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_F_R1.fq.gz https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_F_R2.fq.gz https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_G_R1.fq.gz https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_G_R2.fq.gz https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_H_R1.fq.gz https://vision.fe.uni-lj.si/~bostjanm/raid/mbakery/mbakery_test_H_R2.fq.gz Procesing of these inputs takes a while (see below). It is possible to get the first impression about MetaBakery by analyzing only a subset of the above inputs. In fact, any single fastq pair suffices. However, with reduced input set some processing steps (Strainphlan, MelonnPan and Mothur calculators) may not deliver results. Nonetheless, the overall working of MetaBakery can still be demonstrated. NOTE: with this dataset the following warning may result (depending on MetaBakery edition) due to an insufficient viral content of the set. This is not na error. Failed Mothur summary.single for k__Viruses ... 4. Start MetaBakery in a terminal by entering the following appropriately adopted command. singularity run /abs/path/mbakery_2b_[edition].sif /home/your_login/path_to/directory_with_fastq 5. MetaBakery should start processing and eventually deliver results in a subdirectory of a directory with input fastq files. Execution time for the entire above fastq dataset is about three to four hours on a computer with 64 CPUs and prolong on a less powerful hardware (but execution times do not drop linearly with the number of processors, since some steps are poorly parallelized). Execution time may be shortened by reducing the input set as hinted above in section 3. Also, it is much more pleasant to see your own fastq inputs in action instead of the demo ones. Please note that a rich enough inputs are needed to obtain results from Strainphlan, MelonnPan and Mothur calculators.