Quick Start¶
Install MicrocosM¶
Nextflow¶
Check Java availability
java -version
If missing, install Java with SDKMAN:
Install SDKMAN:
curl -s https://get.sdkman.io | bash
Open a new terminal.
Install Java:
sdk install java 17.0.10-tem
Confirm that Java is installed correctly:
java -version
Install Nextflow
Download Nextflow:
curl -s https://get.nextflow.io | bashMake Nextflow executable from everywhere:
# Make it executable chmod +x nextflow # Move it into an executable path, for example /usr/local/bin sudo mv nextflow /usr/local/bin/
Note
Any path that is included in your
PATHvariable will work. You can check yourPATHvariable by runningecho $PATHin the terminal. If you want to use your own preferred path, you can add it to yourPATHvariable by adding the commandexport PATH="/path/to/your/nextflow:$PATH"to your bash configuration file, such as~/.bashrcor~/.zshrc.Confirm that Nextflow is installed correctly:
nextflow info
Hint
Follow the Nextflow documentation for full instructions and troubleshooting.
Also if you’re new to Nextflow, we recommend going through the official tutorial for a solid understanding of the platform and its features.
Conda or Mamba¶
If Conda/Mamba is not yet available, install Mamba via Miniforge following their installation guide.
MicrocosM¶
Install through Nextflow portal:
nextflow pull tnmquann/metaflow
# Check pipeline info
nextflow info tnmquann/metaflow
Prepare taxonomy databases (minimal pre-built versions)¶
sourmash¶
Download pre-built database from ctbrown’s farm.
We recommend using the version built on GTDB-RS226.
YATCH¶
Download pre-built database available on Zenodo.
We recommend using the pretrained database on GTDB-RS226 (ANI threshold 0.995), which is available here.
Important
Choose pre-built databases for sourmash and YATCH that were built on the same GTDB or GenBank release.
Prepare samplesheet input¶
Create a samplesheet file (CSV format) containing paths to your FASTQ files and sample information.
The file must contain at least 5 comma-separated columns, with the following headers:
sample_id,group,short_reads_1,short_reads_2,long_reads
Run MicrocosM (on paired-end short reads)¶
Read-based workflow¶
nextflow run tnmquann/metaflow \
--input /path/to/your/samples.csv \
--input_format csv \
-profile conda \
--outdir /path/to/output/directory \
--sourmash_database /path/to/your/sourmash_database
--yacht_database /path/to/your/yacht_database.json \
--enable_readbase
Assembly-based workflow¶
nextflow run tnmquann/metaflow \
--input /path/to/your/samples.csv \
--input_format csv \
-profile conda \
--outdir /path/to/output/directory \
--sourmash_database /path/to/your/sourmash_database