Quick Start¶

Install MicrocosM¶

Nextflow¶

Check Java availability

java -version

If missing, install Java with SDKMAN:

Install SDKMAN:
curl -s https://get.sdkman.io | bash
Open a new terminal.

Install Java:
sdk install java 17.0.10-tem
Confirm that Java is installed correctly:
java -version

Install Nextflow
1. Download Nextflow:
```
curl -s https://get.nextflow.io | bash
```
1. Make Nextflow executable from everywhere:
```
# Make it executable
chmod +x nextflow

# Move it into an executable path, for example /usr/local/bin
sudo mv nextflow /usr/local/bin/
```
Note

Any path that is included in your PATH variable will work. You can check your PATH variable by running echo $PATH in the terminal. If you want to use your own preferred path, you can add it to your PATH variable by adding the command export PATH="/path/to/your/nextflow:$PATH" to your bash configuration file, such as ~/.bashrc or ~/.zshrc.
1. Confirm that Nextflow is installed correctly:
```
nextflow info
```

Hint

Follow the Nextflow documentation for full instructions and troubleshooting.

Also if you’re new to Nextflow, we recommend going through the official tutorial for a solid understanding of the platform and its features.

Conda or Mamba¶

If Conda/Mamba is not yet available, install Mamba via Miniforge following their installation guide.

MicrocosM¶

Install through Nextflow portal:

nextflow pull tnmquann/metaflow

# Check pipeline info
nextflow info tnmquann/metaflow

Prepare taxonomy databases (minimal pre-built versions)¶

sourmash¶

Download pre-built database from ctbrown’s farm.

We recommend using the version built on GTDB-RS226.

YATCH¶

Download pre-built database available on Zenodo.

We recommend using the pretrained database on GTDB-RS226 (ANI threshold 0.995), which is available here.

Important

Choose pre-built databases for sourmash and YATCH that were built on the same GTDB or GenBank release.

Prepare samplesheet input¶

Create a samplesheet file (CSV format) containing paths to your FASTQ files and sample information.

The file must contain at least 5 comma-separated columns, with the following headers:

sample_id,group,short_reads_1,short_reads_2,long_reads

Run MicrocosM (on paired-end short reads)¶

Read-based workflow¶

nextflow run tnmquann/metaflow \
   --input /path/to/your/samples.csv \
   --input_format csv \
   -profile conda \
   --outdir /path/to/output/directory \
   --sourmash_database /path/to/your/sourmash_database
   --yacht_database /path/to/your/yacht_database.json \
   --enable_readbase

Assembly-based workflow¶

nextflow run tnmquann/metaflow \
   --input /path/to/your/samples.csv \
   --input_format csv \
   -profile conda \
   --outdir /path/to/output/directory \
   --sourmash_database /path/to/your/sourmash_database