Quick Start

Install MicrocosM

Nextflow

  1. Check Java availability

java -version

If missing, install Java with SDKMAN:

  1. Install SDKMAN:

curl -s https://get.sdkman.io | bash
  1. Open a new terminal.

  2. Install Java:

sdk install java 17.0.10-tem
  1. Confirm that Java is installed correctly:

java -version
  1. Install Nextflow

    1. Download Nextflow:

    curl -s https://get.nextflow.io | bash
    
    1. Make Nextflow executable from everywhere:

    # Make it executable
    chmod +x nextflow
    
    # Move it into an executable path, for example /usr/local/bin
    sudo mv nextflow /usr/local/bin/
    

    Note

    Any path that is included in your PATH variable will work. You can check your PATH variable by running echo $PATH in the terminal. If you want to use your own preferred path, you can add it to your PATH variable by adding the command export PATH="/path/to/your/nextflow:$PATH" to your bash configuration file, such as ~/.bashrc or ~/.zshrc.

    1. Confirm that Nextflow is installed correctly:

    nextflow info
    

Hint

Follow the Nextflow documentation for full instructions and troubleshooting.

Also if you’re new to Nextflow, we recommend going through the official tutorial for a solid understanding of the platform and its features.

Conda or Mamba

If Conda/Mamba is not yet available, install Mamba via Miniforge following their installation guide.

MicrocosM

Install through Nextflow portal:

nextflow pull tnmquann/metaflow

# Check pipeline info
nextflow info tnmquann/metaflow

Prepare taxonomy databases (minimal pre-built versions)

sourmash

Download pre-built database from ctbrown’s farm.

We recommend using the version built on GTDB-RS226.

YATCH

Download pre-built database available on Zenodo.

We recommend using the pretrained database on GTDB-RS226 (ANI threshold 0.995), which is available here.

Important

Choose pre-built databases for sourmash and YATCH that were built on the same GTDB or GenBank release.

Prepare samplesheet input

Create a samplesheet file (CSV format) containing paths to your FASTQ files and sample information.

The file must contain at least 5 comma-separated columns, with the following headers:

sample_id,group,short_reads_1,short_reads_2,long_reads

Run MicrocosM (on paired-end short reads)

Read-based workflow

nextflow run tnmquann/metaflow \
   --input /path/to/your/samples.csv \
   --input_format csv \
   -profile conda \
   --outdir /path/to/output/directory \
   --sourmash_database /path/to/your/sourmash_database
   --yacht_database /path/to/your/yacht_database.json \
   --enable_readbase

Assembly-based workflow

nextflow run tnmquann/metaflow \
   --input /path/to/your/samples.csv \
   --input_format csv \
   -profile conda \
   --outdir /path/to/output/directory \
   --sourmash_database /path/to/your/sourmash_database