Preprocessing

Before running GraphBin, we have to assemble our read data into contigs and bin the contigs.

Assembly

Reads can be assembled into contigs using 3 assembly software.

metaSPAdes

SPAdes is an assembler based on the de Bruijn graph approach. metaSPAdes is the dedicated metagenomic assembler of SPAdes. Use metaSPAdes (SPAdes in metagenomics mode) software to assemble reads into contigs.

SGA

SGA (String Graph Assembler) is an assembler based on the overlap-layout-consensus (more recently string graph) approach. Use SGA software to assemble reads into contigs.

MEGAHIT

MEGAHIT is an assembler based on the de Bruijn graph approach. Use MEGAHIT software to assemble reads into contigs.

If you are using MEGAHIT assemblies, please refer to the section Before using MEGAHIT assemblies in the Support page before you run GraphBin.

Initial Binning

Once you have obtained the assembly output, you can run a metagenomic binning tool such as MaxBin2, CONCOCT, MetaBAT2 or VAMB to get an initial binning result.

You can use the prep_result.py support script to format an initial binning result in to the .csv format with contig identifiers and bin ID. Contigs are named according to their original identifier and bins are numbered according to the fasta file name. You can run prep_result.py as follows.

python prep_result.py --binned /path/to/folder_with_binning_result --output /path/to/output_folder

You can see the usage options of prep_result.py by typing python prep_result.py -h on the command line.

Formatted binning result will be stored in a file named initial_contig_bins.csv in the output folder provided. Bin IDs and corresponding fasta files for each bin will be recorded in a file named bin_ids.csv in the output folder provided.

Now we are all set to run GraphBin.