CCycDB-Usage and Examples


A guide to analyzing metagenomic data with CCycDB

Step 0. Prepare your files

Input files supports 3 types

Step 1. Download

You can download database directly through https://zenodo.org/records/10045943 or third-party download tools.

$ git clone https://github.com/ccycdb/CCycDB.PL

Step 2. Annotation

Usage:

perl GetFun_CCycdb.pl [-situation read-based|assembly-based|tabular] [-wd work_directory] [-m diamond|usearch|blast] [-f filetype] [-s seqtype] [-id] [-e] [-tpm] [-norm xx] [-rs xx] [-thread xx] [-od xx]

[Options:]

-situation The situation for input files (read-based|assembly-based|tabular)
-wd Work directory. Ensure that the files downloaded in Step 1 and your input files be included in this directory.
-od Output file. This directory may or may not exist.
-m
Database searching program you plan to use (diamond|usearch|blast).
-f Specify the extensions of your sequence files (E.g. fastq, fastq.gz, fasta, fasta.gz, fq, fq.gz, fa, fa.gz) or (faa, fna) or (diamond|usearch|blast).
When using "-situation tablular", -f supports "diamond|usearch|blast".
Ensure that filetype is support for the tool selected by -m option.
(E.g., if -m usearch, the supported file types for -f are "fastq|fasta," and for "-m blast," they are "fasta|fa".)
-s (nucl|prot) Sequence type.
-tpm (0|1)  "1" need $sample.tpm exist in the work directory (default: 0).
"-situation assembly-based" is a prerequisite for this option.
-id Minimum identity to report an alignment (default: 30).
-e Maximum e-value to report alignments (default: 1e-5).
-norm (0|1) 0: don`t need random sampling; 1: need random sampling.
-rs The number of sequences for random subsampling. (default: the lowest number of sequences).
Note: "-norm 1" is a prerequisite for this parameter.
-thread Number of threads (default: 2)

Examples

 

Depending on the tools used, you may want to cite also:

DIAMOND: Buchfink B, Xie C, Huson D H. Fast and sensitive protein alignment using DIAMOND[J]. Nature methods, 2015, 12(1): 59-60.

BLASTX: Boratyn G M, Camacho C, Cooper P S, et al. BLAST: a more efficient report with usability improvements[J]. Nucleic acids research, 2013, 41(W1): W29-W33.

USEARCH: Edgar R C. Search and clustering orders of magnitude faster than BLAST[J]. Bioinformatics, 2010, 26(19): 2460-2461.

CSVTK: Csvtk—CSV/TSV Toolkit. Available online: https://bioinf.shenwei.me/csvtk/

SEQKIT: Shen W, Le S, Li Y, et al. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation[J]. PloS one, 2016, 11(10): e0163962.