Introduction to AcaFinder

AcaFinder is the first ever automated genome mining tool for reliable Acas screening. To more confidently identify Acas given a genome or metagenome assembled genome, AcaFinder implementes two approaches. The first approach is based on guilt-by-association (GBA), meaning that we identify homologs of Acrs first and then search for HTH-containing proteins in the acr gene neighborhood. The second approach is to build an HMM (hidden markov model) database using training data of the 12 known Aca families, and then search for Aca homologs with this Aca-HMMdb instead of Pfam HTH HMMs. In addition to the two implemented approaches, AcaFinder also integrates a CRISPR-Cas search tool (CRISPRCasTyper), a prophage search tool (VIBRANT), and in-house a Self-targeting spacer (STSS) searching tool, providing users with detailed information vital to the assessment of Aca predictions.

Workflow of AcaFinder

Step 1) The input faa file will be used as query to DIAMOND blastp against the built-in dbAcr for Acr homologs (coverage > 60%, Evalue < 1e-3). Once Acr homologs are determined, the input fna file will be scanned for short-gene-operons (SGOs) by the following criteria: (i) All genes < Acr homolog length (99.9% Acrs in dbAcr are less than 200aa); (ii) All intergenic distances < 250bp; (iii) All genes on the same strand; and (iv) at least one acr homologous genes in the SGO.

Step 2) Each SGO will be scanned for HTH-containing genes using hmmscan using the HTH-HMMdb or its subset HTH-HMM_strictdb as database.

Step 3) HTH-containing proteins from SGOs will be output as candidate Acas. All non-Aca and non-Acr genes in SGOs will be further annotated with Pfam database using PfamScan.

In addition to the identification of Acas and Acr-Aca operons, AcaFinder will also scan the input fna file for prophages, CRISPR-Cas systems, and self-targeting spacers (STSs).