Predicted Acas

Using AcaFinder (with “--HTH-HMM_strictdb” flag) and the GBA approach, Acas were screened from 15201 complete bacteria + 961 archaea genomes of the NCBI Refseq database, and 142809 viral contigs of the human gut phage database (GPD). To further increase the quality of predictions, we limited predicted Acas from prokaryotes to genomes with complete CRISPR-Cas systems and within prophage regions. To cluster predicted Acas into potential families, CD-Hit was used with a 40% sequence identity threshold, on the bases of proteins above this threshold are more likely to share structure and/or function similarities. Clusters with a size of greater than 3 were selected to filter out singletons and smaller size clusters. All processes resulted in a total of 1422 Aca clusters, of which can be considered to having high potential of being novel Aca families. The 1422 cluster related information as well as all associated protein sequences are organized into tables and protein fasta file below.

  • 1422 predicted Aca cluster table, download here.
  • 1422 Aca cluster associated protein sequences, download here.