Amplicon sequence variant

Amplicon sequence variant (ASV) is a term used to refer to single DNA sequences recovered from a high-throughput marker gene analysis. These amplicon reads are created following the removal of erroneous sequences generated during PCR and sequencing. This allows ASVs to distinguish sequence variation by a single nucleotide change. ASVs are utilized to classify groups of species based on DNA sequences, finding biological and environmental variation and to determine ecological patterns. For many years the standard unit for marker gene analysis was operational taxonomic units (OTUs), which are generated by clustering sequences based on a shared similarity threshold. These traditional units were created by construction of molecular taxonomic units by either clustering based on similarities between sequencing reads (de-novo OTUs) or by clustering reference databases to define and label an OTU (closed-reference OTUs). Instead of using exact sequence variants (single nucleotide changes), OTUs are distinguished by a less fixed dissimilarity threshold which is most commonly 3%. This means these units have to share 97% of the DNA sequence. ASV methods on the other hand are able to resolve sequence differences by as little as a single nucleotide change which allows this method the ability to avoid similarity-based operational clustering units all together. Therefore, ASVs provide a more precise measurement of sequence variation since this method uses DNA differences instead of user created OTU differences. ASVs are also referred to as exact sequence variants (ESVs), zero-radius OTUs (zOTUs), sub-OTUs (sOTUs), Haplotypes, or Oligotypes.^[1] ^[2]

This compares ASVs and OTUs. This chart provides a check mark in regards to whether or not that that marker-gene analysis method is precise, traceable, reproducible, or comprehensive.

This graph shows a real sequence that was sequenced over a hundred times. The black dots are called the error cloud, with the Y-axis being how many types that specific error showed up in this set. The red vertical line represents the 3% cut-off, that means everything to the right of this line is new biology and everything to the left is an error. This demonstrates the errors or new biology that can be missed when using OTUs, since OTUs will include these in the 3% dissimilarity threshold.

This is the same real sequence that was sequenced over a hundred times as the above graph. The black dots are called the error cloud, with the Y-axis being how many types that specific error showed up in this set. Now this diagram shows how ASVs prevent these errors associated with OTUs from being included in the data set because ASVs limit the errors to being below the black curved line and new biology being those dots above the curved black line. This means that ASVs are more exact in measuring differences among sequences.

This visually demonstrates how OTUs pick up erroneous amplicon reads created from PCR and sequencing. When these sequences are amplified into clustered units, these errors are pick-up and placed into clustered units. OTUs therefore pick up a wider set of data points and have the potential to accidentally group two distinct DNA sequences into the same unit as seen by only two colors or DNA sequences being picked up into OTUs instead of four colors (DNA sequences).

This visually shows how ASVs remove and correct errors from PCR, when compared to the OTU diagram above. ASVs are able to create groups for all four colors or DNA sequences observed. This allows ASVs to be more precise in finding sequence variation

OTUs benefits[]

Although ASVs allow for a more precise and accurate measurement for sequence variation, OTUs are still an acceptable and valuable approach. In a research study conducted by Glassman and Martiny, these researchers were able to prove the validity of OTUs when applied to broad-scale diversity analyses studies. They concluded that OTUs and ASVs provided similar ecological results, with ASVs enabling a slightly stronger detection of fungal and bacteria diversity. This study revealed that even though ASVs will now allow for a more accurate measurement of species diversification, scientists should not question the validity of well constructed research studies where OTUs were utilized to demonstrate broad-scale diversification. ^[3]

ASVs benefits[]

The introduction of ASV methods spurred a debate among researchers regarding their utility. Some have argued that ASVs should replace OTUs in marker gene analysis. Arguments in favor of ASVs focus on the precision, tractability, reproducibility and comprehensiveness that ASVs can provide to marker gene analysis. The utility of finer sequence resolution (precision) and the advantage of being able to easily compare sequences between different studies (tractability and reproducibility) makes ASVs the better option for analyzing sequence differences. The units within OTUs can change between researchers, experiments, and databases, since these are operational units and therefore depend on the person who created that specific similarity threshold. Whereas ASVs are exact nucleotide sequence variation, so the changes seen between past experiments can be more easily traced to biological differences instead of unit clustering differences. This means researchers are able to work with themselves from two years ago because ASVs do not utilize database or researcher biases clusters, instead ASV's are detectable biological variation providing consistent labeling across all datasets. Also ASV’s tables provide a more precise and comprehensive sequence variation compared to OTUs databases because operational units vary between experiment and researcher. Since these are exact sequence variations, ASV’s are more comprehensive and precise in comparison to the operational units created by each database. Although the validity of OTUs has been proven, ASVs are more precise, reusable, comprehensive, and reproducible for marker gene sequencing. ^[4] ^[5]

ASV methods[]

Popular methods for resolving ASVs including DADA2,^[6] Deblur,^[7] MED,^[8] and UNOISE.^[9] These methods work broadly by generating an error model tailored to an individual sequencing run and employing algorithms that use the model to distinguish between true biological sequences and those generated by error.

References[]

^ Porter, Teresita M.; Hajibabaei, Mehrdad (2018). "Scaling up: A guide to high-throughput genomic approaches for biodiversity analysis". Molecular Ecology. 27 (2): 313–338. doi:10.1111/mec.14478. ISSN 1365-294X. PMID 29292539.
^ Callahan, Benjamin J.; McMurdie, Paul J.; Holmes, Susan P. (December 2017). "Exact sequence variants should replace operational taxonomic units in marker-gene data analysis". The ISME Journal. 11 (12): 2639–2643. doi:10.1038/ismej.2017.119. ISSN 1751-7370.
^ Glassman, Sydney I.; Martiny, Jennifer B. H. (29 August 2018). "Broadscale Ecological Patterns Are Robust to Use of Exact Sequence Variants versus Operational Taxonomic Units". mSphere. 3 (4). doi:10.1128/mSphere.00148-18. ISSN 2379-5042.
^ Callahan, Benjamin J; McMurdie, Paul J; Holmes, Susan P (2017-07-21). "Exact sequence variants should replace operational taxonomic units in marker gene data analysis". The ISME Journal. 11 (12): 2639–2643. doi:10.1038/ismej.2017.119. PMC 5702726.
^ Callahan, Benjamin J.; McMurdie, Paul J.; Holmes, Susan P. (December 2017). "Exact sequence variants should replace operational taxonomic units in marker-gene data analysis". The ISME Journal. 11 (12): 2639–2643. doi:10.1038/ismej.2017.119. ISSN 1751-7370.
^ Callahan, Benjamin J; McMurdie, Paul J; Rosen, Michael J; Han, Andrew W; Johnson, Amy J; Holmes, Susan P (2015-08-06). "DADA2: High resolution sample inference from amplicon data". doi:10.1101/024034. Cite journal requires |journal= (help)
^ Amir, Amnon; McDonald, Daniel; Navas-Molina, Jose A.; Kopylova, Evguenia; Morton, James T.; Zech Xu, Zhenjiang; Kightley, Eric P.; Thompson, Luke R.; Hyde, Embriette R. (2017-04-25). Gilbert, Jack A. (ed.). "Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns". mSystems. 2 (2). doi:10.1128/mSystems.00191-16. ISSN 2379-5077. PMC 5340863. PMID 28289731.
^ Eren, A Murat; Morrison, Hilary G; Lescault, Pamela J; Reveillaud, Julie; Vineis, Joseph H; Sogin, Mitchell L (2014-10-17). "Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences". The ISME Journal. 9 (4): 968–979. doi:10.1038/ismej.2014.195. ISSN 1751-7362. PMC 4817710. PMID 25325381.
^ Edgar, Robert C (2016-10-15). "UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing". doi:10.1101/081257. Cite journal requires |journal= (help)

[1] Porter, Teresita M.; Hajibabaei, Mehrdad (2018). "Scaling up: A guide to high-throughput genomic approaches for biodiversity analysis". Molecular Ecology. 27 (2): 313–338. doi:10.1111/mec.14478. ISSN 1365-294X. PMID 29292539.

[2] Callahan, Benjamin J.; McMurdie, Paul J.; Holmes, Susan P. (December 2017). "Exact sequence variants should replace operational taxonomic units in marker-gene data analysis". The ISME Journal. 11 (12): 2639–2643. doi:10.1038/ismej.2017.119. ISSN 1751-7370.

[3] Glassman, Sydney I.; Martiny, Jennifer B. H. (29 August 2018). "Broadscale Ecological Patterns Are Robust to Use of Exact Sequence Variants versus Operational Taxonomic Units". mSphere. 3 (4). doi:10.1128/mSphere.00148-18. ISSN 2379-5042.

[4] Callahan, Benjamin J; McMurdie, Paul J; Holmes, Susan P (2017-07-21). "Exact sequence variants should replace operational taxonomic units in marker gene data analysis". The ISME Journal. 11 (12): 2639–2643. doi:10.1038/ismej.2017.119. PMC 5702726.

[5] Callahan, Benjamin J.; McMurdie, Paul J.; Holmes, Susan P. (December 2017). "Exact sequence variants should replace operational taxonomic units in marker-gene data analysis". The ISME Journal. 11 (12): 2639–2643. doi:10.1038/ismej.2017.119. ISSN 1751-7370.

[6] Callahan, Benjamin J; McMurdie, Paul J; Rosen, Michael J; Han, Andrew W; Johnson, Amy J; Holmes, Susan P (2015-08-06). "DADA2: High resolution sample inference from amplicon data". doi:10.1101/024034. Cite journal requires |journal= (help)

[7] Amir, Amnon; McDonald, Daniel; Navas-Molina, Jose A.; Kopylova, Evguenia; Morton, James T.; Zech Xu, Zhenjiang; Kightley, Eric P.; Thompson, Luke R.; Hyde, Embriette R. (2017-04-25). Gilbert, Jack A. (ed.). "Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns". mSystems. 2 (2). doi:10.1128/mSystems.00191-16. ISSN 2379-5077. PMC 5340863. PMID 28289731.

[8] Eren, A Murat; Morrison, Hilary G; Lescault, Pamela J; Reveillaud, Julie; Vineis, Joseph H; Sogin, Mitchell L (2014-10-17). "Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences". The ISME Journal. 9 (4): 968–979. doi:10.1038/ismej.2014.195. ISSN 1751-7362. PMC 4817710. PMID 25325381.

[9] Edgar, Robert C (2016-10-15). "UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing". doi:10.1101/081257. Cite journal requires |journal= (help)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]