Nucleic acid nomenclature
It has been suggested that this article be merged into Nucleic acid notation. (Discuss) Proposed since October 2021. |
Molecular biologists use several shorthand terms when referring to nucleic acid molecules, such as DNA and RNA, collectively referred to as nucleic acid nomenclature.
The most common is the representation of the base pairs as letters—an adenine nucleotide is abbreviated as A, guanine as G, cytosine as C, thymine as T, and in RNA, uracil as U.[1]
Additionally, the positions of the carbons in the ribose sugar that forms the backbone of the nucleic acid chain are numbered, and are used to indicate the direction of nucleic acids (5'->3' versus 3'->5'). This is referred to as directionality.[1]
Expanded letter code[]
In addition to the conventional GATC symbols, there is an expanded letter code to indicate a position within a sequence that may be flexible when defining sequences.[1]
Letter | Nucleotide(s) included |
---|---|
A | A |
T | T |
G | G |
C | C |
U | U |
R | G or A |
Y | T or C |
M | A or C |
K | G or T |
S | G or C |
W | A or T |
H | A or C or T |
B | G or T or C |
V | G or C or A |
D | G or T or A |
N | G or T or A or C |
For example, if the sequences known to bind protein X are known to be AAAAAAGAAA, AAAAAACAAA, AAAAAATAAA, and AAAAAAAAAA, this can be expressed as AAAAAANAAA.
Triple Helix Base Pairing[]
This section needs additional citations for verification. (September 2021) |
Watson and Crick base pairs are indicated by a "•" or a "-" or a "." (example: A•T, or poly(rC)•2poly(rC)).
Hoogsteen triple helix base pairs are indicated by a "*" or a ":" (example: C•G*G+, or T•A*T, or C•G*G, or T•A*A).
See also[]
References[]
- ^ a b c Cornish-Bowden A (May 1985). "Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984". Nucleic Acids Research. 13 (9): 3021–30. doi:10.1093/nar/13.9.3021. PMC 341218. PMID 2582368.
- DNA replication
- Nucleic acids
- Nucleotides