Genetics: Glossary of terms

Authors:: Benjamin A Raby, MD, MPH; Robert D Blank, MD, PhD
Section Editor:: Anne Slavotinek, MBBS, PhD
Deputy Editor:: Jennifer S Tirnauer, MD

Literature review current through: Dec 2022. | This topic last updated: Jun 01, 2022.

INTRODUCTION — One of the greatest obstacles clinicians experience in reading about and understanding genetics is the extensive use of technical language and jargon. It should be noted that genetic terms are frequently used imprecisely in published clinical literature. The following is a compilation of some of the most important technical terms.

A more extensive discussion of terms can be accessed in standard genetics reference texts [1]. In addition, a guide for the conventions regarding the proper names of genes and alleles in humans can be found at www.genenames.org/about/guidelines/.

Glossaries of epidemiological terms and terms that apply to systematic reviews and meta-analyses are presented separately in UpToDate. (See "Glossary of common biostatistical and epidemiological terms" and "Systematic review and meta-analysis", section on 'Glossary of terms'.)

DEFINITIONS

Allele — An allele is one of a series of alternative forms (genotypes) at locus, or a specific region of a chromosome. At the DNA level, different alleles have different base sequences.

Allelic fraction — The allelic fraction can be defined as the number of times a mutated base is observed, divided by the total number of times any base is observed at the locus [2]. Allelic fraction is generally applied to a single mutation in a tumor, and thus is distinct from allelic frequency, which examines the frequency of an allele in a population (see 'Allele frequency' below). Mutation fraction can be defined as the ratio between mutant and wild-type alleles in a tumor sample.

Allele frequency — The proportion of chromosomes, loci, or genes in a population harboring a specific allele. "Minor allele frequency" typically refers to the less common variant at a biallelic locus and is usually used to refer to the frequency of a single nucleotide polymorphism (SNP) (see 'Single nucleotide polymorphism (SNP)' below). This population frequency is distinguished from allelic ratio, which applies to a single person (eg, with a malignancy).

Allelic heterogeneity — Allelic heterogeneity refers to the common occurrence of multiple pathogenic variants in one gene that all result in the same disease or syndrome. As an example, more than 1500 variants in the cystic fibrosis transmembrane conductance regulator (CFTR) gene cause cystic fibrosis. Note that this term differs from genetic heterogeneity, in which variants in multiple genes can cause the same disease phenotype. (See 'Genetic heterogeneity' below.)

Allelic ratio — Allelic ratio measures the relative abundance of mutated to normal or wildtype alleles within a tumor. Higher allelic ratios (ie, a greater fraction of mutant alleles) have been reported to be associated with poorer prognosis. Unlike allele frequency, which is a characteristic of a population (see 'Allele frequency' above), allelic ratio is a property of cells within a tumor in a single individual. Allelic ratio is of necessity an inexact concept because it is rare (for solid tumors at least) to avoid substantial contamination by non-tumor cells from blood, stroma, and vasculature. Amplification of mutant sequences in a tumor can also have a large impact on allelic ratios.

Aneuploidy — The state of having an abnormal number of chromosomes. A euploid human karyotype has 46 chromosomes (figure 1). Aneuploidy can affect the entire somatic cell population, as in trisomy 21, or it can affect a subset of cells, as in a tumor.

Anticipation — A phenomenon whereby the symptoms of a genetically-based condition appear at an earlier age, or with greater severity, in successive generations. Expansion of trinucleotide repeats is a known molecular cause for specific diseases (such as myotonic dystrophy, fragile X syndrome, Huntington’s chorea) that manifest anticipation.

Association — Genetic association is a property of alleles. It refers to the non-random relationship between an allele and a phenotype in a population. Genetic association between a marker allele and a phenotype can result either because the allele is a direct causal variant, because the allele is in linkage disequilibrium, or segregating with a causal variant in close proximity, or because of stratification of the population. Association may be determined in a genome-wide association study. (See 'Genome-wide association study (GWAS)' below.)

Autosome — A chromosome other than X or Y. The human genome has 44 autosomes (22 pairs of autosomes) (figure 1).

Autosomal — A gene is autosomal if it is located on an autosome rather than a sex chromosome. A gene's inheritance pattern is also referred to as autosomal if the pattern corresponds to that of known autosomal genes (rather than sex-linked). (See 'Sex-linked' below.)

Benign variant — (See 'Variant' below.)

Biome — Humans are colonized by a multitude of microorganisms, which vary by age and location in the body. The biome (or microbiome) is the totality of colonizing microorganisms in a specific environmental milieu. Biomes may be studied genetically using metagenomics. (See 'Metagenomics' below.)

Carrier — An individual who is heterozygous for a risk or disease allele. The term is typically used to describe someone who is heterozygous for a gene variant that causes autosomal recessive or X-linked recessive disease, but in clinical discussions, it is also used to describe heterozygotes for risk alleles or deleterious alleles that predispose to disease, regardless of inheritance type.

Carrier rate — The frequency of carriers in a population.

Carrier testing — Clinical method of genotyping at risk populations or family members to identify individuals, usually asymptomatic, who have a pathogenic or likely pathogenic variant for an autosomal recessive or X-linked disorder. One example is prenatal screening for Tay-Sachs disease-associated variants in people of Ashkenazi Jewish ancestry. (See "Genetic testing" and "Preconception and prenatal carrier screening for genetic disease more common in people of Ashkenazi Jewish descent and others with a family history of these disorders".)

Cascade testing — Refers to a testing approach in which at-risk first-degree relatives are tested for a familial genetic variant; if these individuals test positive, then their first-degree relatives are tested. Cascade testing allows testing to be focused on a specific variant and reduces unnecessary testing of relatives who are not at risk.

Centromere — A condensed chromosome region that mediates attachment of chromosomes to the microtubules of the mitotic or meiotic spindle. The centromere is important in preserving normal chromosome number.

Chimerism — When referring to an individual, a state in which two or more populations of genetically distinct cells are present that arose from the fusion of two or more fertilized eggs. Contrasted with mosaicism. (See 'Mosaicism' below.)

Also used in patients post-allogeneic hematopoietic stem cell transplant to refer to a state of two genetically distinct populations of hematopoietic cells (one from the donor and one from the recipient). (See "Hematopoietic support after hematopoietic cell transplantation", section on 'Chimerism'.)

Chromatid — One of two replications, or copies, of a chromosome formed prior to cell division and joined together at their centromeres. The centromere is the last portion of a chromosome to replicate during cell division. Sister chromatids are a pair of chromatids attached at the centromere.

Chromatin — A complex structure composed of DNA, RNA, and proteins that facilitates efficient packaging of DNA in cells. The primary structure of chromatin is the nucleosome, consisting of double-stranded DNA coiled around a core of histone proteins. Nucleosomes packed tightly together form a "bead-on-string" configuration, which in turn assembles in hierarchical looping structures to create densely-packaged chromatin. The regulation of gene transcription is governed by the uncoiling of packed chromatin (heterochromatin) into exposed DNA (euchromatin). (See "Principles of epigenetics".)

Clonal — Arising from a single clone, or cell. Examples include clonal selection of lymphocytes during immune development and clonal origin of leukemia cells or other tumor cells. (See "Immunoglobulin genetics" and "Pathogenesis of acute myeloid leukemia".)

Cloning — Production of a genetically identical copy. Can refer to a single gene or to an entire organism.

Coding region — Portion of a gene that encodes a protein.

Coding mutation or polymorphism — A genetic variation in the open reading frame (protein-encoding region) of a gene. Coding variants that alter amino acid composition of a protein are called non-synonymous or missense variants (figure 2). Variants that do not alter amino acid composition are called synonymous variants. Nonsense variants are coding variants that result in the introduction of a stop codon (figure 3). A frameshift mutation results from an insertion or deletion of a number of bases not divisible by three, resulting in shifting of the reading frame (figure 4). Variants are also classified according to their pathogenicity. (See 'Variant' below.)

Codon — A three-nucleotide sequence that codes either for a specific amino acid or for chain initiation or termination during protein synthesis.

Complementation — The restoration of normal phenotype by gene replacement. The replaced gene can either be an intact copy of a defective gene (direct replacement), or an alternate gene with function that can compensate for the defective gene's aberrant function.

Complex trait/complex disease — Trait or disease for which interactions between more than one gene and/or environmental factors also play a role in the phenotype.

Compound heterozygote — An individual bearing two different pathogenic variants in the same gene that together are sufficient to manifest an autosomal recessive phenotype. This differs from "homozygote," which refers to an individual in whom both pathogenic variants are the same, and from "double heterozygote," which refers to an individual who is heterozygous for pathogenic variants at two separate genetic loci, which together manifest disease. The inheritance pattern for double heterozygosity is referred to as digenic inheritance. (See 'Digenic inheritance' below.)

Consanguinity — Reproduction between two individuals from the same bloodline (eg, first cousin, second cousin). Consanguineous parentage increases the probability of a rare recessive disease, resulting from higher probability of both parents sharing the same rare deleterious sequence variant.

Copy number variation (CNV) — The most prevalent type of chromosomal structural variation, in which the number of copies of a large chromosomal or DNA segment (usually measuring thousands to millions of bases) varies between individuals. (See "Genomic disorders: An overview", section on 'Copy number variations'.)

Coupling — The presence of two specified alleles at two linked loci on the same homologous chromosome (ie, "in cis"), and the two alternative alleles on the other chromosome. For illustration, in the case of dominant and recessive alleles, the coupling gametes formed are AB and ab (figure 5). In contrast, repulsion refers to the presence of the specified alleles at two linked loci on different chromosomes (ie, in trans). (See 'Repulsion' below.)

CRISPR — CRISPR (clustered regularly interspaced short palindromic repeats; pronounced "crisper"; sometimes referred to as CRISPR-Cas9) is a component of a bacterial defense system that has been adapted for use in combination with an endonuclease such as Cas9 or Cpf1 for genome editing. (See 'Genome editing' below.)

Crossing-over — The exchange of chromosome segments through the process of recombination that occurs between two homologous chromosomes during meiosis. The site on the chromosome where the exchange occurs is called a crossover.

De novo mutation — A novel genetic sequence variant introduced by a germline mutation in the proband's DNA. Often used to distinguish familial from sporadic cases of genetic disease.

Digenic inheritance — Diseases caused by co-inheritance of variants (mutations) at two distinct genetic loci (ie, in two different genes). Individuals with digenic inheritance may also be called "double heterozygotes," which is distinct from compound heterozygotes (individuals with two different mutations in the same gene). (See 'Compound heterozygote' above.)

Diploid — Possessing two copies of each autosomal chromosome and two sex chromosomes. Most human cells are diploid. Hepatocytes are frequently polyploid (tetraploid or greater). Gametes are haploid (one copy of each autosome and one sex chromosome) (figure 1). (See 'Haploid' below and 'Ploidy' below.)

DNA — DNA (deoxyribonucleic acid) is the primary molecular constituent of chromosomes that stores the genetic information of most living organisms, including humans. The genetic information is encoded by the sequence of the four bases adenine, guanine, thymine, and cytosine. Adenine and guanine are purines; thymine and cytosine are pyrimidines. DNA is usually present as a double-stranded antiparallel polymer composed of an outer phosphodeoxyribose backbone with central nucleotide side chains (figure 6). The two strands are held together by hydrogen bonds that link each purine to a pyrimidine (between adenine and thymine and between guanine and cytosine). The energy of the hydrogen bonds is low enough to enable localized strand separation under physiologic conditions. (See "Basic genetics concepts: DNA regulation and gene expression", section on 'DNA and RNA'.)

DNA barcoding — A collection of methods developed to facilitate the analysis of complex mixtures of pooled samples, whereby short, unique DNA sequences (referred to as tags or barcodes) are added to each of the pooled DNA samples (eg, from distinct individuals). Barcoding is used routinely in next-generation sequencing applications, including single-cell RNA sequencing and exome sequencing.

Barcoding also refers to methods for determining the species of origin of a DNA sample on the basis of the DNA sequence itself. A clinical example is identification of the ingredients in an herbal preparation.

Dominant negative — Dominant negative alleles are alleles that cause an abnormal phenotype or disease by a mechanism that depends on the presence of an abnormal gene product interfering with the function of the products from a normal gene. In other words, the variant allele confers a loss of function by interfering with the remaining normal allele. In contrast to most loss-of-function variants that confer phenotype only when both alleles are defective (ie, recessive inheritance), dominant-negative mutations act dominantly, meaning that only a single allele with the mutation is sufficient to cause the disease phenotype.

Double heterozygote — An individual who is heterozygous for two mutations at two separate genetic loci that together are sufficient to manifest a phenotype. Differs from compound heterozygote.

Embryonic stem cell (ESC) — A pluripotential (pluripotent) cell derived from the inner cell mass of an early-stage embryo that is capable of differentiating into cells derived from all three germ layers. (See "Overview of stem cells".)

Enhancer — A region of DNA, upstream (5') or downstream (3") of a gene, that regulates gene expression. Enhancer function relies on binding of specific regulatory proteins (transcription factors). (See "Basic genetics concepts: DNA regulation and gene expression", section on 'Gene expression'.)

Enhancer hijacking — Use by one gene of another gene's enhancer. Often due to changes in three-dimensional genome structure that place one region of DNA adjacent to another region. Can explain the mechanism of certain diseases involving aberrant gene expression.

Epigenetic change — A modification of chromatin that does not alter the nucleotide base sequence, but alters the expression of a gene. Epigenetic changes may be stable in an individual, but may be reversed during gametogenesis or development. DNA methylation and histone acetylation are common epigenetic changes. Epigenetic changes form the mechanistic basis of imprinting. Some medications alter epigenetic regulation (eg, histone deacetylase [HDAC] inhibitors). (See "Principles of epigenetics".)

Epigenetic modifications are removed when cells are treated in the laboratory to generate induced pluripotent stem cells (iPSCs). (See 'Induced pluripotent stem cell (iPSC)' below.)

Epistasis — The process by which variations at two or more genetic loci interact to produce phenotypes different from the individual effects of each variant. This process is often referred to as either a gene-gene interaction or a genetic modifier effect.

Exome — The portion of the genome that consists of exons. (See 'Exon' below.)

Exome sequencing — A sequencing strategy that provides the DNA sequence corresponding to all exons (which represent approximately 1 to 2 percent of the genome), excluding introns and noncoding genomic sequence. Though the complete exome includes noncoding 5’ and 3’ untranslated regions (UTRs), most exome sequencing assays are enriched for the coding exons and largely exclude the noncoding regions.

Exon — A segment of DNA that is transcribed and present in mature messenger RNA (mRNA). Many exons encode a portion of a protein, but noncoding exons also exist. This is in contrast to an intron, the DNA sequence between exons that does not become part of mature mRNA. Exons constitute only a small percent of the genome (about 1 to 2 percent).

Expressivity — A parameter used in genetic models that quantifies the degree to which an inherited characteristic is expressed in an organism.

Frameshift mutation — A frameshift mutation is a change in DNA sequence that results from an insertion or deletion of a number of bases that is not divisible by three, resulting in a shift of the reading frame (figure 4) and thus altering synthesis of the protein.

Fusion gene — A fusion gene is a functional gene product that results from the fusion of DNA segments from two physically distinct genes. The fusion occurs as a consequence of chromosomal rearrangements such as translocations, inversions, segmental deletions, or duplications. Examples include the BCR-ABL and the FIP1L1-PDGFRA oncogenes.

Gene — A gene is a unit of DNA sequence that encodes specific function. Classical definitions limit genes to those elements that code for proteins. However, non-protein coding genes (such as noncoding RNAs or pseudogenes) are also genes.

Gene editing — Gene editing refers to the use of nucleases to alter the DNA sequence of a gene, as discussed in more detail below. (See 'Genome editing' below.)

Genetic heterogeneity — Genetic heterogeneity refers to a phenomenon in which variants in different genes result in the same phenotype or disease. Examples include the multiple genetic causes of sensorineural deafness. This differs from allelic heterogeneity, in which multiple variants in the same gene can lead to the same phenotype. (See 'Allelic heterogeneity' above.)

Genetic polymorphism — A genetic polymorphism is a DNA segment for which two or more alternate forms can be found in a population. The common types of polymorphisms include single nucleotide variants (single base pair changes, also called single nucleotide polymorphisms [SNPs]), indels (insertion/deletion polymorphisms) or larger structural changes like copy number variants. Most commonly, genetic polymorphism refers a common single base-pair change or single nucleotide polymorphism (SNP). (See 'Polymorphism' below and 'Single nucleotide polymorphism (SNP)' below.)

Genetic risk score — An estimate of an individual's genetic risk for a specific polygenic phenotype [3]. Genetic risk scores are calculated using the cumulative contribution of all known risk alleles carried by the individual. This is in contrast to polygenic risk scores, which model genetic risk using a larger number of loci, including many that do not meet genome-wide significance criteria in association studies. (See "Principles of complex trait genetics" and 'Polygenic risk (PGR) score' below.)

Genotype — A genotype is the combination of two alleles at one genomic location (locus) or base pair in an individual (figure 5).

Genome editing — Genome editing refers to the use of nucleases to insert or remove DNA from a genome. There are several common technologies that make use of genome editing, including clustered regularly interspaced short palindromic repeats (CRISPR), transcription activator-like effector nucleases (TALENs) and zinc finger nucleases (ZFNs). CRISPR is increasingly employed and is an RNA-guided gene editing method that uses a bacterially-derived protein (Cas9) and a specifically-designed synthetic guide RNA (gRNA; also known as a small guide RNA [sgRNA] or a single guide RNA) to introduce a double-strand break at a precise location in the gene of interest. The sgRNA directs the position of the double-strand break by hybridization to its matching sequence. Genome editing is used as a tool for genetic perturbation in research. Therapeutic applications for the correction of inherited genetic variation are under investigation. (See "Overview of gene therapy, gene editing, and gene silencing".)

Genome-wide association study (GWAS) — A GWAS (pronounced "gee-wass") study is a type of genetic mapping study design that assesses for evidence of association between genetic variants and heritable traits across the entire genome. Typical studies consist of genotyping hundreds of thousands of common SNPs, using DNA microarrays or other methodologies in large case-control populations, with the goal of identifying specific risk alleles that are more prevalent in cases than in controls. (See "Tools for genetics and genomics: Gene expression profiling".)

Germline — Germline refers to the gametes (ova and spermatozoa and their precursors) that have the capacity to give rise to offspring. In the context of pathogenic variants, germline mutations refer to those that arose in germline cells as opposed to somatic mutations that were acquired in a specific tissue.

Haploid — Cells or organisms possessing one copy of each autosomal chromosome and one sex chromosome (and therefore effectively one copy of each gene). Gametes (ova and sperm) are haploid. Fertilization of a haploid ovum by a haploid sperm results in formation of a diploid embryo. Many microorganisms are haploid. In contrast, diploid organisms possess two of each autosome and two sex chromosomes. (See 'Diploid' above.)

Haploinsufficiency — Having only a single functional copy of a gene due to inactivation of the second allele by a deleterious variant. In a diploid cell, the single functional copy of the gene does not produce sufficient protein, resulting in disease. All haploinsufficient loci are hemizygous, but not all hemizygous loci are haploinsufficient. (See 'Hemizygous' below.)

Haplotype — The physical combination or sequence of alleles present on a single chromosome. By definition, alleles on one haplotype are in "cis" (figure 5).

Hemizygous — The state of carrying only one copy of a genomic region due to deletion or altered function of the corresponding region on the other chromosome. Carriers of large-scale deletions are hemizygotes. Hemizygosity can confer disease if having one normally functioning copy is insufficient for normal cellular function (haploinsufficiency), but if a single functional copy of the gene is sufficient for normal cellular function, the phenotype may not be abnormal. Hemizygosity can also confer disease if a pathogenic mutation is present within the hemizygous region. (See 'Haploinsufficiency' above.)

Heritability — The proportion of phenotypic variation that is explained by genetic (or in some cases, epigenetic) factors.

Heteroplasmy — The occurrence in a single cell of more than one different population of mitochondrial DNA sequence.

Identity by descent — Alleles are identical by descent if they can be traced back to a common ancestor. Identity by descent is a more stringent classification than identity by state (see 'Identity by state' below). Identity by descent is the basis for establishing linkage.

Identity by state — Alleles are identical by state if the assay being used to distinguish alleles determines that they are identical.

Imprinting — Gamete-specific gene silencing, in which only the allele from the mother or only the allele from the father is expressed, leading to observed parent-of-origin effects in offspring. Examples include the Prader-Willi syndrome and Angelman Syndrome locus and a gene involved in pseudohyperparathyroidism. (See "Prader-Willi syndrome: Clinical features and diagnosis" and "Congenital cytogenetic abnormalities".)

Indel — A class of common polymorphism or deleterious sequence variant defined by an extra copy or a missing copy of a short genetic or chromosomal sequence. (See "Chromosomal translocations, deletions, and inversions".)

Induced pluripotent stem cell (iPSC) — A pluripotent cell derived by in vitro reprogramming of a somatic cell that is capable of both self-renewal and differentiation to mature lineages. (See "Overview of stem cells", section on 'Induced pluripotent stem (iPS) cells'.)

Intron — A segment of DNA between two exons that is transcribed to pre-mRNA, but is removed through the process of splicing and is therefore not part of mature mRNA. Introns may contain regulatory DNA or serve other functions.

Inversion — A chromosomal rearrangement characterized by rotation and reintegration of a DNA segment, resulting in an inverted orientation of the segment relative to its typical state.

Karyotype — Karyotype refers to the complete set of chromosomes in an organism or tumor. Karyotype is determined by visual examination and counting of condensed chromosomes from several representative cells to determine the number of copies of each chromosome as well as any translocations, sub-chromosomal deletions, or duplications. Determination of the karyotype of a tumor is also called "cytogenetic analysis." (See "Tools for genetics and genomics: Cytogenetics and molecular genetics".)

Likely benign variant — (See 'Variant' below.)

Likely pathogenic variant — (See 'Variant' below.)

Linkage — The relationship that exists between two loci that violate the Mendelian law of independent assortment and therefore segregate in families in a non-random fashion. Non-independent assortment results because linked loci reside together on the same chromosome (ie, they are syntenic). However, most syntenic loci are not linked due to mandatory recombination during meiosis. Linkage therefore implies the linked loci are in close physical proximity to each other. The genetic linkage distance is expressed as the recombination fraction, which is measured in centiMorgans (cM). Note that this is not necessarily proportional to the physical distance (base pairs) separating the loci.

Linkage analysis — Method of gene mapping that tests for the non-random segregation of disease phenotypes with discrete chromosomal segments. Identification of linked regions implies the existence of disease-causing (pathogenic) variants within or proximal to the linked region. The process of disease-gene identification within this region is termed positional cloning.

Linkage disequilibrium — The non-random association of alleles at two or more loci in a population. Linkage disequilibrium is present when the observed haplotype distribution of two or more markers in a population is significantly different from the expected haplotype distribution (which can be derived from the cross-product of observed allele frequencies) (figure 7).

Locus — A locus (plural, loci) is a specific chromosomal or genomic location.

LOD score — The "logarithm of the odds" (LOD) score is a quantitative measure of the statistical evidence of linkage between two genes. The LOD score depends on both the probability of cosegregation of the two genes during meiosis and the size and structure of the population in which the linkage analysis is performed. By convention, LOD scores >3 are considered to be evidence of linkage in human studies. In some studies, the threshold LOD scores for linkage can be established via permutation testing.

Lyonization — (See 'X-inactivation' below.)

Manhattan plot — A type of plot used to display results of a GWAS study (see 'Genome-wide association study (GWAS)' above). Genomic coordinates are shown on the X-axis and the negative logarithm of the P-value for each SNP on the Y-axis. SNPs with the strongest association will have the lowest P-values, and hence the tallest profiles. Named for the appearance of the skyline in Manhattan in the United States (figure 8).

Marker — A locus with alternative alleles that can be used in genetic mapping experiments.

Meiosis — The cell division process in germline cells by which the chromosomal complement is reduced from the diploid to the haploid number (figure 9).

Mendelian inheritance — A trait is said to have Mendelian inheritance if its genetic transmission can be explained by a Mendelian model of inheritance, such as autosomal dominant, autosomal recessive, or X-linked recessive or dominant inheritance. This is in contrast to non-Mendelian inheritance patterns such as digenic inheritance, or quantitative traits. (See 'Digenic inheritance' above.)

Mendelian randomization — A study design in which genotypes serve as a proxy for an epidemiologic exposure(s). The rationale for undertaking such investigations is that alleles segregate independently and are therefore immune to biases that cannot be overcome in observational studies. The alleles included in the study must be associated with the exposure being tested, and several additional stringent assumptions also must be satisfied. (See "Mendelian randomization".)

Metagenomics — The study of complex microbial populations (biomes) using genomic approaches. Human tissues such as the skin and gut have multiple heterogeneous populations of microorganisms that differ from each other with respect to phyla composition and abundance in a tissue-specific manner. These abundances can be estimated by sequencing the mixed population of microorganisms, either through targeted sequencing of 16S ribosomes (for bacterial characterization) or whole-genome approaches (for bacteria, viruses, fungi, and other organisms).

Methylation — The addition of methyl groups to cytosine bases in DNA or to lysine residues in the tails histones. Methylation followed by deamination is a major pathway for mutation of cytosine to thymine. Methylation is a form of epigenetic regulation that correlates with reduced gene transcription and is an important mechanism for gene imprinting and X-inactivation. (See 'Epigenetic change' above and 'Imprinting' above and 'X-inactivation' below.)

Micro-RNA (miR) — A small, noncoding RNA that regulates the stability or translation of a set of mRNAs.

Microsatellite — A tandem array of short sequences of DNA (typically two to four bases). Microsatellites are numerous and widely distributed in the genome. There is often polymorphism in their length, making them useful markers in genetic studies, including genome mapping and family-based linkage analysis. Microsatellites are also known as short tandem repeat markers (STRs) or short tandem repeat polymorphisms (STRPs).

Mitochondrial genome — The genetic material carried within mitochondria, known as mitochondrial DNA (mtDNA). At fertilization, all the mitochondria are derived from the egg, so mitochondrial genes display maternal inheritance.

Mitosis — The process of cell division occurring in somatic cells, in which each daughter cell receives a full chromosome complement.

Monogenic trait/monogenic disease — Trait or disease with inheritance that can be explained by a single gene. (See "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)".)

Monogenic traits are contrasted with polygenic and complex diseases. (See 'Polygenic trait/polygenic disease' below.)

Mosaicism — When referring to an individual, a state in which two populations of genetically distinct cells are present that arose from a single fertilized egg. Mosaicism can arise through a variety of mechanisms including chromosome nondisjunction, anaphase lag, endoreplication, and post-fertilization mutation. A common instance occurs in Klinefelter syndrome, in which post-fertilization nondisjunction causes some but not all cells to harbor an XXY karyotype.

Contrasted with chimerism. (See 'Chimerism' above.)

Mutation, mutant — An alteration in a gene, or the altered version of a gene, typically in such a manner that affects function, but not always (eg, a "silent" mutation that changes the DNA sequence but not the protein sequence). These terms are used in several different senses, depending on context:

●In human genetics, a mutant is a genetic variant of low population frequency, in contrast to a polymorphism (often a single nuclear polymorphism [SNP]) with an allele frequency of 1 percent or greater. (See 'Single nucleotide polymorphism (SNP)' below.)

Types of gene mutations include:

•Nonsense mutation – Creates premature stop codon (figure 3)

•Missense mutation – Creates amino acid change (figure 2)

•Synonymous mutation – Does not change protein sequence

•Frameshift mutation – Shifts the reading frame of the DNA, in turn altering the triplet codons for protein translation, creating an entirely new protein sequence downstream of the mutation (figure 4)

●In human disease, mutation is commonly used to imply a change associated with abnormal function (eg, sickle cell mutation of the hemoglobin beta chain). However, the preferred term in this case is pathogenic variant. (See 'Variant' below.)

●When used in the context of inheritance, mutation implies a recent sequence change (either germline or somatic), in contrast to inheritance from a carrier parent.

●When used to refer to a non-human organism or population of non-human organisms, a mutant refers to a population that harbors a specific, atypical variant (eg, antibiotic-resistant mutants). This term should not be used for people.

For the meanings that refer to human traits and diseases, there has been a shift in terminology to use the term variant rather than mutation; variants are further classified according to their pathogenicity. (See 'Variant' below.)

Mutation fraction — Synonymous with allelic fraction or allelic ratio. (See 'Allelic fraction' above and 'Allelic ratio' above.)

Next-generation sequencing — Any of several high-throughput DNA sequencing methods that rely on parallel analysis of multiple DNA fragments (eg, whole genome sequencing, exome sequencing). These methods have resulted in dramatic decreases in the cost and time needed for sequencing projects and are used in some clinical settings. (See "Next-generation DNA sequencing (NGS): Principles and clinical applications".)

Noncoding variant — Genetic variation that does not map to gene regions that code for protein. These variants can be functional if they reside in and disrupt functional elements, such as noncoding RNA sequences or regulatory sites (eg, promoters, enhancers, suppressors, or splice-sites).

Nucleic acid vaccine — Refers to a synthetic nucleic acid sequence (either DNA or RNA) packaged in a lipid soluble particle that transfects human cells to produce antigenic viral proteins to induce an antiviral immune response. Examples include FDA-approved RNA vaccines for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes coronavirus disease 2019 (COVID-19).

Oncogene — Gene that contributes to the production of cancer. Oncogenes typically act in a dominant manner (ie, an oncogenic mutation at one allele is sufficient to promote tumorigenesis). In contrast, tumor suppressor genes typically act in a recessive manner. (See 'Tumor suppressor gene' below.)

Pathogenic variant – Genetic change associated with disease or strongly suspected of being associated with disease. (See 'Variant' below.)

Pedigree — A diagram or other graphic representation of a family that shows the family relationships, sex of each family member, and presence or absence of one or more diseases in each individual (figure 10).

Penetrance — The probability that an individual harboring a pathogenic variant will develop the associated disease or condition. Incomplete (or variable) penetrance occurs when an individual with a pathogenic variant does not manifest features of the disorder. There are many causes of incomplete penetrance, including absence of environmental or genetic co-factors, epigenetic effects such as imprinting, sex-specific effects, or age-related expression differences. (See "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)", section on 'Penetrance and expressivity'.)

Phenotype — A characteristic of an organism (as opposed to the organism’s genotype). Phenotypes are sensitive to the assays used to assign or measure them. They may be categorical, such as presence or absence of a disease; or quantitative, such as systolic blood pressure. Further complexities in phenotypic description involve the physiological state of the organism at the time of measurement, age, or use of provocative stimuli. Most phenotypes are variable, and this variability leads to the concepts of penetrance and expressivity.

Pleiotropy — The association of variant(s) in a single gene with multiple phenotypic effects, often in different tissues or organs. An example is Marfan syndrome, in which mutations in the fibrillin 1 (FBN1) gene can cause cardiac, ocular, and connective tissue findings.

Ploidy — The number of sets of chromosomes present in an organism or cell. Ploidy varies among different organisms, including those that are always haploid (eg, bacteria), either haploid or diploid (eg, Saccharomyces species [yeast]), consistently diploid (eg, mammals) (see 'Diploid' above), or polyploid (eg, hexaploid wheat). Different tissues in multicellular organisms may have different ploidies (eg, mammalian hepatocytes may be tetraploid). The gametes (ova and sperm) are haploid (See 'Haploid' above.) The designation of ploidy is based on the predominant ploidy of cells in the organism.

Polygenic risk (PGR) score — An estimate of an individual's genetic risk for a specific polygenic phenotype that is derived from weights of alleles from hundreds to thousands of loci. Allele-specific weights are estimated using specialized linear regression method. The scores are typically generated in a model-building population, then validated in additional independent test populations. Synonymous with polygenic score; contrasts with genetic risk score, which calculates the contribution of the known risk alleles carried by an individual. (See 'Genetic risk score' above and "Principles of complex trait genetics", section on 'Polygenic risk scores'.)

Polygenic trait/polygenic disease — In contrast to monogenic diseases, polygenic diseases are those for which the inherited trait(s) is explained by more than one gene. (See 'Monogenic trait/monogenic disease' above.)

Polymerase chain reaction (PCR) — A method of specifically amplifying a unique target sequence (DNA or RNA) in the laboratory. PCR uses specific primers and repeated cycles of heating and cooling with a heat-stable DNA polymerase to replicate the template material exponentially. (See "Tools for genetics and genomics: Polymerase chain reaction".)

Polymorphism — Polymorphism can refer to a genetic polymorphism. (See 'Genetic polymorphism' above.)

It can also refer to any biologic marker (DNA, RNA, or protein) with two or more states. Protein polymorphisms (varying amino acid sequence) can result from DNA polymorphisms or from differential RNA splicing (different isoforms), which in turn can result from sequence variation, epigenetic phenomena, or temporal/spatial/environmental differences.

Quantitative traits and quantitative trait loci (QTL) — "Quantitative" traits are distinguished from discrete traits. The population varies continuously for quantitative traits and falls into obvious phenotypic classes for discrete traits. Quantitative traits are sometimes referred to as "complex" traits, reflecting the fact that multiple genes, the environment, and gene-environment interactions all contribute to an individual's trait value. Many traits are quantitative, and their inheritance is much more challenging to unravel than discrete traits. A quantitative trait locus (QTL) is a genomic region linked or associated with a quantitative trait.

Read depth — In genomic or gene sequencing, the number of independent times each base in a targeted region has been sequenced. Typically expressed as an average X coverage (for example 20X = an average of 20 sequence reads per base). A minimum read depth of 30X is often required for clinical-grade sequencing. (See "Next-generation DNA sequencing (NGS): Principles and clinical applications".)

Reading frame — The starting point in translating the DNA sequence to protein. Since each codon includes three nucleotides, the reading frame can be initiated at one of three nucleotides. Offsetting the reading frame changes the amino acid composition of the encoded protein.

Recombinant — Recombinant has different meanings in different contexts. For inheritance patterns, recombinant refers to offspring whose genotype and phenotype combinations differ from their parents, implying genetic recombination between the loci under study.

For laboratory techniques, recombinant technologies (also called genetic engineering), are molecular genetic approaches that use the process of homologous recombination to manipulate genotypes for experimental purposes. Examples include transgenic models where specific genetic loci are either knocked-out (removed) or knocked-in (introduced) to enable study of the locus; recombinant inbred mouse strains; recombinant viral transfection for synthesis of protein.

Recombination — The process of exchanging DNA sequence between two homologous chromosome regions. Mandatory recombination occurs at least once per aligned chromosome pair during meiosis. The exchange results in the creation of novel haplotypes that are combinations of the grandparental haplotypes present in a diploid cell. Exchange of unequal sequence content (ie, non-homologous recombination) can introduce DNA gains and losses of thousands or millions of bases. These gains and losses result in structural genetic variation and copy number variants (CNVs). (See 'Copy number variation (CNV)' above.)

Repulsion — The state in which alleles at two distinct loci are on physically opposing chromosomal strands. By definition, these variants are not part of the same haplotype (figure 5). In the example of dominant and recessive alleles, repulsion gametes formed are Ab and aB. The opposite relationship is coupling. (See 'Coupling' above.)

Risk allele — An allele associated with a disease phenotype that typically acts in combination with other genetic or environmental factors. Though a risk allele is often that which is least common (ie, the minor allele), risk alleles associated with some complex traits may be the more common allele.

RNA — RNA (ribonucleic acid) is a polymer consisting of a phosphoribose backbone and the bases adenine, guanine, uracil, and cytosine as side chains. Many viruses use RNA rather than DNA as the principal form of genetic information (RNA viruses).

There are several different types of RNAs that have diverse structures and functions.

●mRNA – Messenger RNA (mRNA) is transcribed from the coding strand of DNA and transmits the genetic information to the protein synthesis machinery, serving as an intermediary between a gene's DNA sequence and its encoded protein.

●rRNA – Ribosomal RNA (rRNA) is an integral component of ribosomes, the organelles responsible for protein synthesis.

●tRNA – Transfer RNAs (tRNAs) carry specific amino acids and recognize the corresponding codons of the mRNA during protein synthesis.

●Regulatory RNAs – There are several types of regulatory RNAs such as micro RNAs (miRs), long noncoding RNAs (LNCs, the ribonucleoproteins involved in mRNA splicing), and PIWI-interacting RNAs (piRNAs). PIWI (P-element-induced wimpy testis) designates a class of proteins that may regulate stem cells and appear to be aberrantly expressed in some cancers [4]. (See "Basic genetics concepts: DNA regulation and gene expression", section on 'Transcription'.)

RNA interference (RNAi) — A ubiquitous intracellular process mediated by small RNA species, whereby specific RNAs are targeted for editing, degradation, or clearance. RNAi has important roles in the regulation of gene expression, developmental processes, cellular defense, and epigenetic effects.

RNAi technology (also called antisense technology) has been used in the laboratory to test the function of a gene by preventing its expression. Its use has been attempted clinically as a means of posttranscriptional gene silencing to reduce the expression of viral or cancer genes, or to lower cholesterol. The specific therapies are sometimes referred to as antisense oligonucleotides (ASOs; AS-ODNs). Early attempts at developing therapeutic applications are ongoing in the fields of hematology, oncology, and neurodegenerative disease. (See "Overview of gene therapy, gene editing, and gene silencing".)

Sequencing — Determination of the nucleotide base sequence of a gene or collection of genes that determines the amino acid sequence of a protein. (See "Next-generation DNA sequencing (NGS): Principles and clinical applications".)

Sex chromosomes — Refers to the X and Y chromosomes, which are different in females (XX) and males (XY).

Sex-linked — A gene is sex-linked if it is located on a sex chromosome rather than on an autosome. A gene's inheritance pattern is also referred to as sex-linked if the pattern corresponds to that of known sex-linked genes (rather than autosomal genes). (See 'Autosomal' above.)

Silencing — Regulation that prevents the expression of a gene. Mechanisms of silencing include gene methylation (see 'Methylation' above), destruction of messenger RNA, or prevention of protein translation.

Single nucleotide polymorphism (SNP) — A single nucleotide polymorphism (SNP; pronounced "snip") is a polymorphism (difference in base pair) that affects a single base pair. This terminology was previously used to refer to variation that had a population frequency of at least 1 percent. The term SNP is commonly used in research such as in GWAS studies (see 'Genome-wide association study (GWAS)' above). In clinical diagnostic testing, the term "variant" with a qualifier about pathogenicity is preferred (although use is inconsistent).

SNP may also be used to refer to polymorphisms in a testing platform such as a SNP array. (See "Genetic association and GWAS studies: Principles and applications", section on 'Single nucleotide polymorphisms' and "Tools for genetics and genomics: Cytogenetics and molecular genetics", section on 'Allele specific oligonucleotide hybridization' and "Tools for genetics and genomics: Cytogenetics and molecular genetics", section on 'Array comparative genomic hybridization'.)

Somatic — Referring to tissues that are not within the germline. Somatic mutations arise in somatic tissues and are therefore not passed from parent to offspring. Somatic mutations are common in neoplasms.

Structural genetic variation — A term that encompasses a variety of large-scale genomic aberrations, including segmental rearrangements, translocations, or inversions and copy-number variants (CNVs) (see 'Copy number variation (CNV)' above). Large rearrangements or deletions can be visualized through karyotyping. Smaller variants, particularly CNVs, segmental duplications, and interchromosomal interstitial rearrangements, are assessed by array comparative genomic hybridization (array CGH) or SNP arrays.

Syntenic — Describing genetic loci that reside on the same chromosome. As an example, the genes causing Birt-Hogg-Dubé syndrome (Folliculin [FLCN], at chromosome 17p11) and early-onset breast cancer (BRCA1, at chromosome 17q21) are syntenic to each other on chromosome 17. However, because they are far apart from each other, they are not linked. (See 'Linkage' above.)

Telomere — Region at the ends of a chromosome that prevents the loss of genetic material or the accidental fusion of two chromosomes together during cell division. Telomeres of chromosomes in most cells shorten as an individual ages. Telomere length is maintained by the enzyme telomerase. (See 'Telomerase' below.)

Telomerase — Multicomponent enzyme that extends the length of telomeres. Telomerase mutations are seen in some inherited "telomere syndromes." (See "Dyskeratosis congenita and other telomere biology disorders".)

Translocation — A translocation is a structural chromosomal abnormality whereby chromosome segments are exchanged (swapped) between two non-homologous chromosomes. This form of rearrangement can be balanced, when the translocation does not result in any significant loss or gain of genetic material in the resultant gamete or cell; or unbalanced, when there is a gain or loss of genetic material in the resultant gamete or cell. (See "Chromosomal translocations, deletions, and inversions", section on 'Translocations'.)

Tumor suppressor gene — A tumor suppressor gene is a gene that protects against the development or growth of tumors. Tumor suppressor genes typically act in a recessive manner (ie, both normal copies must be lost for a tumor to develop). In contrast, oncogenes typically act in a dominant manner. (See 'Oncogene' above.)

Uniparental disomy — The inheritance of two copies of a chromosome (or part of a chromosome) from one parent, and no copy from the other parent, due either to nondisjunction errors during either the first or second phases of meiosis, or to chromosomal alterations in early fetal development. Nondisjunction during the first phase of meiosis (meiosis I) will result in inheritance of each of the grandparental chromosomes from one parent, termed "heterodisomy." In contrast, nondisjunction during meiosis II results in inheritance of two identical copies of one grandparental chromosome, termed "isodisomy."

Variant — The term variant is used to refer to a specific change in either DNA or protein sequence.

In microbiology, a variant refers to an organismal isolate whose genetic sequence varies from that of its reference organism. (See 'Viral variant' below.)

For germline variants, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology have recommended use of a five-tier terminology system for the clinical classification of genetic variants (table 1), consisting of the following designations [5]:

●Pathogenic variant (PV) – A disease-causing variant, as determined by very strong genetic and experimental evidence, including consistent familial co-segregation with disease and definitive functional studies.

●Likely pathogenic variant (LPV) – A variant with strong, but not definitive, evidence of pathogenicity based on its similarity to known pathogenic variants, co-segregation with disease in families or populations, and functional evidence.

●Variant of uncertain significance (VUS) – A variant for which the specific criteria for the other four criteria are not met, or when contradictory lines of evidence in support of both benign or pathogenic classifications are present. Also called variant of unknown significance.

●Likely benign variant – A variant with multiple supporting (but not conclusive) lines of evidence to suggest it is not disease-causing.

●Benign variant – A variant with conclusive evidence as not disease-causing, as determined typically (but not only) by a high prevalence of the variant in the general (healthy) population, at a prevalence that exceeds that of the suspected disease.

For clinical testing and management, these terms are preferred over "mutation." Mutation remains appropriate in certain contexts such as when referring to a pathophysiologic process or to specific changes in a region of DNA (or less commonly, a protein). (See 'Mutation, mutant' above.)

Additional information about this classification and its application to genomic testing is presented separately. (See "Secondary findings from genetic testing", section on 'Definitions and classification of variants'.)

Variant allele frequency — See allele frequency. (See 'Allele frequency' above.)

Variant of uncertain significance (VUS) — A classification term used in clinical DNA sequencing reports to signify genetic polymorphisms (variants) for which the pathogenicity (likelihood of causing disease) cannot be determined easily and that cannot be readily classified as "pathogenic," "likely pathogenic," "benign, or "likely benign." Also called variant of unknown significance. (See 'Variant' above.)

Viral variant — A viral isolate with a genome sequence that differs from that of the reference virus, regardless of whether the sequence variant alters the virus's phenotype. A viral strain is a viral variant with a sequence change that confers a unique viral phenotype (eg, altered replication rate, infectivity, or lethality). In contrast, variants with an impact limited to antigenicity are referred to as having different serotypes rather than as different strains. (See "COVID-19: Epidemiology, virology, and prevention", section on 'Variants of concern'.)

Whole genome sequencing — A sequencing strategy that provides the DNA sequence for the entire genome, including exons, introns, and other noncoding sequence. In contrast, exome sequencing only determines the sequence of gene-coding regions.

X-inactivation — An epigenetic process that occurs in all female mammalian cells, whereby one of the two X chromosomes are randomly rendered inactive, such that all subsequent gene expression is derived from the other (active) X chromosome. This is sometimes called lyonization, after Mary Lyon, who did important early work on this phenomenon. (See "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)", section on 'Sex-linked' and "Principles of epigenetics", section on 'Types of processes that are regulated'.)

SUMMARY

●Definitions – Commonly used genetics terms are defined above. (See 'Definitions' above.)

●Genetics concepts – Basic genetics concepts are discussed in separate topic reviews. (See "Basic genetics concepts: DNA regulation and gene expression" and "Basic genetics concepts: Chromosomes and cell division" and "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)" and "Genomic disorders: An overview" and "Principles of complex trait genetics" and "Principles of epigenetics".)

●Clinical applications – Use of genetic information in clinical care is also discussed in separate topic reviews. (See "Genetic testing" and "Genetic counseling: Family history interpretation and risk assessment" and "Personalized medicine" and "Secondary findings from genetic testing".)

●Genetics tools – Specific methods used in clinical genetic testing and research are also discussed separately.

•DNA sequencing – (See "Next-generation DNA sequencing (NGS): Principles and clinical applications".)

•PCR – (See "PCR testing for the diagnosis of herpes simplex virus in patients with encephalitis or meningitis".)

•Cytogenetics – (See "Tools for genetics and genomics: Cytogenetics and molecular genetics".)

•Gene expression profiling/genome-wide association studies (GWAS) – (See "Tools for genetics and genomics: Gene expression profiling" and "Genetic association and GWAS studies: Principles and applications".)

•Animal models – (See "Tools for genetics and genomics: Specially bred and genetically engineered mice" and "Tools for genetics and genomics: Model systems".)

Brenner's Encyclopedia of Genetics, 2nd ed, Maloy S, Hughes K (Eds), Elsevier, 2013.
Van Loo P, Campbell PJ. ABSOLUTE cancer genomics. Nat Biotechnol 2012; 30:620.
Cooke Bailey JN, Igo RP Jr. Genetic Risk Scores. Curr Protoc Hum Genet 2016; 91:1.29.1.
Han YN, Li Y, Xia SQ, et al. PIWI Proteins and PIWI-Interacting RNA: Emerging Roles in Cancer. Cell Physiol Biochem 2017; 44:1.
Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 2015; 17:405.

Topic 2898 Version 55.0

References

1 : Brenner's Encyclopedia of Genetics, 2nd ed, Maloy S, Hughes K (Eds), Elsevier, 2013.

2 : ABSOLUTE cancer genomics.

3 : Genetic Risk Scores.

4 : PIWI Proteins and PIWI-Interacting RNA: Emerging Roles in Cancer.

5 : Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.