1. Why be skeptical of contemporary conceptions of the gene?
The contemporary, classical-molecular understanding of “gene” involves a segment of double-stranded DNA which codes for or causes the production of a functional polypeptide chain; this process is essentially characterized by two steps: transcription and translation. During transcription DNA is converted to single-stranded RNA that retains a base sequence that is complementary to that of the original DNA. RNA may be characterized into three main groups: transfer RNA (tRNA), ribosomal RNA (rRNA), and messenger RNA (mRNA). Not only do RNA strands compliment the original DNA, but when taken as a triplet of nucleotides, or codon, they correspond to a specific amino acid, which during translation form polypeptide chains. However, though the correspondence between RNA molecules and amino acids applies to each type of RNA, of the three only the mRNA molecules are translated into polypeptides; tRNA aids in this process and rRNA becomes a part of ribosomes found in the endoplasmic reticulum of a cell. This simplistic sketch provides a quite obvious worry with the above conception; that is, it ignores instances in which RNA molecules are not translated into polypeptides. Thus it is fitting, as textbooks generally do, to modify the above definition in order to include RNA molecules along with polypeptides. However even when this is allowed, other issues still remain.
The problems that arise with properly defining “gene” are broadly concerned with, but not necessarily a criticism of, the pluralistic way biologists tend to use the term (Falk 1986; Stotz et al. 2004). It seems that “gene” may be used to denote the entire coding region, that is, the entire sequence of DNA that ultimately codes for a polypeptide; however sometimes the term is used to describe only parts of a coding region. “Gene” may also be used to describe portions of a coding region along with regulatory regions, which is a segment of DNA that isn’t translated into a polypeptide but aids in regulating the expression of the polypeptide that is coded for. “Gene” also may indicate the entire coding and regulatory regions found within a segment of DNA (Waters 2007)
Also, worries arise when considering various observations such as mRNA editing, alternate cis-splicing, trans-splicing, overlapping genes, and antisense transcription, which when considered have led some to define genes as whatever a competent biologists choose to call or need for experiment (Falk 1986, p. 169; Kitcher 1992, p. 131; Fogle 2001). This, as gene skeptics tend to argue, renders the concept hopelessly ambiguous, and should be done away with altogether (Portin 1993; Fogle 2001; Kitcher 1992; Burian 1986). However I don’t think that a pluralistic account of genes should lead to such a drastic conclusion; it seems that, at least within a given research program, researchers are entitled to a certain amount of leeway to adequately define what they are studying. Thus, I suggest that the above, classical-molecular definition of “gene” be kept but pluralistically defined from research program to research program.
Here, I will elaborate upon the above mentioned observations, claiming that there are three categories, namely issues concerning predictability, issues related to the ambiguous nature of DNA, and issues that arise over gene boundaries that lead to the gene concept being thought ambiguous. This will be followed by an explanation of various attempts to solve the issues of this observations create for the gene concept. Taking these attempts into consideration, I will then conclude that though ambiguities arise within the gene concept, these aren’t reason for concern so long as researchers are clear about what they consider a gene within their particular programs.
2. Observations conflicting with the gene concept
The conflicting observations mentioned above may be split into three types: (1) issues involving predictability, (2) those related to the ambiguous nature of DNA, and (3) those concerned with gene boundaries. (1) seems to arise primarily after transcription, during RNA editing. It concerns predictability in that if a particular DNA segment is thought of as coding for a polypeptide, it seems to follow that the DNA allows predictive power concerning the particular polypeptide that is to be translated. However with mechanisms in place such as mRNA editing, alternative cis-splicing, and trans-splicing, it is not always possible to observe a segment of DNA and accurately predict its resulting polypeptide chain. Problems concerned with predictability seem to arise when biologists talk about “genes for” a particular polypeptide, protein, or trait in general. The worry is that if genes are segments of DNA, it seems that an observed reading sequence should be able to accurately predict a particular trait. However these mechanisms have the potential to render these predictions false. (2) involves observations such as gene overlap and antisense transcription, in which a segment of DNA is capable of coding for more than one product. I consider these ambiguities inherent in the DNA in which more than one product is capable of being produced before DNA is transcribed. This creates problems for the gene concept when a reading sequence is viewed as discrete, only coding for one polypeptide. (3) is primarily concerned with regulatory and promoter regions, in which the primary reading sequence is only able to be transcribed, and in turn translated, with the aid of these regions. Below, I will describe these conflicting observations.[i]
DNA is made up of four nucleotides: adenine (A), guanine (G), cytosine (C), and thymine (T). G forms a base pair with C; A forms a base pair with T. During transcription T is replaced by the nucleotide uracil (U). But during mRNA editing C may be converted to U; when this occurs a stop codon, that is an mRNA triplet that codes for transcription to stop, is potentially introduced (Stotz & Griffiths 2004). Thus, if the mRNA triplet is originally CAA, CAG, CGA, or CGG and this conversion takes place at any of these reading frames, the triplet will become UAA, UAG, UGA, or UGG each of which are stop codons. If a novel stop codon is introduced, what would have originally been a longer x polypeptide now becomes a shorter y. Keeping with the underlying issue, this establishes problems if an observed sequence of DNA is predicted to code for x, but since the stop has been introduced, y is the polypeptide observed. Here, there is no issue with the above definition so long as it implies that the DNA segment codes for any polypeptide; however if the DNA is expected to allow for predictability of a particular polypeptide problems arise.
Similar complications arise when considering that introns, non-coding DNA sequences, are found in a given coding sequence. Introns are initially transcribed into what is called precursor RNA (pre-RNA), but are eventually excised and the coding regions, exons, are spliced together (Stotz et al. 2004, p. 650; Stotz & Griffiths 2004). Here again it is seen that the polypeptide the original coding region appeared to code for is not that which is coded. This is further complicated when considering alternative cis-splicing, in which exons of the pre-RNA are capable of being spliced together in such a manner that gives rise to a number of different polypeptides which were coded for by one strand of DNA, i.e. several gene products may possibly be made from what is considered a single gene (Stotz et al. 2004, p 650). The above are cases that deal with splicing on a single coding sequence; however this is even further complicated by the existence of trans-splicing, in which mRNA transcripts from different DNA coding regions are spliced into a single polypeptide (Stotz et al. 2004, p. 650; Stotz & Griffiths 2004). Here the issue becomes not just that a particular coding region may not be able to predict a corresponding polypeptide; rather, the issue becomes that even if a coding region is able to accurately predict a certain polypeptide, the region of mRNA coded for may ultimately be trans-spliced into another mRNA strand coded for by a different DNA coding sequence, in turn yielding a different polypeptide. With issues relating to predictability summed, what of the ambiguous nature of DNA?
Overlapping genes, in which two or more DNA coding regions overlap, pose a threat to the above conception of genes in that a given strand of DNA may contain more than one reading frame coding for more than one product. Genes aren’t lined up as if they were beads on a string, one coding region may start with another (Stotz & Griffiths 2004). In these cases, the same segment of DNA may be treated as two or more different genes, even possibly, depending upon the amount of shared sequences, with very different products (Stotz et al. 2004, p. 650; Stotz & Griffiths 2004). An alternative way of thinking of this may be to consider a two-headed person standing in line among other people[ii]. The line as a whole is analogues to a given strand of DNA and the two-headed person to overlapping genes. A problem arises when attempting to count the people standing in line, is the two-headed person two people or count as one? To answer this there needs to be a definition of “person”; once defined, the people in the line seem easily countable; so if “gene” is defined, they too are countable. With this however, genes seem to necessarily be defined in terms of gene products, or at least in terms of what they do, and if there are ambiguities pertaining to the classical-molecular conception, which defines gene in this way, there is still a problem. Antisense transcription similarly results in two, possibly very different products, but in a very different way. DNA is read from its 3’ end to its 5’ end, but since DNA is double stranded, there are two 3’ and 5’ ends. So it is possible for one strand to code for one product and the other code for a very different one (Stotz & Griffiths 2004). These cases differ from the above in that even if a coding region in a DNA strand has predicative power over its products, it still potentially codes for more than one product.
So far instances in which DNA is capable of being transcribed and eventually translated have been considered. But consider a segment of DNA that is unable to be transcribed without the aid of some promoter or regulatory region found upstream of the primary reading sequence. Such instances are akin to the lac operon in E. coli, in which the regulatory and promoters regions are found immediately upstream from the primary reading sequence. However this is not the case when considering many eukaryotes, in which these regions may be found at much greater distances (Stotz et al. 2004, p. 650). Seemingly, it is easy to discount such regions as transcriptions factor of promoter regions; however, without them the primary reading sequence will not be transcribed. Thus they are necessary if DNA is to eventually express the polypeptide it codes for; so with the above definition of “gene” they necessarily are considered part of the gene. However, if the above definition only suggests that a particular DNA coding sequence is a gene, thus regulatory regions that are not found in such a sequence are not part of a gene. With this, a paradox contingent upon what counts as a gene, or at least concerning gene boundaries, seems to arise.[iii]
3. Dealing with conflicting observations
The complications that arise due to the above observations have been dealt with a number of ways, here I describe two extremes. As mentioned above, some advocate for notions of “gene” to be discarded to allow for focus to instead turn toward the underlying biochemical processes that result in what has to this point been know as the gene. Others tend to embrace these conflicting accounts, suggesting that biologists retain the gene concept, while also incorporating the conflicts. I tend to take a different approach in that so long as a biologist within her research program adequately defines the unit in which she is working with, issues concerning the gene aren’t as detrimental as may originally appear. Below I will describe various ways philosophers have dealt with conflicts arising within the gene concept, concluding that these issues may be incorporated into the concept with minimal harm resulting so long as genes are defined adequately with a given research program retaining focus on what has been classically and molecularly defined as a gene. I will begin with a suggestion to discard the notion.
Thomas Fogle suggests that what biologists consider a gene is really a social construct of what a gene is supposed to be, which is formed through observation of parts that have been stereotypically specified to belong to a gene, which he terms the “consensus gene” (2001). In this a gene is a DNA sequence that contains enough features, e.g. TATA box, RNA transcript, open reading frame, etc., to conform to what has historically counted as such (Fogle 2001; Stotz & Griffiths 2004). Fogle argues that issues relating to the gene are a product of combining functional and structural features, which ultimately hinders biologists from realizing the diversity of function capable of being performed by one structure and the diversity of structures capable of performing one function (Fogle 2001; Stotz & Griffiths 2004). Thus it seems the gene is unwarrantedly stereotyped as a segment of DNA that ultimately causes the production of polypeptides; in turn, when conflicting observations are made the gene stereotype is said to be in peril. Fogle seems to suggest that if such a stereotype were never formed, there would be no issues related to the gene concept; in this he argues that if individual biochemical processes were all that were observed and described, conflict with the standard gene concept could not arise because there would be no gene concept (2001). However, it seems that there is basis in forming such a stereotype in that genes generally do code for an RNA transcript which eventually is transcribed into a polypeptide chain, they generally contain a promoter region, or TATA box, that signals various enzymes to induce transcription, they are subject to various forms of RNA editing, and the like. So it doesn’t seem particularly unfair to suggest that most genes do fit a stereotype. Thus it may be initially argued that the standard gene concept should remain. However, if the stereotypical gene is described as above, a segment of DNA coding for a polypeptide, it may easily be shown the trouble that this leads. Consider the process of cis-splicing which, as described above, a single segment of DNA may possibly code for multiple polypeptides; this case leads to trouble with the stereotypical gene which has been characterized as a segment of DNA coding for a polypeptide, which is not so with this and the cases described above. Seemingly it could be argued that the stereotype may be changed to accommodate for instances like those described, but the stereotype then becomes something akin to an infinite regress. So it would more easily do to discard the gene concept altogether in order for focus to be paid to the underlying processes occurring (Fogle 2001).
Instead of doing away with the gene concept outright, Kenneth Waters recommends that “gene” be redefined as “a linear sequence in a product at some stage of genetic expression” (1994, p. 178). With genes characterized in this manner, seemingly conflicting events are simply incorporated into the concept. Questions concerning conflicting observations are dealt with by asking whether the observation is “a linear sequence in a product at some stage of genetic expression”, and it seems that most if not all conflicting issues related to the gene concept may be considered such (178). Though I admire Waters’ re-definition in that it is able to accommodate conflicting observations found in genetics, it seems strange in that it treats genes as if they are events. His use of the word “stage” seems unfitting to observable instances pertaining to genes; it sees that he suggests genes to be all that occur within a timeframe from DNA replication to translation[P2] .
Ralph Falk’s conclusion that biologist define genes according to their particular research needs, though he suggests that present debates concerning this matter aren’t as helpful as those between early geneticists and cytologists, hints to my previous claims (1986, p. 169; Stotz et al. 2004, p. 651). As suggested, I contend that “gene” be defined in such a manner that corresponds to a particular research program, and I do not think this is too far from what biologists actually do. I side with Fogle in that underlying biochemical processes should be the focal point of biologists’ definitions; however unlike Fogle, I think the classical-molecular conception should also be retained. It seems that genes are ultimately viewed as difference makers or causes;[P3] in the above definition, genes are defined as segments of DNA causing RNA strands or polypeptide chains. Thus, without any attention paid to the underlying mechanisms, what biologists ultimately seem to be concerned with is the differences genes make, and segments of DNA ultimately make a difference in either polypeptides or RNA. So a gene may still be considered a segment of DNA which codes for a polypeptide chain or RNA segment, but in a given research program, attention must be given to all of the underlying processes that occur or possibly occur in the production of the RNA or polypeptide in question. I stipulate a focus on underlying biochemical processes to avoid confusions that potentially arise when conflicting observations are made. This treatment of genes looks something like this: a biologist is studying DNA coding sequence x, and x is known to usually cause the production of RNA segment y which goes on to produce z polypeptide, within the definition of x she must include all that x may possibly produce. Say that x is sometimes subject to cis-splicing, and when this occurs x eventually produces polypeptide a rather than z. With this being the case, the biologist must include the production of a as a possible exception to z in her definition; this also holds if x is subject to trans-splicing, mRNA editing and the like. This treatment of “gene”, since it retains the classical-molecular definition, also accounts for issues that result from gene overlap. As hinted above, if segment x overlaps with another coding region, defining “gene” in terms of products allows for genes to be counted. Also, if x has a regulatory region found upstream, it seems necessary to include it in the biologist’s definition as well. As suggested above, without the presence of such a regulatory region what is coded for will not be produced, and if x is considered to ultimately produce z, it seems that this region should also be described in the definition of x. So this treatment of genes ultimately accounts for the conflicting observations mentioned above.
However, a criticism of treating genes in this manner is that if biologists define genes in terms of their research programs, “gene” remains ultimately ambiguous. This is analogous to uses of the ambiguous words like “bank” or “jerk” in that there is more than one meaning for each. “Jerk” may be a rude person, or a sudden stopping motion; likewise if biologists use “gene” to include instances of gene overlap (or other conflicting observations) and in other research programs such cases are excluded, the treatment is still ambiguous. However, such a criticism ignores the stipulation that biologists concern themselves with DNA as well as its products, which establishes commonality among all possible, individual research program conceptions of genes. As suggested above, this stipulation is necessary in order for “gene” to ultimately have roughly the same meaning among varying research programs. This however still seems a rather convoluted way of dealing with problems with the gene concept.
[i] For a full list of observations that conflict with the standard gene concept, see Peter Portin (1993).
[ii] This example, I think, is most suiting, and is accredited solely to David Harker.
[iii] This may be thought of in terms of classical genetics in which a mutation was induced on a strand of DNA, resulting in lack of a particular trait’s production. However, the mutation occurred at the regulatory region for this trait’s coding region rather than the coding region itself. The researcher in turn ascribes the regulatory region, opposed to the coding region, as the gene for this trait.