I mulled over some of the suggestions in my request for basic topics to cover, and I realized that there is no such thing as a simple concept in biology. Some of the ideas required a lot of background in molecular biology, others demand understanding of the philosophy of science, and what I am interested in is teetering way out at the edge of what we know, where definitions often start to break down. Sorry, I have to give up.
Seriously, though, I think that what does exist are simple treatments of complex subjects, so that is what I’m aiming for here: I talk a lot about genes, so let’s just step way back and give a useful definition of a gene. I admit right up front, though, that there are two limitations: I’m going to give a very simplified explanation that fits with a molecular genetics focus (pure geneticists define genes very differently), and I’m going to talk only about eukaryotic/metazoan genes. I tell you right now that if I asked a half dozen different biologists to help me out with this, they’d rip into it and add a thousand qualifiers, and it would never get done. So let’s plunge in and see what a simple version of a gene is.
First, let me cite a single source that I used to pull this out: Modern Genetic Analysis: Integrating Genes and Genomes(amzn/b&n/abe/pwll) by Anthony J.F. Griffiths, Richard C. Lewontin, Jeffrey H. Miller, and William M. Gelbart. It’s an excellent genetics textbook, well worth the $116 if you’ve got the loose change jangling about. Here’s their definition of a gene:
A gene is an operational region of the chromosomal DNA, part of which can be transcribed into a functional RNA at the correct time and place during development. Thus, the gene is composed of the transcribed region and adjacent regulatory regions.
So we have long strings of DNA organized into chromosomes in each of our cells, and certain portions of that DNA will be copied or transcribed into RNA strands by various proteins in the nucleus. Which parts will be transcribed will depend in part on what proteins are present in a particular cell; the proteins have to bind to specific regions in the DNA to initiate the protein machinery to do the work of copying, and that machinery also recognizes certain regions of the DNA as places to stop copying. We have approximately 25,000 genes; the emphasis is on the “approximately” because one of the ways we identify genes is by looking for the punctuation marks of the start and stop regions, and there’s a lot of random punctuation scattered throughout the genome. The hypothetical designer must be a very poor copy editor.
Here’s a simple picture of a eukaryotic gene.
It has a few general parts. It’s on a strand of DNA, which you’ll have to imaging going off the screen to the left and right for a few miles in either direction. There is a regulatory region for transcription initiation (more about that in a little bit) which, if we include various enhancers and repressors, may stretch for many thousands of base pairs, with important short areas for regulation scattered throughout; one serious flaw with this diagram is that the regulatory regions comprise roughly twice as much DNA as the coding regions.
The part of the gene that is actually transcribed is broken up into regions called introns and exons. Introns aren’t going to be part of the final gene product, usually; enzymes are going to cut them out of the RNA and splice together those dark green exons to make the final functional RNA.
Let’s look at a specific example of a gene. The Online Mendelian Inheritance in Man database makes it easy to look up human genes with known functional roles, and I arbitrarily picked CFTR, the cystic fibrosis transmembrane conductance regulator. Follow that link, and you’ll learn far more than you ever wanted to about this gene that transports ions across cell membranes, and which is responsible for cystic fibrosis when it fails to work. It’s not basic, I’m afraid.
One of the things you can do from that gene entry, though, is take a look at a graphic portrayal of the gene on the chromosome. CFTR is one gene among many on the long arm of chromosome 7. You can also find the nucleotide sequence for the coding region here. Isn’t this informative?
1 aattggaagc aaatgacatc acagcaggtc agagaaaaag ggttgagcgg caggcaccca 61 gagtagtagg tctttggcat taggagcttg agcccagacg gccctagcag ggaccccagc 121 gcccgagaga ccatgcagag gtcgcctctg gaaaaggcca gcgttgtctc caaacttttt 181 ttcagctgga ccagaccaat tttgaggaaa ggatacagac agcgcctgga attgtcagac 241 atataccaaa tcccttctgt tgattctgct gacaatctat ctgaaaaatt ggaaagagaa 301 tgggatagag agctggcttc aaagaaaaat cctaaactca ttaatgccct tcggcgatgt 361 tttttctgga gatttatgtt ctatggaatc tttttatatt taggggaagt caccaaagca 421 gtacagcctc tcttactggg aagaatcata gcttcctatg acccggataa caaggaggaa 481 cgctctatcg cgatttatct aggcataggc ttatgccttc tctttattgt gaggacactg 541 ctcctacacc cagccatttt tggccttcat cacattggaa tgcagatgag aatagctatg 601 tttagtttga tttataagaa gactttaaag ctgtcaagcc gtgttctaga taaaataagt 661 attggacaac ttgttagtct cctttccaac aacctgaaca aatttgatga aggacttgca 721 ttggcacatt tcgtgtggat cgctcctttg caagtggcac tcctcatggg gctaatctgg 781 gagttgttac aggcgtctgc cttctgtgga cttggtttcc tgatagtcct tgcccttttt 841 caggctgggc tagggagaat gatgatgaag tacagagatc agagagctgg gaagatcagt 901 gaaagacttg tgattacctc agaaatgatt gaaaatatcc aatctgttaa ggcatactgc 961 tgggaagaag caatggaaaa aatgattgaa aacttaagac aaacagaact gaaactgact 1021 cggaaggcag cctatgtgag atacttcaat agctcagcct tcttcttctc agggttcttt 1081 gtggtgtttt tatctgtgct tccctatgca ctaatcaaag gaatcatcct ccggaaaata 1141 ttcaccacca tctcattctg cattgttctg cgcatggcgg tcactcggca atttccctgg 1201 gctgtacaaa catggtatga ctctcttgga gcaataaaca aaatacagga tttcttacaa 1261 aagcaagaat ataagacatt ggaatataac ttaacgacta cagaagtagt gatggagaat 1321 gtaacagcct tctgggagga gggatttggg gaattatttg agaaagcaaa acaaaacaat 1381 aacaatagaa aaacttctaa tggtgatgac agcctcttct tcagtaattt ctcacttctt 1441 ggtactcctg tcctgaaaga tattaatttc aagatagaaa gaggacagtt gttggcggtt 1501 gctggatcca ctggagcagg caagacttca cttctaatgg tgattatggg agaactggag 1561 ccttcagagg gtaaaattaa gcacagtgga agaatttcat tctgttctca gttttcctgg 1621 attatgcctg gcaccattaa agaaaatatc atctttggtg tttcctatga tgaatataga 1681 tacagaagcg tcatcaaagc atgccaacta gaagaggaca tctccaagtt tgcagagaaa 1741 gacaatatag ttcttggaga aggtggaatc acactgagtg gaggtcaacg agcaagaatt 1801 tctttagcaa gagcagtata caaagatgct gatttgtatt tattagactc tccttttgga 1861 tacctagatg ttttaacaga aaaagaaata tttgaaagct gtgtctgtaa actgatggct 1921 aacaaaacta ggattttggt cacttctaaa atggaacatt taaagaaagc tgacaaaata 1981 ttaattttgc atgaaggtag cagctatttt tatgggacat tttcagaact ccaaaatcta 2041 cagccagact ttagctcaaa actcatggga tgtgattctt tcgaccaatt tagtgcagaa 2101 agaagaaatt caatcctaac tgagacctta caccgtttct cattagaagg agatgctcct 2161 gtctcctgga cagaaacaaa aaaacaatct tttaaacaga ctggagagtt tggggaaaaa 2221 aggaagaatt ctattctcaa tccaatcaac tctatacgaa aattttccat tgtgcaaaag 2281 actcccttac aaatgaatgg catcgaagag gattctgatg agcctttaga gagaaggctg 2341 tccttagtac cagattctga gcagggagag gcgatactgc ctcgcatcag cgtgatcagc 2401 actggcccca cgcttcaggc acgaaggagg cagtctgtcc tgaacctgat gacacactca 2461 gttaaccaag gtcagaacat tcaccgaaag acaacagcat ccacacgaaa agtgtcactg 2521 gcccctcagg caaacttgac tgaactggat atatattcaa gaaggttatc tcaagaaact 2581 ggcttggaaa taagtgaaga aattaacgaa gaagacttaa aggagtgctt ttttgatgat 2641 atggagagca taccagcagt gactacatgg aacacatacc ttcgatatat tactgtccac 2701 aagagcttaa tttttgtgct aatttggtgc ttagtaattt ttctggcaga ggtggctgct 2761 tctttggttg tgctgtggct ccttggaaac actcctcttc aagacaaagg gaatagtact 2821 catagtagaa ataacagcta tgcagtgatt atcaccagca ccagttcgta ttatgtgttt 2881 tacatttacg tgggagtagc cgacactttg cttgctatgg gattcttcag aggtctacca 2941 ctggtgcata ctctaatcac agtgtcgaaa attttacacc acaaaatgtt acattctgtt 3001 cttcaagcac ctatgtcaac cctcaacacg ttgaaagcag gtgggattct taatagattc 3061 tccaaagata tagcaatttt ggatgacctt ctgcctctta ccatatttga cttcatccag 3121 ttgttattaa ttgtgattgg agctatagca gttgtcgcag ttttacaacc ctacatcttt 3181 gttgcaacag tgccagtgat agtggctttt attatgttga gagcatattt cctccaaacc 3241 tcacagcaac tcaaacaact ggaatctgaa ggcaggagtc caattttcac tcatcttgtt 3301 acaagcttaa aaggactatg gacacttcgt gccttcggac ggcagcctta ctttgaaact 3361 ctgttccaca aagctctgaa tttacatact gccaactggt tcttgtacct gtcaacactg 3421 cgctggttcc aaatgagaat agaaatgatt tttgtcatct tcttcattgc tgttaccttc 3481 atttccattt taacaacagg agaaggagaa ggaagagttg gtattatcct gactttagcc 3541 atgaatatca tgagtacatt gcagtgggct gtaaactcca gcatagatgt ggatagcttg 3601 atgcgatctg tgagccgagt ctttaagttc attgacatgc caacagaagg taaacctacc 3661 aagtcaacca aaccatacaa gaatggccaa ctctcgaaag ttatgattat tgagaattca 3721 cacgtgaaga aagatgacat ctggccctca gggggccaaa tgactgtcaa agatctcaca 3781 gcaaaataca cagaaggtgg aaatgccata ttagagaaca tttccttctc aataagtcct 3841 ggccagaggg tgggcctctt gggaagaact ggatcaggga agagtacttt gttatcagct 3901 tttttgagac tactgaacac tgaaggagaa atccagatcg atggtgtgtc ttgggattca 3961 ataactttgc aacagtggag gaaagccttt ggagtgatac cacagaaagt atttattttt 4021 tctggaacat ttagaaaaaa cttggatccc tatgaacagt ggagtgatca agaaatatgg 4081 aaagttgcag atgaggttgg gctcagatct gtgatagaac agtttcctgg gaagcttgac 4141 tttgtccttg tggatggggg ctgtgtccta agccatggcc acaagcagtt gatgtgcttg 4201 gctagatctg ttctcagtaa ggcgaagatc ttgctgcttg atgaacccag tgctcatttg 4261 gatccagtaa cataccaaat aattagaaga actctaaaac aagcatttgc tgattgcaca 4321 gtaattctct gtgaacacag gatagaagca atgctggaat gccaacaatt tttggtcata 4381 gaagagaaca aagtgcggca gtacgattcc atccagaaac tgctgaacga gaggagcctc 4441 ttccggcaag ccatcagccc ctccgacagg gtgaagctct ttccccaccg gaactcaagc 4501 aagtgcaagt ctaagcccca gattgctgct ctgaaagagg agacagaaga agaggtgcaa 4561 gatacaaggc tttagagagc agcataaatg ttgacatggg acatttgctc atggaattgg 4621 agctcgtggg acagtcacct catggaattg gagctcgtgg aacagttacc tctgcctcag 4681 aaaacaagga tgaattaagt ttttttttaa aaaagaaaca tttggtaagg ggaattgagg 4741 acactgatat gggtcttgat aaatggcttc ctggcaatag tcaaattgtg tgaaaggtac 4801 ttcaaatcct tgaagattta ccacttgtgt tttgcaagcc agattttcct gaaaaccctt 4861 gccatgtgct agtaattgga aaggcagctc taaatgtcaa tcagcctagt tgatcagctt 4921 attgtctagt gaaactcgtt aatttgtagt gttggagaag aactgaaatc atacttctta 4981 gggttatgat taagtaatga taactggaaa cttcagcggt ttatataagc ttgtattcct 5041 ttttctctcc tctccccatg atgtttagaa acacaactat attgtttgct aagcattcca 5101 actatctcat ttccaagcaa gtattagaat accacaggaa ccacaagact gcacatcaaa 5161 atatgcccca ttcaacatct agtgagcagt caggaaagag aacttccaga tcctggaaat 5221 cagggttagt attgtccagg tctaccaaaa atctcaatat ttcagataat cacaatacat 5281 cccttacctg ggaaagggct gttataatct ttcacagggg acaggatggt tcccttgatg 5341 aagaagttga tatgcctttt cccaactcca gaaagtgaca agctcacaga cctttgaact 5401 agagtttagc tggaaaagta tgttagtgca aattgtcaca ggacagccct tctttccaca 5461 gaagctccag gtagagggtg tgtaagtaga taggccatgg gcactgtggg tagacacaca 5521 tgaagtccaa gcatttagat gtataggttg atggtggtat gttttcaggc tagatgtatg 5581 tacttcatgc tgtctacact aagagagaat gagagacaca ctgaagaagc accaatcatg 5641 aattagtttt atatgcttct gttttataat tttgtgaagc aaaatttttt ctctaggaaa 5701 tatttatttt aataatgttt caaacatata taacaatgct gtattttaaa agaatgatta 5761 tgaattacat ttgtataaaa taatttttat atttgaaata ttgacttttt atggcactag 5821 tatttctatg aaatattatg ttaaaactgg gacaggggag aacctagggt gatattaacc 5881 aggggccatg aatcaccttt tggtctggag ggaagccttg gggctgatgc agttgttgcc 5941 cacagctgta tgattcccag ccagcacagc ctcttagatg cagttctgaa gaagatggta 6001 ccaccagtct gactgtttcc atcaagggta cactgccttc tcaactccaa actgactctt 6061 aagaagactg cattatattt attactgtaa gaaaatatca cttgtcaata aaatccatac 6121 atttgtgtga aa
That sequence will be translated into a protein in the cytoplasm of the cell, which will then go on to be incorporated into the membrane, where it will work to regulate the secretion of chloride and other ions. I confess, I’m often not so much interested in what the coding region of the gene does as I am in how the gene is turned on or off in the first place, so let’s look at how that’s done.
Here’s another cartoon of a gene. The green part is the piece of DNA that is to be copied into an RNA transcript, and the piece of protein machinery that is going to do that job is called RNA polymerase, the pink rectangle. RNA polymerase is going to advance sequentially along the DNA, matching each DNA nucleotide on one strand with a complementary RNA nucleotide, and catalyzing the linkage of each RNA nucleotide to its neighbor. RNA polymerase needs to know where to start, though — it doesn’t just land on a random part of the genome and start copying away — and it looks for a region called a promoter (in red).
Part of the promoter is a relatively simple sequence called the TATA box, because it contains lots of A and T nucleotides. The TATA box is bound by a whole constellation of transcription initiation proteins, though, building up a complex that promotes the binding and activity of RNA polymerase. The DNA itself has a 3-dimensional structure that folds around and allows sequences called enhancers and silencers to play a role in controlling transcription by way of intermediary proteins called activators and repressors. Turning on a gene is a family affair, requiring the participation of many proteins.
The really complicated part of the diagram above, of course, is that each of those colored blobs is a protein, which is in turn the product of expression of a gene elsewhere in the genome, which has in turn its own promoter and enhancers and silencers. The coding region in this cartoon could, for instance, be for one of the components of that RNA polymerase complex in action here. Genes can make gene products that affect the expression of other genes by binding to the regulatory regions or to the proteins that are involved in the regulatory complex.
One last thing: I also took a look at the other common web source for definitions of basic concepts, Wikipedia. Here’s the first line of the Wikipedia entry for “gene”:
A gene is the unit of heredity, with each gene determining one inherited feature of an organism.
That is completely wrong. “One gene, one character” is a false idea of the relationship of genes to inheritance, since many genes contribute to the appearance of a single feature, and one gene will play a role in many different features. Apparently, the next basic ideas I should summarize are polygeny and pleiotropy.
Kagehi says
“One gene, one character” is a false idea of the relationship of genes to inheritance
Perhaps we need to compare it to a different sort of language. Anyone know of one where indifidual characters “do not” denote a specific sound, but only in combination? Can’t think of any, unless you included something complicated like Unicode. But, in principle, I mean dealing with concrete meanings, i.e.:
a = means something alone.
e = doesn’t.
ae = still doesn’t.
ea = still doesn’t.
But in each case the “sound” may change and each word that uses those has unique meanings. Sorry, closest I can think of. Though.. Maybe Mayan… I understand someone is finally breaking their knot code. Its a numerical system, sort of binary like, but with patterned knots and reversals, which change the specifics of the meaning and definition, though what some of those are is still unknown. A knot, in and of itself, lacks context, so doesn’t even necessarilly qualify as a number.
Ron says
One interesting point about gene structure and regulation is that metazoans do things differently than the other large groups of eukaryotic organisms (plants and fungi). For instance, promoter regions in plants seem to be much shorter than promoter regions in animals, (depending on where you sit on the definition of promoter, of course). And, of course, fungi have their own little weirdness. :-)
G. Tingey says
Where are all these definitions and explanatory articles going to be “stored”/made accessible/given a URL?
I think you may a=have said when the first idea came around, but a repeat – and every time one of these definitions comes up – would be a good idea.
Or are Seed, or someone going to set up a Wiki, with strict editorial controls?
Rosie Redfield says
If we think the Wikipedia definition is wrong we should fix it.
Bob says
Cool beans. Nice job, PZ.
Anyone know where this dude can fill out a W-4?
:)
Greg Laden says
Nice post
Regarding the “one gene -> one character” issue. This brings up an interesting feature of genetics in the modern world. My wife (who sees biology much more from the perspective of the INSIDE of a cell than I do) and I were reviewing the Minn. Science Museum race DVD last night. (Written review coming to a blog near you in a few days). The narration mentioned that not the vast majority of genetic differences do not show up as trait differences. I sensed Amanda wince, but actually, we never got to talk about it becuase we got onto other things (the meaning of genetic variation a la Lewontin and this “85% of the variation happens in your own village” thing…)
If one third of mutations are in silent bases (same amino acid) then that means two thirds are visible. But only if you are looking at the protein.
In other words, from the iside the cell perspective, one gene = one trait (but wait, I have a caveat below). We just need to modify our understanding of “trait” to mean, most of the time “protein”
Caveat: I’m talking about the gene as an expressed entity. Zany things like transcribing a gene backwards or MHC genes that have dynamic genomes and stuff aside….
Suezboo says
At the risk of being derided again, PZ, could you please explain to a political scientist who never studied any science, the following :
Can you tell me in descending or ascending order of size the relationship between :atom, nucleus, cell, nucleus, molecule, gene, chromosone, amino acids, proteins and all the others I have forgotten.
I mean, I can read a post like the one on the Hox gene and it makes sense to me as I read but I need to know its connection to the other discrete things I read about. I know most people learn this in elementary school but for me that was 50 years ago in a Convent where Ladies did not Learn that Science stuff. Help.
JohnA says
This is fascinating stuff. True enough, the Wikipedia article does describe the gene in terms of “one gene = one feature”. From my understanding of PZ’s article, this could not be further from the truth. Unfortunately the issue has come up before on the article and a poster named Opabinia regalis removed the portions that explain how a gene can be removed from a phenotypic effect. This occured on Dec 10th, and went unchallenged by the Wikipedia community…evidence that many people don’t understand genes very well.
By all means more informed posters than myself should get in there and rebuild that article. If a person types “gene” into Google, that page is the #1 result. it must be made to be correct…I’ll help if I can but I don’t think I am the expert here.
Steve LaBonne says
Not even inside the cell, Greg. Mutations in a lot of genes have extensively pleiotropic effects; it rarely makes sense in any way to try to associate one and only one “function” with each protein. (Not to mention that with alternate splicing even the definition of “one expressed entity” gets fuzzy around the edges.)No matter how you slice it, “one gene – one trait” is a hopelessly outdated and inaccurate slogan.
sparc says
The ENSEMBL genome browser gives a much nicer overview (http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000001626). Klick one of the links under genomic locations for a zoomable map. I strongly encourage you to play around with all the features and links to other species. Have fun
Keith Douglas says
One thing that is never made clear in characterizations of genes is: can they be discontinuous? i.e. does the biochemistry permit the following
gene 1 —— gene 2 —— gene 1
where the “two” gene 1s contribute to the same RNA strands and gene 2 to others?
Adam Cuerden says
Tend to hang around the Wikipedia articles strictly on Evolution and Gilbert and Sullivan-related subjects, but I’ve flagged up the problem with the Gene article on its talk page.
Bruce says
“each of those colored blobs is a protein, which is in turn the product of expression of a gene elsewhere”
so its turtles all the way down…
Steve LaBonne says
Keith- I’m not quite clear what your example is trying to say, but do a Google search on trans-splicing. It is possible for a messenger RNA to be stitched together from two separate transcripts. AKAIK most of the known examples are in “weird” organisms like trypanosomes. PZ probably is much more up to date on this than I am and hopefully will comment.
Griststone says
But this *is* the functional definition of the gene as portrayed by Dawkins in the Selfish Gene. He essentially argues that biologists derive genes backwards from different phenotypes, and, elsewhere, that a gene is best understood as the potential difference between two otherwise identical organisms:
The metaphor matters. It must map to the more scientific defintion; otherwise, popular science books like Selfish Gene are inherently deceptive. I have to assume that Dawkins “believes” in “one gene, one character,” even if he “knows” better.
Greg Laden says
Not even inside the cell, Greg. Mutations in a lot of genes have extensively pleiotropic effects; it rarely makes sense in any way to try to associate one and only one “function” with each protein.
You are absolutely correct in what you say, but I was not clear in what I said, so your disagreement is about something else.
A gene codes for a protein (let’s ignore talk of exceptions to this). If you only look at proteins, you will see that two identical genes code for identical proteins. Two different genes will code for two different proteins.
Lost of the time.
as I mentioned, there are cases where a single gene codes for more than one protein, but that does not obviate the incorrectness of “most genetic differences are invisible” from the protein perspective.
And the two proteins may not be different in how they function. But at the organism level, we do not say that heritable variation in a trait is not there if it does not have a function. It is still there.
Is this more clear?
I’m not actually advocating the use of the equation. Instead, I’m pointing out how it is interesting that this “antiquated” formulation is in some ways more true rather than less true, now that we have the technology to examine the proteins directly.
Greg Laden says
gene 1 —— gene 2 —— gene 1
where the “two” gene 1s contribute to the same RNA strands and gene 2 to others?
Yes, sort of. Two genes can transcribe two different RNA’s that are then joined together into a single molecule that translates into a protein. This is called Trans splicing.
That is not really two genes combining but it is two genes coding for what one would think by looking at it would be coded for by one gene.
In addition to that, many many proteins are made up of the primary products of more than one gene, stuck together to make a higher-order protein. That is very common.
Tim Vickers says
I wrote the original Wikipedia definition. I was hoping to produce a simplified definition that could be understood by the general public, since we had much criticism of the original article being far too technical. I’ve gone back to a transcript-based definition, since that will also cover functional RNA-encoding genes. This is indeed more accurate, but I worry that many people who are trying to find out what this “gene” thing their newspaper keeps talking about will stop reading immediately.
If anybody wants to help, please feel free to contribute.
jeff says
Is it common to conflate a gene and its protein? You wrote …this gene that transports ions across cell membranes… Isn’t it the protein that does the transport, not the gene? Since, as other comments explain, genes and proteins are not one-to-one, why does this usage continue?
Steve LaBonne says
Greg- I still get hives at the thought of equating a protein with a “trait”. So while I think I follow what you’re trying to say, I still think the formulation in terms of “traits” is highly misleading to the uninitiated and is best left alone altogether.
Tim Vickers says
No, that bit of the article isn’t mine and is quite wrong. This page needs a great deal of work and any contributions from people reading this blog would be most welcome.
Steve LaBonne says
Griststone- this is why I don’t even like the word “gene” and wish we could somehow get rid of it (I know, fat chance). Dawkins is simply talking like a classical geneticist in that passage, and as PZ warns in his post the referent of “gene” in that world maps very imperfectly indeed onto what molecular biologists think of as a gene.
AJ Milne says
Can you tell me in descending or ascending order of size the relationship between :atom, nucleus, cell, nucleus, molecule, gene, chromosone, amino acids, proteins and all the others I have forgotten…
Knowing full well I’m inviting a million quibbles for the exceptions and nuances I’m not going to cover, a (very) quick primer to set off your necessary Google searches…
First, descending order of size is difficult. There’s some overlap, as some of those are classes of things. Proteins and genes are variable in size. And both are specific types of molecules, But roughly:
cell, nucleus (1), chromosome, gene is one easy sequence (large to small), and
protein, amino acid is another, and finally
molecule, atom, nucleus (2) is a last one.
… note that nucleus (1) and nucleus (2) are entirely different things. See the last thing below.
Quick descriptions:
Atom: one or more protons in combination with zero or more neutrons (the nucleus of an atom), surrounded by a cloud of electrons. The hydrogen atom has one proton, one electron, and (usually) zero neutrons. The oxygen atom has eight protons, eight electrons, and (usually) eight neutrons. Neutron counts vary within a given element; two elements with the same proton count and different neutron counts are called isotopes. I won’t get into this further here.
Molecule: A combination of two or more atoms, chemically bonded to one another. Water is a chemical compound composed of two hydrogen atoms, each bonded to the same oxygen atom.
Organic molecule: Technically, just a molecule containing carbon (an element with six protons in its nucleus, and which can easily form interesting chains and branches and rings in a molecule). Organic molecules are interesting in biology because of that branching/chaining thing, and because most of biology concerns their interaction, probably again because of that branching/chaining thing. Inorganic molecules are also involved in biology, however. Water is technically an inorganic molecule (contains no carbon), but it’s very important to biochemistry.
Amino acid: A class of organic molecule. There are about 20 or so different types in a typical orgnanism. Pictures should be available anywhere. They’re mostly pretty small. But a chain of these is called a peptide chain, or a polypeptide. Typically, if the polypeptide is big enough (and they can get quite big), and biologically interesting, we call it a protein.
Protein: See above. A longish, usually biologically interesting chain of amino acids. Note that while they start, structurally, as chains, due to the different amino acids involved, which have various electronegativities and sizes, they tend to fold/clump/twist into interesting shapes as they’re assembled. Biological organisms do a lot with them. A very large number of the enzymes that carry out metabolism are proteins. There are also structural proteins. Cells make them by reading mRNAs (see below) and translating each codon of three nucleic acids (also see below) into a corresponding amino acid, building the protein next to the mRNA. Google ‘genetic code’ for more.
mRNA. Messenger RNA. A single stranded RNA copied off a DNA template. Google ‘nucleic acid complementary’ to get an idea of what’s going on there. Generally, an organism makes an mRNA from a DNA template, then a protein from the mRNA.
RNA. Ribonucleic acid. A big, chained molecule made up of a bunch of nucleic acids. RNA and DNA differ a bit in their ‘backbone’… DNA is missing one of the oxygens at each link that you find in RNA.
DNA. Deoxyribnucleic acid. Like RNA, a big, chained molecule made up of a bunch of nucleic acids. Chromosomes are made up of DNA. Genes are physically DNA sequences.
Nucleic acid. Another class of organic molecule. The small things out of which RNA and DNA are made. There are five that appear between RNA and DNA: Adenine, Thymine, Guanine, Cytosine, Uracil.
Nucleus: one of two things. Either a bunch of protons and neutrons in the centre of an atom, or the central, membrane-bounded organelle in a eukaryotic cell in which you find all the DNA. The latter is vastly bigger than the former, as it’s made up of a whole lot of molecules, themselves each containing the former. Both are pretty small, compared to, say, a fruit fly, tho’.
Greg Laden says
Greg- I still get hives at the thought of equating a protein with a “trait”. So while I think I follow what you’re trying to say, I still think the formulation in terms of “traits” is highly misleading to the uninitiated and is best left alone altogether.
Right. I don’t think we are disagreeing here. But I’ll take another go at it anyway.
Let’s say I want to study the evolution of the holes made by woodpeckers. That could have applications, but I really should also look at the beaks and the bodies of the woodpeckers.
But then, of course, someone is going to discover that what the beaks are made out of .. how woodpecker beaks develop … is special and related to their hole-making behavior. And so on. At some point you get all the way back to the genes themselves. My original study of the holes assumed an underlying genetic pattern, of course, but I was happy with different hole shapes and position to be the trait. And those things are traits. Nonetheless, at some point you get to the genes.
I quickly add I am not being reductionist here. I’m sure you loose all kinds of stuff when you head down to the genes, not the least of which in this case is the tree the holes are in and the grubs the woodpecker is going after and the woodpeckers competitors, etc. etc.
Anyway, all these other, underlying things … beaks, talons, keratin microstructures in the beak,etc. are all traits as well.
All the way down to the protein, which are on the last stop on this journey through traitland before we get to the gene (which is not a trait … it’s a gene).
So if you are looking at proteins, you see proteins. Not the other stuff. Not even beaks. In protein land, the statement that the film made “the vast majority of genetic variation is not observed in traits” is silly. That’s all I meant.
So, I’m not equating a protein with a trait. I’m saying that at one very primary level a protein is a trait. But the protein is not the hole in the tree any more than the beak is.
Indeed, the VERY MOMENT a gene is expressed, the very FIRST thing that you get … the most basic trait … is always already a complex interaction of different elements including other genes. After all, the very expression of a gene in a multicelled organism depends on what cell it is in, and that was determined already by a process that involved numerous genes.
I quickly add that even at that level, the gene-> trait link is too simple (as I’ve been noting all along).
As far as being misleading to the uninitiated, so far I don’t think we’ve come even close to dealing with that! PZ’s post is too much. Tim’s definition in Wikipedia is too little.
And this brings us to the crux of the problem. You can’t do it. Explaining “gene” simply is like explaining “the details of the antilock breaking system” simply during a TV commercial for a new car. Simply can’t be done.
What is needed is a recognition of the complexity when the whole gene thing is invoked in the media.
Greg Laden says
Knowing full well I’m inviting a million quibbles for the exceptions and nuances I’m not going to cover …
No kidding!!! But you did a great job. Dividing the problem into different lists was key. Brilliant.
Tim Vickers says
OK, the new Wikipedia definition, comments?
“A gene is the unit of heredity and genes determine inherited features. In an organism, the set of genes in the genome interact to direct physical development and behavior. Genes are nucleic acid molecules such as DNA or RNA, and carry information. This information is contained in the base sequence of the nucleic acid molecule.”
Suezboo says
AJ Milne, a million thanks. At least I am starting to get these basic structures straight in my thinking. Thank you for taking the time to explain all this to a total rookie.
I think part of my problem is the specialisation. Like PZ is into development at a molecule level (right??) – molecular biology – and I have read books about what happens inside a cell – nuclear physics (??). But what I need is an elementary book that shows me the whole picture from the proton to the environment (ecology??). Is there such a book anyone can recommend? Otherwise I feel like I am floundering around.
Steviepinhead says
Suesboo–
AJ Milnes’ response was right on, but a “purely verbal” description can be difficult to visualize…
The following approach may not assist you with integrating the emergent properties found at the different levels of nature, from quarks up to quasars, but it is a fun and visual way to start.
Google “Powers of Ten.” The first results page will display several different versions–slide shows, web pages, java animations–to assist you with visualizing nature at various scales based on multiplying “up” or dividing “down” by ten.
This may help you begin to weave together or overlap a visual AND and a verbal sense of what “fits” within what: how sub-atomic particles make up atoms of the different elements, which combine into the molecules of inorganic chemistry (which combine into the solids, liquids, gases and plasmas at all scales of the physical world, from individual crystals and motes to rocks, oceans, atmospheres, planets, stars, and galaxies); the atomic elements also make up simple “organic” molecules, which can then hook up into the lengthy twisted chains and side-chains of organic molecules (amino acids, nucleic acids, proteins), which are the constituents of organelles and cells (and the intracellular medium) in living critters, which form into the tissues and organs of multicellular lifeforms, who can concregate into couples, herds, sects, and societies…
From there, if it helps, feel free to google whatever topics this approach suggests.
It’s so refreshing to come across people who actually want to LEARN about this amazing universe, rather than flaunt the false security of their ignorance.
David Marjanović says
There probably are such books, but I don’t know any…
No, cell biology/molecular biology/…
BTW, keep in mind that proteins and DNA strands are humongous molecules made from the combination of smaller molecules (with water as a byproduct in both cases).
David Marjanović says
There probably are such books, but I don’t know any…
No, cell biology/molecular biology/…
BTW, keep in mind that proteins and DNA strands are humongous molecules made from the combination of smaller molecules (with water as a byproduct in both cases).
David Marjanović says
“Powers of Ten” is good, but quite brief.
David Marjanović says
“Powers of Ten” is good, but quite brief.
John says
I think this was an exceptional posting.
I note with interest that, as I write this, I find a total absence of the normal Pharyngulean ideological or trollish comments. Ahem.
Another point of interest is that, (intentionally?) phrasing such as:
is ideal for naive “quote mining”.
qetzal says
Keith Douglas asked:
Yes.
Refer to PZ’s first figure above. See where a gene can be divided into coding regions (exons), and intervening regions (introns)? When the gene in that figure gets copied into RNA, the introns get cut out of the RNA copy, and the exons get connected back together to form one continuous piece of RNA. That ‘spliced’ RNA is the actual template for making a protein. So, the individual exons in the gene correspond to your “gene 1s.”
Interestingly, it’s quite possible for a different gene to be embedded within one of the introns. That would correspond to your “gene 2.”
Search for terms like “overlapping genes” or “nested genes” if you want examples.
Greg Laden says
There is also a simulation with “Cells Alive” called “how big” that starts with a hand holding a pin. Then you zoom in and on the pin are a mite (I guess) and a dot. Yo zoom in and the dot turns out to be some pollen and other stuff, and as you szoom in the other stuff turns out to be a lyphocyte, and a bacterium next to a walking stick.
Then you zoom in and the walking stick is an Ebola virus with some crud next to it.
Then … you guessed it … a rhinovirus.
So this is good for the larger end of the spectrum (compared to molecules)
Suezboo says
David – I can’t believe I made a mistake right there ! I actually meant “inside the atom” but my mind is buzzing with all these terms.
Stevie – Thank You – that is all clear to me ! Amazing ! I glanced at Powers of Ten and it may be just what I need. I can google on from there.
Thank you. Retired politicos (like me and Al Gore) need to learn some science.I do enjoy learning this wholly new stuff so much.
Tukla in Iowa says
Genes? Nonsense. It’s turtles all the way down!
Jim Lund says
The NCBI bookshelf has free, searchable biology textbooks at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books.
It has the previous version of Modern Genetic Analysis,
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=mga.section.132
AJ Milne says
You’re welcome Suezboo, and thanks, Greg.
Darby says
So, does anyone know the rationale behind calling the excised bits introns and the included bits exons? I just realized this past semester that I’d been teaching them backwards, and I have no idea how long ago I snapped to the logical connections from the actual ones.
Good news is it probably doesn’t matter – no one’s remembering details like this from their first semester anyway…
AndyS says
The original post is excellent. Combined with the comments from non-experts, experts, and the tie-in to improving the Wikipedia article it is truly the best use of the blogging for science education. Thanks to everybody. I learned a lot.
This — if the comments are included — is certainly a candidate for the next ScienceBlog compilation.
Greg Laden says
The terms were invented by Walter Gilbert. I think the introns are called introns because they are confined in the loop that is spliced out of the RNA.
It does sound backwards though. Try “The INtrons are thrwon IN the trash leaving only the EXellent EXons.
Kevin Bryant says
This doesn’t tell me anything about poutine and sushi!!
sparc says
Dear all,
I don’t understand the fuss about defining genes. IMO all definitions are operational, i.e they are formulated in a way that one can work with them. Indeed, genes have been first identified as phenotypic traits and for many questions (e.g. description of the inheritance of human genetic diseases) this approach is still valid. Of course there have been many addititons to the original definition and on the molecular level genes are much more complicated. However, all the different gene definitions can only be understood in the light of the biological knowledge at the time they were coined.
If one wants to be picky one will find several points in PZ’s post that for which exceptions are known. Some examples:
What about intron retention in alternative splicing when a non-spliced intron becomes part of the transcript? Does this deny the definition of introns and exon. What about overlapping genes? What about TATA-less promoters. What about the differences of PolI, II and III genes. What about promoter definitions? Where do they start and where do they end? Have transcription termination points for mammalian genes been identified finally (I did not follow this)?
Come on, this is biology.
So you will find an exception for any seemingly well defined issue. And for me this is what makes biology fun. One just has to know which definitions are appropriate for the issues one is working on.
sparc says
BTW, the sequence above is not the CFTR cds but part of its mRNA sequence that contains the cds and parts of the 5′ and 3′ UTRs. You may search the correct ORF by looking for initiation and stop codons. Or click on the link PZ has given and look for cds on the page you are directed to.
Larry Moran says
My definition of a gene is simply “a DNA sequence that is transcribed.” I don’t include regulatory regions in my definition of a gene. Those regions control the expression of the gene but they aren’t part of the gene itself.
It’s like the a car and the keys to a car. The keys aren’t part of the definition of the car.
I’ve got a little essay on the definition of a gene that includes all of the problems and exceptions. Maybe I’ll post it to give a different perspective. My definition includes prokaryotic genes. :-)
Greg Laden says
My definition of a gene is simply “a DNA sequence that is transcribed.” I don’t include regulatory regions in my definition of a gene. Those regions control the expression of the gene but they aren’t part of the gene itself.
This is good because it is in line with the gene-phenotype thing so it is a useful way to characterize a gene. However, why stop there? Why not say:
“A sequence of DNA that is transcribed and translated”
This way we also get rid of the RNA templates from the definition, which is kind of handy.
PZ Myers says
Well, sure…this is MY definition, though, and since I’m most interested in a) regulation and b) metazoans, that tends to bias my perspective. In fact, you read some development papers, and the part of the gene you emphasize in your definition isn’t so important — you can replace it with β-gal and it’s still interesting. I think my second paragraph admits that there are lots of other good ways to look at a gene.
Translation shouldn’t be part of the definition. There are too many interesting bits of RNA floating around that don’t need to be translated to have a function — we don’t want to exclude them.
Greg Laden says
Translation shouldn’t be part of the definition. There are too many interesting bits of RNA floating around that don’t need to be translated to have a function — we don’t want to exclude them.
I’m not interested in EXCLUDING them. They will be allowed to remain. But how they get there and what they do is very different than what proteins do. They are probably linked to fitness in a different way than protein coding genes.
By including translation, you get to say “A gene codes for a protein” without anybody getting in a tizzy. That is a powerful tool in pedagogy. (you could still say it but it would be yet one more white lie among so many that already exist).
But, as you say, that’s your definition. I will simply have to post my own if that’s the way it’s going to be…
sparc says
This may be sufficient for your work. However, this may not be true for other issues. E.g., your approach would only allow poor gene prediction in genome databases. Again, different questions require different gene definitions.
thwaite says
For a completely operational definition of ‘gene’ I’ve always been fond of George Williams’ (1966):
… this is a conceptually challenging encapsulation even when one knows that the endogenous changes he’s referring to are those like crossover and mutation. But it’s a useful and generalized definition, and so was cited by Dawkins in his Extended Phenotype book, and at least implicit throughout the Selfish Gene – which nowhere details any specific gene. It didn’t have to – this conceptual gene suffices.
sparc says
In addition Larry: What about the genes encoded by RNA viruses?
Greg Laden says
In addition Larry: What about the genes encoded by RNA viruses?
Good point, but it may be a little like saying that a definition of the digestive system has to include tape worms.
Griststone says
Not to be daft, but don’t these operational definitions fail to account for theory? What is interesting about a gene is what it does, and how, not how much of a DNA snippet it is. Circularities might be nice for those working within specific disciplines, but they don’t do much to justify ideas about how genes relate to heritable traits and natural selection. After all, I can adopt an operational definition of the solar system that says the sun goes around the earth every 24 hours, and it will get me home before dark every time, but it won’t be true, and I’ll have a bitch of a time landing a rocket on the moon.
lytefoot says
Ah, so that’s what an intron is… no chance of turning Mr. Barclay into a giant spider, then? (Or was that something else?) Yes, I fear I got way, way too much of my introduction to scientific notions from Star Trek.
Yeah, that Powers of Ten thing is spiffy… it reminds me of a book I had as a little kid, called “Cosmic Scale” or “Cosmic View” or something like that. It starts at 1-1 scale with a picture of a kid sitting on a hill, then zooms out… then goes back to the kid, zooms in to the mosquito sitting on his hand, then the blood cells, and so on until it shows an atom; this was a kids’ book from the eighties, so it stopped there, didn’t delve into the nucleus.
Chris Surridge says
The concept of the gene is certainly fascinating and the history of the concept has involved some of the truely great thinkers in biology. Harvey, Goethe, Bateson, Boveri, etc etc. Seems like the big problem with the concept is that everyone defines it diferently depending what branch of biology you are working in. The best book I have read on the subject is Jacob’s Ladder by Henry Gee. Lots to disagree with in there but a hell of a read and a real page turner.
sparc says
So Mendel’s view of Anlagen would be sufficient? I guess we really do need a definition that relates to the genetic material and its structural and regulatory elements. E.g., I have been involved in the generation of mouse mutants in which the respective genes were inactivated by mutations in their promoters (conventionally one rather targets coding exons but this would have been more complicated because of the intron/exon structure of these genes that were not completely elucidated at that time).
In addition, there are efforts to describe genes of unknown function in terms of their promoter structure and transcription factor binding sites and to develop models that would allow to predict where and when a gene is expressed. The hope is that such models will predict in which pathways a certain gene product is involved.
Please have a look how much of mammalian promoters is conserved in different species. Unfortunately, one can not use synonymous vs. non-synonymous ratios to distinguish positive, neutral and negative selection. But indeed interindividual differences in promoter sequence may lead to differences in gene expression and account for heritable traits.
Again, the gene definition you need really depends on what you want to investigate.
MartinC says
It took a long time for someone to mention RNA virus encoded genes, or even functional non (protein)coding ‘genes’.
So a microRNA encoded by an RNA virus is not a gene ?
What is it then ? Are a lot of us wasting our time doing lab experiments on such entities ?
The central dogma (DNA – mRNA – protein) is easy for teaching purposes but is not really the best current understanding of what constitutes a gene.
In my opinion a gene is a set of instructions for the production of an expressed biological product with a functional effect.
Regulatory elements such as promoters or enhancers, which also encode information, are not expressed and are usually not included in the definition of ‘gene’. To date this information has only been found in nucleic acids although some other biomolecule(s) may have predated this in evolutionary history (and may still exist if we look hard enough or in the correct places).
Andrew Brown says
Matt Ridley’s book, “Nature via Nurture” — which may have been given a different title in the States — has seven different definitions of “gene” in the course of a very clear discussion.
But that presentation was beautiful, PZ, and an example of all that’s best about pharyngula.
Larry Moran says
sparc says,
What? The prediction programs can find “genes” whether or not you include regulatory sequences in the definition. I don’t understand your objection. Are you saying that if we remove regulatory sequences from the definition of a gene then we can’t use them to locate the real gene?
That doesn’t make sense.
Larry Moran says
MartinC says,
That’s because your version of the Central Dogma is wrong! :-)
See Basic Concepts: The Central Dogma of Molecular Biology.
As for RNA genes, they are an exception to the rule. There are lots of exceptions. This is biology. Biology is messy.
I’m going to try and post something on the definition of a gene in order to explain the complications. John Wilkins can help since his current boss is one the the world’s experts ont the topic.
sparc says
We may not be able to agree on a common gene definition. However, the discussion proves that commenters here are able and indeed willing to question definitions rather then just accepting arguments from PZ’s authority. Quite entertainiing and instructive compared to the lame self-referential, reverential ‘discussions’ on UD.
Unfortunately though, I guess we who took part in this discusssion may benefit more then those guys who may be redirected to this thread during the search for some basics in the future because we had the privilege to see the discussion develop. I don’t know how effective this thread will be for people who will just read it in the future.
MartinC says
Larry, its not exactly my version of the central dogma, its essentially the 1965 Jim Watson textbook version and yes, of course it is now seen as an outdated way of seeing information flow in biological systems.
However, RNA genes may be exceptions to rule ?
I completely disagree.
They may not be DNA encoded but the question is what is a gene, not what is the currently best understood or most common form of the gene.
Just because us DNA based individuals have taken over the establishment doesnt mean we should look down on our probable RNA based ancestors as somehow geneless.
sparc says
You may be able to identify a gene’s location just with the information of the cds. However, when you look in ENSEMBL you will see that many genes, especially from animals for which only poor cDNA libraries are available, you will see that the annotation often start with the translation initiation codon. Even with your definition the gene does require at least a 5′-UTR. And the 5′-UTR starts at the transcription start point(s)s. In addition TSP don’t have to be located at the 5′-ed of the exon that contains the ATG. Indeed I have been working with three different genes with untranslated first exons, one of which even contained four different untranslated alternatively used initiation exons one of which had two different TSPs. To identify the different TSPs I had to use primer extension assays which indeed was a pain. Thus, I really do appreciate the efforts to identify TSPs and untranslated first exons (which by your definition are part of genes) by sequence analysis and these analyses enclude investigations of the untranscribed upstream sequences. E.g. if you identify a TATA-box in some distance upstream of a splice donor site it is well possible that the TSP is located ca. 30bp downstream of that TATA-sequence(this example may be oversimplified and more parameters have to be taken into consideration but in principle such program work that way).
Still, if your definition is sufficient for your work, keep it.
Griststone says
Sparc,
I probably wasn’t sufficiently clear in my previous post. I understand that biologists can use a definition of “gene” that works for thier specific experimental needs, and that this may differ from the definition used by a geneticist. But the fact that different disciplines use such divergent operational definitions begins to suggest that there really is no such “thing” as the gene. In other words, it’s a model we are straining to preserve in the language, based on old atomistic ideas about matter and biology.
For the layman, genetics is increasingly more difficult to conceptualize, as it grows increasingly imporant as a matter of policy, e.g. stem cells research, gene therapy, genetic engineering, not to mention the gene-centric evolutionary views of Mr. Dawkins, which, intentionalyl or not, have an impact on the way we view humanity.
One is tempted to draw the conclusion that the paradigmatic shift in genetics and molecular biology has already occured, but no one has taken the time to inform the lay community, perhaps because explaining it would be such a difficult task. I say this half tongue-in-cheek, but with the serious suggestion that the “gene” concept has done its share of heavy lifting and should presently yield to a new model.
CJ says
Agreed and I also agree with the inclusion of regulatory regions in the definition of a gene. However, I take issue with
So if either the temporal or spatial expression (or both) of a gene is incorrect – the sequence no longer represents a gene?
Torbjörn Larsson says
Inability to find an encompassing definition isn’t peculiar to biology but the existence here of at least 7 definitions of a gene (Ridley, apparently) and 26 definitions of a species (Wilkins) points to the real difficulties to get a feel for this elephant.
But IMHO we should not expect that all phenomena are simple and amenable for a specific reduction. It isn’t exceptional if a more exact description will contain models based on a set of different definitions.
As a layman, I would find it helpful if it is made clear which of these definitions is referred to in posts. Actually, I would expect it from a more serious treatment as well.
The market of ideas should weed out bad or unnecessary definitions. Some other considerations could help. Moran pointed out when defining evolution that it is good to distinguish mechanisms from the definitions that describe them, making the later general and flexible terms.
Btw, that reminded me of the distinction and separation of syntax and semantics in logics, math, computer science – perhaps also here the application (the semantics, what does the definition mean) could be kept out of it too to make it even more flexible. For example, why “biological evolution” instead of “evolution” when some of the same mechanisms can be used for software applications?
Torbjörn Larsson says
Inability to find an encompassing definition isn’t peculiar to biology but the existence here of at least 7 definitions of a gene (Ridley, apparently) and 26 definitions of a species (Wilkins) points to the real difficulties to get a feel for this elephant.
But IMHO we should not expect that all phenomena are simple and amenable for a specific reduction. It isn’t exceptional if a more exact description will contain models based on a set of different definitions.
As a layman, I would find it helpful if it is made clear which of these definitions is referred to in posts. Actually, I would expect it from a more serious treatment as well.
The market of ideas should weed out bad or unnecessary definitions. Some other considerations could help. Moran pointed out when defining evolution that it is good to distinguish mechanisms from the definitions that describe them, making the later general and flexible terms.
Btw, that reminded me of the distinction and separation of syntax and semantics in logics, math, computer science – perhaps also here the application (the semantics, what does the definition mean) could be kept out of it too to make it even more flexible. For example, why “biological evolution” instead of “evolution” when some of the same mechanisms can be used for software applications?
Torbjörn Larsson says
“what does the definition mean” – what does the definition refer to
Torbjörn Larsson says
“what does the definition mean” – what does the definition refer to
sparc says
Don’t get me wrong but I see genetics and molecular biology as vivid parts of biology?
As I pointed out in one of my earlier comments I understand all gene definitions as operational. The question remains though which one is appropriate for presentations to layman. And how aware are different laypersons of different gene definitions?
Back in the late 70’s I was taught the one-gene-one-protein story in school and we had to learn the model of the lac operon which inheritantly contains some definition of a gene as an entity comprising regulatory, transcribed and coding sequences. Ok, this was hiher education in Germany but I can’t imaging that biology lessons in your country are so bad that pupils leave school without any bit of such knowledge. Or have I just been lucky?
Natalie says
Another way to remember the difference between introns and exons is introns are the intragene portions while exons are expressed.
And I must say, while this is a very interesting read/discussion, it won’t help the nonscientist figure out what a gene is. I was hoping to get a new definition for my intro Human Bio class, but this is waaayyy beyond them! Interesting for me, though!
sparc says
Sorry but I can’t resist:
Gene (formed 1993, disbanded 2004) were a British indie/rock quartet who rose to prominence in the mid-90s. (from en.wikipedia.org/wiki/Gene_(band))
BTW: You can search definitions in Google directly:
Just type in ‘define:gene’ in the search field
Greg Laden says
I’ve gone ahead and generated a “definition of a gene” of my own (something I will need for a handout in a couple of weeks anyway).
It is very different than that of PZ Myers in style and presentation. Frankly, I’d send my students to the PZ-Gene Def first. Mine is more concise and has no pictures and doesn’t really tell you any details about transcription and translation.
What I tried to do was to be fairly bullet proof and hierarchical with respect to generality and variation in the model.
It is here:
Greg Laden says
I was hoping to get a new definition for my intro Human Bio class, but this is waaayyy beyond them! Interesting for me, though!
Well, yes, we have utterly failed, true. But as scientists and science symps we know when to admit that we’ve failed, and that’s the important part!!!
Greg Laden says
I was hoping to get a new definition for my intro Human Bio class, but this is waaayyy beyond them! Interesting for me, though!
Try this one. (Distilled from my two pager) for the glossary, then send them to PZ’s post (above) for pictures and more details:
A gene is a unit of information stored in a DNA or RNA molecule. Most of the information specifies the order in which amino acids are attached to synthesize a protein (“protein synthesis” or “gene expression”). Some of the information regulates the process of protein synthesis. The encoded information to synthesize a protein is processed through two steps: Transcription, in which a messenger RNA molecule is formed, and Translation in which the messenger RNA molecule specifies the order of assembly of the amino acids.
Then a bunch of stuff happens and you get traits. The details are either too esoteric for the average person to care about, or are too sketchy to provide a good description o
MartinC says
Greg, that definition is SO twentieth century. A gene doesn’t necessarily have to encode a protein. What about non-coding RNA genes like microRNAs or cis-antisense genes ?
sparc says
What do you expect? We just arrived in 2007!
‘What is a gene?’ will still be a valid question in 50 years and the answer will indeed contain much 20th century stuff in the same way as modern evolutionary biology relates to 19th century biology.
Larry Moran says
MartinC says,
I think you missed the point. There is no definition of a gene that covers all the known examples. The best we can do is propose a definition that accounts for most of the common examples. The others will be exceptions.
We could, I suppose, propose that “a gene is a DNA (or RNA) sequence that’s transcribed” but this doesn’t begin to encompass all of the qualifiers that are necessary. Why focus on the few examples of RNA genes and ignore more significant exceptions, such as operons?
Larry Moran says
sparc, you missed the point of my question. You can still use regulatory sequences to identify the location of a gene even if a gene is defined as a DNA sequence that’s transcribed. I don’t see why eliminating regulatory sequences from the definition all of a sudden makes it more difficult to find genes in a genome.
You often hear scientists talk about regulatory sequences that control transcription of the gene. Do you ever hear scientists refer to regulatory sequences that control transcription of part of a gene? Why not, if regulatory sequences are thought to be included in the definition of a gene? In that case, wouldn’t the second statement be accurate and the first inaccurate?
sparc says
Yes.
1. Have a look on the 5′-end of mammalian igf-2 genes. Human and sheep contain four differentially used promoters/first exons. In rodents three of them are conserved.
2. Interleukin 1 receptor antagonist in mammals: There are two isoforms, a secreted and an intracellular form, respectively, that are formed by diferetial transcription of the parts of the gene.
3. The murine androgen receptor gene contains a second functional promoter within its first exon.
MartinC says
Larry, I’m not ignoring operons, I’m simply stating that any definition of a basic term like gene should be able to include up to date information about what we now know about these molecular units of heredity and should be inclusive (of both operons and RNA encoded RNA genes) rather than exclusive. As you pointed out earlier biology is messy, and it may indeed be the case that there is no simple descriptive definition for gene – what we end up with may not sound very clear to those unfamiliar with genetics or molecular biology (compared to the old ‘DNA that encodes an RNA that produces a protein’ type definition). Unfortunately we probably cant help that, any better than quantum physicists can define their subject in ways the public can easily understand. What you seem to be argueing for is a definition of the most commonly accepted forms of genes (‘DNA encoded genes’ or ‘protein encoding genes’ for example), and I have no problem with that so long as it is not seen as the comprehensive definition of all molecular units of heredity. I prefer my gene exceptions to test the rule rather than defy it.
Peter Ellis says
I don’t think “A unit of DNA that is transcribed” is a useful definition – for the simple reason that there’s too much of it! There are plenty of documented examples of intergenic transcription, and plenty of examples of transcribed pseudogenes.
But by the same token, we can’t restrict it to protein coding genes. The majority of functional RNA in the cell is non-coding – even if you restrict yourself just to the ribosomal RNAs and tRNAs, let alone the various other types of ncRNA species.
I think that we have to ultimately fall back on function – we could define a gene as a stretch of DNA that is transcribed and which has a phenotypic effect when transcribed.
That would include protein coding genes and non-coding genes, but exclude transcribed pseudogenes. Unless we find that the transcribed pseudogene has an effect (e.g. on regulation of the cognate coding gene) – in which case I certatinly think we *should* upgrade its status from pseudogene to non-coding gene.
One final qualification is necessary. Consider that intergenic transcription is thought to have effects on chromatin structure (one hypothesis is that a way of clearing out old histones and replacing them with new variants is to send down an RNA polymerase to “bulldoze” the old ones out the way). Under the above description, this would be included as a “gene” – a useless definition as some substantial fraction of the entire genome is translated at some point (albeit at extremely low levels).
A way of distinguishing genic from intergenic transcription is that intergenic transcription is thought to lead to fairly general chromatin effects, not dependent on the DNA sequence. Any phenotypic effect thus wouldn’t be expected to change due to mutations in the intergenic region.
So, a final description, and the best I can offer, is that “A gene is a unit of DNA which is transcribed, has a phenotypic effect when transcribed, and whose phenotypic effect is altered by mutation”
Note that this takes us to something resembling the Dawkins definition – which you’re all misinterpreting hideously. He is *not* saying that “one gene, one function” is always true – it’s simply that at the start of the paragraph he explicitly says that he’s considering the case of single-gene effects, presumably for simplicity’s sake.
sparc says
We observed a phenotypic effect (complete loss of gene function in the homozygous state) when we placed a loxP site in the promoter of a murine gene. However, by your definition the promoter would count as intergenic.
I am fully aware that the definition of promoters (as well as transcription termination signals at the 3′-ends of genes) are quite weak themselves and it hardly impossible to define the 5′-end of a promoter and the 3′-end of a transcribed (not to be mistaken as polyA signals) seuqence in mammals.
I am also aware that what I call a gene is described by other people as ‘transcription unit’, ‘locus’ or in the case of the presence different variants in diploid sexually reproducing organisms as ‘alleles’. In my work I prefer ‘allele because this allows the distinction of the wild type allele from alleles that we have mutated. Unfortunately, ‘allele’ (and ‘homozygous state’ as you suggested) is not usable for prokaryotes.
Greg Laden says
MartinCL: Greg, that definition is SO twentieth century. A gene doesn’t necessarily have to encode a protein. What about non-coding RNA genes like microRNAs or cis-antisense genes ?
The more I think about it the more I deeply disagree with you. To be fair (to myself) the definition I posted here is the watered down simplified version without the details which is the one being requested. The fuller definition is on my blog, and addresses your issues with a sort of compromise. (that I don’t actually like)
But here is why I don’t agree with expanding the definition of a gene to include non-coding RNA etc. etc.:
That expanded definition is a dog being wagged by the tail. Once upon a time, in the 20th century, we figured out that the units of inheritance were stored in the DNA molecule. So, the definition of the gene included DNA, details of the encoding of the gene in the DNA, and details of the expression of the DNA as a product related to the traits that we knew were inherited … in other words, gene expression roughly equaled the information end of protein synthesis.
Everything was fine until science progressed, so now in the 21st century we know that the DNA system is more complex.
The dog is the definition of the gene, the tail that is wagging it is comprised of two things: Anything anybody figures out that DNA does, and every detail and exception to any central definition or model. Forget about the second aspect and focus on the first.
Do you know that the total amount of DNA in a species-typical cell correlates (no, not just correlates, relates functionally to, so I don’t wanna hear any “correlation does not equal etc… remarks”) to the size of the cell, and thus to the metabolic efficiency of the cell, and thus with metabolic implications for the whole organism? This is probably why some animals have genomes where the genes take up most of the space and there is little or no “junk” etc. Even some of their genes may be shorter and they may nave fewer functional duplicate genes. Birds that fly have nil junk DNA, birds that don’t fly have junk DNA. Bats have nil junk DNA. Etc.
If the definition of the gene requires that everything known about DNA is included in it, then this apparent fact needs to be somehow shoehorned into the definition of a gene. How the heck do you do that? What does that get you?
If you find out (the following is very made up:) that some DNA is used to store energy… so in a muscle cell that is stressed for energy some of the DNA is sliced out and used for energy … would you feel the need to incorporate that into the definition of a gene? I would hope not. That would be included in your definition or description of DNA.
It is simply not the case that the definition of a gene needs to be the definition or description of all of the properties of DNA. The genes are the units of inheritance. In order for a gene to be expressed there needs to be a supply of amino acids, there needs to be a bunch of enzymes, there needs to be specific features of organelle structure and arrangement to get at the DNA, build the protein … The nucleic membrane, even the cell membrane are involved in some way in this expression. Among this list of things are the non-coding RNA’s There are many things related ultimately to expression. We don not include foraging behavior that leads to ingestion of specific amino acids in our definition of a gene. Yet, by not including that in the definition of a gene, it is not the case that we don’t know it is happening. We do not feel the need to incorporate all of biology into the definition of the gene!
If the non-coding RNA’s were actually separate organelles that simply duplicated themselves during cell division and had no template in the DNA, the approach you are taking would not require their existence to be accounted for in the definition of the gene other than in their functional role. If DNA was used for energy sometimes then according to the way you are building a definition you would have to include energy storage in the definition of a gene.
The DNA tail is wagging the gene definition dog.
Genes are units of inheritance. DNA is a big-‘ol molecule. They are not exactly the same thing. This is not a 20th century idea!
Requiring every single thing that DNA does in the definition of the gene is like … requiring our definition of intestines to include tape worms and sausages. We can certainly discuss tape worms and sausages in along side intestines, but they are not part of the basic description.
MartinC says
Greg, I think we both agree that genes are units of inheritance but I think you are mistaken to limit your particular description of the term gene to one particular unit of inheritance and not another. Nobody in this debate, as far as I can see, are argueing to include wierd non discovered things like energy containing DNA as part of the description of gene, but it may be a mistake to exclude biological entities that we know exist, that code for specific intracellular molecular functions, and which are the results of and subject to evolutionary change.
Your tapeworm analogy is highly debatable, nobody claims a tapeworm is an integral part of a stomach but try typing ‘microRNA gene’ into pubmed to see if any scientists seem happy with that wacky juxtaposition.
Keith Douglas says
Suezboo, I have also noted that many good introductory level college textbooks do at least the “chemistry to ecology” swath, and a chemistry book at the same level often does atomic constituents to biochemistry. (All simplified, of course, but they together should cover the big picture.) Note the overlap, too. I still have my Campbell and Zumdahl from my intro courses, and they are still valuable (despite being, no doubt, half a dozen versions or so behind or something).
Griststone says
Larry Moran said:
Sparc said :
If these statements are true, I would argue that this portends the end of the “gene” as a definable scientific concept, and all that remains is for a Copernicus to step forward and usher in the new model.
Greg Laden said:
The definition should not require this if a gene differs in some important way from DNA considered generally, which surely it does. What is it that distinguishes the genic unit of heritability from the larger set of genetic material?
Peter Ellis said:
I like this definition. It is concise and comprehensive. I am looking some way to visualize a gene that does not resort to the atomic fallacy, where the gene is some little pellet in our cells that makes our eyes blue, or gives us Alzheimers. I suppose I ask for too much, as genic influence is obviously a maddeningly complex process.
Greg Laden says
MartinC:
I am not convinced that DNA templates for RNA molecules should NOT be included in the definition of a gene (as I have said). I think a perfectly functional definition can exist that excludes them, and it is a neater, easier to use definition for some purposes. The definition I wrote states that they are sometimes included and sometimes not.
But I still want the argument to be considered and taken seriously: it is a legitimate question. A “regular” (as opposed to “regulatory”!) gene is transcribed and translated into a protein. The “expression” and function of non-coding RNA is vastly different in many important ways. That is reason to consider narrowing the definition of a gene and adding new terms or concepts to our overall model of DNA to accommodate new findings.
You might be right about regulatory sequences being better in the definition. But let’s be clear that my suggestion is not 20th century, which implies something fairly negative. I mean, you really hurt my feelings with that and it’s going to take a while to get over that… (may require years of therapy) I’m trying to help the definition work better as we move into the next century, thank you.
My point is that not whether or not we should include biological entities that exist vs. not, but rather, that just because a biological entity exists and is part of DNA or RNA molecules is not in and of itself sufficient justification for being in the “gene definition” … any more than a “corporate report to the stockholders” of a corporation that makes cars should be in the definition of “automobile.” I doubt we are disagreeing on this.
If being subject to evolutionary change is criterion for being in a gene then there are things that not even the most inclusionist (?) definer would ever include, such as the protein coats of Giardia. I agree with you that if an entity is not subject to evolutionary change it probably should not be in the definition.
MicroRNA, Rhibosomes, antisense RNA etc. and all non-coding RNA molecules can fall into a category of regulatory molecules that are templated on the DNA. They come into existence in a way that is utterly different than how a protein comes into existence. As a whole they have dramatically different kinds of functions than regular gene coded proteins. They are so important and so different that they deserve better…
Going back to guts: Exactly: nobody claims a tapeworm is part of a gut, yet microRNA templates are called genes. This is indeed my point! Sticking with guts: Calling regulatory RNA molecules genes is like calling the pancreas part of the intestines. The pancreas is related to digestion. But it is not the intestines. (don’t know why these analogies come to mind … just a gut feeling, I guess)
Cheers,
GTL
MartinC says
Greg, I guess we may be talking cross purposes here. My own personal view as to what a term like ‘gene’ should encompass is much more similar to Dawkins ‘replicator’ definition than the central dogma style DNA that codes for an mRNA that produces a specific protein. Most functional RNAs are regulated at a transcriptional level in a manner not too different from protein encoding transcripts. Indeed many non-coding RNAs carry out functions analagous to proteins (forming scaffolding for ribonucleoprotein complexes, modifying target mRNAs etc). They are non protein encoding however they are genes.
Greg Laden says
I totally agree with having, and firming up, the concept of a replicator. Once you take replication out of protein synthesis, of course, you have to change the way you think about selection, neutral theory, etc. mechanistically (i.e., Lemarck returns from the dead, etc.) but that’s all good.
Another thought I had earlier but have not worked into anything yet is to define the product of a gene as a polymers. Some of these polymers are proteins, others are RNAs
MartinC says
Greg, I think the central problem with the DNA/protein idea is that these particular biomolecules probably became involved in biological processes rather later than other polymers such as RNA. The discovery that the catalytic reaction that forms peptide bonds within ribosomes involves RNAs rather than protein encoded enzymes certainly suggests as much. To involve protein encoding as a central point in the definition of a gene is a bit like talking about the first Americans as Columbus and his crew and forgetting that there was a lot going on before these late arrivers turned up on the scene.
Greg Laden says
Good point. I’d like to know more about these late action polymers.
sparc says
Another reaseon why I would include regulatory sequences in the gene definition:
Upon gene duplication the resulting paralogs should develop different function or different spatial and/or temporal expression patterns because otherwise one of them is likely to be turned into a pseudogene. Different expression patterns can be achieved by mutations in the regulatory regions of either gene. Even if the coding sequences remained unchanged everyone would recognize them as seperate genes.
Greg Laden says
Sparc:
This is a good point, and relates to one thing I had referred to earlier: The moment one passes beyond the very first steps in gene expression, one involves many many other components. A quarterback is nothing unless Spalding makes a ball, Wilson makes a helmet, there is an offensive coach, somebody builds a stadium, etc. etc. Especially if he is a Viking… (no stadium, no coach, I think the ball is defective…)
sparc says
You may find quite some links to gene definitions at http://www.genomicglossaries.com/content/gene_def.asp
(it is not necessary to fill the registration form, you will find the gene definition part below it)
Mouse Genome Informatics at the Jackson labs for example applies the following definitions:
Obviously, these definitions are quite methodology driven.
Please note 4., which refers to Immunoglobulin heavy and light chain v-genes and T-cell receptor genes that encode functional products only after successful rearangement.
sparc says
Sorry, I made a mistake with the “blockquote”, it should look like this:
You may find quite some links to gene definitions at http://www.genomicglossaries.com/content/gene_def.asp
(it is not necessary to fill the registration form, you will find the gene definition part below it)
Mouse Genome Informatics at the Jackson labs for example applies the following definitions:
Obviously, these definitions are quite methodology driven.
Please note 4., which refers to Immunoglobulin heavy and light chain v-genes and T-cell receptor genes that encode functional products only after successful rearangement.
Jonathan Vos Post says
I find this an extremely good first cut at definition. I think that a difficulty comes from a paradigm shift going on. Mathematical Systems Biology and Computational Systems Biology are providing a change of emphasis, and probably something deeper. To give just one word from the new paradigm: “eigengene.”
The definition in this blog will probably look as silly in a 2107 (which may have human reproductive cloning, Strong Artificial Intelligence, and full-blown Nanotechnology) as “one gene, one protein” does in 2007. But from where we are today, a big step from what’s in most college textbooks today.
sparc says
Another issue one should keep in mind is that genes are categorized according to the polymerase used to transcribe them (PolI, PolII and PolIII genes). Since each polymerase requires distinct promoter sequences (and different termination signals) it appears useful to include regulatory sequences in the gene definition.
sparc says
Another gene categorization that refers to the promoters of genes is the distinction of housekeeping and tissue specific genes although the distinction between housekeeping and tissue specific genes was originally made on the basis of expression patterns. In addition, the differentiation based on promoter structures derives from a 1986 paper when only few gene sequences were known (Dynan WS. Promoters for housekeeping genes. Trends Genet. (2) 196-197, you will find it here: http://www.sciencedirect.com/science?_ob=MImg&_imagekey=B6TCY-47PRCP2-3M-1&_cdi=5183&_user=104184&_orig=search&_coverDate=12%2F31%2F1985&_sk=999989999&view=c&wchp=dGLbVlz-zSkzk&md5=cb810b442b5704651c6320036c8bb456&ie=/sdarticle.pdf
Strangely though, you will not find it in Medline or Scopus)
Unfortunately, lateron GC-richness and lack of a TATA-box was occasionally used to ascribe housekeeping function to genes for which no expression data were available and that turned out later to being tissue specifically expressed. Thus, I must admit that beside the distinction of PolI, II and III genes currently promoter based gene definitions are not satisfying. However, promoter sequences can be quite telling. E.g., crystalins in vertebrates are derived from completely different ancestral genes encoding different enzymes without any optical functions. However, AFAIK all crystallin promoters contain pax6 binding sites as do other genes expressed in eyes.
sparc says
Sorry, I gave you the wrong link. It should have been this one:
http://www.sciencedirect.com/science?_ob=MImg&_imagekey=B6TCY-47DT94T-2S-1&_cdi=5183&_user=104184&_orig=search&_coverDate=12%2F31%2F1986&_sk=999979999&view=c&wchp=dGLbVtz-zSkzS&md5=3f82acdec4fa4764f6654a8e135c4dbb&ie=/sdarticle.pdf
Peter Ellis says
Another issue one should keep in mind is that genes are categorized according to the polymerase used to transcribe them (PolI, PolII and PolIII genes). Since each polymerase requires distinct promoter sequences (and different termination signals) it appears useful to include regulatory sequences in the gene definition.
Er, what the hey?
You can categorise genes many ways. Don’t mistake defining what the category is for defining what the gene is. In this case, the promoter sequences and termination signals obviously form part of the definition of the category.
But to say we should include that as part of the definition of the gene makes no more sense than to say we should include size as part of the definition of “a book”, simply because I happen to file some of my books by size.
sparc says
What you describe is a pile of paper glued together. IMO a book requires a cover besides pages.
Hard covers and paper backs are different categories. Still, a definition of books should allow to identify members of both categories.
Henry Gee says
“I realized that there is no such thing as a simple concept in biology”
Evolution by Natural Selection is as simple as simple can be. Heritable variation, oversupply of offspring, and lots of time. What could be simpler?
Makes you wonder what the fuss is all about, really :)
leejoe says
this is just one part of gne to made.there’re trillions trllions and trillions genes not find ways
j
Candy DeBerry says
The legend for the figure of the molecular apparatus controlling transcription in human cells is incorrect. It states “(The numbered proteins are the names of subunits of RNA Polymerase II. Each subunit is named according to its molecular mass in kilodaltons.)”
The numbered proteins are NOT subunits are RNA Pol II. They are different TAFII proteins (TATA-binding protein Associated Factors for RNA Pol II). Together, the TATA binding protein (TBP) and the TAFIIs make up general transcription factor TFIID.