General Biochemical Patterns in Purine Biosynthesis and Protein Relatives. Part 3

Stage 1, formation of the first purine ring (a 5-membered C and N ring is a pyrrole) is not only complete, I have covered a bunch of protein families I can refer back to.

I did not expand on the relatives of one family that I will expand on in this post though. The preATPgrasp domain relatives are a group I wasn’t sure how I wanted to discuss yet.

Metabolism broadly.

This is also a good point to mention that purine biosynthesis merges with 2 other metabolic pathways here. AIR can be made into the 6-membered ring (a pyridine) of thimine, HMP. What I did not put on the big drawing this time was that AIR can also be made into the “lower ligand” of vitamin B12. A ligand is an atom or molecule that binds to a molecule or protein to help carry out its function (the cyanide groups in the last post were ligands). B12 itself is a chelator of cobalt. But there are other lower ligands (potentially less origins related) and I was running out of space.

Stage 2: binding and movement of bicarbonate.

Life has 2 ways of doing the first part of stage 2 but I used the more complicated route because the direct binding of carbon dioxide to where bicarbonate moves to (by class 2 PurEs) is only in animals.

Stage 2.1: bicarbonate binding.

PurK is the protein that binds a bicarbonate to the nitrogen sticking out from the ring. And we have already seen these protein subdomains. It’s preATPgrasp, ATPgrasp, and Carbon monoxide dehydrogenase molybdoprotein N-like again. So for most of what is interesting I will refer to a previous post.

As an observation I would point out that all that is really needed to make these reactions work so far is phosphate. Every reaction but one so far uses a phosphate in active site chemistry, and the exception is the PurN/PurT step where life can add formate without phosphate, and yet PurT uses phosphate anyway in other organisms (E.coli has both).

What I will discuss this time are the relatives of the preATPgrasp domain. This fragment and its family members are in their own topology bin and there are 13 other topology siblings within a larger Rossman-related homology group, inside of a Rossman-like bin, inside of the A+B 3 layer sandwiches.

There are a lot of Rossman-related domains. If I had to stereotype the group based on the associations I would label them “nucleotide binders with a bias for Adenine containing cofactors”. But that is not based on structure, just looking in the bins and they don’t exclusively bind nucleotides.

The first 4 topology groups to mention are: a NAD(P) (niacin) binding Rossman-fold domain, a FAD (riboflavin)/NAD(P) binding domain, a DHS-like NAD/FAD binding domain, and a “Nucleotide binding domain” bin that seems to do all of the above. These are all electron/proton dispensers/acceptors, and the first 2 contain many families, too many to go over in detail. Glycosyltransferase Mag is flagella related so I left it out.

But there is a whole area of biochemistry related to “nucleotides as handles” that I saw inside of the above bins that I haven’t covered on the big drawing. You see Adenosine (adenine-ribose-phosphate) used as a handle for cofactors all over the big drawing, and some of the other bases are used as handles in the molybdenum cofactor part of the drawing. But all of the bases have jobs as handles and there is a range of things related to membranes and sugars that adenosine, guanosine, uridine, cytosine, and thymidine are involved in as handles.

The 5th topology bin is S-Adenosyl-Methionine-dependent (SAM) methyltransferases. This also contains many families and the adenine-ribose-phosphate would be a common denominator for the first 5 bins so far. SAM is not just for methyl transfer though, the adenine-ribose can be an “adenylyl” modification (part of the molybdenum cofactor biosynthesis, sulfur carrier step). Most of methionine can be bound to something as homoserine leaving methyl-sulfur-ribose-adenine as well. As far as I can see this domain is all about SAM methyl transfer.

Tubulin nucleotide binding domain-like is very interesting. The cytoskeleton uses GTP hydrolysis on a timer to determine if the subunits like actin are monomers or polymers. And actin is an intracellular highway for cell contents.

The NagB/RpiA/CoA transferase-like domain has a lot of interesting things. Some transcription repressors, 5, 10-methlenyltetrahydrofolate synthetase. Related to fatty acid synthase there are coenzyme-A transferases and acetyl-CoA hydrolase (makes acetate). Citrate lyase is in the TCA cycle. IF-2B is a translation initiation factor. Finally there is glucosamine-6P isomerase and very interestingly ribose-5P isomerase. That interconverts ribose and ribulose in the pentose phosphate pathway.

The MurCD/PglD bin contains things that work with UDP-sugars and derivatives. MurC and MurD attach an alanine and a glycine to UDP-N-acetylmurmate in making bacterial cell walls.

The activating enzymes of ubiquitin-like proteins, or “E1 enzymes” are mostly studied in eukaryotes where ubiquitin targets proteins for degradation in the proteasome. But the general machinery exists in prokaryotes too and the activation process involves an adenylyl modification of the ubiquitin-like protein (UBL), where the adenylyl is replaced by the E1 (E1-UBL link) prior to attachment of the UBL to another protein, “activation” of the UBL. ThiF (thiamine biosynthesis) is in here, and it activates ThiS (on the big drawing) in a similar manner to exchange the adenylate with sulfur. That is something I want to add to the big drawing like I did with molybdenum cofactor biosynthesis, and MoeB (molybdenum cofactor biosynthesis) is in this bin too. No NAD(P)/FAD/SAM in here.

Ubiquitin-like Protein Conjugation: Structures, Chemistry, and Mechanism Cappadocia 2017

The formate/glycerate dehydrogenase catalytic domain-like topology bin is interesting because every catalytic region could be relevant, and especially for smaller, more origins relevant molecules. There is S-adenosyl-l-homocysteine Hydrolase (methionine biosynthesis), and alanine dehydrogenase (alanine to pyruvate). Lysine oxioreductase, and dipicolinate synthase both in lysine metabolism. Glycerate, formate and lactate dehydrogenases. Also something called nicotinamide nucleotide Transhydrogenase. All of these require NAD(P) so it this may still just be cofactor binding regions.

The UDPG/MGDP dehydrogenase C-term topology bin doesn’t have much but it’s all nucleotide-sugar proteins. UDP-glucose is central to carbohydrates and polymers of them for storage and membrane parts. And at least one requires NAD.

Finally there are the aspartate/ornathine carbamoyl transferases. PyrB is in the pyrimidine pathway.

Stage 2.2: moving bicarbonate over 2 atoms.

PurE1 doesn’t use phosphate as mutase reactions change the structure of a molecule without adding or removing things, it’s a low energy process and doesn’t need ATP. The proteins and their various amino acid side chains can hold the molecule just right, and alter charge distributions such that the bicarbonate moves over. And this a protein domain family that I’ve already covered, the A+B3LS/Flavodoxin-like/GATase-like domain is the entire protein.

And bicarbonate is finally something in common with the pyrimidine pathway. Bicarbonate would have been in primordial oceans and would have seeped into the rock with water as well as contact with the outside of hydrothermal vent minerals.

General Biochemical Patterns in Purine Biosynthesis and Protein Relatives. Part 2.

In this post I will be looking at the rest of the proteins in stage 1 of purine biosynthesis, the related PurL and PurM. To this point an ammonia dispenser in PurF provided an ammonia that replaced the diphosphate on ribose in PRPP, an ATPgrasp ATP phosphate dispenser used a phosphate to add a glycine, and then a formate was added to the nitrogen on glycine with PurT/PurU, or PurN. This leaves us with FGAR, formylglycinamideribonucleotide.

Now I add steps 4 and 5. Swapping the glycine carbonyl with an imine, and closing the first purine ring.

Step 4: PurL and swapping the glycine carbonyl with an imine via phosphate.

PurL is a large protein, the largest on the drawing. To me large and usually complex means old. This protein uses a phosphate from ATP to swap the glycine carbonyl oxygen (C=O) for an imine (C=N). The ammonia again comes from glutamine. Even though it is probably old parts of it can be deleted for fun if I assume ammonia is in the vent.

PurL has 5 individual domains, 2 present in duplicate. I consider some parts more important than others. In the vent the entire glutamine ammonia dispenser might be unnecessary.

  • An Alpha Arrays/RuvA-C-like domain (needs corrected from Beta Barrels on the drawing), the PurL linker domain.
  • An A+B2L/Alpha-Beta Plaits domain, the PurS-like domain.
  • An A+B3L/Bacillus chorismate mutase-like/PurM-N-like domain (present in duplicate).
  • An A+B2L/Alpha-Beta Plaits/PurM-C-like domain (present in duplicate).
  • An A/B3LS/Flavodoxin-like/Glutamate amidotransferase class 1 domain (GATase1).

RuvA-C-like domain

Starting with the Alpha Arrays/RuvA-C-like domain. This is a linker domain that acts in long range communication between parts of the protein. It’s all by itself in its homology group so that’s that.

PurS-like domain

This second domain in PurL is an Alpha + Beta 2 Layers domain called “PurS-like” because it is sometimes a separate protein, but always required for PurL function so in that case they call it “PurS”. It’s function isn’t well known but it isn’t the part that makes ammonia or holds FGAR, maybe it just holds the other 2 domains together. This is also in its own homology group all alone.

PurM-N-like domain

This domain is in the “bacillus chorismate mutase-like” X group, which may be related to the N-terminal part of PurM/parts of PurL. And maybe not. There is an interesting set of things in here and I consider this part and the following PurM-C-like domain to be the most important.

First is HypE which is involved in maturation of hydrogen utilization proteins with complex iron-nickel centers, hydrogenases. They can turn H2 into 2 protons and 2 electrons and back again. When it comes to the concept of energy, cells need to have and control a pool of protons and electrons, using things like niacin and riboflavin.

HypE binds a carboxamido (N-C=O) on it’s C-terminal cysteine, that came from carbamoyl phosphate (as in pyrimidine biosynthesis), and delivered by HypF. ATP is then used to dehydrate it to a cyanide (nitrogen triple bonded to carbon). This cyanide and a second is then bound to an iron atom (On HypC-HypD) used to make the finished iron center that is combined with a nickel atom.

Structural Insight into [NiFe] Hydrogenase Maturation by Transient Complexes between Hyp Proteins Miki 2020

Selenide, water dikinase, also known as selenophosphate synthetase, SelD. Synthases and synthetases both ligate (connect) things together but synthetases need a source of energy like ATP.

This combines selenium and phosphate to make selenophosphate which is the form used by cells. Selenophosphate is then mostly used as the 21st amino acid in a system that replaces serine with sersec (Sec, U) on the tRNA, or an RNA modification. The serine translation system is used and individual sec replacements are made during translation, hence the amino acid sersec. This is in bacteria, archaea and eukarya.

The selenophosphate synthetase family: A review Manta 2022

Finally thiamine monophosphate kinase (ThiL, on the big drawing) makes mature thiamine diphosphate.

I type mature because as far as I have found thiamine and thiamine monophosphate do nothing. It is thiamine diphosphate that is mostly used by cells, with some thiamine triphosphate and adenosine thiamine diphosphate. Those last 2 are somehow associated with glucose availability and amino acid starvation, and the opposite, amino acids and no glucose, respectively.

Update on Thiamine Triphosphorylated Derivatives and Metabolizing Enzymatic Complexes Bettendorff 2021

PurM-C-like domain.

And just like that we’re done with this domain because again we have PurL, PurM, HypE, selenophosphate synthetase, and ThiL. The only difference is this part of the protein is an Alpha+Beta 2 Layers and then in the Alpha-Beta Plaits X group, within which there is no guarantee of relationships outside of the general alpha helix and beta sheet organization.

Class 1 glutamine amidotransferase domain.

This is the glutamine ammonia dispenser for this protein. GATase class 2 we saw with PurF, a GATase2. The larger family this is a part of, the X:Flavodoxin-like and H: GATase1-like A+B3LS domains, appears many times on the big drawing. There is a lot to point out here. 20 related topology bins are siblings with the GATase1-like bin. And that topology bin has a lot besides the GATase family.

First the GATase1-like topology group itself has a lot of things already on the drawing. The current PurL, GuaA, CarA, PyrG, AroE, TrpG, PabA, FolD HisH, MetA and Pdx2. There are more but they are in different topology bins. This topology group also has protease 1, peptidase S51, peptidase C26, peptidase S66, catalase 1 (breaks down peroxide), isocyanide dehydratase, and beta-galactosidase.

The same topology bin as AroE and FolD (amino acid dehydrogenase-like-N) also contains Glu/Phe/Trp/Leu/Val (amino acids) dehydratases, malate oxioreductase (makes pyruvate from a TCA cycle intermediate), and at least 2 pterin (folate relative) proteins, methylene tetrahydromethanopterin dehydrogenase and tetrahydrofolate dehydrogenase/cyclohydrolase.

Pterin related, in its own topology bin is F420 dependent methylene tetrahydromethanopterin dehydrogenase.

In separate topology bins and also on the big drawing we have PurE, RibH, AroB (in a bin with glycerol dehydrogenase), and AroQ.

Things that grab things and move them is a theme for the Periplasmic Binding Proteins-like 1 and Chelatase- like bins. In bacteria with double membranes the space between them is called the periplasm. Periplasmic binding proteins bind things like amino acids (leucine here) and sugars (glucose, galactose, arabinose here). Chelatases bind and insert tetrapyrroles like heme and chlorophyll, which in turn often bind, “chelate” metal ions. This bin also contains iron-molybdenum binding proteins, zinc transporters, cobalt chelators, vitamin B12 transporters, a iron chelators.

Relatedly in its own topology bin is uroporphyrinogen 3 synthase, or HemD. This protein turns a linear chain of 5-membered rings into a circular ring heme/chlorophyll precursor, uroporphyrinogen 3. These often feature a chelated metal ion in mature form.

A last grabber/holder is the ATC-like topology bin, where the “ATC domain” contains a bunch of cysteines in the sequence which likely hold iron-sulfur centers, and is related to aspartate and glutamate racemases.

In a topology bin by themselves is phosphofructokinase and NAD (niacin) kinase. A glycolysis protein and the protein that makes NADP from NAD, and thus switches niacin to the anabolic form from the catabolic form.

The CheY-like bin is named after a chemotaxis protein, and contains pili related proteins, HydB (a NiFe hydrogenase), sensor kinases (RcsC), a vitamin B12 binding domain which is in MetH (big drawing), and ornathine decarboxylase.

In a topology bin by themselves there is fucose and arabinose isomerase. Another separate topology bin contains YfiR which makes cyclic di-GMP. Yet another topology bin contains IlvD (isoleucine, valine, and coenzyme-A biosynthesis) and 6-phosphogluconate dehydratase.

And finally the interesting FabD/lysophospholipase-like topology bin contains acyl carrier (FabD) and transfer proteins. It also has parts of fatty acid synthase. Binding and carrying coenzyme-A is a big theme. Lysophospholipases separate fatty acids from the glycerol backbone.

Step 5: PurM

PurM closes the ring via a phosphate from ATP, and the same basic reaction mechanism as PurL.

Thanks to PurL, I’m done with PurM since it’s the same 2 fragments shared with HypE, selenophosphate synthetase, and ThiL.

That’s it for stage 1 of purine biosynthesis, forming the first ring.

General Biochemical Patterns in Purine Biosynthesis and Protein Relatives. Part 1.

In this post I want to start expanding on the kind of analysis I did in a previous post with the protein that adds glycine in purine biosynthesis, PurD. I looked at what proteins are related to each of the subdomains and noted the kinds of chemistry and molecules involved, and did some fun speculation. A family of phosphate dispensers and very small molecules with HO-C=O as part of their structure.

And these are still puzzle pieces so it’ll be like a list in places where there might be significance, but I don’t know much about them. First some words about the kind of data on the site I primarily use to look at the relatives of parts of proteins.

Evolutionary Classification Of protein Domains and bin importance: AXHTF.

ECOD and similar archives of data relating to the relatedness of nucleotides or proteins have hierarchies of bins to group by relatedness. The largest bin at ECOD is the Architecture or “A” bin which groups by the most general arrangements of protein 3D structure, the Alpha Helix  or the Beta Sheet, and how those relate.

Top Left: water is tetrahedral, polarized (protons+, electrons-), and constantly jostling around. Top Center-Left: a tripeptide has free rotations around it’s bonds and isn’t just abstract and straight, things rotate around in the water. Top-Right: a dipeptide showing more parts of the molecule than I typically use to indicate where positively and negatively polarized regions are (“charged” is for ions). Center: helices (like the Alpha Helix) are one common shape charges allow polypeptides to take. Bottom: sheets (like the Beta Sheet) are another common shape charges allow polypeptides to take. Most structure at this level are kinds of sheets and helices.

So you get things like: Alpha/Beta 3 layer sandwiches defined as “repeating beta-alpha units form a sandwich with a mainly parallel beta-sheet layer stacked between two alpha-helix layers” or Alpha+Beta 2 layers. These outermost levels can be all related but they can just as easily be independently arrived at shapes. The same is true of the “X” group which is “possible homologs” where homologs are related by sequence. The same problem exists in A and X but things might be related in X.

Below the X group we get the “H” or homology group where things are explicitly related based on sequence. Within H there is “T” or topology which means they have the same sequence but there are differences in structure. Finally “F” or family are protein subdomains most closely related to one another.

HTF are where I look the most.

Purine biosynthesis, Adenosine and Guanosine.

I see 5 stages and 13 steps in purine biosynthesis.

  1. Formation of the first ring (5 steps).
  2. Carbon addition and movement (2 steps).
  3. Aspartate addition and fumerate removal (2 steps).
  4. Formation of the second ring (2 steps).
  5. Adenosine or Guanosine monophosphate formation (2 steps, separate paths).

I will be limiting myself in this post to most of Stage 1: Ribose phosphate, glycine, formate and ammonia join up. This post got longer than I thought it would for all of stage 1 but building up ring closing as a specific thing could be useful.

The first 3 steps of purine biosynthesis.

Step1: PurF

This protein exchanges an ammonia for a diphosphate from Phospho-ribose-diphosphate (PRPP), making phospho-ribosyl-amine. This pattern of using phosphate to swap for other things occurs over and over. PRPP was made from ribose-5-phosphate with ATP by the gene Prs, which is related to the second domain in this protein. Cells have a PRPP pool because as the drawing shows, PRPP has multiple uses.

Phosphoribosyl Diphosphate (PRPP): Biosynthesis, Enzymology, Utilization, and Metabolic Significance. Hove-Jensen 2016

Also known as amidophosphoribosyltransferase (you see why I prefer short protein names? I usually like descriptive names), PurF has 2 subdomains:

  • An Alpha+Beta 4 layer (A+B4L)/NTN hydrolase domain at the end with the nitrogen (N-terminal, defined as the beginning of the sequence)
  • An Alpha/Beta 3 Layer Sandwich (A/B3LS)/PRTase-like domain at the carboxy or C-terminal end.

The “amido” refers to the fact that the ammonia comes from the amino acid glutamine, which becomes the amino acid glutamate after removing the nitrogen in the first step of PurF activity carried out by the NTN-hydrolase domain, making phospho-ribosyl-amine (PRA).

The A+B4L subdomain is in the N-terminal nucleophile aminohydrolase (NTN hydrolase) homology group, and class 2 glutamine amidotransferase topology and family groups (GATase2). In terms of primordial chemistry, the chemistry of an end of the protein chain is a significant thing. This N-terminal nitrogen protein superfamily has an affinity for nuclei (nucleophile) and breaking bonds with water (hydrolase). Without the need for ATP this protein domain removes the ammonia and that reactive molecule floats over to PRPP which is reactive towards ammonia.

The AB3LS domain at the C-terminal end is in a transferase subgroup of these proteins, PRTases, where Prs is a synthase. This subdomain essentially just holds the ribose-5P end. In fact at this point PRTases are just ribose phosphate holders to me. This particular family has one other synthase for OMP, and a bunch for purine and pyrimidine bases in general. The interesting part for PRTases is that the ones for niacin, histidine, and tryptophan are in different protein subdomain families on ECOD.

The rest of the entry for PurF will be on the A+B4L subdomain and what’s interesting there. In the sibling homology group to the NTN aminohydrolases there are protein serine/threonine phosphatase 2C members, specifically the part of the protein that does the removal, the catalytic domain. Phosphatases are proteins that remove phosphate from things, here serine or threonine. With all of the phosphate around a related family of phosphate removers is interesting. Looking at them more closely is a plan.

Inside the NTN hydrolase homology group there are 6 related topology groups, one of which is the GATase 2 group. Outside of GATases there are some interesting things. “Proteasome subunits” contains parts of the cellular trash can, the proteasome, and individual peptidases which are things that cut peptide bonds in proteins. This group also includes PurO which is in stage 4 of purine biosynthesis. There are some antibiotic biosynthesis enzymes for penicillin and carbapenam, and things that break C-N bonds other than peptide bonds. I’ll probably ignore antibiotics going forward, there are lots of them and they are unlikely to be usefully origins related.

Glycosyl-asparaginase removes aspartate from glucose-asparagine leaving glucose-amine (glycoprotein degradation). Gamma-glutamyl-transpepidases move glutamate to things and from things.

Inside of the GATase2 group is Asparagine synthase makes that amino acid from aspartate (via glutamate), and the protein that makes glutamate from 2-oxogluterate. There is a protein that makes glucosamine from glutamate and glucose-6P.

That’s it for PurF for now.

Step 2: PurD

I first introduced this protein with the big drawing.

This protein has 2 steps in its reaction. First the phosphate dispenser part releases a phosphate from ATP which binds the carboxy on glycine making glycine-P. This is reacted with PRA and the phosphate is swapped for the nitrogen on the ribose like in PurF, making glycinamide-ribonucleotide (GAR). As I mentioned previously this is the closest molecule to both nucleotides and polypeptides as it has peptide bond features.

This protein contains 3 subdomains in order:

  • An A/B3LS/Rossman-related preATPgrasp domain (thought to be substrate specificity)
  • An Alpha/Beta Complex Topology (A/BC.T.) ATPgrasp domain (the phosphate dispenser domain)
  • An A/BC.T./Alpha Beta hammerhead barrel hybrid sandwich domain related to the carbon monoxide dehydrogenase molybdenum protein (CODH MO-N-LIKE) and biotin carboxylases

There are patterns to the phosphate dispenser with and without the substrate specificity part, and patterns to CODH MO-N with the rest and by itself.

First of all there are 3 more ATPgrasp proteins in purine biosynthesis, and the ATPgrasp domain by itself is related to purine biosynthesis protein PurC, SAICAR synthase, and a bunch of protein kinases (put phosphate on serin, threonine or tyrosine). Both of those are sibling homology groups with a bunch of protein kinases, things that put phosphate on serine, threonine, or tyrosine in a polypeptide chain, usually to change the function of a protein somehow.

When all 3 subdomains are together you have PurD, PurT, PurK (Pur=purine biosynthesis), L-amino acid ligase (attaches an L-amino acid to things), and a set of things including biotin, acetyl-CoA, and pyruvate carboxylases.

PurP only has the preATPgrasp and ATPgrasp domains. Also pyrimidine pathway CarB.

When the phosphate dispenser is alone, no preATPgrasp, you have a collection of interesting things. RNA and DNA Ligase (protein that attaches things). Ligases for glutamate (GshA, from the ancestral fragment post) or tyrosine and tubulin. Citrate lyase, succinyl-CoA synthase, both related to the TCA cycle. PEP (phosphoenolpyruvate) synthetase, a protein involved in the first step of gluconeogenesis (making glucose instead of breaking it down for energy) in some prokaryotes.

When the CODH MO-N-LIKE bit is by itself you have purine breakdown protein xanthine dehydrogenase chain B, which adds an oxygen. There’s the CODH MO-N itself. There is aldehyde oxioreductase.

This CODH MO-N-LIKE fragment gets more interesting when you zoom out one level. This is the longest part of this post.

First we have MoaE, molybdopterin synthase. That’s on the drawing! And there’s another on the drawing with the PRTase that attaches ribose-5P to the ring in making niacin. There are a couple of proteins from the large ribosomal subunit too, L10 and L27. The L27 topology bin also has parts of the “exosome complex” which degrades RNA, another trash can.

The “duplicated hybrid motif” bin contains 3 interesting things. The first is part of a system that uses the phosphate from PEP and passes it along several other proteins to a membrane protein (the PTSD E2A fragment here) that phosphorylates a cargo, usually a sugar, during import. The second is a family of peptidases including one that cuts between glycines (LytM). The third is the RNA polymerase beta prime subunit.

Finally there is the “single hybrid motif” which has a lot of things. First there is the RNA polymerase beta subunit (as opposed to beta prime). Second the protein domain that binds both cofactors lipoate and biotin are related to each other and are in here. Those are newer cofactors so I haven’t done much with them. Third is the glycine cleavage system H protein, this is very significant as this is where cells get glycine and serine through interconversion. Fourth AstE_AspA are Succinylglutamate desuccinylase/aspartoacylase. The first is an arginine degradation protein, and the second is too but seems limited to eukaryotes (or isn’t studied in prokaryotes due to the human disease emphasis).

New paragraph due to theme difference. The fifth thing in the “single hybrid motif” is the V-type ATP synthase alpha chain. The sixth is RnfC, a protein involved in anaerobic electron transport with niacin and pumping protons or sodium ions. Relatedly the seventh is cytochrome F, something that passes electrons in electron transport chains. NqrA is similarity involved niacin and electron/sodium transport.

I’ll finish up this with the eighth and ninth, NusG and Rrp15p. They are both involved with ribosomes. NusG is a major transcription (copying DNA) termination factor, and Rrp15p is required for cells to make the large ribosomal subunit.

Step 3: PurT and PurU or PurN

In this step a formate is removed from folate and attached to the nitrogen at the end of the glycine that was just attached, to make formyl-glycin-amide-ribonucleotide, FGAR.

This is something I will have to fix on the big drawing but life has more than one protein for the third step of purine biosynthesis. I have PurT and PurU, but not PurN. We’ve technically seen the relations of step 3 PurT since it is also an ATPgrasp protein (all 3 domains).

PurU is the protein that removes the formate and passes it to PurT where the phosphate from ATP is used to make a reactive formyl-P that similarity to the previous steps is used to attach a carbon to a nitrogen.

This is where things get easier for PurN because PurU is related to PurN so I can cover them together, though PurU has more pieces, suggesting the most important part is the one they share.

PurU has 2 domains:

  • An Alpha/Beta 3 layer sandwich/formyltransferasesdomain (A/B3LS).
  • An Alpha +Beta 2 layer(A+B2L)/Alpha-Beta Plaits domain.

The first domain that is related to PurN in its entirety. The homology and topology bins are formyltransferases. The most interesting ones are methionyl-tRNA Fmet formyl transferase (adds formate to the initiator methionine in translation) and 10-formyltetrahydrofolate dehydrogenase (removes formate from folate as CO2 while adding a proton to niacin).

The second domain in PurU is related to a lot of things but I’m thinking I can ignore those given PurN.

That’s it for now. It would have easily been over twice as long if I included PurL and PurM. The Alpha+Beta 2 Layers category has an “Alpha-Beta Plaits” with lots of things. It’ll be practice at sorting out general biochemical themes.

LGBTQ+ People Are Not Going Back.

I’m still generally fine with the sentiments in Julia Serano’s post.

1) I will not tolerate any backpedaling on LGBTQ+ rights whatsoever, and

2) If my representatives fail to strongly stand up against these attacks on LGBTQ+ rights, then I will take my vote elsewhere next election.

Words aren’t enough. Actions. Please contact your representatives to tell them the same.

Planned politics is good.

I’m passing this along.

“I propose that on Tuesday, December 3rd, 2024 (the first day that both the House and Senate are back in session), all of us who are invested in this issue and have a platform (whether it be a blog, newsletter, column, podcast, YouTube, TikTok, Instagram, etc.) publish a piece with the shared title: “LGBTQ+ People Are Not Going Back.” Yes, I know, it’s a cheesy title, but it holds Democrats accountable to their own talking points and makes it clear that backsliding on LGBTQ+ rights is nonnegotiable for us.”

Planned Action for LGBTQ+ & Allies in Response to Democrats Capitulating on Trans Rights by Julia Serano

Abortion

I try to keep my abortion position as independent of law as possible and the most important part of this post concerns this. That being said I’ll include legal and religious angles at the end that I think are useful. But these should not be used to debate, too often forced-birthers steal the political atmosphere with debate. Delight in giving orders, and stating things as the way things are. Learn to enjoy their opportunity to figure out their negative feelings about your actions.

What freedom looks like.
My position can be summarized as there is no right to force someone to give birth or stop someone from offering the services. Without forced-birthers people would just get abortions. That’s the freedom and liberty part.
As a group forced-birthers get by because they are a group, a mob, people who can force others to give birth themselves or through government. Relatedly any forced-birther is a stand-in for the people they put into power when it comes to shaming and criticism.

That’s it. They assert the power to use force on others with no good reason or rights to do so. If one were to want a legal position I’ve been confused as to why the 9th amendment has not been used as a reason, “The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.” Including the right to go get or offer an abortion which harms the rights of no one else. They have to go to “spectral evidence” and speak for fetuses eventually, and I don’t give them that power to make up words on behalf of the unborn without minds or words.

I don’t even believe the christians among forced-birthers have their own theology right. When I was raised in that environment a big value was placed on sin and choice, they have to use force. They don’t even believe what they say. And it’s a fine for killing the unborn of another in their book. They have manipulation of language only.

Finally I’ve a way of dealing with their “baby talk”. It’s untested but rhetorically in the moment give them “baby” or “child” mockingly, but make the conceptus “baby 0.01”, the embryo “baby 0.5”. Be clear that you aren’t serious and that they simply emotionally require and depend on recalling feelings AFTER birth and can’t deal with reality. You can bet they’d destroy embryology to get their way if they had to.

The chirality issue

Note: the second figure has been replaced with the correct one, twice. A xylulose bond was in the wrong direction. This is hard.

There are a couple of other origin of life issues that I haven’t focused on yet. One that I was hoping would reveal itself is the issue of “chirality”, or the side of a molecule a bond or atom is on relative to the rest of the molecule in 3D. The strict definition includes things with identical chemical formulas having non-superimposable structures.
I don’t tend to draw in 3D to simplify things. I also don’t include hydrogen to simplify things.

I meant to address this in a previous post where I posted this figure of thiamine and parts of the pentose phosphate pathway. It’s why there is a single hydroxal on xylulose-5P drawn in 3D as if it’s coming at you out of the page. I’ll get to that at the end.

I’ve drawn ribose and xylulose in 3D. Both looking down the length of the molecule and from the side. Carbon actually binds 4 things and is tetrahedral like a pyrimid. So I put the hydrogens back into the upper structures and used conventional dashed and filled wedges to represent bonds going into and out of the plane of the page respectively.

Each of those hydroxals could be where the hydrogen is on the carbon instead. If you flipped the hydrogen and hydroxyl on ribose you can see how the formula would be the same but the structures would not be superimposeable. Things on different sides are designated with R/S or D/L distinctions that refer to single atoms on the structure. D-ribose refers to the stereochemistry around the atom farthest from the aldehyde group in the form of ribose used by life.

Amino acids are all L- relative to the side chains aside from glycine which has no atoms with stereochemistry.

So why is life one way and not the other when it comes to these bond directions? I don’t know but I noticed that the only sugar with a bond in the other conformation is xylulose-5P so I drew that one bond with 3D on the thiamine figure. The rest are in the other confirmation. And xylulose is the donor for the pentose phosphate pathway 2C molecules. This isn’t a solution to the chirality problem but it is a clue.

Ancestral fragment?

This post is going to look at something I found from the results of 2 studies looking at the ribosome, the molecular machine that translates the genetic code into a string of amino acids that fold into the 20,000+ proteins that build and make us.

The first paper Evolution of the ribosome at atomic resolution peels the large subunit ribosome apart stem-loop by stem loop. They have a second paper with the small subunit which I took some figures from. A stem loop is a loop shaped structure where you have a base paired nucleotide “stem” and above that the rest of the structure forms a “loop”, often with more stem parts. These are how ribosomes tend to evolve. Sequence changes have more of a negative effect than insertion of something that adds new functions.

In the top portion of the figure from their second paper they compare a similar region of ribosomal RNA (rRNA) from different species where you can see how new loops are added.

They went through the ribosome and figured out what stem loops were added where and peeled additions away down to a first rRNA “hairpin”, named “AES1” for “ancestral expansion segment 1” in capital letters for the large subunit and lower case letters for the small subunit. I’m interested in the large subunit because it older. More recent segments are AES2…AES3…

This second figure shows the base pairing in the large subunit. The top is numbered by their expansion segments and the bottom is colored by 6 phases of evolution they identify. The part I’m interested in is #1 in the upper right by the “PTC” in the lower right.

AES1 is associated with the Peptidyl Transferase Center (PTC). The region that does the actual connection between one amino acid and another in creating a new protein.

The second paper The ribosome as a missing link in the evolution of life looked at the sequence of rRNA and did a search of other genes to look for similarities and found some very interesting things. I should emphasize that the rRNA is not transcribed into products as far as is known. But that may not have always been the case.

They find the transfer RNAs, tRNAs, that escort the amino acids to the ribosome. They find lots of parts of ribosome and RNA related proteins. They find ribosomal proteins, polymerases, and more.

From these data sets I identified the sequences associated with the oldest fragment of the large subunit, AES1, and went and looked at what was in it.

I found parts of 5 things:
1) GshA Glutamate–cysteine ligase
2) Protease 4
3) RecF a DNA replication and repair protein
4) PqqL a probable zinc protease
5) RplB/Ribosomal large subunit protein 2

Things that attach and cut amino acids, proteases. Interesting. A protein that binds single stranded DNA during repair and replication, RecF. And one of the first ribosomal proteins to bind the large subunit rRNA, one that accesses the PTC during translation, RplB. Very interesting.

Again but better.

I’m not really functional for human behavior related posting, but thinking about the molecules is a hobby and I’ve been able to improve on the drawing I did in my previous post. I actually might have enough origin of life material for 4 or 5 posts. This post will be limited to the overall patterns in the drawing and the most interesting thing I’ve found in the subdomains so far. All of these posts will be like collecting puzzle pieces to the origin of life.

After more time with my drawing I’ve decided that it’s better described as a metabolic alignment of everything related to DNA that I can fit into a canvas. The file is too large for me to post on WordPress so I linked a Google documents version again. To properly figure out biological origins the sequences of genes and proteins aren’t enough. The very metabolism of these things should be considered. I’ve been careful about the structure of the alignment. Even now I’m thinking about how it could be improved with more room or simpler representation. There’s only so much I can do to compact the information down.
It’s a complex set of interconnected notes useful to anyone interested in the origin of life. This one is even cleaned up and error checked.

Cells are relationships between nucleic acids and proteins. Ribosomes and a genome. They make everything between them with the help of some atoms other than CHONPS. That is complex, but not endless. Ribose, 5 bases (A G U C dT), and 20 amino acids.

There is a legend in the top left of the linked drawing. It shows that each molecule is positioned with the protein(s) that use it. So the molecule is the product of something tangential to it shown by arrows, and the MAIN product of the reaction carried out by the protein(s) is also tangential.
Positioned right to above the molecule are other reactants and catalysts, and positioned left to below are products of the reaction.
The name of the protein(s) is given with the length in amino acids. Below that is every identified subdomain in the protein in order, and the protein domain superfamilies they belong to according to the Evolutionary Classification Of protein Domains database (ECOD).
Occasionally the molecule is below a protein when it’s a product of something already present on the figure nearby, like 10- formyl tetrahydrofolate (thought to be a storage form of formyl groups).

There is also a list of molecules that make an appearance in the top right. From 1 to 5 carbons and larger molecules.

In addition to the Purines (A G) and Pyrimidines (U C dT) are:
The amino acids Histidine, Methionine, Phenylalanine, Tyrosine, and Tryptophan.
The cofactors Thiamine (vitamin B1), Flavin (vitamin B2), Coenzyme-A (vitamin B5), the vitamin B6 group (pyridoxal…), Niacin, Folate, S-Adenosyl-Methionine and Molybdenum Cofactor.
These are boxed in blue for the amino acid product or first usable form of cofactor. Molecules that follow this box are different forms of the cofactor used by proteins made from the basic form (folate, molybdenum cofactor, vitamin B6, flavin too but I haven’t included FAD yet).

A cofactor, or coenzyme is a molecule that participates in what the protein is doing. The protein is just the polypeptide part. You can think of them as tools that often have to be reloaded when they dispense things. ATP itself is a cofactor as “energy currency”. The amino acid glutamate is a cofactor for dispensing ammonia. Cofactors also act as platforms to build things (coenzyme-A), or move things to other things (Thiamine), or move electrons around with charge so chemistry can happen (vitamin B6).

Overall Patterns

The whole thing is roughly centered on the purine (A G) pathway going from left to right and bent downward halfway through. Ribose-5P sits in the upper left corner in a small red box. The purine pathway is boxed in red as well. The products are marked with a large black A and G.

The pyrimidines are lined up and go down where the purines bend down. This is deliberate because parts of both pathways involving the amino acid aspartate and bicarbonate (HCO3) are close to one another. Large black letters mark the products similarity to the purines. From there continuing to the right I have niacin, methionine (and SAM), and coenzyme-A allowing a similar grouping of aspartate in those pathways.

Thiamine is above the purines and I’ve lined its glycine up with the one in the purine pathway.

Under ribose-5P and traveling downward are the vitamin B6 group and “aromatics” which are tyrosine, phenylalanine, tryptophan, and part of folate called PABA. All from the same pathway. The B6 group uses ribose-5P and the aromatics use erythrose-4P so they made sense together and near ribose. 5C sugar, 4C sugar…

Histidine, folate, flavin, and molybdenum cofactor branch off from the ends of the purine pathway where A and G separate. They are made from A and G, where many of the previous molecules have A attached as part of the structure.
This whole section has the feelings of “partly deconstructed purines” to it, and deconstructed ribose in the case of tryptophan.

On the whole there is a feeling of there being an aspartate accumulation after early purine evolution (if the pathways can be read like that). Maybe aspartate was the first nitrogen delivery molecule. That it’s a part of the purine pathway, used for so many cofactors and amino acids, and delivers purine nitrogen is significant.

Purines and Pyrimidines, General Patterns.

To simplify discussion of some of this I drew a separate purine and pyrimidine pathway that just focuses of the overall patterns.

A simple alignment of the purine (bottom 3/4) and pyrimidine (top 1/4) pathways with gained and lost molecules included.

The purine pathway that makes Adenosine monophosphate and Guanosine monophosphate is unique in that it is constructed from smaller parts relative to all of the other pathways presented, outside of the ribose-5P. The biggest thing is glycine, the smallest amino acid, in the formation of the smaller 5-membered ring with ammonia and formate.

Aspartate, the amino acid used in making pyrimidines, temporarily joins in making the second ring. It attaches by it’s nitrogen and leaves as fumarate (tricarboxylic acid cycle). Bicarbonate (or carbon dioxide in later organisms) is the only other thing in the pathway left to mention. And the way it’s attached to the former glycine makes a 3-carbon segment. It’s a 2C-3C transition. Maybe it means something. Carbon fixation would be different things at different metabolic levels.

The pyrimidines have the opposite relationship to ribose-P as the purines. The ring is made first and then it is mounted on the ribose via PRPP, the form of ribose used to make nucleotides (and niacin, tryptophan, folate…). A bicarbonate and aspartate are combined and made into a 6-membered ring like the second ring of the purines. Getting aspartate and bicarbonate together in the figure makes sense when it comes to positioning the pathways together.

Both pathways make an intermediate that isn’t used for anything else. Inosine monophosphate and orotidine monophosphate. After that it’s smaller changes to get each of the final nucleotides.

The pyrimidines also lose parts of structure. Niacin removes 2 hydrogens and their electrons making a double bond. After orotate binds ribose it loses a carbon dioxide (decarboxylation). And the only place DeoxyriboNucleic Acid shows up is deoxythimidine, a loss of a hydroxal. That’s done with purines too but I didn’t need to show deoxyribonucleic acid until dT.

Maybe there’s a progression here with 5 ring ancestors and a 6 ring newcomer. Early base pairing between them. I certainly think accumulation of aspartate is a theme in both the evolution of pyrimidines and much more. In fact I think modern genomes and ribosomes are descended from something like the 5-membered thiazole in thiamine on a ribose maybe. With chemistry and base pairing between strands.

The purine pathway splits to thiamine biosynthesis at AIR, but the molecule is turned into the 6-membered ring of HMP. Not the 5-membered ring that moves carbon in making ribose (pentose phosphate pathway), the part that assists in the chemistry (more on that at the end)

Purines

I think there is a level where one can tentatively delete whole parts of proteins when thinking about precellular life in the vent. Formate, methane, bicarbonate, acetate, ammonia, phosphate were just tentatively available. There is sufficient literature support for these molecules in the vent from my point of view. As well as acetate, pyruvate, and fatty acids that make membranes.

Extreme accumulation of ammonia on electroreduced mackinawite: An abiotic ammonia storage mechanism in early ocean hydrothermal systems

Generation of long-chain fatty acids by hydrogen-driven bicarbonate reduction in ancient alkaline hydrothermal vents

Bio-inspired CO2 conversion by iron sulfide catalysts under sustainable conditions

Phosphate availability and implications for life on ocean worlds

I’ll only mention the most interesting thing I have found in the protein subdomains in this post.

Focusing on the purine pathway there is another maybe link to the origin of life. 3 polymers are important in life and origins: polynucleotides (DNA RNA), polypeptides (proteins), and fatty acids (~16-20 carbons, membranes and cofactors like biotin). Not only are nucleotides being made but step 2 of the purine pathway has a product, phospho-ribosyl-glycin-amide, that looks like a peptide (protein) bond on ribose between the glycine and the ribose-amide.

Top: purine intermediate PRA and an aspartate-glycine dipeptide. Middle-Bottom: the reaction PurD carries out and the glycine phosphate intermediate.

Maybe it doesn’t mean anything but get enough interesting things together and there is more room for thought.

The protein that does this step, PurD, is an example of something that I can cut a lot of things off of, just for fun. PurD is a member of the ATPgrasp family of proteins. These proteins bind an ATP, and pop a single phosphate off making an ADP.

When a protein uses ATP as “energy currency” in this way it either participates in a chemical reaction directly or it drives physical movement of a protein to do something else including chemistry. The metaphor is about doing work. Here the phosphate directly participates in attaching glycine to the nitrogen added in step 1 with a glycine-P intermediate.
The ATP-Grasp Enzymes

Structural biology of the purine biosynthetic pathway

Here is where the chopping out can happen. In the vent there is phosphate. The ATPgrasp family is essentially a bunch of phosphate dispensers and just for fun can be deleted in terms of what is more or less interesting here. That being said phosphate control seems to have evolved first/early
Structural Phylogenomics Reveals Gradual Evolutionary Replacement of Abiotic Chemistries by Protein Enzymes in Purine Metabolism

And the ATPgrasp family is related to the protein that puts aspartate on CAIR in the  purine pathway which is interesting.

That’s most of the protein deleted, except for the C2 domain. It’s an interesting little piece and it’s relatives are involved in some very origins related chemistry. At the Evolutionary Classification of Protein Domains database this domain is in the “Alpha helix + beta sheet complex topology” bin. And in there it’s in the “alpha/beta-Hammerhead Barrel-sandwich hybrid” X, H and T (homology, topology) groups.

That whole T group is filled with interesting stuff but the PurD C-terminal fragment is in the “CO dehydrogenase molybdoprotein N-domain-like” F or family group. It’s relatives include not only the above CODH MO-N fragment (1C+1C) but is also in biotin carboxylase (1C manipulation), acetyl-CoA carboxylase (1C+2C, on the drawing), and pyruvate carboxylase (1C+3C).

No where else do I see a numerical progression like that. Where did aspartate come from for pyrimidines? Today cells can use pyruvate carboxylase to make oxaloacetate in initiating gluconeogenesis (glucose synthesis), which is also made into aspartate. Interesting.

This “CODH MO-N fragment” is also in 3 other purine biosynthesis genes, PurT, PurK, and PurP. They involve formate, and bicarbonate. Carboxylations use bicarbonate. The fragment is also in L Amino acid Ligase (LAL). And it is present in a few proteins without the ATPgrasp part: xanthine dehydrogenase chain B, Carbon monoxide dehydrogenase large chain (so  2 parts of CODH), aldehyde oxioreductase, quinoline 2-oxidoreductase large subunit, and 4-hydroxybenzoyl-coa reductase large subunit. I don’t know what to think about the things in this paragraph yet but I mention them for completeness.

The last thought on this fragment is that the general shape of the carboxy HO-C=O, or even the carbonyl C=O itself is maybe what it is about. I need to look at protein structures.

Ribose

I’ll finish with some thoughts on ribose. I’ve been hoping the origin of ribose comes out of this. There are different variables to consider in the abstract. The polynucleotides are the most complex part of this. Only counting genomes ribose:
1) Has 5 carbons with hydroxals on 4 and an aldehyde on one end. Each oriented the same (chirality).
2) Polymerizes using phosphate on the carbon on the end opposite the aldehyde (#5) and 2 carbons over (#3). So a 3-carbon segment for the backbone.
3) Provides a site for nitrogenous base binding at the aldehyde carbon (#1). This also provides a site for base pairing between strands.

Each of those variables is modifiable hypothetically.

Maybe there were things with erythrose-4P or shorter in the backbone of the polymer. This still allows for base pairing. With glyceraldehyde-3P maybe you still get the same but no ring.

But it’s the predictability of the ring that evolution can respond to. There is a huge diversity in things that bind ribose or things with it. Erythrose is only used in a few other places outside of the pentose phosphate pathway and the pathway that makes part of folate, tryptophan, phenylalanine, and tyrosine. And a 2-carbon segment in the backbone might be too close for 2 phosphates, steric hindrance (physically interfering).

If you get larger than ribose then you worsen the problem of self splicing, poly-RNAs of a certain size looping out and removing part of themselves and sealing the break. Removal of the #2 carbon reactive hydroxal that is removed in Deoxy-ribonucleotide genomes is bad enough, 2 would allow for more chemical possibilities than the relative stability of 1.

Getting outside of genomes carbons #2 and #3 bind amino acids in translation with class 1 and 2 aminoacyltransferases. Maybe there’s evidence of an erythrose backbone and only carbon #2 binding amino acids in the past there, but probably not.
Carbon #3 also binds phosphate in coenzyme-A and niacin in its anabolic mode (the NAD/NADP difference).

Carbon #1 binds lots of cofactors, mostly through at least 1 phosphate as the drawing shows. It also binds lots of other things for transport like sugars, membrane components, and more. Carbon #1 also binds all amino acids during the process of aminoacyl transfer in creating transfer RNA loaded with an amino acid, again on one phosphate, aminoacyl-AMP.

These carbons are also used in making signalling versions of nucleotides like cyclic AMP and cyclic GMP. The signalling functions of nucleotides is an area I’ve yet to get into in depth and integrate into this.

Finally the metabolism of ribose on the figure may provide clues to its origin. In the cofactor pathways ribose is often partially dismantled with pieces leaving, such that 4 or fewer carbons from ribose remain in the final molecule. At the least it makes me think about the assembly of ribose and its own abiotic concentration.

3C+2C?
4C+1C?
6C-1C?
3C+3C with loss of OCO?

I think the first and maybe last follows the most likely routes. 3 comes after 2 and if 2 is already around… It’s maybe thiamine related.

Common ancestry between the purine and thiamine pathways is possible and likely given the use of AIR in making HMP. Attaching glyceraldehyde and glycoaldehyde could be a route. Now how would acetate and pyruvate become sugars? Something between oxaloacetate, gluconeogenesis and the pentose phosphate pathway maybe.

Nested sets of chickens and eggs.

Update

I’m much better in terms of hypervigilance. But it’s still useful to keep my number of social spaces to a minimum while I think about social variables around me. “Social firmware update”? It makes sense that groups would complicate that. “Gender-Null” still makes sense. It’s easy to just be “like everyone” when the group is more sensitive to the words than me. I make choices with the words. Relative to tourette syndrome maybe it’s an “intense words” thing. I make choices with lots of words that other people feel intensely about. This is not to say that tourette syndrome and non-binary or gender-null are connected, rather these parts might have connections for me.

I can’t say this is me going back to blogging on a regular basis either. This is more like I have a creative project connected to a lot of feelings about what I wish I was doing professionally, origin of life research. Obsessing over brain science is more of a defence mechanism, this is an older obsession and it’s not only interesting but I may get some benefits out of it. Lots of feelings here.

I have spent a lot of time drawing metabolic pathways until I can do it from memory as a way of internalizing them. More body parts and sensations involved enhances memory. Drawing anatomy is helpful too. And I’ve had a lot of time to think about the patterns. I’ve come back to one project over and over and I recently got a tablet with a good screen for drawing and transferred the latest version of a group of pathways to Sketchbook. I’ve also started using Kingdraw for molecular structures.

BUT, I can’t just post this drawing by itself. If you have a strong background in molecular biology you can skip to the end (check out the transcription and translation diagram though, I integrated aaRS class information in with the genetic code) but for everyone else I need some figures to make this somewhat more meaningful. An attempt to go from public perception to hardcore molecular biology.

This is for fun. I’m playing with the ideas and concepts. So there is a lot of me imagining things in here but I try to be clear about what is data and what is me speculating. Other views are welcome, including data that suggests I’m wrong, or people who have made the same observations that I missed. This is a work in progress, but keep in mind that this is about fun.

From public perception to molecular biology.

The perception about A G U C T nucleic acids are usually understood by their relationship with amino acids through the genetic code and the making of “proteins”. Equating the metaphorical “hard drive” with its products hides the full range of relationships that must be taken into account, even pinning things relative to DNA sequence or protein sequence.

DNA “base pairing” on the left, RNA on the right.

The letters pop up all over culture, but usually not the “U”. Here’s what that looks like in detail, the drawing includes the pathways that make these molecules. And metabolically related molecules.

A DNA/RNA hybrid example. The first 4 pairs from the top down are DNA, the bottom 1 is RNA. Arrows point to the hydroxals (-OH) removed in De-oxy-Ribo-Nucleotides. The green dashes between bases are base pairing hydrogen bonds.

Briefly, the Deoxy-Ribo-Nucleic acids are “duplex chains” of -ribose-phosphate-ribose-phosphate- with Adenine, Guanine (purines), Cytosine, or Thymidine (pyrimidines) attached to the Ribose by the 1′ Carbon (read as “one prime carbon” ). The structure of the “nitrogenous bases” is such that attractions (hydrogen bonds) between molecules allow A:T and G:C pairing between the strands.

Triplets in those chains (with 3 reading frames on each strand) are a basis of the “genetic code” used to synthesize the proteins in a cell. A “protein” is a chain of amino acids.

Below is the simplest amino acid, Glycine, and water in even more detail so I can show some more molecular features that can be referenced in the future. The previous figure omitted them and following ones will as well to simplify things.

The previous base-pairing diagram omitted parts of molecules worth mentioning that aren’t often represented.

  • Ionic charges. Many of these molecules are actually ions in water, amino acids lose a proton (hydrogen nuclei) on one end, and gain one on the other (center versus right). This comes down to some atoms having more “electronegativity” than others (nuclei attract electrons more strongly) Most of the time I don’t do this to save effort.
  • “lone pairs”. Oxygen and nitrogen atoms have electrons in “non-bonding orbitals” (the places electrons go). These regions are more negative while not being full ionic charges, called “polar” . That’s why nitrogen atoms can either attract or repel one another in the base pairing. If a hydrogen is sticking out from one base and a lone pair from another they attract.
  • Water also has lone pairs, it’s actually a pyramid shaped, highly polar molecule. There is no “hydrophobic”, water is attracted to itself and crowds non-polar molecules away as it is attracted to itself. I also mostly omit lone pairs to save effort.

Now for all of the amino acids your body, and all cells, use to make proteins.

Cells use these 20 amino acids to chain together “poly-peptides” (the bonds are called peptide bonds) that do biochemistry or make structures. Essentially they are glycine + something at a specific carbon. (There’s a layer called chirality I’m not getting into yet, the hydrogens on that carbon aren’t the same chemically and biology uses one here).

This is my own take on how to organize them. There are 2 organizing principles; metabolic relationships, and atoms in the side chain.

The untitled block on the left roughly acts as precursors for ones to the right. There’s a system that can generate serine from glycine and back as the cell needs. Serine itself can be used to make cysteine. Larger amino acids can be broken down into smaller ones too.

The “aromatic amino acids” (Phe, Tyr, Trp, named for the benzine ring) are complicated because they are made from 3 carbon and 4 carbon molecules. So I put them on a different level.

Tyrosine and histidine are close because they both use ribose in their biosynthesis.

Making proteins from DNA.

This is a simple protein made with the previous “proteogenic amino acids” (cells use these). The next figure will explain how, and why methionine is first.

This is how cells make proteins with DNA sequences.

This figure starts on the top left with the DNA genome, so named because it contains the “genes” expressed into proteins (now more complicated due to RNA genes). Some cells have “linear” chromosomes of 100’s of millions of base pairs (genome segments), and bacteria tend to have small genomes of a few million.

The DNA is separated into it’s 2 strands and one strand acts as a template for copying a part or all of the genome. That copy of a part is made from RNA in a process called Transcription. The copy is called an “mRNA” or “messenger RNA”. Many proteins are involved.

The mRNA is escorted to the smaller half of a massive 2 subunit complex called the Ribosome where the nucleic acid sequence is transformed into a poly-peptide sequence. The small subunit (SSU) contains an RNA called rRNA or “ribosomal RNA” and 20+ proteins in bacteria.

After the mRNA docks with the SSU a large subunit (LSU) made from 2 rRNAs and 30+ proteins in bacteria docks with the small subunit.

Small (50-70 nucleotides, nts) RNAs called transfer RNAs (tRNA) bring in amino acids one at a time and a 3 nucleotide sequence in the mRNA called a “codon” determines what amino acid is placed through binding to an “anticodon” on the tRNA. (In the bottom-left I show an intermediate in the aminoacylation of tRNA, an aminoacyl-AMP intermediate.)

Both protein and RNA are involved in aminoacylation.

A 3 codon system gives 64 possibilities and the way those are arranged is the generic code. 3 of these are reserved for STOP: UAA, UAG, and UGA. Methionine doubles as START with AUG.

There are 2 classes of tRNA and systems of attachment of amino acids to tRNA. The aminoacyltRNA transferase systems. Class 1 involves attachment of amino acids to the 2′ hydroxal of ribose, class 2 involves attachment of amino acids to the 3′ hydroxal of ribose. On the figure I have listed the classes and subclasses under the relevant hydroxals and listed their codons next to them.

The ribosome subunits hold the mRNA, and the tRNAs in place and one amino acid is added at a time while empty tRNAs leave. Eventually a STOP is reached and a new poly-peptide/protein is released from the exit tunnel of the large subunit.

These proteins are then often modified in many ways and transported to many places to do their structural or catalytic jobs.

The genetic code relationship plus ribose metabolism.

That is not ONLY how I see these things. I also see the individual biochemical reactions that cells use to make nucleic acids. Starting with RNA. (Anabolism=construction, catabolism=breaking down and I will need to look at catabolism at some point).

DNA as a physical substrate for molecular biology, our “hard drive” is based in RNA metabolism. And so the metabolism of RNA, not deoxy-ribose is context to understand these relationships. (Deoxy-nucleotides are definitely part of the story though.)

Ribose centered metabolism.

PRPP is circled and pathways are aligned relative to PRPP. The Purine intermediate AIR is in a diamond. The products of each pathway are in a box.

Note that this is at lower resolution but still ok for display. A full resolution version can be found HERE.

This is a Ribose centered metabolic diagram. A tool for thinking about how metabolism can tell us where nucleic acids and proteins came from. They have metabolic relationships beyond the DNA>RNA:>Protein one. These are parts of that relationship.

All of the hydrogens are implied to save space and work. Each protein is named with type instead of written so they stand out. Each protein is positioned with the molecules they interact with, and are products of the previous protein(s).

This is a prokaryote centered metabolic diagram. It’s been difficult to get interested in us bacterial-archal fusions (eukaryotes) here. The blending complicates the picture.

Beneath each protein is the length in amino acids, usually from E.coli strain K12 unless E.coli was the exception relative to cells in general. I’ll get around to citing everything eventually but I can get specific things if someone wants something. I’ll put some sources of information at the end, and here and there in the text.

Along with the length I list each known subdomain within the protein and the protein domain families and superfamilies to which the domains belong. These parts of proteins are related to one another and their relationships are part of the puzzle. I go over an example with PurF in the resources section. These bits of related protein often have themes to what they do, and can potentially lead to primordial subdomains.

Black arrows point in the direction of biosynthesis.

This is a very unfinished diagram. Once I did The pathways that produce the bases in nucleic acids I kept adding Ribose related pathways and I’m not done. This is maybe the 20th iteration and I keep adding elements and reorganizing.

My intention is to upload and comment on new versions, as well as adding new categories of information as big as protein subdomains (below). There are likely errors here (PurF is a GATase class 2!!! I keep putting class 1, I used PutF as an example in the references with correct info and this will be my first correction,) and after and once this is up I’ll do another error sweep before doing more. There is fading and smudging from copying and pasting. And I’m still making decisions on how I’m going to represent things so there may be inconsistent bits.

The following pathways use ribose in the form of Phospho-Ribosyl-PyroPhosphate, PRPP.

  • The Purines: produced as Adenosine-MonoPhosphate (AMP), and Guanosine-MonoPhosphate (GMP). (Starts center left with PurA, runs right and down, and forks left into AMP and GMP)
  • The Pyrimidines: produced as Uridine-MonoPhosphate (UMP), Cyotosine-MonoPhosphate (CMP), and deoxy-Thymidine-MonoPhosphate (dTMP) (Starts at the bottom-left, continues to the middle, and forks into CMP and dTMP after UMP)
  • Amino acids: Tyrosine (starts top-left, goes down, turns right to the center, forks right into Trp, and forks up into Phe and Tyr), and Histidine (starts top-left, moves right)
  • Cofactors: Thiamine (Vitamin B1, starts upper right under histidine, pathway moves right), Nicotinamide-Adenine-Dinucleotide-Phosphate (niacin, starts lower-middle-left and moves right), Cobalamin (Vitamin B12, starts to the right of NADP, this only covers the part of B12 derived from AIR),and Tetra-Hydro-Folate (to be added).

If someone is wondering why I haven’t added the cofactor that donates the formate for Purine biosynthesis to the diagram, that is an example of going too far with treating things like dispensers of things available in earlier environments. Some organisms use formyl-phosphate. But it still needs adding. Ultimately I’m betting THF, and things like glutamine (the NH3 donor, mostly) are providing things that were environmentally available. THF is a remodeled GMP (like how histidine is a remodeled AMP?) A GMP with an intermediate from the Phe/Tyr/Trp pathway (PABA) added.

Purines and Pyrimidines

The Purine pathway produces A and G. The Pyrimidine pathway produces U, C, and T. These 2 pathways interact with Ribose in opposite fashion. A and G are built onto Ribose bit by bit out of small things and rings are closed. Pyrimidines are made by building a ring and putting them on Ribose. PRPP is also used with whole nitrogenous bases that are already finished.

It might be useful to look at the pathway like this.

Purines: PRPP + NH3 + Glycine + CH2O + NH3 + CHO3 + Aspartate – Fumerate + CH2O = IMP (pathway fork).

That aspartate-fumerate is effectively a 2-step, or indirect amination (ammonia addition) that releases a TCA cycle intermediate.

IMP + Aspartate – Fumerate = AMP

And there that weird amination again. It happens in arginine production too.

I have ideas of different places in time where aspartate and AMP accumulation occured. Maybe glycine accumulation earlier too. How to define the “start” of things?

IMP + H2O + NH3 = GMP.

Pyrimidines: CHO3 + Aspartate – H2 + PRPP – CO2 = UMP

Maybe there was aspartate accumulation with more than one use. A transition from a 5 to a 6 ring polymer (Purine intermediate AIR) leading to Pyrimidines and other things? A step is shared with arginine biosynthesis (CHO3, bicarbonate use through an intermediate called carbamoyl-phosphate).

UMP + NH3 = CMP

UMP + PO4 – O + PO4 – 2 PO4 + CH4 = dTMP

Thymidine stands apart from the other nucleotides in that it is made from Uridine-DiPhosphate (UDP) that is dehydroxylated before being methylated. Since I’m a bit RNA focused I haven’t pursued Thymidine much.

Amino Acids: Histidine and Tryptophan

In histidine biosynthesis a ribose from PRPP is connected to a nitrogen on ATP. The same nitrogen provided by aspartate. Then the ribose leaves with a C-N, and the rest of the ATP is a late intermediate of Purine biosynthesis. Adding an NH3 and some processing gives histidine.

A molecule that is a 4-carbon clone of Ribose-5-phosphate, Erythrose-4-phosphate, is combined with a 3-carbon glycolysis intermediate, Phospho-Enol-Pyrivate, to make a 7-carbon molecule that is processed into a ring before being attached to ribose via an added nitrogen. After that the resulting molecule loses carbons 1-3 of ribose as the glycolysis intermediate Glyceraldehyde-3-Phosphate, a molecule like a 3-carbon Erythrose-3-phosphate (or ribose-5-phosphate) (histidine takes the ones ribose loses here…). That G3P is replaced with a serine to make tryptophan.

Cofactors: Thiamine, NAD, and THF

A cofactor is like an “attached functional module” for proteins. Like different heads on a power tool.

There’s Thiamin (VB1), a carbon group extractor/dispenser. Interestingly AIR is made into the part of Thiamine that isn’t grabbing carbon chains. But it’s rearranged into HMP in an interesting set of reactions where the ribose is dismantled and parts are attached to the carbon-nitrogen ring to make a 6-membered ring. And the thiazole-P that HMPP is combined with looks a lot like the ring on AIR but with sulfur instead of nitrogen.

Thiamine is part of many complexes and it acts to move carbon chains from one molecule to another. In the Pentose-Phosphate-Pathway Thiamine is one way ribose-5-P is made by moving 2-carbon units between other carbohydrates (carbon with water, hydrocarbons are carbon with hydrogen). Maybe we’re descended from the thiazole?

And while it’s fuzzy on the diagram ThiO/ThiH are two routes to make what looks like glycine, but there’s a double bond on the nitrogen (an imine as opposed to an amine). Like the double bond on FGAM, the molecule right before AIR (and see NADP). There’s some interesting things with glycine similar molecules here and there.

Then there’s NAD(P) (the P is a phosphate that changes Nicotine-Adenine-Dinucleotide’s metabolic role, the 2′ hydroxal, interesting…), a proton extractor/dispenser. It’s effectively the same for electrons. Aspartate is involved again and now it gets an imine before getting combined with DiHidroxyAcetonePhosphate, another glycolysis intermediate before ring synthesis and attachment to ribose.

Then there’s cobalamin (VB12), this molecular monstrosity is a rearranger (isomerase), methyl extractor/dispenser, and halogen extractor. I need to add it but the initial rearrangement of AIR opens ribose like with histidine and Tryptophan, and the 1′ carbon is lost.

Tetra-Hydro-Folate, a methyl (-CH3), methylene (-CH2-), methenyl (-CH=), formyl (-CH=O), or formimino (-CH2=N) extractor/dispenser. This molecule is part of many 1 carbon processes. It dispenses the formate in purine biosynthesis. It’s involved with the transformation of serine to glycine and back. And it’s made from GTP. I’ll add it but the 5-membered ring of GTP is opened first between the nitrogens and the carbon between is lost as formate before ribose opens and all of the ribose is used in following molecules.

Lastly I added a part of Sulfur metabolism, the Iron-Sulfur cluster part that provides the sulfur in the Non-PRPP part of Thiamine biosynthesis in the form of carrier proteins being the sulfur where it is needed. Sulfur chemistry is a big part of this, and yet it’s not very directly involved in nucleic acid biosynthesis. It is critical in cofactors used to move carbon around (Thiamine, Biotin, coenzyme-A…).

PurF has what may be a vestigial iron-sulfur cluster though. Structural Biology of the Purine Biosynthetic Pathway: “The first type, present in higher vertebrates, plants, flies, cyanobacteria, and Gram-positive bacteria, contains a structural Fe4S4 cluster and a cleavable N-terminal propeptide, and is exemplified by Bacillus subtilis PurF in literature [10,16,17].”

Plans.

Add info about proteins and nucleotides as adaptors and dispensers/extractors of other molecules.

Add info about thematic differences among NTPs as P or PP donors.

Add more AARS (aminoacyl tRNA synthetase) info.

Attempt to organize the relationships by intermeadiates, and protein subdomain.

Add the urea cycle due to the connections between Purine (Asp amination), Pyrimidine (CP) and Arginine byosynthesis.

Add more reaction intermediates like aminoacyl-AMP. For example the glycine in purine biosynthesis exists as phospho-glycine. The aspartate as phospho-aspartate.

Other Observations.

Molecular “dispenser/extractor” and “adaptor”.

ATP, THF, glutamate, and other things ultimately may be providing things that were primordially available. So to an extent they, and some protein subdomains of associated enzymes, can be excluded from the picture. In fact aspartate and even glycine itself are ammonia donors in transamination reactions.

Similarly the same exclusion of parts can be done for molecules that act as adaptors, even tRNAs. I have ideas about coenzyme-A without the Adenosine-phosphate used to carry it around the cell.

Ideas.

Pre-cellular life. When looking at the NMPs and NDPs as adaptors the Pyrimidines have seem to have a bias towards membrane components and systems. CDP-glycerol and UDP and this part of bacterial cell wall biosynthesis.

G3P>E4P>R5P progression in backbone = each nucleotide in codon triplet?

Lysine is the only aa with a C1 and C1 AARS. Connection to Lysine looping steps in Lys biosynthesis via symmetry at the codon stem?

Glycinamide and bicarbonate, ammonia, or hydrogen sulfide as significant to abiogenesis.

Ribose originally come from a 3C+2C source? G3P + the activated aldehyde on Thiamine?

Issues.

The medium allows a never-ending cycle of reorganizing by different principals. On one level this is good. Copy and past my way to unpacking metabolism.

Good and Bad. All the things in my visual field is a potential benefit with ADHD. There are drawbacks.

Resources and how I use them.

Biochemical Pathways: An Atlas of Biochemistry and Molecular Biology. 2nd ed. Gerhard Michael and Dietmar Schomburg, 2012. General reference.

Roche free molecular and cell biology posters. Sometimes I like to just stare at the whole mess.

Uniprot. This is a resource for protein sequence and functional information. This was my first stop in getting information about a protein, example PurF. After getting length and other info I scroll down to family and domain databases and check the Interpro links.

Interpro. This is a resource for classifying proteins into families and predicting domains and other important sites. Here’s PurF again. Scrolling down to entry matches for the protein you can see the what domains have been defined so far, 2 subdomains. Once I have the subdomains I go to ECOD.

ECOD. This is a resource that is a “… hierarchical classification of protein domains according to their evolutionary relationship.” Proteins aren’t just related to one another, their subparts are related to one another. Often separately. There are 4 levels of comparison X: possible homology level, H: homology level (big families), T: topology level (structural similarity), F: Families.

The first domain of PurF is in the alpha helixes + beta sheets 4-layers group. X: Ntn/PP2C (N-terminal nucleophile aminohydrolases/Protein serine/throinine phosphatases 2C), H: Ntn, T: Class 2 glutamine amidotransferases, F: GATase_2_1st. You can click right into it.

The second domain of PurF is in the alpha helix/beta sheet 3-layered sandwich group. X: PRTase-like (PhosphoRibosyl-Transferase), H: PRTase-like, T: PRTase-like, F: Pribosyltran.

With these 2 domains and boxes of their close relatives I can look for patterns that suggest general functions of the group, and how they relate to one another. There are many studies looking at relatedness among for example ATPgrasp enzymes and their subdomains.

And with all of these relationships in ribose metabolism visible on the diagram lots of patterns may become visible. It’s like a Rubik’s Cube/Matryoska Doll/Where’s Waldo combo. There are probably multiple Waldo’s.

KEGG. Sometimes it’s useful to look at more specific metabolic pathways. This is also good for finding everything a molecule like uracil might be involved in.

Pubmed. A major archive of journal articles.

Wikipedia. Often a good place to get general info.