General Biochemical Patterns in Purine Biosynthesis and Protein Relatives. Part 4

Ammonia, glycine, and formate formed a ring on ribose-P, bicarbonate bound a nitrogen and was moved over 2 atoms. And little but phosphate was required for all these steps (externally, the protein still affects the process).

It’s worth mentioning here that this is another place where purine biosynthesis splits and joins another metabolic pathway. Previously thiamine and B12 joined at the first ring closure. Here aspartate is going to bind the nitrogen that used to be a glycine carbonyl, and then most of that aspartate will be removed as fumarate effectively making this a 2-step equivalent to glutamine as an ammonia dispenser. Aspartate as primordial ammonia dispenser?

After the removal of aspartate the product AICAR is a product of histidine biosynthesis as the big drawing shows, or should show if I remembered to draw an arrow (above). That explains why most of the molecule made with ATP and PRPP suddenly disappeared. Histidine is considered to be a late evolving amino acid, and AICAR feeds into purine biosynthesis instead of purine biosynthesis feeding into the connected metabolic pathway. Maybe it’s as simple as early purine biosynthesis being old and late purine biosynthesis being younger? Maybe one can just read age into the pathways somewhere.

Stage 3.1: addition of aspartate.

This is a protein domain family that I’ve already covered in general, but it has a few relations worth mentioning.  SAICAR synthase, PurC, is related to the ATPgrasp domain, and a bunch of serine/threonine/tyrosine protein kinases. And phosphate is used in attaching the nitrogen of aspartate to the bicarbonate moved in the previous step by PurE1 (the phosphate attaches to CAIR by the bicarbonate first, and is replaced by aspartate).

While the ATPgrasp and protein kinases are topology siblings, PurC has some family siblings and they are all inositol kinases.

Inositol. Every hydroxal (-O) can be phosphorylated.

Phosphatidylinositol 5-phosphate 4-kinase type-2 alpha, Inositol-trisphosphate 3-kinase A, and Inositol-pentakisphosphate 2-kinase. This one may not be that origins related as inositol is not used by many prokaryotes. It’s often a membrane lipid modification, and the various forms of phosphorylated inositol can act as signalling molecules on and off the lipids. Since inositol can have 6 phosphates it has been used as phosphate storage in plants. I tend to think inositol isn’t something I should pursue with respect to origins but that could change.

Otherwise what is interesting is that with the arrival of aspartate we have all of the basic ingredients for pyrimidines, bicarbonate and aspartate (and ammonia and phosphate).

The reactions from PurK to PurC are a place to root thinking about the appearance of a second family of nucleotides, and the identity of any early purines with a single ring with multiple extensions as the pathway progresses. Did NCAIR and CAIR ever extend as nucleotides themselves? AIR? Was a SAICAR ever part of a chain of nucleotides?

And all that has been needed chemically for purines was a phosphate. And phosphate control evolved early. Earlier than purine biosynthesis directly. The following paper looked at kinds of metabolism and parts of purine metabolism relating to phosphate control are among the oldest parts.

Structural Phylogenomics Reveals Gradual Evolutionary Replacement of Abiotic Chemistries by Protein Enzymes in Purine Metabolism. Caetano-Anollés 2013

The following refers to Figure 1 in the results, the image is too large for WordPress to let me upload.

The most ancient enzymes (colored in deep red) are located in a horizontal transect of the subnetwork that is responsible for nucleotide interconversion, supporting previous evidence that metabolism originated in enzymes harboring the P-loop hydrolase fold and responsible for these kinase functions

Stage 3.2: aspartate leaves as fumarate, leaves its nitrogen.

No phosphate is needed for PurB to do its work. Stage 3 is a 2-step version of what the glutamine ammonia dispensers did previously. This way of delivering nitrogen occurs in arginine biosynthesis too. I can’t help but think of an aspartate accumulation step in history, between that and all of the other things on the big drawing that use or start with aspartate.

And what is fumarate? Fumerate is an intermediate in the TriCarboxylic Acid (TCA) cycle,or “citric acid cycle” or “Krebs cycle”.

The Tricarboxylic acid cycle. I have added some cofactors and molecules that are gained or lost in the reactions.
The pyruvate and 2-oxogluterate dehydrogenase complexes are multi protein macromolecular complexes where carbon skeletons are passed from thiamine to lipoamide to coenzyme-A. I think part of our history involves being macromolecular complexes in mineral systems.

This cycle is where the precursor to glutamate comes from (2-oxogluterate), and the  precursor to aspartate is involved (oxaloacetate). Add glycine at succinyl-CoA and you get the pathway leading to heme and chlorophyll (not shown). If I had to stereotype the TCA cycle it would be about converting a 4C (oxaloacetate) and a 2C (acetyl-CoA) into a 5C (2-oxogluterate). It also produces protons via NAD(P) (can be used to regenerate ATP), and GTP/ATP from phosphate and GDP/ADP. So the removal of fumerate may tie purines to the TCA cycle. This cycle has a reverse version like glycolysis and gluconeogenesis, there are separate enzymes that do irreversible steps (arrows that just go in one direction).

Fumerate also works as an electron acceptor in electron transport chains.

As for the protein domains making up PurB, the situation is reminiscent of PurM. PurB is divided into 3 subdomains and most of the same proteins are in all 3. But the 3 subdomains may be more closely related to each other with PurB, they are all in the same architecture bin where the N and C ends of PurM are in different architecture bins.

(As I type this ECOD seems to have a problem where it somehow has Alpha Arrays architecture in the Beta Barrels bin. When I first did this all 3 domains were in Alpha Arrays, now the C-terminal region is in Beta Barrels. There are no barrels made from Beta Sheets in this domain so I will keep it as Alpha Arrays (arrays of Alpha Helices). There are a few mistakes on the big drawing due to this issue which I will correct. Specifically some Beta Barrels are actually Alpha Arrays. Otherwise the relatives of this domain all seem there.)

PurB is in a group called “L-Aspartase-like” and there are “L-Aspartase-like-N”, “L-Aspartase-like-M” (middle), and “L-Aspartase-like-C” X bins. In these bins are aspartase (useful under nitrogen starvation). Histidine ammonia lyase (histidine breakdown), phenylalanine ammonia lyase (plants) tyrosine and phenylalanine amino mutases (makes amino acids of opposite chirality) Adenylosuccinate lyase (PurB), delta crystallin (eye proteins), and fumerate hydratase (TCA cycle protein). The exceptions seem to be 3-carboxy-cis,cis-muconate cycloisomerase (benzoate degradation) absent from the N-terminal end, and the histidine, phenylalanine, and tyrosine proteins seem absent at the C-terminal end but I need to look at things with another archive to be sure.

Stage 4.1: formylation of the aspartate derived nitrogen.

This stage and the following step is a place where life has more than one path. In some organisms the bifunctional PurHJ, (sometimes just called PurH) carries out the steps sequentially. In other organisms PurP and PurO carry out the steps. The following paper covers how widespread these proteins are distributed among prokaryotes and the authors named the PurH associated formylation protein PurV and the following protein PurJ when the functions are in separate proteins.

Different Ways of Doing the Same: Variations in the Two Last Steps of the Purine Biosynthetic Pathway in Prokaryotes Cruz 2019

PurJ is in the A+B3LS architecture bin, and the MGS-like X, homology, and topology bins (methylglyoxal synthase) so these are all related. PurJ is listed in ECOD as bifunctional protein PurH. CarB of pyrimidine biosynthesis is in here too. Lastly there is methylglyoxal synthase which provides an alternate way to metabolize 3C-phosphates when phosphate is limiting. Notably PurV activity does not require phosphate but does use folate like PurN.

PurP is familiar because it is an ATPgrasp enzyme. It also has the preATPgrasp domain, but not the C-terminal CODH-MO-N-like domain that I find so interesting. So similarly to Stage 1 life has ways of formylating with and without phosphate. But in this case PurP seems to be limited to archaea. The above paper found no bacteria using PurP. Most bacteria use PurH (the whole thing) and some have separate PurJ and PurV/PurO. Some bacteria and archaea have no identified proteins for Stage 4, autocatalytic?

Stage 4.2: closure of the second ring and the formation of IMP.

PurV is in the A+B3L architecture bin (not a sandwich), and cytidine deaminase-like X, homology and topology bins. There are a lot of proteins in here and modification of poly RNA or DNA seems to be a theme, editing enzymes. There is a whole level where RNA can be edited after transcription to change protein sequence or make other function changes. I’m amused by the tRNA adenosine deaminase TadA (TA-DA!). There are other adenine deaminases. There are cytosine deaminases, and guanosine deaminases. There is a pre-mRNA splicing factor, Prp8. RadC which is thought to be a nuclease (cuts polynucleotides).

Isopeptidase activity is associated with this group. An isopeptidase is a protein that cuts a protein outside of the main protein sequence, the relevant example here being ubiquitin-like proteins that are attached to a lysine residue (or attached to another side chain in other isopeptidases). This system is where ubiquitin tags proteins destined to be broken down in the proteasome (the cellular trash can) is eukaryotic but is related to systems in bacteria and archaea. Prokaryotic relatives of the ubiquitin system is something I will need to read more about.

LpxL is in here and that is a UDP-2,3-diacylglucosamine pyrophosphatse (removes diphosphate at some point). That is interesting due to sugars and lipid related things bound to nucleotide handles in the previous post, but is a minor bit of chemistry in this group.

FdhD is interesting. That is a sulfur carrier required for the function of formate dehydrogenase. This isn’t the first sulfur carrier/ubiquitin-like system connection I have seen. ThiS and MoaD are ubiquitin-like. Maybe there is a sulfur carrier link to isopeptidase-like (deubiquitinase-like) proteins as well?

PurO

PurO is in a group I have already covered, the A+B4L architecture bin, the NTN/PP2C X bin, and NTN (N-Terminal amiNohydrolase) homology bin. It’s a bit like coming full circle at Inosine MonoPhosphate (IMP) since PurF was at the beginning and contains one of these domains. PurO is in the proteasome subunits family bin. This is interesting because while PurO joins parts to make a ring, proteasome subunits cut proteins.

Now with second ring closure something that gets an abbreviation like AGCT, I, IMP. I is never found in transcription directly, but some of those editing enzymes can turn something into I. That affects codon base pairing in translation leading to different effects. tRNAs can contain I.

Nucleotides as intracellular signalling molecules.

Inosine is also an example of a whole level I haven’t read about and integrated into my thinking yet, it is an intracellular signalling molecule like cyclic AMP or GMP derivatives. AICAR is a signalling molecule too. Guanosine tetraphosphate is a signalling molecule.

And these are all signalling molecules (“alarmone” is used with many of these) in addition to AICAR (called ZMP and ZTP as signalling molecules) and IMP. ZTP and IMP act in regulation of steps of purine biosynthesis. I found a paper looking for these systems in some archaea with a big table.

Putative Nucleotide-Based Second Messengers in the Archaeal Model Organisms Haloferax volcanii and Sulfolobus acidocaldarius. Braun 2021

Many are thought to be ancient.

Alarmones as Vestiges of a Bygone RNA World. Hernández-Morales 2019

Cyclic AMP (cAMP) shifts metabolism in response to things. In bacteria it is high when things other than glucose are carbon sources. Glucose is like an ancestral default sugar maybe. cAMP has many other functions like osmotic pressure and DNA damage. The rest of them are hard to find stereotyped roles for. It’s the cutting edge of this research.

Cyclic GMP (cGMP) is involved in UV stress adaptation and cyst (suspended animation) formation. There is a cIMP, cCMP and cUMP too but very little is known about their role.

c-di-AMP is also involved in osmotic pressure. c-di-GMP is involved in transition from motile to sessile living. cGAMP is only found in metazoa, so no origins role. But I drew a 2,3 cGAMP and there is a 3,3 cGAMP that is involved in viral infection.

Ap(n)A has different forms and Ap4A is best known. Ap4A is an alarmone for many stress conditions. I couldn’t find anything on Gp(n)G. (p)ppGpp is also involved in stress signalling.

That’s it for stages 3, 4 and an area of nucleotide biochemistry I still need to look at.

General Biochemical Patterns in Purine Biosynthesis and Protein Relatives. Part 3

Stage 1, formation of the first purine ring (a 5-membered C and N ring is a pyrrole) is not only complete, I have covered a bunch of protein families I can refer back to.

I did not expand on the relatives of one family that I will expand on in this post though. The preATPgrasp domain relatives are a group I wasn’t sure how I wanted to discuss yet.

Metabolism broadly.

This is also a good point to mention that purine biosynthesis merges with 2 other metabolic pathways here. AIR can be made into the 6-membered ring (a pyridine) of thimine, HMP. What I did not put on the big drawing this time was that AIR can also be made into the “lower ligand” of vitamin B12. A ligand is an atom or molecule that binds to a molecule or protein to help carry out its function (the cyanide groups in the last post were ligands). B12 itself is a chelator of cobalt. But there are other lower ligands (potentially less origins related) and I was running out of space.

Stage 2: binding and movement of bicarbonate.

Life has 2 ways of doing the first part of stage 2 but I used the more complicated route because the direct binding of carbon dioxide to where bicarbonate moves to (by class 2 PurEs) is only in animals.

Stage 2.1: bicarbonate binding.

PurK is the protein that binds a bicarbonate to the nitrogen sticking out from the ring. And we have already seen these protein subdomains. It’s preATPgrasp, ATPgrasp, and Carbon monoxide dehydrogenase molybdoprotein N-like again. So for most of what is interesting I will refer to a previous post.

As an observation I would point out that all that is really needed to make these reactions work so far is phosphate. Every reaction but one so far uses a phosphate in active site chemistry, and the exception is the PurN/PurT step where life can add formate without phosphate, and yet PurT uses phosphate anyway in other organisms (E.coli has both).

What I will discuss this time are the relatives of the preATPgrasp domain. This fragment and its family members are in their own topology bin and there are 13 other topology siblings within a larger Rossman-related homology group, inside of a Rossman-like bin, inside of the A+B 3 layer sandwiches.

There are a lot of Rossman-related domains. If I had to stereotype the group based on the associations I would label them “nucleotide binders with a bias for Adenine containing cofactors”. But that is not based on structure, just looking in the bins and they don’t exclusively bind nucleotides.

The first 4 topology groups to mention are: a NAD(P) (niacin) binding Rossman-fold domain, a FAD (riboflavin)/NAD(P) binding domain, a DHS-like NAD/FAD binding domain, and a “Nucleotide binding domain” bin that seems to do all of the above. These are all electron/proton dispensers/acceptors, and the first 2 contain many families, too many to go over in detail. Glycosyltransferase Mag is flagella related so I left it out.

But there is a whole area of biochemistry related to “nucleotides as handles” that I saw inside of the above bins that I haven’t covered on the big drawing. You see Adenosine (adenine-ribose-phosphate) used as a handle for cofactors all over the big drawing, and some of the other bases are used as handles in the molybdenum cofactor part of the drawing. But all of the bases have jobs as handles and there is a range of things related to membranes and sugars that adenosine, guanosine, uridine, cytosine, and thymidine are involved in as handles.

The 5th topology bin is S-Adenosyl-Methionine-dependent (SAM) methyltransferases. This also contains many families and the adenine-ribose-phosphate would be a common denominator for the first 5 bins so far. SAM is not just for methyl transfer though, the adenine-ribose can be an “adenylyl” modification (part of the molybdenum cofactor biosynthesis, sulfur carrier step). Most of methionine can be bound to something as homoserine leaving methyl-sulfur-ribose-adenine as well. As far as I can see this domain is all about SAM methyl transfer.

Tubulin nucleotide binding domain-like is very interesting. The cytoskeleton uses GTP hydrolysis on a timer to determine if the subunits like actin are monomers or polymers. And actin is an intracellular highway for cell contents.

The NagB/RpiA/CoA transferase-like domain has a lot of interesting things. Some transcription repressors, 5, 10-methlenyltetrahydrofolate synthetase. Related to fatty acid synthase there are coenzyme-A transferases and acetyl-CoA hydrolase (makes acetate). Citrate lyase is in the TCA cycle. IF-2B is a translation initiation factor. Finally there is glucosamine-6P isomerase and very interestingly ribose-5P isomerase. That interconverts ribose and ribulose in the pentose phosphate pathway.

The MurCD/PglD bin contains things that work with UDP-sugars and derivatives. MurC and MurD attach an alanine and a glycine to UDP-N-acetylmurmate in making bacterial cell walls.

The activating enzymes of ubiquitin-like proteins, or “E1 enzymes” are mostly studied in eukaryotes where ubiquitin targets proteins for degradation in the proteasome. But the general machinery exists in prokaryotes too and the activation process involves an adenylyl modification of the ubiquitin-like protein (UBL), where the adenylyl is replaced by the E1 (E1-UBL link) prior to attachment of the UBL to another protein, “activation” of the UBL. ThiF (thiamine biosynthesis) is in here, and it activates ThiS (on the big drawing) in a similar manner to exchange the adenylate with sulfur. That is something I want to add to the big drawing like I did with molybdenum cofactor biosynthesis, and MoeB (molybdenum cofactor biosynthesis) is in this bin too. No NAD(P)/FAD/SAM in here.

Ubiquitin-like Protein Conjugation: Structures, Chemistry, and Mechanism Cappadocia 2017

The formate/glycerate dehydrogenase catalytic domain-like topology bin is interesting because every catalytic region could be relevant, and especially for smaller, more origins relevant molecules. There is S-adenosyl-l-homocysteine Hydrolase (methionine biosynthesis), and alanine dehydrogenase (alanine to pyruvate). Lysine oxioreductase, and dipicolinate synthase both in lysine metabolism. Glycerate, formate and lactate dehydrogenases. Also something called nicotinamide nucleotide Transhydrogenase. All of these require NAD(P) so it this may still just be cofactor binding regions.

The UDPG/MGDP dehydrogenase C-term topology bin doesn’t have much but it’s all nucleotide-sugar proteins. UDP-glucose is central to carbohydrates and polymers of them for storage and membrane parts. And at least one requires NAD.

Finally there are the aspartate/ornathine carbamoyl transferases. PyrB is in the pyrimidine pathway.

Stage 2.2: moving bicarbonate over 2 atoms.

PurE1 doesn’t use phosphate as mutase reactions change the structure of a molecule without adding or removing things, it’s a low energy process and doesn’t need ATP. The proteins and their various amino acid side chains can hold the molecule just right, and alter charge distributions such that the bicarbonate moves over. And this a protein domain family that I’ve already covered, the A+B3LS/Flavodoxin-like/GATase-like domain is the entire protein.

And bicarbonate is finally something in common with the pyrimidine pathway. Bicarbonate would have been in primordial oceans and would have seeped into the rock with water as well as contact with the outside of hydrothermal vent minerals.

General Biochemical Patterns in Purine Biosynthesis and Protein Relatives. Part 2.

In this post I will be looking at the rest of the proteins in stage 1 of purine biosynthesis, the related PurL and PurM. To this point an ammonia dispenser in PurF provided an ammonia that replaced the diphosphate on ribose in PRPP, an ATPgrasp ATP phosphate dispenser used a phosphate to add a glycine, and then a formate was added to the nitrogen on glycine with PurT/PurU, or PurN. This leaves us with FGAR, formylglycinamideribonucleotide.

Now I add steps 4 and 5. Swapping the glycine carbonyl with an imine, and closing the first purine ring.

Step 4: PurL and swapping the glycine carbonyl with an imine via phosphate.

PurL is a large protein, the largest on the drawing. To me large and usually complex means old. This protein uses a phosphate from ATP to swap the glycine carbonyl oxygen (C=O) for an imine (C=N). The ammonia again comes from glutamine. Even though it is probably old parts of it can be deleted for fun if I assume ammonia is in the vent.

PurL has 5 individual domains, 2 present in duplicate. I consider some parts more important than others. In the vent the entire glutamine ammonia dispenser might be unnecessary.

  • An Alpha Arrays/RuvA-C-like domain (needs corrected from Beta Barrels on the drawing), the PurL linker domain.
  • An A+B2L/Alpha-Beta Plaits domain, the PurS-like domain.
  • An A+B3L/Bacillus chorismate mutase-like/PurM-N-like domain (present in duplicate).
  • An A+B2L/Alpha-Beta Plaits/PurM-C-like domain (present in duplicate).
  • An A/B3LS/Flavodoxin-like/Glutamate amidotransferase class 1 domain (GATase1).

RuvA-C-like domain

Starting with the Alpha Arrays/RuvA-C-like domain. This is a linker domain that acts in long range communication between parts of the protein. It’s all by itself in its homology group so that’s that.

PurS-like domain

This second domain in PurL is an Alpha + Beta 2 Layers domain called “PurS-like” because it is sometimes a separate protein, but always required for PurL function so in that case they call it “PurS”. It’s function isn’t well known but it isn’t the part that makes ammonia or holds FGAR, maybe it just holds the other 2 domains together. This is also in its own homology group all alone.

PurM-N-like domain

This domain is in the “bacillus chorismate mutase-like” X group, which may be related to the N-terminal part of PurM/parts of PurL. And maybe not. There is an interesting set of things in here and I consider this part and the following PurM-C-like domain to be the most important.

First is HypE which is involved in maturation of hydrogen utilization proteins with complex iron-nickel centers, hydrogenases. They can turn H2 into 2 protons and 2 electrons and back again. When it comes to the concept of energy, cells need to have and control a pool of protons and electrons, using things like niacin and riboflavin.

HypE binds a carboxamido (N-C=O) on it’s C-terminal cysteine, that came from carbamoyl phosphate (as in pyrimidine biosynthesis), and delivered by HypF. ATP is then used to dehydrate it to a cyanide (nitrogen triple bonded to carbon). This cyanide and a second is then bound to an iron atom (On HypC-HypD) used to make the finished iron center that is combined with a nickel atom.

Structural Insight into [NiFe] Hydrogenase Maturation by Transient Complexes between Hyp Proteins Miki 2020

Selenide, water dikinase, also known as selenophosphate synthetase, SelD. Synthases and synthetases both ligate (connect) things together but synthetases need a source of energy like ATP.

This combines selenium and phosphate to make selenophosphate which is the form used by cells. Selenophosphate is then mostly used as the 21st amino acid in a system that replaces serine with sersec (Sec, U) on the tRNA, or an RNA modification. The serine translation system is used and individual sec replacements are made during translation, hence the amino acid sersec. This is in bacteria, archaea and eukarya.

The selenophosphate synthetase family: A review Manta 2022

Finally thiamine monophosphate kinase (ThiL, on the big drawing) makes mature thiamine diphosphate.

I type mature because as far as I have found thiamine and thiamine monophosphate do nothing. It is thiamine diphosphate that is mostly used by cells, with some thiamine triphosphate and adenosine thiamine diphosphate. Those last 2 are somehow associated with glucose availability and amino acid starvation, and the opposite, amino acids and no glucose, respectively.

Update on Thiamine Triphosphorylated Derivatives and Metabolizing Enzymatic Complexes Bettendorff 2021

PurM-C-like domain.

And just like that we’re done with this domain because again we have PurL, PurM, HypE, selenophosphate synthetase, and ThiL. The only difference is this part of the protein is an Alpha+Beta 2 Layers and then in the Alpha-Beta Plaits X group, within which there is no guarantee of relationships outside of the general alpha helix and beta sheet organization.

Class 1 glutamine amidotransferase domain.

This is the glutamine ammonia dispenser for this protein. GATase class 2 we saw with PurF, a GATase2. The larger family this is a part of, the X:Flavodoxin-like and H: GATase1-like A+B3LS domains, appears many times on the big drawing. There is a lot to point out here. 20 related topology bins are siblings with the GATase1-like bin. And that topology bin has a lot besides the GATase family.

First the GATase1-like topology group itself has a lot of things already on the drawing. The current PurL, GuaA, CarA, PyrG, AroE, TrpG, PabA, FolD HisH, MetA and Pdx2. There are more but they are in different topology bins. This topology group also has protease 1, peptidase S51, peptidase C26, peptidase S66, catalase 1 (breaks down peroxide), isocyanide dehydratase, and beta-galactosidase.

The same topology bin as AroE and FolD (amino acid dehydrogenase-like-N) also contains Glu/Phe/Trp/Leu/Val (amino acids) dehydratases, malate oxioreductase (makes pyruvate from a TCA cycle intermediate), and at least 2 pterin (folate relative) proteins, methylene tetrahydromethanopterin dehydrogenase and tetrahydrofolate dehydrogenase/cyclohydrolase.

Pterin related, in its own topology bin is F420 dependent methylene tetrahydromethanopterin dehydrogenase.

In separate topology bins and also on the big drawing we have PurE, RibH, AroB (in a bin with glycerol dehydrogenase), and AroQ.

Things that grab things and move them is a theme for the Periplasmic Binding Proteins-like 1 and Chelatase- like bins. In bacteria with double membranes the space between them is called the periplasm. Periplasmic binding proteins bind things like amino acids (leucine here) and sugars (glucose, galactose, arabinose here). Chelatases bind and insert tetrapyrroles like heme and chlorophyll, which in turn often bind, “chelate” metal ions. This bin also contains iron-molybdenum binding proteins, zinc transporters, cobalt chelators, vitamin B12 transporters, a iron chelators.

Relatedly in its own topology bin is uroporphyrinogen 3 synthase, or HemD. This protein turns a linear chain of 5-membered rings into a circular ring heme/chlorophyll precursor, uroporphyrinogen 3. These often feature a chelated metal ion in mature form.

A last grabber/holder is the ATC-like topology bin, where the “ATC domain” contains a bunch of cysteines in the sequence which likely hold iron-sulfur centers, and is related to aspartate and glutamate racemases.

In a topology bin by themselves is phosphofructokinase and NAD (niacin) kinase. A glycolysis protein and the protein that makes NADP from NAD, and thus switches niacin to the anabolic form from the catabolic form.

The CheY-like bin is named after a chemotaxis protein, and contains pili related proteins, HydB (a NiFe hydrogenase), sensor kinases (RcsC), a vitamin B12 binding domain which is in MetH (big drawing), and ornathine decarboxylase.

In a topology bin by themselves there is fucose and arabinose isomerase. Another separate topology bin contains YfiR which makes cyclic di-GMP. Yet another topology bin contains IlvD (isoleucine, valine, and coenzyme-A biosynthesis) and 6-phosphogluconate dehydratase.

And finally the interesting FabD/lysophospholipase-like topology bin contains acyl carrier (FabD) and transfer proteins. It also has parts of fatty acid synthase. Binding and carrying coenzyme-A is a big theme. Lysophospholipases separate fatty acids from the glycerol backbone.

Step 5: PurM

PurM closes the ring via a phosphate from ATP, and the same basic reaction mechanism as PurL.

Thanks to PurL, I’m done with PurM since it’s the same 2 fragments shared with HypE, selenophosphate synthetase, and ThiL.

That’s it for stage 1 of purine biosynthesis, forming the first ring.

General Biochemical Patterns in Purine Biosynthesis and Protein Relatives. Part 1.

In this post I want to start expanding on the kind of analysis I did in a previous post with the protein that adds glycine in purine biosynthesis, PurD. I looked at what proteins are related to each of the subdomains and noted the kinds of chemistry and molecules involved, and did some fun speculation. A family of phosphate dispensers and very small molecules with HO-C=O as part of their structure.

And these are still puzzle pieces so it’ll be like a list in places where there might be significance, but I don’t know much about them. First some words about the kind of data on the site I primarily use to look at the relatives of parts of proteins.

Evolutionary Classification Of protein Domains and bin importance: AXHTF.

ECOD and similar archives of data relating to the relatedness of nucleotides or proteins have hierarchies of bins to group by relatedness. The largest bin at ECOD is the Architecture or “A” bin which groups by the most general arrangements of protein 3D structure, the Alpha Helix  or the Beta Sheet, and how those relate.

Top Left: water is tetrahedral, polarized (protons+, electrons-), and constantly jostling around. Top Center-Left: a tripeptide has free rotations around it’s bonds and isn’t just abstract and straight, things rotate around in the water. Top-Right: a dipeptide showing more parts of the molecule than I typically use to indicate where positively and negatively polarized regions are (“charged” is for ions). Center: helices (like the Alpha Helix) are one common shape charges allow polypeptides to take. Bottom: sheets (like the Beta Sheet) are another common shape charges allow polypeptides to take. Most structure at this level are kinds of sheets and helices.

So you get things like: Alpha/Beta 3 layer sandwiches defined as “repeating beta-alpha units form a sandwich with a mainly parallel beta-sheet layer stacked between two alpha-helix layers” or Alpha+Beta 2 layers. These outermost levels can be all related but they can just as easily be independently arrived at shapes. The same is true of the “X” group which is “possible homologs” where homologs are related by sequence. The same problem exists in A and X but things might be related in X.

Below the X group we get the “H” or homology group where things are explicitly related based on sequence. Within H there is “T” or topology which means they have the same sequence but there are differences in structure. Finally “F” or family are protein subdomains most closely related to one another.

HTF are where I look the most.

Purine biosynthesis, Adenosine and Guanosine.

I see 5 stages and 13 steps in purine biosynthesis.

  1. Formation of the first ring (5 steps).
  2. Carbon addition and movement (2 steps).
  3. Aspartate addition and fumerate removal (2 steps).
  4. Formation of the second ring (2 steps).
  5. Adenosine or Guanosine monophosphate formation (2 steps, separate paths).

I will be limiting myself in this post to most of Stage 1: Ribose phosphate, glycine, formate and ammonia join up. This post got longer than I thought it would for all of stage 1 but building up ring closing as a specific thing could be useful.

The first 3 steps of purine biosynthesis.

Step1: PurF

This protein exchanges an ammonia for a diphosphate from Phospho-ribose-diphosphate (PRPP), making phospho-ribosyl-amine. This pattern of using phosphate to swap for other things occurs over and over. PRPP was made from ribose-5-phosphate with ATP by the gene Prs, which is related to the second domain in this protein. Cells have a PRPP pool because as the drawing shows, PRPP has multiple uses.

Phosphoribosyl Diphosphate (PRPP): Biosynthesis, Enzymology, Utilization, and Metabolic Significance. Hove-Jensen 2016

Also known as amidophosphoribosyltransferase (you see why I prefer short protein names? I usually like descriptive names), PurF has 2 subdomains:

  • An Alpha+Beta 4 layer (A+B4L)/NTN hydrolase domain at the end with the nitrogen (N-terminal, defined as the beginning of the sequence)
  • An Alpha/Beta 3 Layer Sandwich (A/B3LS)/PRTase-like domain at the carboxy or C-terminal end.

The “amido” refers to the fact that the ammonia comes from the amino acid glutamine, which becomes the amino acid glutamate after removing the nitrogen in the first step of PurF activity carried out by the NTN-hydrolase domain, making phospho-ribosyl-amine (PRA).

The A+B4L subdomain is in the N-terminal nucleophile aminohydrolase (NTN hydrolase) homology group, and class 2 glutamine amidotransferase topology and family groups (GATase2). In terms of primordial chemistry, the chemistry of an end of the protein chain is a significant thing. This N-terminal nitrogen protein superfamily has an affinity for nuclei (nucleophile) and breaking bonds with water (hydrolase). Without the need for ATP this protein domain removes the ammonia and that reactive molecule floats over to PRPP which is reactive towards ammonia.

The AB3LS domain at the C-terminal end is in a transferase subgroup of these proteins, PRTases, where Prs is a synthase. This subdomain essentially just holds the ribose-5P end. In fact at this point PRTases are just ribose phosphate holders to me. This particular family has one other synthase for OMP, and a bunch for purine and pyrimidine bases in general. The interesting part for PRTases is that the ones for niacin, histidine, and tryptophan are in different protein subdomain families on ECOD.

The rest of the entry for PurF will be on the A+B4L subdomain and what’s interesting there. In the sibling homology group to the NTN aminohydrolases there are protein serine/threonine phosphatase 2C members, specifically the part of the protein that does the removal, the catalytic domain. Phosphatases are proteins that remove phosphate from things, here serine or threonine. With all of the phosphate around a related family of phosphate removers is interesting. Looking at them more closely is a plan.

Inside the NTN hydrolase homology group there are 6 related topology groups, one of which is the GATase 2 group. Outside of GATases there are some interesting things. “Proteasome subunits” contains parts of the cellular trash can, the proteasome, and individual peptidases which are things that cut peptide bonds in proteins. This group also includes PurO which is in stage 4 of purine biosynthesis. There are some antibiotic biosynthesis enzymes for penicillin and carbapenam, and things that break C-N bonds other than peptide bonds. I’ll probably ignore antibiotics going forward, there are lots of them and they are unlikely to be usefully origins related.

Glycosyl-asparaginase removes aspartate from glucose-asparagine leaving glucose-amine (glycoprotein degradation). Gamma-glutamyl-transpepidases move glutamate to things and from things.

Inside of the GATase2 group is Asparagine synthase makes that amino acid from aspartate (via glutamate), and the protein that makes glutamate from 2-oxogluterate. There is a protein that makes glucosamine from glutamate and glucose-6P.

That’s it for PurF for now.

Step 2: PurD

I first introduced this protein with the big drawing.

This protein has 2 steps in its reaction. First the phosphate dispenser part releases a phosphate from ATP which binds the carboxy on glycine making glycine-P. This is reacted with PRA and the phosphate is swapped for the nitrogen on the ribose like in PurF, making glycinamide-ribonucleotide (GAR). As I mentioned previously this is the closest molecule to both nucleotides and polypeptides as it has peptide bond features.

This protein contains 3 subdomains in order:

  • An A/B3LS/Rossman-related preATPgrasp domain (thought to be substrate specificity)
  • An Alpha/Beta Complex Topology (A/BC.T.) ATPgrasp domain (the phosphate dispenser domain)
  • An A/BC.T./Alpha Beta hammerhead barrel hybrid sandwich domain related to the carbon monoxide dehydrogenase molybdenum protein (CODH MO-N-LIKE) and biotin carboxylases

There are patterns to the phosphate dispenser with and without the substrate specificity part, and patterns to CODH MO-N with the rest and by itself.

First of all there are 3 more ATPgrasp proteins in purine biosynthesis, and the ATPgrasp domain by itself is related to purine biosynthesis protein PurC, SAICAR synthase, and a bunch of protein kinases (put phosphate on serin, threonine or tyrosine). Both of those are sibling homology groups with a bunch of protein kinases, things that put phosphate on serine, threonine, or tyrosine in a polypeptide chain, usually to change the function of a protein somehow.

When all 3 subdomains are together you have PurD, PurT, PurK (Pur=purine biosynthesis), L-amino acid ligase (attaches an L-amino acid to things), and a set of things including biotin, acetyl-CoA, and pyruvate carboxylases.

PurP only has the preATPgrasp and ATPgrasp domains. Also pyrimidine pathway CarB.

When the phosphate dispenser is alone, no preATPgrasp, you have a collection of interesting things. RNA and DNA Ligase (protein that attaches things). Ligases for glutamate (GshA, from the ancestral fragment post) or tyrosine and tubulin. Citrate lyase, succinyl-CoA synthase, both related to the TCA cycle. PEP (phosphoenolpyruvate) synthetase, a protein involved in the first step of gluconeogenesis (making glucose instead of breaking it down for energy) in some prokaryotes.

When the CODH MO-N-LIKE bit is by itself you have purine breakdown protein xanthine dehydrogenase chain B, which adds an oxygen. There’s the CODH MO-N itself. There is aldehyde oxioreductase.

This CODH MO-N-LIKE fragment gets more interesting when you zoom out one level. This is the longest part of this post.

First we have MoaE, molybdopterin synthase. That’s on the drawing! And there’s another on the drawing with the PRTase that attaches ribose-5P to the ring in making niacin. There are a couple of proteins from the large ribosomal subunit too, L10 and L27. The L27 topology bin also has parts of the “exosome complex” which degrades RNA, another trash can.

The “duplicated hybrid motif” bin contains 3 interesting things. The first is part of a system that uses the phosphate from PEP and passes it along several other proteins to a membrane protein (the PTSD E2A fragment here) that phosphorylates a cargo, usually a sugar, during import. The second is a family of peptidases including one that cuts between glycines (LytM). The third is the RNA polymerase beta prime subunit.

Finally there is the “single hybrid motif” which has a lot of things. First there is the RNA polymerase beta subunit (as opposed to beta prime). Second the protein domain that binds both cofactors lipoate and biotin are related to each other and are in here. Those are newer cofactors so I haven’t done much with them. Third is the glycine cleavage system H protein, this is very significant as this is where cells get glycine and serine through interconversion. Fourth AstE_AspA are Succinylglutamate desuccinylase/aspartoacylase. The first is an arginine degradation protein, and the second is too but seems limited to eukaryotes (or isn’t studied in prokaryotes due to the human disease emphasis).

New paragraph due to theme difference. The fifth thing in the “single hybrid motif” is the V-type ATP synthase alpha chain. The sixth is RnfC, a protein involved in anaerobic electron transport with niacin and pumping protons or sodium ions. Relatedly the seventh is cytochrome F, something that passes electrons in electron transport chains. NqrA is similarity involved niacin and electron/sodium transport.

I’ll finish up this with the eighth and ninth, NusG and Rrp15p. They are both involved with ribosomes. NusG is a major transcription (copying DNA) termination factor, and Rrp15p is required for cells to make the large ribosomal subunit.

Step 3: PurT and PurU or PurN

In this step a formate is removed from folate and attached to the nitrogen at the end of the glycine that was just attached, to make formyl-glycin-amide-ribonucleotide, FGAR.

This is something I will have to fix on the big drawing but life has more than one protein for the third step of purine biosynthesis. I have PurT and PurU, but not PurN. We’ve technically seen the relations of step 3 PurT since it is also an ATPgrasp protein (all 3 domains).

PurU is the protein that removes the formate and passes it to PurT where the phosphate from ATP is used to make a reactive formyl-P that similarity to the previous steps is used to attach a carbon to a nitrogen.

This is where things get easier for PurN because PurU is related to PurN so I can cover them together, though PurU has more pieces, suggesting the most important part is the one they share.

PurU has 2 domains:

  • An Alpha/Beta 3 layer sandwich/formyltransferasesdomain (A/B3LS).
  • An Alpha +Beta 2 layer(A+B2L)/Alpha-Beta Plaits domain.

The first domain that is related to PurN in its entirety. The homology and topology bins are formyltransferases. The most interesting ones are methionyl-tRNA Fmet formyl transferase (adds formate to the initiator methionine in translation) and 10-formyltetrahydrofolate dehydrogenase (removes formate from folate as CO2 while adding a proton to niacin).

The second domain in PurU is related to a lot of things but I’m thinking I can ignore those given PurN.

That’s it for now. It would have easily been over twice as long if I included PurL and PurM. The Alpha+Beta 2 Layers category has an “Alpha-Beta Plaits” with lots of things. It’ll be practice at sorting out general biochemical themes.

LGBTQ+ People Are Not Going Back.

I’m still generally fine with the sentiments in Julia Serano’s post.

1) I will not tolerate any backpedaling on LGBTQ+ rights whatsoever, and

2) If my representatives fail to strongly stand up against these attacks on LGBTQ+ rights, then I will take my vote elsewhere next election.

Words aren’t enough. Actions. Please contact your representatives to tell them the same.

Planned politics is good.

I’m passing this along.

“I propose that on Tuesday, December 3rd, 2024 (the first day that both the House and Senate are back in session), all of us who are invested in this issue and have a platform (whether it be a blog, newsletter, column, podcast, YouTube, TikTok, Instagram, etc.) publish a piece with the shared title: “LGBTQ+ People Are Not Going Back.” Yes, I know, it’s a cheesy title, but it holds Democrats accountable to their own talking points and makes it clear that backsliding on LGBTQ+ rights is nonnegotiable for us.”

Planned Action for LGBTQ+ & Allies in Response to Democrats Capitulating on Trans Rights by Julia Serano

Abortion

I try to keep my abortion position as independent of law as possible and the most important part of this post concerns this. That being said I’ll include legal and religious angles at the end that I think are useful. But these should not be used to debate, too often forced-birthers steal the political atmosphere with debate. Delight in giving orders, and stating things as the way things are. Learn to enjoy their opportunity to figure out their negative feelings about your actions.

What freedom looks like.
My position can be summarized as there is no right to force someone to give birth or stop someone from offering the services. Without forced-birthers people would just get abortions. That’s the freedom and liberty part.
As a group forced-birthers get by because they are a group, a mob, people who can force others to give birth themselves or through government. Relatedly any forced-birther is a stand-in for the people they put into power when it comes to shaming and criticism.

That’s it. They assert the power to use force on others with no good reason or rights to do so. If one were to want a legal position I’ve been confused as to why the 9th amendment has not been used as a reason, “The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.” Including the right to go get or offer an abortion which harms the rights of no one else. They have to go to “spectral evidence” and speak for fetuses eventually, and I don’t give them that power to make up words on behalf of the unborn without minds or words.

I don’t even believe the christians among forced-birthers have their own theology right. When I was raised in that environment a big value was placed on sin and choice, they have to use force. They don’t even believe what they say. And it’s a fine for killing the unborn of another in their book. They have manipulation of language only.

Finally I’ve a way of dealing with their “baby talk”. It’s untested but rhetorically in the moment give them “baby” or “child” mockingly, but make the conceptus “baby 0.01”, the embryo “baby 0.5”. Be clear that you aren’t serious and that they simply emotionally require and depend on recalling feelings AFTER birth and can’t deal with reality. You can bet they’d destroy embryology to get their way if they had to.

The chirality issue

Note: the second figure has been replaced with the correct one, twice. A xylulose bond was in the wrong direction. This is hard.

There are a couple of other origin of life issues that I haven’t focused on yet. One that I was hoping would reveal itself is the issue of “chirality”, or the side of a molecule a bond or atom is on relative to the rest of the molecule in 3D. The strict definition includes things with identical chemical formulas having non-superimposable structures.
I don’t tend to draw in 3D to simplify things. I also don’t include hydrogen to simplify things.

I meant to address this in a previous post where I posted this figure of thiamine and parts of the pentose phosphate pathway. It’s why there is a single hydroxal on xylulose-5P drawn in 3D as if it’s coming at you out of the page. I’ll get to that at the end.

I’ve drawn ribose and xylulose in 3D. Both looking down the length of the molecule and from the side. Carbon actually binds 4 things and is tetrahedral like a pyrimid. So I put the hydrogens back into the upper structures and used conventional dashed and filled wedges to represent bonds going into and out of the plane of the page respectively.

Each of those hydroxals could be where the hydrogen is on the carbon instead. If you flipped the hydrogen and hydroxyl on ribose you can see how the formula would be the same but the structures would not be superimposeable. Things on different sides are designated with R/S or D/L distinctions that refer to single atoms on the structure. D-ribose refers to the stereochemistry around the atom farthest from the aldehyde group in the form of ribose used by life.

Amino acids are all L- relative to the side chains aside from glycine which has no atoms with stereochemistry.

So why is life one way and not the other when it comes to these bond directions? I don’t know but I noticed that the only sugar with a bond in the other conformation is xylulose-5P so I drew that one bond with 3D on the thiamine figure. The rest are in the other confirmation. And xylulose is the donor for the pentose phosphate pathway 2C molecules. This isn’t a solution to the chirality problem but it is a clue.

Ancestral fragment?

This post is going to look at something I found from the results of 2 studies looking at the ribosome, the molecular machine that translates the genetic code into a string of amino acids that fold into the 20,000+ proteins that build and make us.

The first paper Evolution of the ribosome at atomic resolution peels the large subunit ribosome apart stem-loop by stem loop. They have a second paper with the small subunit which I took some figures from. A stem loop is a loop shaped structure where you have a base paired nucleotide “stem” and above that the rest of the structure forms a “loop”, often with more stem parts. These are how ribosomes tend to evolve. Sequence changes have more of a negative effect than insertion of something that adds new functions.

In the top portion of the figure from their second paper they compare a similar region of ribosomal RNA (rRNA) from different species where you can see how new loops are added.

They went through the ribosome and figured out what stem loops were added where and peeled additions away down to a first rRNA “hairpin”, named “AES1” for “ancestral expansion segment 1” in capital letters for the large subunit and lower case letters for the small subunit. I’m interested in the large subunit because it older. More recent segments are AES2…AES3…

This second figure shows the base pairing in the large subunit. The top is numbered by their expansion segments and the bottom is colored by 6 phases of evolution they identify. The part I’m interested in is #1 in the upper right by the “PTC” in the lower right.

AES1 is associated with the Peptidyl Transferase Center (PTC). The region that does the actual connection between one amino acid and another in creating a new protein.

The second paper The ribosome as a missing link in the evolution of life looked at the sequence of rRNA and did a search of other genes to look for similarities and found some very interesting things. I should emphasize that the rRNA is not transcribed into products as far as is known. But that may not have always been the case.

They find the transfer RNAs, tRNAs, that escort the amino acids to the ribosome. They find lots of parts of ribosome and RNA related proteins. They find ribosomal proteins, polymerases, and more.

From these data sets I identified the sequences associated with the oldest fragment of the large subunit, AES1, and went and looked at what was in it.

I found parts of 5 things:
1) GshA Glutamate–cysteine ligase
2) Protease 4
3) RecF a DNA replication and repair protein
4) PqqL a probable zinc protease
5) RplB/Ribosomal large subunit protein 2

Things that attach and cut amino acids, proteases. Interesting. A protein that binds single stranded DNA during repair and replication, RecF. And one of the first ribosomal proteins to bind the large subunit rRNA, one that accesses the PTC during translation, RplB. Very interesting.

Again but better.

I’m not really functional for human behavior related posting, but thinking about the molecules is a hobby and I’ve been able to improve on the drawing I did in my previous post. I actually might have enough origin of life material for 4 or 5 posts. This post will be limited to the overall patterns in the drawing and the most interesting thing I’ve found in the subdomains so far. All of these posts will be like collecting puzzle pieces to the origin of life.

After more time with my drawing I’ve decided that it’s better described as a metabolic alignment of everything related to DNA that I can fit into a canvas. The file is too large for me to post on WordPress so I linked a Google documents version again. To properly figure out biological origins the sequences of genes and proteins aren’t enough. The very metabolism of these things should be considered. I’ve been careful about the structure of the alignment. Even now I’m thinking about how it could be improved with more room or simpler representation. There’s only so much I can do to compact the information down.
It’s a complex set of interconnected notes useful to anyone interested in the origin of life. This one is even cleaned up and error checked.

Cells are relationships between nucleic acids and proteins. Ribosomes and a genome. They make everything between them with the help of some atoms other than CHONPS. That is complex, but not endless. Ribose, 5 bases (A G U C dT), and 20 amino acids.

There is a legend in the top left of the linked drawing. It shows that each molecule is positioned with the protein(s) that use it. So the molecule is the product of something tangential to it shown by arrows, and the MAIN product of the reaction carried out by the protein(s) is also tangential.
Positioned right to above the molecule are other reactants and catalysts, and positioned left to below are products of the reaction.
The name of the protein(s) is given with the length in amino acids. Below that is every identified subdomain in the protein in order, and the protein domain superfamilies they belong to according to the Evolutionary Classification Of protein Domains database (ECOD).
Occasionally the molecule is below a protein when it’s a product of something already present on the figure nearby, like 10- formyl tetrahydrofolate (thought to be a storage form of formyl groups).

There is also a list of molecules that make an appearance in the top right. From 1 to 5 carbons and larger molecules.

In addition to the Purines (A G) and Pyrimidines (U C dT) are:
The amino acids Histidine, Methionine, Phenylalanine, Tyrosine, and Tryptophan.
The cofactors Thiamine (vitamin B1), Flavin (vitamin B2), Coenzyme-A (vitamin B5), the vitamin B6 group (pyridoxal…), Niacin, Folate, S-Adenosyl-Methionine and Molybdenum Cofactor.
These are boxed in blue for the amino acid product or first usable form of cofactor. Molecules that follow this box are different forms of the cofactor used by proteins made from the basic form (folate, molybdenum cofactor, vitamin B6, flavin too but I haven’t included FAD yet).

A cofactor, or coenzyme is a molecule that participates in what the protein is doing. The protein is just the polypeptide part. You can think of them as tools that often have to be reloaded when they dispense things. ATP itself is a cofactor as “energy currency”. The amino acid glutamate is a cofactor for dispensing ammonia. Cofactors also act as platforms to build things (coenzyme-A), or move things to other things (Thiamine), or move electrons around with charge so chemistry can happen (vitamin B6).

Overall Patterns

The whole thing is roughly centered on the purine (A G) pathway going from left to right and bent downward halfway through. Ribose-5P sits in the upper left corner in a small red box. The purine pathway is boxed in red as well. The products are marked with a large black A and G.

The pyrimidines are lined up and go down where the purines bend down. This is deliberate because parts of both pathways involving the amino acid aspartate and bicarbonate (HCO3) are close to one another. Large black letters mark the products similarity to the purines. From there continuing to the right I have niacin, methionine (and SAM), and coenzyme-A allowing a similar grouping of aspartate in those pathways.

Thiamine is above the purines and I’ve lined its glycine up with the one in the purine pathway.

Under ribose-5P and traveling downward are the vitamin B6 group and “aromatics” which are tyrosine, phenylalanine, tryptophan, and part of folate called PABA. All from the same pathway. The B6 group uses ribose-5P and the aromatics use erythrose-4P so they made sense together and near ribose. 5C sugar, 4C sugar…

Histidine, folate, flavin, and molybdenum cofactor branch off from the ends of the purine pathway where A and G separate. They are made from A and G, where many of the previous molecules have A attached as part of the structure.
This whole section has the feelings of “partly deconstructed purines” to it, and deconstructed ribose in the case of tryptophan.

On the whole there is a feeling of there being an aspartate accumulation after early purine evolution (if the pathways can be read like that). Maybe aspartate was the first nitrogen delivery molecule. That it’s a part of the purine pathway, used for so many cofactors and amino acids, and delivers purine nitrogen is significant.

Purines and Pyrimidines, General Patterns.

To simplify discussion of some of this I drew a separate purine and pyrimidine pathway that just focuses of the overall patterns.

A simple alignment of the purine (bottom 3/4) and pyrimidine (top 1/4) pathways with gained and lost molecules included.

The purine pathway that makes Adenosine monophosphate and Guanosine monophosphate is unique in that it is constructed from smaller parts relative to all of the other pathways presented, outside of the ribose-5P. The biggest thing is glycine, the smallest amino acid, in the formation of the smaller 5-membered ring with ammonia and formate.

Aspartate, the amino acid used in making pyrimidines, temporarily joins in making the second ring. It attaches by it’s nitrogen and leaves as fumarate (tricarboxylic acid cycle). Bicarbonate (or carbon dioxide in later organisms) is the only other thing in the pathway left to mention. And the way it’s attached to the former glycine makes a 3-carbon segment. It’s a 2C-3C transition. Maybe it means something. Carbon fixation would be different things at different metabolic levels.

The pyrimidines have the opposite relationship to ribose-P as the purines. The ring is made first and then it is mounted on the ribose via PRPP, the form of ribose used to make nucleotides (and niacin, tryptophan, folate…). A bicarbonate and aspartate are combined and made into a 6-membered ring like the second ring of the purines. Getting aspartate and bicarbonate together in the figure makes sense when it comes to positioning the pathways together.

Both pathways make an intermediate that isn’t used for anything else. Inosine monophosphate and orotidine monophosphate. After that it’s smaller changes to get each of the final nucleotides.

The pyrimidines also lose parts of structure. Niacin removes 2 hydrogens and their electrons making a double bond. After orotate binds ribose it loses a carbon dioxide (decarboxylation). And the only place DeoxyriboNucleic Acid shows up is deoxythimidine, a loss of a hydroxal. That’s done with purines too but I didn’t need to show deoxyribonucleic acid until dT.

Maybe there’s a progression here with 5 ring ancestors and a 6 ring newcomer. Early base pairing between them. I certainly think accumulation of aspartate is a theme in both the evolution of pyrimidines and much more. In fact I think modern genomes and ribosomes are descended from something like the 5-membered thiazole in thiamine on a ribose maybe. With chemistry and base pairing between strands.

The purine pathway splits to thiamine biosynthesis at AIR, but the molecule is turned into the 6-membered ring of HMP. Not the 5-membered ring that moves carbon in making ribose (pentose phosphate pathway), the part that assists in the chemistry (more on that at the end)

Purines

I think there is a level where one can tentatively delete whole parts of proteins when thinking about precellular life in the vent. Formate, methane, bicarbonate, acetate, ammonia, phosphate were just tentatively available. There is sufficient literature support for these molecules in the vent from my point of view. As well as acetate, pyruvate, and fatty acids that make membranes.

Extreme accumulation of ammonia on electroreduced mackinawite: An abiotic ammonia storage mechanism in early ocean hydrothermal systems

Generation of long-chain fatty acids by hydrogen-driven bicarbonate reduction in ancient alkaline hydrothermal vents

Bio-inspired CO2 conversion by iron sulfide catalysts under sustainable conditions

Phosphate availability and implications for life on ocean worlds

I’ll only mention the most interesting thing I have found in the protein subdomains in this post.

Focusing on the purine pathway there is another maybe link to the origin of life. 3 polymers are important in life and origins: polynucleotides (DNA RNA), polypeptides (proteins), and fatty acids (~16-20 carbons, membranes and cofactors like biotin). Not only are nucleotides being made but step 2 of the purine pathway has a product, phospho-ribosyl-glycin-amide, that looks like a peptide (protein) bond on ribose between the glycine and the ribose-amide.

Top: purine intermediate PRA and an aspartate-glycine dipeptide. Middle-Bottom: the reaction PurD carries out and the glycine phosphate intermediate.

Maybe it doesn’t mean anything but get enough interesting things together and there is more room for thought.

The protein that does this step, PurD, is an example of something that I can cut a lot of things off of, just for fun. PurD is a member of the ATPgrasp family of proteins. These proteins bind an ATP, and pop a single phosphate off making an ADP.

When a protein uses ATP as “energy currency” in this way it either participates in a chemical reaction directly or it drives physical movement of a protein to do something else including chemistry. The metaphor is about doing work. Here the phosphate directly participates in attaching glycine to the nitrogen added in step 1 with a glycine-P intermediate.
The ATP-Grasp Enzymes

Structural biology of the purine biosynthetic pathway

Here is where the chopping out can happen. In the vent there is phosphate. The ATPgrasp family is essentially a bunch of phosphate dispensers and just for fun can be deleted in terms of what is more or less interesting here. That being said phosphate control seems to have evolved first/early
Structural Phylogenomics Reveals Gradual Evolutionary Replacement of Abiotic Chemistries by Protein Enzymes in Purine Metabolism

And the ATPgrasp family is related to the protein that puts aspartate on CAIR in the  purine pathway which is interesting.

That’s most of the protein deleted, except for the C2 domain. It’s an interesting little piece and it’s relatives are involved in some very origins related chemistry. At the Evolutionary Classification of Protein Domains database this domain is in the “Alpha helix + beta sheet complex topology” bin. And in there it’s in the “alpha/beta-Hammerhead Barrel-sandwich hybrid” X, H and T (homology, topology) groups.

That whole T group is filled with interesting stuff but the PurD C-terminal fragment is in the “CO dehydrogenase molybdoprotein N-domain-like” F or family group. It’s relatives include not only the above CODH MO-N fragment (1C+1C) but is also in biotin carboxylase (1C manipulation), acetyl-CoA carboxylase (1C+2C, on the drawing), and pyruvate carboxylase (1C+3C).

No where else do I see a numerical progression like that. Where did aspartate come from for pyrimidines? Today cells can use pyruvate carboxylase to make oxaloacetate in initiating gluconeogenesis (glucose synthesis), which is also made into aspartate. Interesting.

This “CODH MO-N fragment” is also in 3 other purine biosynthesis genes, PurT, PurK, and PurP. They involve formate, and bicarbonate. Carboxylations use bicarbonate. The fragment is also in L Amino acid Ligase (LAL). And it is present in a few proteins without the ATPgrasp part: xanthine dehydrogenase chain B, Carbon monoxide dehydrogenase large chain (so  2 parts of CODH), aldehyde oxioreductase, quinoline 2-oxidoreductase large subunit, and 4-hydroxybenzoyl-coa reductase large subunit. I don’t know what to think about the things in this paragraph yet but I mention them for completeness.

The last thought on this fragment is that the general shape of the carboxy HO-C=O, or even the carbonyl C=O itself is maybe what it is about. I need to look at protein structures.

Ribose

I’ll finish with some thoughts on ribose. I’ve been hoping the origin of ribose comes out of this. There are different variables to consider in the abstract. The polynucleotides are the most complex part of this. Only counting genomes ribose:
1) Has 5 carbons with hydroxals on 4 and an aldehyde on one end. Each oriented the same (chirality).
2) Polymerizes using phosphate on the carbon on the end opposite the aldehyde (#5) and 2 carbons over (#3). So a 3-carbon segment for the backbone.
3) Provides a site for nitrogenous base binding at the aldehyde carbon (#1). This also provides a site for base pairing between strands.

Each of those variables is modifiable hypothetically.

Maybe there were things with erythrose-4P or shorter in the backbone of the polymer. This still allows for base pairing. With glyceraldehyde-3P maybe you still get the same but no ring.

But it’s the predictability of the ring that evolution can respond to. There is a huge diversity in things that bind ribose or things with it. Erythrose is only used in a few other places outside of the pentose phosphate pathway and the pathway that makes part of folate, tryptophan, phenylalanine, and tyrosine. And a 2-carbon segment in the backbone might be too close for 2 phosphates, steric hindrance (physically interfering).

If you get larger than ribose then you worsen the problem of self splicing, poly-RNAs of a certain size looping out and removing part of themselves and sealing the break. Removal of the #2 carbon reactive hydroxal that is removed in Deoxy-ribonucleotide genomes is bad enough, 2 would allow for more chemical possibilities than the relative stability of 1.

Getting outside of genomes carbons #2 and #3 bind amino acids in translation with class 1 and 2 aminoacyltransferases. Maybe there’s evidence of an erythrose backbone and only carbon #2 binding amino acids in the past there, but probably not.
Carbon #3 also binds phosphate in coenzyme-A and niacin in its anabolic mode (the NAD/NADP difference).

Carbon #1 binds lots of cofactors, mostly through at least 1 phosphate as the drawing shows. It also binds lots of other things for transport like sugars, membrane components, and more. Carbon #1 also binds all amino acids during the process of aminoacyl transfer in creating transfer RNA loaded with an amino acid, again on one phosphate, aminoacyl-AMP.

These carbons are also used in making signalling versions of nucleotides like cyclic AMP and cyclic GMP. The signalling functions of nucleotides is an area I’ve yet to get into in depth and integrate into this.

Finally the metabolism of ribose on the figure may provide clues to its origin. In the cofactor pathways ribose is often partially dismantled with pieces leaving, such that 4 or fewer carbons from ribose remain in the final molecule. At the least it makes me think about the assembly of ribose and its own abiotic concentration.

3C+2C?
4C+1C?
6C-1C?
3C+3C with loss of OCO?

I think the first and maybe last follows the most likely routes. 3 comes after 2 and if 2 is already around… It’s maybe thiamine related.

Common ancestry between the purine and thiamine pathways is possible and likely given the use of AIR in making HMP. Attaching glyceraldehyde and glycoaldehyde could be a route. Now how would acetate and pyruvate become sugars? Something between oxaloacetate, gluconeogenesis and the pentose phosphate pathway maybe.