In this post I want to start expanding on the kind of analysis I did in a previous post with the protein that adds glycine in purine biosynthesis, PurD. I looked at what proteins are related to each of the subdomains and noted the kinds of chemistry and molecules involved, and did some fun speculation. A family of phosphate dispensers and very small molecules with HO-C=O as part of their structure.
And these are still puzzle pieces so it’ll be like a list in places where there might be significance, but I don’t know much about them. First some words about the kind of data on the site I primarily use to look at the relatives of parts of proteins.
Evolutionary Classification Of protein Domains and bin importance: AXHTF.
ECOD and similar archives of data relating to the relatedness of nucleotides or proteins have hierarchies of bins to group by relatedness. The largest bin at ECOD is the Architecture or “A” bin which groups by the most general arrangements of protein 3D structure, the Alpha Helix or the Beta Sheet, and how those relate.
So you get things like: Alpha/Beta 3 layer sandwiches defined as “repeating beta-alpha units form a sandwich with a mainly parallel beta-sheet layer stacked between two alpha-helix layers” or Alpha+Beta 2 layers. These outermost levels can be all related but they can just as easily be independently arrived at shapes. The same is true of the “X” group which is “possible homologs” where homologs are related by sequence. The same problem exists in A and X but things might be related in X.
Below the X group we get the “H” or homology group where things are explicitly related based on sequence. Within H there is “T” or topology which means they have the same sequence but there are differences in structure. Finally “F” or family are protein subdomains most closely related to one another.
HTF are where I look the most.
Purine biosynthesis, Adenosine and Guanosine.
I see 5 stages and 13 steps in purine biosynthesis.
- Formation of the first ring (5 steps).
- Carbon addition and movement (2 steps).
- Aspartate addition and fumerate removal (2 steps).
- Formation of the second ring (2 steps).
- Adenosine or Guanosine monophosphate formation (2 steps, separate paths).
I will be limiting myself in this post to most of Stage 1: Ribose phosphate, glycine, formate and ammonia join up. This post got longer than I thought it would for all of stage 1 but building up ring closing as a specific thing could be useful.
Step1: PurF
This protein exchanges an ammonia for a diphosphate from Phospho-ribose-diphosphate (PRPP), making phospho-ribosyl-amine. This pattern of using phosphate to swap for other things occurs over and over. PRPP was made from ribose-5-phosphate with ATP by the gene Prs, which is related to the second domain in this protein. Cells have a PRPP pool because as the drawing shows, PRPP has multiple uses.
Phosphoribosyl Diphosphate (PRPP): Biosynthesis, Enzymology, Utilization, and Metabolic Significance. Hove-Jensen 2016
Also known as amidophosphoribosyltransferase (you see why I prefer short protein names? I usually like descriptive names), PurF has 2 subdomains:
- An Alpha+Beta 4 layer (A+B4L)/NTN hydrolase domain at the end with the nitrogen (N-terminal, defined as the beginning of the sequence)
- An Alpha/Beta 3 Layer Sandwich (A/B3LS)/PRTase-like domain at the carboxy or C-terminal end.
The “amido” refers to the fact that the ammonia comes from the amino acid glutamine, which becomes the amino acid glutamate after removing the nitrogen in the first step of PurF activity carried out by the NTN-hydrolase domain, making phospho-ribosyl-amine (PRA).
The A+B4L subdomain is in the N-terminal nucleophile aminohydrolase (NTN hydrolase) homology group, and class 2 glutamine amidotransferase topology and family groups (GATase2). In terms of primordial chemistry, the chemistry of an end of the protein chain is a significant thing. This N-terminal nitrogen protein superfamily has an affinity for nuclei (nucleophile) and breaking bonds with water (hydrolase). Without the need for ATP this protein domain removes the ammonia and that reactive molecule floats over to PRPP which is reactive towards ammonia.
The AB3LS domain at the C-terminal end is in a transferase subgroup of these proteins, PRTases, where Prs is a synthase. This subdomain essentially just holds the ribose-5P end. In fact at this point PRTases are just ribose phosphate holders to me. This particular family has one other synthase for OMP, and a bunch for purine and pyrimidine bases in general. The interesting part for PRTases is that the ones for niacin, histidine, and tryptophan are in different protein subdomain families on ECOD.
The rest of the entry for PurF will be on the A+B4L subdomain and what’s interesting there. In the sibling homology group to the NTN aminohydrolases there are protein serine/threonine phosphatase 2C members, specifically the part of the protein that does the removal, the catalytic domain. Phosphatases are proteins that remove phosphate from things, here serine or threonine. With all of the phosphate around a related family of phosphate removers is interesting. Looking at them more closely is a plan.
Inside the NTN hydrolase homology group there are 6 related topology groups, one of which is the GATase 2 group. Outside of GATases there are some interesting things. “Proteasome subunits” contains parts of the cellular trash can, the proteasome, and individual peptidases which are things that cut peptide bonds in proteins. This group also includes PurO which is in stage 4 of purine biosynthesis. There are some antibiotic biosynthesis enzymes for penicillin and carbapenam, and things that break C-N bonds other than peptide bonds. I’ll probably ignore antibiotics going forward, there are lots of them and they are unlikely to be usefully origins related.
Glycosyl-asparaginase removes aspartate from glucose-asparagine leaving glucose-amine (glycoprotein degradation). Gamma-glutamyl-transpepidases move glutamate to things and from things.
Inside of the GATase2 group is Asparagine synthase makes that amino acid from aspartate (via glutamate), and the protein that makes glutamate from 2-oxogluterate. There is a protein that makes glucosamine from glutamate and glucose-6P.
That’s it for PurF for now.
Step 2: PurD
I first introduced this protein with the big drawing.
This protein has 2 steps in its reaction. First the phosphate dispenser part releases a phosphate from ATP which binds the carboxy on glycine making glycine-P. This is reacted with PRA and the phosphate is swapped for the nitrogen on the ribose like in PurF, making glycinamide-ribonucleotide (GAR). As I mentioned previously this is the closest molecule to both nucleotides and polypeptides as it has peptide bond features.
This protein contains 3 subdomains in order:
- An A/B3LS/Rossman-related preATPgrasp domain (thought to be substrate specificity)
- An Alpha/Beta Complex Topology (A/BC.T.) ATPgrasp domain (the phosphate dispenser domain)
- An A/BC.T./Alpha Beta hammerhead barrel hybrid sandwich domain related to the carbon monoxide dehydrogenase molybdenum protein (CODH MO-N-LIKE) and biotin carboxylases
There are patterns to the phosphate dispenser with and without the substrate specificity part, and patterns to CODH MO-N with the rest and by itself.
First of all there are 3 more ATPgrasp proteins in purine biosynthesis, and the ATPgrasp domain by itself is related to purine biosynthesis protein PurC, SAICAR synthase, and a bunch of protein kinases (put phosphate on serin, threonine or tyrosine). Both of those are sibling homology groups with a bunch of protein kinases, things that put phosphate on serine, threonine, or tyrosine in a polypeptide chain, usually to change the function of a protein somehow.
When all 3 subdomains are together you have PurD, PurT, PurK (Pur=purine biosynthesis), L-amino acid ligase (attaches an L-amino acid to things), and a set of things including biotin, acetyl-CoA, and pyruvate carboxylases.
PurP only has the preATPgrasp and ATPgrasp domains. Also pyrimidine pathway CarB.
When the phosphate dispenser is alone, no preATPgrasp, you have a collection of interesting things. RNA and DNA Ligase (protein that attaches things). Ligases for glutamate (GshA, from the ancestral fragment post) or tyrosine and tubulin. Citrate lyase, succinyl-CoA synthase, both related to the TCA cycle. PEP (phosphoenolpyruvate) synthetase, a protein involved in the first step of gluconeogenesis (making glucose instead of breaking it down for energy) in some prokaryotes.
When the CODH MO-N-LIKE bit is by itself you have purine breakdown protein xanthine dehydrogenase chain B, which adds an oxygen. There’s the CODH MO-N itself. There is aldehyde oxioreductase.
This CODH MO-N-LIKE fragment gets more interesting when you zoom out one level. This is the longest part of this post.
First we have MoaE, molybdopterin synthase. That’s on the drawing! And there’s another on the drawing with the PRTase that attaches ribose-5P to the ring in making niacin. There are a couple of proteins from the large ribosomal subunit too, L10 and L27. The L27 topology bin also has parts of the “exosome complex” which degrades RNA, another trash can.
The “duplicated hybrid motif” bin contains 3 interesting things. The first is part of a system that uses the phosphate from PEP and passes it along several other proteins to a membrane protein (the PTSD E2A fragment here) that phosphorylates a cargo, usually a sugar, during import. The second is a family of peptidases including one that cuts between glycines (LytM). The third is the RNA polymerase beta prime subunit.
Finally there is the “single hybrid motif” which has a lot of things. First there is the RNA polymerase beta subunit (as opposed to beta prime). Second the protein domain that binds both cofactors lipoate and biotin are related to each other and are in here. Those are newer cofactors so I haven’t done much with them. Third is the glycine cleavage system H protein, this is very significant as this is where cells get glycine and serine through interconversion. Fourth AstE_AspA are Succinylglutamate desuccinylase/aspartoacylase. The first is an arginine degradation protein, and the second is too but seems limited to eukaryotes (or isn’t studied in prokaryotes due to the human disease emphasis).
New paragraph due to theme difference. The fifth thing in the “single hybrid motif” is the V-type ATP synthase alpha chain. The sixth is RnfC, a protein involved in anaerobic electron transport with niacin and pumping protons or sodium ions. Relatedly the seventh is cytochrome F, something that passes electrons in electron transport chains. NqrA is similarity involved niacin and electron/sodium transport.
I’ll finish up this with the eighth and ninth, NusG and Rrp15p. They are both involved with ribosomes. NusG is a major transcription (copying DNA) termination factor, and Rrp15p is required for cells to make the large ribosomal subunit.
Step 3: PurT and PurU or PurN
In this step a formate is removed from folate and attached to the nitrogen at the end of the glycine that was just attached, to make formyl-glycin-amide-ribonucleotide, FGAR.
This is something I will have to fix on the big drawing but life has more than one protein for the third step of purine biosynthesis. I have PurT and PurU, but not PurN. We’ve technically seen the relations of step 3 PurT since it is also an ATPgrasp protein (all 3 domains).
PurU is the protein that removes the formate and passes it to PurT where the phosphate from ATP is used to make a reactive formyl-P that similarity to the previous steps is used to attach a carbon to a nitrogen.
This is where things get easier for PurN because PurU is related to PurN so I can cover them together, though PurU has more pieces, suggesting the most important part is the one they share.
PurU has 2 domains:
- An Alpha/Beta 3 layer sandwich/formyltransferasesdomain (A/B3LS).
- An Alpha +Beta 2 layer(A+B2L)/Alpha-Beta Plaits domain.
The first domain that is related to PurN in its entirety. The homology and topology bins are formyltransferases. The most interesting ones are methionyl-tRNA Fmet formyl transferase (adds formate to the initiator methionine in translation) and 10-formyltetrahydrofolate dehydrogenase (removes formate from folate as CO2 while adding a proton to niacin).
The second domain in PurU is related to a lot of things but I’m thinking I can ignore those given PurN.
That’s it for now. It would have easily been over twice as long if I included PurL and PurM. The Alpha+Beta 2 Layers category has an “Alpha-Beta Plaits” with lots of things. It’ll be practice at sorting out general biochemical themes.