Science and opinions at the LC
What is a protein? Proteins are the doers of the cell. They move muscle fibrils so we can stand up, they shuttle oxygen from our lungs to the rest of the body when we breathe, and they detect light by subtly changing shape so we can see. There are twenty thousand different types of proteins and fifty million total proteins per human cell, all working together.
A protein is composed of amino acids, like lysine, tryptophan, phenylalanine, methionine, threonine, valine, isoleucine, and leucine. There are 20 natural amino acids. String a specific combination of them together, watch it fold in onto itself, and you get a protein. The sequence of amino acids determines the final shape of a protein. The folded shape of that string of amino acids determines what job the protein will do.
Now would be a good time to remind ourselves that the proteins discussed here are the nanoscopic molecular machines that en masse can be the nutritious sort that we may eat in the form of fish or soybeans or chicken. One protein is about one ten billion-billionth the mass of a soybean.
The shape and motions of a protein gives it its function. Thus understanding the relationship, the physics and chemistry, that connects a protein’s sequence of amino acids and its folded shape can help us understand and explain protein function. Once function is understood, then malfunction, and the diseases that arise from that, can be explained and potentially fixed.
Our genome encodes the combination of amino acids for each protein in our body. With genome sequencing becoming ever more accurate and cheap, being able to predict the function of a protein from our genome will lead to a paradigm-shifting advance in medicine.
What is protein-folding? Over the past 50 years, scientists from diverse backgrounds—physics, mathematics, biology, chemistry, and computer science—have sought to understand how and why the sequence of amino acids (primary sequence) in a protein rules its shape.
In the 1960s, Christian Anfinsen’s experiments on ribonuclease A showed that the final intricately folded state of a protein is “encoded” in its primary sequence. This was an awesome result (Anfinsen won the Nobel Prize in Chemistry in 1972). His experiment not only showed that the primary sequence encodes the folded state of a protein and thus its function, but it raised the protein-folding problem. How do proteins spontaneously sample their way to the folded state as opposed to sampling all possible paths? Randomly, proteins would take the age of the universe to fold.
Not all proteins fold completely by themselves. Some proteins have chaperones that help drive them to the folded state, or free them from misfolded states. But the majority of proteins fold all by themselves.
What then drives a protein to fold? An intuitive reason why proteins fold is because some amino acids in the protein chain don’t like water – they’re hydrophobic (Greek, in fear of water). Why are they hydrophobic? It’s all about water’s free energy - a water molecule’s freedom to effortlessly explore the solution (entropy) is diminished when hydrophobic amino acid side chains are exposed because water can’t hydrogen bond (enthalpy) with a hydrophobe as it can with other water molecules. Water minimizes this big penalty by squeezing hydrophobic amino acids away while pulling polar amino acids outward, a big driver of protein folding.
With a finer view, it’s not just about hydrophobicity. The actual atomic structure of proteins and how the atoms are bonded together is also important. Amino acid sidechains, which are attached to the alpha carbon, can rotate around the rigid amide links of the amino acid chain. When rotated to one side of chain or the other, the amide hydrogen in one link can hydrogen bond with the carbonyl oxygen of another link, forming helical or sheet, secondary structural elements.
Different side chains provide a menu of physical properties that help determine the final shape, motions, and kinds of chemical interactions or reactions the protein can participate in. In larger proteins with multiple secondary structural elements, the side chains of these elements interact, driving the formation of tertiary structure. The overall structure of ribonuclease A in the figure at the top of this blog is a tertiary structure: the final folded, native state of ribonuclease A.
But this explanation of protein-folding is unsatisfactory because I haven’t told you the sequence of events in folding ribonuclease A. And that is because, with advances in computer science, physics, and mathematics enabling atomistic protein-folding simulations, some now believe that the exact steps taken in folding one ribonuclease molecule may not be the same as another molecule with the same sequence. Many forces contribute to folding and those described here are just a sample.
The protein-folding problem has grown from blue-skies research into a full-fledged field with profound importance for medicine. But where do we go from here? What paths with researchers take to understand the paths proteins take?
From chemistry, we need small molecule spectroscopic probes that minimally perturb protein energy landscapes in vivo (bioorthogonal); we need new technologies to visualize improbable events corresponding to transition states of protein folding and unfolding.
From physics, we need new NMR methods that have even higher resolution in space and time to monitor transitions; we need new models of protein-substrate interactions like drug binding or DNA binding; we need tighter coupling of models to experimental observables or new models to explain old data; we need ways to “watch” a drug actually binding to a protein while measuring the affinity, in a single approach.
From computer science, we need new data array creation, storage, and manipulation algorithms that can deal with massive data (e.g. TB arrays); we need to leverage CPUs, GPUs and MICs so the benefits of each supplant the weaknesses of the other, in parallel.
From mathematics, we need to understand why some functions like parabolas are so seemingly fundamental in physics (e.g. why do harmonic oscillators model so many things so well?); can we rewrite molecular Hamiltonians to contain electronic phenomena?
Blue skies research? Yes.
Figure made using Protein Data Bank file 5RSA (Comparison of two independently refined models of Ribonuclease A, Wlodawer, A., et al., Acta Crystallogr., Sect. B, 42: 379, 1986).