Lessons to be learnt in how to maximise your protein expression systems.
Getting the most out of your DNA!
Some people seem to be able to build muscle with relative ease, but for most it is a painful experience of lifting weights and protein shakes, and even then, the determining factor is probably genetics.
Making proteins in mammalian expression systems can also be a painful experience, sometimes the proteins just don’t express, and even when they do, the yields are often lower than you would like for bioproduction purposes. But where do you go next when faced with this problem?
Part of the challenge we often find is that no distinction is made between a standard cloning plasmid, and a plasmid used for large scale protein manufacture.
The two tasks (basic research vs bio-production) are very different, but often plasmids that were simply made to get an experiment done quickly by inserting a gene in the most ‘convenient way’, rather than the ‘optimal way’, frequently make their way to the bioproduction team, leaving them with major headaches, and significant expression challenges. In essence the genetics are simply poor.
A mission to build something universally powerful
We decided that instead of creating just ‘another expression vector’ we would develop a bioproduction protein expression vector, and that it would have to have the following key features:
- The ability to easily change the vector to suit the bio-production needs i.e. a modular design
- Reduced protein to protein variability, so if you express 10 proteins the question becomes ‘how much did I get?’ not ‘did it express?’
- Protein yields that exceed the standard available plasmids in both HEK-293 and CHO systems
- A defined loci for gene insertion from the start codon to stop codon, every time, thereby allowing optimal sequences outside of the gene to used be consistently to reduce variability e.g. untranslated regions and the Kozak site
- Minimal size to improve transfection
The approach above excludes the gene itself, that’s not to say that the gene isn’t the most important part, but we had already been making algorithms for gene optimisation for a number of years, and we were pretty happy with how things had gone. We will discuss this further in a later article.
One thing we really wanted to avoid was making a plasmid that’s great for expressing luciferase or a GFP reporter gene, but not great for antibodies, GPCRs and cytokines! Increasing luciferase expression seemed to be like shooting fish in a barrel, but we had to start somewhere, so we initially used reporter genes to give us an insight into which DNA sequences showed some value for further testing.
We screened thousands of DNA sequences; including recombinant promoter libraries (5000 variants! See figure 1), untranslated regions, poly-adenylation sequences, UCOEs, MARs, episomal systems, Kozak libraries, stop codon libraries (yep, we even tested if stop codons matter….), and we also screened thousands of genes from other organisms to find out if they could modulate cell biology to improve protein yields.
And we managed to find a number of important things out, some relating to specific DNA features, where they would work amazingly for one protein but not another, or only in certain cell types.
Some features we found provided a significant benefit, and the positioning was not really that important, for others it was. We also found some strange cell line dependencies, for example in one cell line we tested, the stop codon made no difference at all, but in another cell line a library of 30 stop codon variants showed the same stop codon repeated, once, twice or three times was always the best!
Needless to say we now use three copies of that stop codon in our bio-production designs.
So just how universal is it?
The problem was simple, whenever we showed data of our improvements to our potential customers they would say ‘but I don’t work on antibodies?’ or ‘we don’t use GPCRs’, so how do you prove that the vector you’ve made is universally beneficial for protein expression?
Not an easy one to solve! However, we have some great bioinformatics guys here at Oxford Genetics, so we asked them if they could figure out what the most popular human genes in scientific study were, and rank them by citations over time (PubMed hits, 2000-2016), then give us the top 150 for synthesis and expression testing.
The first problem? Stupid human gene names! Who calls a gene ‘LARGE’ just because it’s a very long gene?! Once we got rid of the ‘MAX’ and ‘CAT’ artefacts, we were left with a group of genes from diverse parts of the human cell (secreted, membrane bound, cytosolic, nuclear, mitochondria embedded) for expression testing.
We batch optimized and synthesized them all in one go, and cloned the genes into our new expression system, which we had now christened ‘SnapFast Pro V1’. For comparison purposes, we also cloned all of the genes into the commonly used expression vector pCDNA3.1.
Each plasmid pair (SnapFast Pro vs pCDNA3.1) expressing each gene was then transfected into HEK-293 and CHO cells and subjected to automated western blotting. Results showed that SnapFast Pro consistently out-performed pCDNA3.1 by a significant margin (see figure 3), in some cases no protein at all could be detected from pCDNA3.1.
As such, we are now confident the ability of SnapFast Pro V1 to consistently improve protein expression, and also in our ability to design modular DNA systems that reduce the nightmare vector and/or gene expression variability, so often the bottleneck in early stage pre-clinical development or bioproduction.
SnapFast Pro V1 is the next generation of protein bioproduction plasmids, allowing maximal yields in a versatile plasmid system into which thousands of other DNA components can be easily inserted with minimal steps using a modular approach.
The vector system is now available for commercial usage under license, and our experienced DNA designer are here to help with your project, please contact our team for any questions.