Control Theory in Synthetic Gene Circuits: A Review

Sarrah Rose
students x students
16 min readJun 25, 2021

--

Photo by National Cancer Institute on Unsplash

Gene circuits are surprisingly similar to the electronic circuits we see in our phones, computers and other electronic devices.

They’re essentially regulatory programmes on a cellular scale — that take in specific molecular inputs, funnel that information through intermediate processing steps and measure its output. Presently, drawing on the fact that our body has mastered these intricate biomolecular networks, researchers now seek to replicate these properties through synthetic gene circuits.

Why? These genetic circuits allow us to access specific genes, turning different “knobs” to control gene expression within the host organism. Through these circuits, researchers possess more precise control over gene expression in a temporal & context dependent manner, allowing them to determine whether gene expression should be initiated, interrupted or terminated at specific levels.

This level of control is incredible, especially when you consider the MAJOR effects it could have in revolutionizing healthcare (e.g. biosensors to detect chemical signatures of cancer cells, reprogramming bacteria as vehicles for drug delivery) and the environment (e.g. bioremediation of pollutants, programming microbes to convert feedstock into biofuels).

1. Problems in Genetic Circuits

As amazing as these potential applications are, we currently face major problems in scaling up these circuits to reach desired levels of complexity:

1.1 Noise

Far from being the auditory nuisance we experience in daily life, noise in the context of biological systems refers to the heterogeneity of cell composition & action, which leads to inconsistent gene expression. Intrinsic noise refers to the uncertainties of biomolecular processes within a given cell — the likelihood of two molecules bonding is incredibly probabilistic, causing variabilities in cellular interactions. Extrinsic noise, meanwhile refers to variations in identically-regulated quantities between two different cells (e.g. gene expression, gene copy numbers, etc).

Intuitively, cellular noise limits our ability to precisely control gene expression. This problem is often compounded by noise propagation in downstream signal cascades, deteriorating circuit performance and occasionally leading to complete circuit failure.

1.2 Retroactivity

Retroactivity refers to how a downstream genetic element can retroactively affect upstream ones, by changing circuit behaviour due to interactions from downstream modules. For instance, signalling molecules generated by an upstream circuit may later be involved in chemical interactions downstream. As a result, the signalling molecule becomes temporarily unavailable to interact with the upstream circuit, changing its dynamics and creating a “disturbance” signal.

1.3 Resource Competition

Resource competition occurs much like how we observe it in the wild. There exists a limited set of resources, such as transcriptional & translational machinery (i.e. RNA polymerases & ribosomes, amino acids, etc.) competed for by these circuits. As the synthetic circuit size increases, it can cause an overloading of resources in the host, resulting in cellular toxicity or a delay/reduction in circuit activity.

An important principle here is that resource loading is often at odds with the concept of “modularity” oft referenced as a key principle of Synthetic Biology. This is because their mere presence affects the input-output (I/O) properties of the host circuit, due to contextual differences.

A prime example is the use of synthetic gene circuits to overexpress σ factors, proteins involved in the initiation of transcription in bacteria. As the circuit size increases, they overload the cell by occupying the entire pool of free core RNAP, competing to bind to the core which can couple their activity and potentially disrupting host processes.

3. Basic Equations Governing Transcriptional Networks

3.1 Transcriptional Networks

For the purpose of this review, we’ll focus on transcriptional networks — networks which demonstrate the interactions between transcription factors and genes.

Cells are bombarded with all kinds of information from the environment and its internal activities — transcription factors help to summarise & represent these conditions in a manner that’s comprehensible to the cell. Each active transcription factor is able to bind to DNA to regulate the rate at which specific gene targets are transcribed into mRNA, which is then translated into proteins, which can then act on the environment.

Transcription is the process by which RNA polymerase produces mRNA which corresponds to the gene’s coding sequence. Promoters, regulatory sequences which precede the gene, control the rate of transcription, i.e. the number of mRNA produced per unit time. Transcription factors then bind to specific regions of the promoter sequence, which in turn alters the probability that RNA polymerase will bind to the promoter, and hence affecting the rate at which RNA polymerase initiates transcription of the gene.

A key feature of transcription networks is the separation of timescales. On the smallest end of the scale, input factors can change transcription factor activities on a sub-second timescale. Transcription factors bind to DNA within seconds. The key processes of transcription and translation take minutes, while the accumulation of the protein product can take from minutes to hours. A consideration for these varying timescales is incredibly important when coordinating the responses between the different components of a gene circuit.

3.2 Hill Function

Positive control occurs when the transcription factor increases the rate of transcription when it binds to the promoter (activator). Comparatively, negative control is when the transcription factor suppresses the rate of transcription upon binding to the promoter (repressor).

The extent of this effect on the transcription rate is determined by the input function. If X regulates Y, the quantity of protein Y produced per unit time if a function of the concentration of X in its active form, X*:

Rate of production of Y = f(X*)

The most commonly used function, one we’ll likely be seeing throughout this paper is called the Hill-function:

In the case of an activator, i.e. transcription factor binds to promoter increasing its output, the Hill input function is seen as an S-shaped curve, that rises from zero and approaches a maximal saturated level.

K is called the activation coefficient, determining the concentration of active X needed to activate expression (similar to how activation energy works!). K’s value is hardwired, dependent on the chemical affinity between X and the promoter as well as promoter strength.

The second parameter in this function is β, the maximal expression level of the promoter. It makes sense that mathematically, it’s reached at higher concentrations as X* has a higher probability of binding to the promoter, simultaneously stimulating RNA polymerase to produce many mRNA transcripts per unit time.

Finally, the Hill coefficient, n, decides how steep the curve will be. As seen from the diagram above, a larger n leads to a more step-wise function. It should be noted that the Hill Function doesn’t increase indefinitely, instead approaching a limiting value at high concentrations of X. This occurs because the probability that the activator binds to the promoter cannot exceed 1, regardless of high high X* is.

Alternatively, since the repressor allows the strong transcription of a gene only when it is not bound to the promoter, the input function of the repressor is derived considering the probability that the promoter is unbound by X.

Logically, the maximal production rate, β, is then achieved at the point at which the repressor does not bind the promoter — when X* = 0. K represents the gene’s expression coefficient, again tied to promoter strength & binding affinity. Finally, n, the hill coefficient determines the steepness of the input function.

3.3 Dynamic Gene Regulation Equations

The production of protein Y is balanced by 2 mechanisms: protein degradation (its specific destruction by a specialised protein, the proteasome, in the cell) and dilution (a decrease in concentration of the protein as the cell volume continually grows).

The total rate of reduction can be modelled as follows, where αdeg represents the rate of protein degradation while αdil represents the rate of protein dilution.

αT = αdil + αdeg

The change in concentration of Y is due to the difference between its production and degradation/dilution:

dy/dx = β — αTY

4. How are we dealing with these problems? Control Theory.

Control theory is derived from the field of engineering; strategies designed at improving the stability, robustness & performance of physical systems. Crucially, a feedback control system ultimately seeks to regulate the behaviour of a given dynamical system. We’re able to achieve regulation through (i) the manipulation of an actuated variable, (ii) in accordance with the goal of enabling the system output to follow a desired behaviour.

4.1 Early Inspirations of Feedback Mechanisms

Fundamentally, a big part of control theory relies on feedback mechanisms in a closed-loop system. Tying this into the nascent stages of Synthetic Biology nearly 20 years ago, scientists implemented: the toggle switch & the repressilator.

4.1.1 Toggle Switch

The toggle switch uses two mutually repressing genes, forming a positive feedback loop — increasing the production of a specific protein. This in turn leads to a bistable system where it can switch between 2 possible states under certain conditions (e.g. the introduction of an inducer molecule which interferes with the repressor protein)

4.1.2 Repressilator

The repressilators are 3 mutually repressing transcription factors in a loop, forming a negative feedback system. The output of each transcription factor (e.g. “on”/ “off” ) would determine the states of the next 2 factors, turning each gene on & off at repeated time intervals. Due to the time lag, when a reporter gene such as GfP is expressed, it forms an oscillatory network.

Using x1, x2 & x3 to represent the concentrations of the three proteins, we can model the system with the following equations. a represents the maximal protein production rate constant and n is the cooperativity of the protein.

4.2 Negative Autoregulation

Autoregulation refers to the regulation of a gene by its own gene product. More specifically, negative autoregulation occurs when transcription factor X represses its own transcription. This often looks like X binding to its own promoter to inhibit the production of mRNA. Consequently, as the concentration of X increases, the lower its production rate.

Research has shown that negative autoregulation can lead to less cell-cell variability (intrinsic noise) and in attenuating extrinsic noise.

Other studies have also found that negative autoregulation allows for greater robustness to uncertain parameters. Robustness is a principle derived from the field of engineering, representing an essential function that is independent of the biochemical parameters that tend to vary from cell to cell.

4.3 Integral Controllers

An integral controller is able to eliminate the steady-state error that occurs, using a proportional controller, in order to drive a system towards a constant set point. In this case, the steady-state error refers to the difference between the input and the output as time tends to infinity. The proportional controller meanwhile, refers to the actuator that is creates a response proportional to the error.

This often takes place in 2 steps:

  1. The controller measures a quantity which reflects some undesirable deviation by integrating that quantity over time
  2. It then computes that integration, using it to drive processes which correct the imbalance, driving it to zero (e.g. through negative feedback)

Consequently, researchers have begun looking to synthesise these biomolecular integral controllers in vivo, improving the circuit’s robustness to uncertainty & disturbances. A potential example is antithetic integral controllers.

The controller network functions on 2 controller species, Z1 & Z2, both of which have the capacity to annihilate each other. Z1, actuates the network through X1, which in turn activates Xl. Z2 is then produced at a rate proportional to Xl, that will finally annihilate with control species Z1. This annihilation can take place both with the removal of biological activity, or in the formation of a high affinity, biologically-inert dimer between Z1 & Z2.

The manifestation of this can be seen through an endogenous circuit to regulate the housekeeping genes (constitutive genes required for the maintenance of basic cellular function) in E coli. The σ-70 binds to RNA polymerase, forming a complex that controls the transcription of housekeeping genes. Simultaneously, it controls the anti σ-70 factor, RSD, which later binds σ-70 with a high affinity, sequestering it from the RNA Polymerase enzyme, altering the transcriptional program of the cell through a negative feedback loop.

Their dynamics can be modelled with the following equation:

In this system, u represents the transcription factor while xo represents the regulated output. The degradation rate constant is then represented by θ, where z1 + z2 → (θ)→ ⌀.

4.4 Robustness Tracking

As we move towards more multi-functional systems, the creation of genetic circuits with a capacity to track dynamic biomolecular signals will be crucial. Hsiao et. al has presented a potential system to implement this: an in vivo Protein Concentration Tracker. With this, they were able to demonstrate how negative feedback could result in “tracking behaviour”, referencing the “proportional modulation of a given protein concentration relative to that od the reference protein, over a range of reference induction levels”.

The scaffold protein recruits HK & RR, 2 specialised proteins, forming a ternary complex, resulting in the phosphorylation of CUSR. The phosphorylated CUSR acts as a transcription factor, binding to a promoter which initiatives the production of the target protein, the antiscaffold. The antiscaffold, which expresses GFP, a gene responsible for fluorescence, in turn possesses domains which sequester free scaffold protein. The sequestration of the scaffold protein results in a further production of the anti-scaffold protein, forming a negative feedback loop. In this way, the negative feedback is the mechanism that generates real-time tracking behaviour.

4.5 Insulation

An insulation device could be designed to minimise the loading effects from the downstream system on the signal, u, that is received by the upstream system.

A 2 step device was proposed. As shown above, the system takes kinase z, as an input to convert inactive substrate yin into active substrate which regulates the downstream system. Phosphatase P converts yin back into y. The output cycle of the device is designed to be a high-gain negative feedback system, where high gains are realised through large substrate and phosphatase amounts. Meanwhile, the input cycle is designed to have lower amounts of substrate z and phosphatase P, such that the loading (and subsequent retroactivity) applied to u remains small.

The low-gain input system therefore utilises timescale separation, as the dynamics of a phosphorylation cycle are much faster than protein expression. Consequently, load-induced delays occur at the faster time scale of the z-cycle (seconds) and are therefore negligible in the time scale of the input (minutes to hours).

The dynamics of such a system could be modelled as below:

This ultimately shows that as the gain G increases, the contribution of s (retroactivity to the output) becomes negligible when compared to the contribution of u (signal of the upstream signal) to y.

4.6 Incoherent Feed Forward Loops

A feed forward loop is composed of 3 nodes, composed of a transcription factor X, that regulates a second transcription factor Y, and both X an Y regulate gene Z. Thus, the feed-forward loop has two parallel regulation paths from the output to input — a direct path from X to Z and an indirect path that goes through Y.

In incoherent feedforward loops, the sign of the indirect path is opposite to that of the direct paths. This specific characteristic allows it to one path to compensate for the input transmitted by the other, allowing the output to approximately reject constant disturbances in the system.

For example, the plasmid copy number (i.e. the average expected number of plasmid copies per host cell) , d, “activates” the protein, y, it’s expressing. Simultaneously, d results in the expression of an intermediate protein, x, which in turn represses the protein of interest, y. If the two feedforward branches are balanced, their opposite directions result in a null contribution from d to y, resulting in complete disturbance rejection.

This is modelled by the following equations:

Protein x is a transcriptional repressor of Protein y, where k represents the disassociation constant (used to evaluate and rank order strengths of bimolecular interactions) of the binding of protein x to the promoter site regulating protein y. Hence, y is dependent on d unless dk/a is negligible as compared to d, demonstrating how the two branches need to be in perfect equilibrium for the disturbance to be completely rejected.

4.7 Multicellular coordination

In control theory terms, multi-cellular control can be viewed as a type of cooperative control — where a multi-agent system which communicates through cell-cell communication, with the decision of each agent making up the collective behaviour of the population. Studies have shown that a multitude of environments / input signals can be programmed to initiate collective cell behaviour, such as spatial patterning, dark-light edge based detectors, and microbial consortia (dynamics of interactions between multiple cell strains).

Multicellular coordination has also been proven to reduce population-level heterogeneity in gene expression through population averaging, improving the functionality of these circuits.

4.7.1 Population Control

A key mechanism for multi-cellular coordination is the internal quorum sensing system. A specialised protein within the cell is constantly produced, to catalyse the formation of acyl-homoserine lactones (AHLs). AHLs act as “signalling” molecules for the cell, with an increase in intracellular AHL concentration being “sensed” by the cell as a proxy for an increase in cell population. Once the cell population has surpassed a certain threshold, the cells execute a consensus-like protocol, collectively carrying out a specific cellular response.

For instance, multicellular coordination can be programmed for negative feedback. In certain cell systems, AHL bind to a specialised protein, LuxR, to activate a “killer gene” which initiates cell lysis. As a result, an increase in intracellular concentration of AHLs leads to an increased activation of the killer gene, decreasing cell population size, and closing the feedback loop.

4.7.2 Pattern Formation

Synthetic circuits designed for pattern formation systems could form biomaterials of the future — materials that can self-organise into patterns of biological entities. One key example, was a pre-programmed circuit for pattern formation of a bullseye-like gradient in 2005. Sender cells, are placed in the centre of the plate, and constantly produce AHL molecules. AHL diffuses into the nearby “receiver cells”, binding to lux R, an AHL-dependent transcriptional regulator, activating the expression of the lambda repressor and Lac repressor (responsible for the repression of GFP gene)

The concentration of AHL can be used as a proxy for the distance of the receiver cells from the sender cells. At high concentrations of AHL, i.e. the ring of receiver cells closest to the centre, cellular levels of the lambda repressor and Lac repressor are high, repressing the expression of Gfp fluorescence. At intermediate concentrations of AHL, as the lambda repressor has a higher repression efficiency, it successfully shuts down LacI expression, while the Lac repressor fails to inactivate Gfp fluorescence. Finally, at low AHL concentrations, the lambda repressor & Lac repressor are expressed at basal levels, resulting in the expression of a variant LacI, which in turn suppresses Gfp expression.

5. Conclusion

The potential for synthetic gene circuits is incredibly exciting, because of its potential to enable so many revolutionary breakthroughs in improving human health and conserving the environment. The current future of this field is still not entirely clear — we’ve managed to build simple synthetic circuits such as oscillators & repressors, but still face immense difficulty in scaling up and crafting increasingly complex circuits due to cellular noise, retroactivity and resource competition. Control theory thus presents itself as an incredibly important conceptual field to integrate into the designing of these circuits to decrease the uncertainty of these systems, in turn improving reproducibility and predictability. If we eventually do achieve a precise quantification of these synthetic circuits, their potential to alter the way we engineer biology will be incredible.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Hey, I’m Sarrah Rose! A 17 year old deeply passionate in utilising Synthetic Biology & Artificial Intelligence to solve major problems in the world today. If you enjoyed this article or would just like to chat, I’d love to hear from you:

email: sarrahrose04@gmail.com || twitter || Linkedin

Feel like you’re about to jump into a rabbit-hole of reading these incredible articles?
Don’t worry, we feel the same way.
Not only can you jump into the rabbit hole with us, but we’ve got more than enough articles that’ll help you jump out!
For some of the best ideas on Medium from the youngest minds of the generation, visit students x students

--

--