Linearity and linear systems are important in science and engineering for two reasons:
Linear systems give rise to a rich ground of understanding and are natural to think about and design, even when the underlying physics is nonlinear. Furthermore, we can and often do design electronic and other systems to provide linear control of something that is nonlinear underneath.
Perhaps most importantly, a great deal of nonlinear systems can be analyzed by breaking them down into a number of locally linear systems, as we'll see shortly.
Linear might be defined as "in a line" or "in proportion to," but we can think about linearity at a number of levels. All of these layers of conceptual depth give rise to techniques for understanding and designing electronics, and for engineering at large.
Let's build from the simplest cases upwards.
Consider the slope-intercept form of a line in the 2D x-y plane:
$$y=m x + b$$
This equation has explicit independent ($x$) and dependent ($y$) variables, input and output. It has two constants $m, b$ which define the line. We can intuitively understand the relationship by inspection:
The slope-intercept form is convenient because we can plug in our independent variable and solve "forwards" (doing just a multiplication and addition) to get the other variable.
However, if you've studied algebra, you've also seen the standard form of the equation of a line:
$$A x + B y = C$$
This form is nice because it keeps all the terms with variables on the left-hand side, and keeps the fixed constant $C$ on the right. It extends well to situations with more than just the two variables here.
In this equation, there are two variables $x,y$ within one equation. One equation means one constraint. The solution space has 2 variables - 1 equation = 1 degree of freedom. (We'll build on this concept in Systems of Equations.)
The standard form of the line is is equivalent to a matrix equation:
$$ \begin{align} \mathbf{G} \mathbf{w} & = \mathbf{H} \\ \begin{bmatrix}A & B\end{bmatrix} \begin{bmatrix}x \\ y\end{bmatrix} & = \begin{bmatrix}C\end{bmatrix} \end{align} $$
In the standard form, the relationship between input and output (between independent and dependent variables) is not as obvious as it was in the slope-intercept form. Regardless, it still describes exactly the same line and has exactly the same solutions.
As an example, consider the line:
$$y = 5 x + 2$$
We can transform this equation from slope-intercept form to standard form and matrix form:
$$ \begin{align} -5 x + y & = 2 \\ \begin{bmatrix}-5 & 1\end{bmatrix} \begin{bmatrix}x \\ y\end{bmatrix} & = \begin{bmatrix}2\end{bmatrix} \end{align} $$
This equation describes the entire line, not just a particular point on it. There are infinitely many solutions; there are infinitely many points on a line. That means that $\begin{bmatrix}0 \\ 2\end{bmatrix}, \begin{bmatrix}1 \\ 7\end{bmatrix}, \begin{bmatrix}2 \\ 12\end{bmatrix}$, and any other points on that line, are all possible solutions to this matrix equation.
Let's shift our focus from all of the solutions on the line to isolating one particular solution.
How do we find the value of $y$ when $x = 3$? In the slope-intercept form, this was obvious enough: just substitute in the number, multiply, and add.
However, in the matrix case, we can add a very simple new equation $x=3$ to the system, which adds a new row to both the $\mathbf{G}$ and $\mathbf{H}$ matrices:
$$\begin{bmatrix}-5 & 1 \\1 & 0\end{bmatrix} \begin{bmatrix}x \\ y\end{bmatrix} = \begin{bmatrix}2 \\ 3\end{bmatrix}$$
In the next section Systems of Equations, we'll talk more directly about how to solve this, but for now it suffices to say that given a square $N \times N$ matrix $\mathbf{G}$ and a $N \times 1$ vector of constants $\mathbf{H}$, it is possible to solve $\mathbf{G} \mathbf{w} = \mathbf{H}$ for a unique solution vector $\mathbf{w}$ (if a single unique solution exists). Note that, depending on $\mathbf{G}$, there are cases in which there are zero or infinitely many solutions.
In the case above, the only solution is:
$$\mathbf{w} = \begin{bmatrix}x \\ y \end{bmatrix} = \begin{bmatrix}3 \\ 17\end{bmatrix}$$
This single point is a unique solution to the 2x2 system of equations above.
This technique works equally well when we want to solve "backwards", for example to find the value of $x$ on our line when $y=5$. We'd just set up the matrix equation:
$$\begin{bmatrix}-5 & 1 \\0 & 1\end{bmatrix} \begin{bmatrix}x \\ y\end{bmatrix} = \begin{bmatrix}2 \\ 5\end{bmatrix}$$
and solve to find the solution:
$$\begin{bmatrix}x \\ y\end{bmatrix} = \begin{bmatrix}\frac{3}{5} \\ 5\end{bmatrix}$$
Even in this simplest case where we started with just a single equation of a line, we can encapsulate both variables $x$ and $y$ into a single vector $\mathbf{w} = \begin{bmatrix}x \\ y\end{bmatrix}$ which describes possible values within the space of the system.
The big idea here is that even a simple single equation for a line can be treated more generally as a system of equations. A matrix equation is a very compact way of writing a system of linear equations. If we treat it as "easy" to solve the matrix equation $\mathbf{G} \mathbf{w} = \mathbf{H}$ -- and it is easy, especially for computers -- then terms like "input", "output", "dependent variable" and "independent variable" drop away and just describe a bigger concept, a system, which is itself linear and can be built upon to higher levels of complexity.
Consider this system of four equations and four variables:
$$ \begin{align} V_1 & = 5 \\ I_2 & = \frac {1} {10} V_1 \\ I_3 & = 2 \\ I_1 & = I_2 - I_3 \end{align} $$
If you've studied electronics before you'll see these equations represent a simple electronic circuit:
Interactive Exercise Click to open and simulate the circuit above.
(We'll talk in a future section Solving Circuit Systems about how to go from this schematic to this set of equations, so don't worry about that yet if you're confused! We'll also dig into how node voltages, branch currents, and terminal currents are related and named in Labeling Voltages, Currents, and Nodes.)
Can you solve this system of equations? This particular system was chosen to be very easy because you can just read down the list and solve for one variable at a time. (This isn't usually the case, but we designed it to make this example easier!)
$$ \begin{align} V_1 & = 5 && \text{(directly)} \\ I_2 & = 0.5 && \text{(from plugging in}\ V_1\text{)} \\ I_3 & = 2 && \text{(directly)}\\ I_1 & = -1.5 &&\text{(from plugging in}\ I_2\ \text{and}\ I_3\text{)} \end{align} $$
We've hidden the idea of independent and dependent variables here, but now suppose that the first equation $V_1 = 5$ is an input that we can control. For example, perhaps we have a knob on an adjustable voltage source and can change the voltage 5 to 5.5 or 6.
Now, what happens to $I_1$ when $V_1$ is increased by a small amount, let's call it $\Delta x$? We can solve through algebra, simply substituting $V_1 = (5 + \Delta x)$ in place of $V_1 = 5$ and following through:
$$ \begin{align} V_1 & = 5 + \Delta x \\ I_2 & = \frac {1} {10} (5 + \Delta x) \\ I_3 & = 2 \\ I_1 & = \frac {1} {10} (5 + \Delta x) - 2 \end{align} $$
We can look at things in terms of these "deltas," or changes, to see how much the other variables change in response to the one we're adjusting. We just do that by subtraction from their original reference values before we added $\Delta x$:
$$ \begin{align} \Delta V_1 & = \Delta x \\ \Delta I_2 & = \frac {1} {10} \Delta x \\ \Delta I_3 & = 0 \\ \Delta I_1 & = \frac {1} {10} \Delta x \end{align} $$
With all these deltas, we now know that if we increase $V_1$ by a little bit, we can say how that is going to affect all the other variables in the system.
We can use CircuitLab's Frequency Domain simulation mode to plot the constants associated with these deltas directly:
Interactive Exercise Click the circuit, click "Simulate," and "Run Frequency-Domain Simulation." The lines for $I_2, I_1$ will both be plotted at $\frac {1} {10}$, which is exactly the slope for $\frac {\Delta I_2} {\Delta x}$ we found in our deltas relationship above, while $I_3$ will be at 0. Though we aren't using any frequency-dependent circuit elements (such as capacitors or inductors), this shows the power of the frequency-domain simulation mode to quickly analyze the linearized relationships between circuit variables.
We can also come up with a single algebraic relationship between any dependent variable and our independent variable (hiding the rest of the system) because everything is linear.
Suppose $V_1 = c$. Can you write $I_1$ as a function of only $c$ and some constants?
$$I_1 = \frac {1} {10} c - 2$$
If you open the same circuit again, you can plot that line from the simulation:
Interactive Exercise Click the circuit, click "Simulate," then click "DC Sweep" to open a mode where the simulator evaluates the circuit for a range of different input avlues, and finally "Run DC Sweep." You'll see exactly that line plotted, with various values of $V_1$ on the x-axis, and the current $I_1$ plotted on the y-axis. Mouse-over the graph to see that at $V_1=5$, we have $I_1=-1.5$, and look at the slope of the relationship.
We can and will consider systems with multiple inputs and multiple outputs, but the overall linear system concept is that:
In matrix form we can write:
$$\mathbf{y} = \mathbf{A} \mathbf{x} + \mathbf{b}$$
where $\mathbf{x}$ is a $N \times 1$ column vector of independent variables (inputs), $\mathbf{A}$ is a $M \times N$ matrix of coefficients, $\mathbf{b}$ is a $M \times 1$ column vector of constants, and $\mathbf{y}$ is the $M \times 1$ column vector of dependent variables (outputs). If you're scared of matrixes, don't be! This is just a compact way to write:
$$ \begin{equation} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_M \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} & \ldots & a_{1N} \\ a_{21} & a_{22} & \ldots & a_{2N} \\ \vdots & \vdots & \ddots & \vdots \\ a_{M1} & a_{M2} & \ldots & a_{MN} \\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_N \end{bmatrix} + \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_M \end{bmatrix} \end{equation} $$
For the circuit above, if we selected $V_1$ to be our independent variable, we could write this "matrix slope-intercept form" like this:
$$ \begin{equation} \begin{bmatrix} I_2 \\ I_3 \\ I_1 \end{bmatrix} = \begin{bmatrix} \frac {1} {10} \\ 0 \\ \frac {1} {10} \end{bmatrix} \begin{bmatrix} V_1 \end{bmatrix} + \begin{bmatrix} 0 \\ 2 \\ -2 \end{bmatrix} \end{equation} $$
Alternatively, it would look like this if we decided that both $V_1$ and $I_3$ were our independent variables:
$$ \begin{equation} \begin{bmatrix} I_2 \\ I_1 \end{bmatrix} = \begin{bmatrix} \frac {1} {10} & 0\\ \frac {1} {10} & -1 \end{bmatrix} \begin{bmatrix} V_1 \\ I_3 \end{bmatrix} + \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{equation} $$
This is just a linear equation in a higher-dimensional space, which means it contains multiple simultaneously-true equations. It may look advanced, but it's really saying the same thing as the four equations above (although we've pulled out $V_1=5$ and $I_3=2$ as independent input variables $\mathbf{x} = \begin{bmatrix}V_1 \\ I_3\end{bmatrix} = \begin{bmatrix}5 \\ 2\end{bmatrix}$).
No matter how we rearrange it, and no matter what we choose to include as dependent or independent variables, the system itself is linear. Just as we did in Level 1, rearranging between the input-output form $\mathbf{y} = \mathbf{A} \mathbf{x} + \mathbf{b}$ and the standard form is possible.
We can rearrange all equations with all the multiplicative terms -- regardless of whether they're dependent or independent variables -- on the left hand side, and just a single constant on the right. Here are the four original equations where we've just subtracted to move all coefficient-times-variable terms to one side and all constants to the other:
$$ \begin{align} V_1 & = 5 \\ I_2 - \frac {1} {10} V_1& = 0 \\ I_3 & = 2 \\ I_1 - I_2 + I_3 & = 0 \end{align} $$
This can be set up as a matrix equation $\mathbf{G} \mathbf{w} = \mathbf{H}$, where $\mathbf{w}$ includes both the dependent and independent variables.
$$ \begin{equation} \begin{bmatrix} 1 & 0 & 0 & 0 \\ -\frac {1} {10} & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & -1 & 1 & 1 \end{bmatrix} \begin{bmatrix} V_1 \\ I_2 \\ I_3 \\ I_1 \end{bmatrix} = \begin{bmatrix} 5 \\ 0 \\ 2 \\ 0 \end{bmatrix} \end{equation} $$
This is in a form that's very easy for a computer to solve.
Just for this particular example, notice that this particular problem was chosen to be a really easy toy problem that you could solve by hand just reading down the list of equations one at a time. This makes this particular matrix $\mathbf{G}$ have a special shape: it's a "lower triangular" matrix, because all the values above and to the right of the main diagonal are all 0. $\mathbf{G}$ being lower triangular is not true in general, but we picked our problem here intentionally. This depends not just on the equations, but on the order we put them in, as well as the order we map variables to columns. We'll talk more about triangular matrices and how they're used for solving any system in the Systems of Equations section.
From an electronics perspective, look at which conditions generate which equations, especially after you read through Chapter 2. For example, the voltage and current sources generate the 1st and 3rd rows, with nonzero constant terms in $\mathbf{H}$:
$$ \begin{align} V_1 & = 5 \\ I_3 & = 2 \end{align} $$
And Kirchhoff's Current Law summing currents produces the 4th equation, with several $\pm 1$ coefficients on currents summing to zero:
$$I_1 - I_2 + I_3 = 0$$
And finally, the second equation came from Ohm's Law, also summing to zero constant right-hand side:
$$I_2 - \frac {1} {10} V_1 = 0$$
The big idea here is that systems of multiple simultaneous linear equations can be written in many ways, but the standard matrix equation form $\mathbf{G} \mathbf{w} = \mathbf{H}$ encompasses all of them without rearranging to any paricular input-output form. Computers are really good at quickly solving this linear equation (if $\mathbf{G}$ is square and certain other conditions are met) even if there are 100 or 1000 simultaneous equations and unknowns.
The full space of possible $\mathbf{w}$ values (in multiple dimensions) is constrained by each equation to obey some linear relationship between variables. As each constraint between variables (i.e. each equation) is linear, the system as a whole must also be linear, regardless of how big or complicated-looking it gets!
Redefining our system or network to be linear (rather than just a single linear input-output relationship) is important because many analysis techniques we'll use rely on us redefining inputs and outputs on the fly as we examine pieces of our circuits.
Continuing with our example circuit from above, we have an idea of what happens if we change $V_1$ from 5 to 6; we can simply rewrite our equations and matrices and solve again.
But what if the input is changing as a function of time? We can introduce another variable $t$ which is not an unknown in our matrix, but is a parameter to the input (and therefore will be a parameter to the output too). Let's choose our input to be a signal, which can be defined as a function of time:
$$V_1(t) = 5 + \sin(2 \pi t)$$
We now write $V_1(t)$ instead of just $V_1$ to remind ourselves that this is a function of time.
Given this input, what is our output $I_1(t)$? In this particular case, our system is memoryless, which means nothing depends on any previous value of the state of the system (such as an integral or derivative). Additionally, the system is time invariant because nothing in the system depends on the parameter $t$ itself. (For the future, note that most electronic systems of interest will be time invariant, but will not be memoryless, because of the presence derivative and/or integral terms from capacitors or inductors. However, a memoryless system responds to reach its new steady-state equilibrium instantly, making it easy to think about for this example.) Because these two properties are true, then our original solution:
$$I_1 = \frac {1} {10} V_1 - 2$$
can be translated directly into treating these variables as functions of time:
$$I_1(t) = \frac {1} {10} V_1(t) - 2$$
Now, if we substitute in our definition of $V_1(t)$ above:
$$ \begin{align} I_1(t) & = \frac {1} {10} \big( 5 + \sin(2 \pi t) \big) - 2 \\ I_1(t) & = \frac {1} {10} \sin(2 \pi t) - 1.5 \end{align} $$
Try it in this simulation:
Interactive Exercise Click the circuit above, then click "Simulate," and "Run Time-Domain Simulation." Note that you can change the voltage signal to be whatever you want. The system is perfectly linear, and nothing within the system itself depends on time as a variable, so it doesn't matter: the input and output as defined in this way will retain their proportionality.
The big idea here is that if the system obeys certain properties, then we can broaden our thinking: instead of reasoning about our system as a function $x \rightarrow F(x) \rightarrow y$ where $x,y$ are numbers, to now thinking about our system as a capital-S System which takes a signal (a function of time) in and outputs another signal, $x(t) \rightarrow F(x(t)) \rightarrow y(t)$.
Let's take a slight detour and hint at some amazing things to come. Instead of having $V_1(t)$ be a single sine wave, what if it were the sum of two different sine waves at two different frequencies?
$$V_1(t) = 5 + \sin(2 \pi t) + 0.1 \sin(2 \pi 10 t)$$
We've now added a small amplitude (0.1) but faster (10 Hz) wave on top of our previous one. Since the system is still linear, so is the output:
$$ \begin{align} I_1(t) & = \frac {1} {10} \big( 5 + \sin(2 \pi t) + 0.1 \sin(2 \pi 10 t) \big) - 2 \\ & = \frac {1} {10} \sin(2 \pi t) + \frac {1} {100} \sin(2 \pi 10 t) - 1.5 \end{align} $$
Try it in this simulation:
Interactive Exercise Click to run the simulation above.
In addition to showing this combined input algebraically, we can show it schematically:
Interactive Exercise Click to run the simulation above.
Compare both schematics above. They both do the same thing.
This represents a combination of inputs in the frequency domain. For any linear system, the same frequencies will then be present in the output.
This particular example has a flat frequency response, which means it doesn't matter whether the input is 1 Hz or 10 Hz -- the input is reduced by a factor $\frac {1 \text{A}} {10 {V}}$ in either case. However, even in other cases with non-flat frequency response, the idea of a linear combination of sine waves is a useful one.
If you've seen Laplace and/or Fourier Transforms, you may see where this line of thought is going, but we'll put it away for now.
The big idea here is that if we take a linear system and put in the sum of two sine waves at two different frequencies, our output will also have the sum of two sine ways at those same two frequencies. (The amplitude and/or phase of each may be different depending on each frequency.)
Let's take a closer look at a few definitions:
In the earlier Level 3 example, we considered a memoryless example which had no derivative or integral terms, but in many physical situations there is a differential equation that includes both $x(t)\ \text{and} \ \frac {dx}{dt}$.
The derivative $\frac {dx} {dt}$ needs to know what a variable's own value was at $x(t - \Delta t)$ (even if an infinitesimally small $\Delta t$) in order to have a derivative, so the situation is no longer memoryless. The derivative has memory, so prior values of the input will affect the present value of the function.
Taking a derivative is a linear operator. This might surprise you. You might have done some derivatives and know that, for example:
$$\frac {d} {dt} \sin(t) = \cos(t)$$
Since $\sin(t)$ and $\cos(t)$ aren't proportional to each other, how are they possibly linear? But that's not what linear means when we talk about a linear operator.
If $F$ is a linear operator, then:
$$F \big( a \cdot x(t) + b \cdot y(t)\big) = a \cdot F \big( x(t) \big) + b \cdot F \big( y(t) \big)$$
A linear operator scales up and down in size with a constant term, and it distributes linearly to a sum.
A derivative follows these properties, so a derivative is a linear operator.
Suppose we go back to our single equation from Level 1, $y = m x$, and change it to be:
$$y(t) = m \cdot x(t) + n \cdot \frac {dx(t)} {dt}$$
We now have a single equation that describes how a system's input $x(t)$ is related to its output $y(t)$, but we now have a coefficient for the time derivative of the input. This is extremely practical in electronics because, as we will see later, capacitors and inductors create time derivatives of their inputs in equations, which we call differential equations.
Is this really linear? For example, suppose we have a particular input signal:
$$x_1(t) = \sin(2 \pi t)$$
Then, taking the derivative of the input:
$$\frac {dx_1(t)} {dt} = 2 \pi \cos(2 \pi t)$$
And substituting in:
$$y_1(t) = m \cdot \sin(2 \pi t) + 2 \pi n \cos(2 \pi t)$$
Clearly, if $n \neq 0$ then we won't have a "linear" relationship between the numerical values $x_1(t)$ and $y_1(t)$ at a given time $t$.
But again, that is not what we're talking about! We're talking about linearity in terms of the whole signal, the whole function, over all time.
We have changed our perspective in an important way: instead of the input being a single number, now the input is a signal, a function of time, and the output signal is also a function of time.
The important linear aspect here is that if we consider a second signal, for example $x_2(t) = 5$, we can find $y_2(t) = m \cdot 5$.
And now, when we create any linear combination of those two input signals:
$$x_3(t) = c_1 \cdot x_1(t) + c_2 \cdot x_2(t)$$
For any values of the two linear combination constants $c_1$ and $c_2$, the output will be:
$$y_3(t) = c_1 \cdot y_1(t) + c_2 \cdot y_2(t)$$
That's the meaning of linearity at an operator level, and taking a derivative is a linear operator.
Note that it was not particularly important to choose a $\sin(\dots)$ function here. However, $\sin$ and $\cos$ have a nice property which is that their derivative is always another wave of the same frequency, although it may have different amplitude and phase. (This will be useful later!)
Additionally, $\sin$ and $\cos$ also have a very nice property that summing up any number of multiple $\alpha_1 \cos(\omega t)$ and $\beta_1 \sin(\omega t)$ terms -- as long as they all have the same frequency $\omega$ -- can be collapsed into a single function $\gamma \cos(\omega t + \phi)$ with some single overall amplitude $\gamma$ and phase $\phi$. Just as $\sin$ and $\cos$ have a geometric interpretation in terms of tracing the path of the unit circle in the x-y plane, there is a geometric interpretation here. For more on this topic, review the Complex Numbers section.
The big idea here is that when we talk about linear systems, we're not talking about just mapping an input value (a single number) to an output value (a different number). We're really talking about something that takes an input signal (a function of time) and gives an output signal (a different function of time). We've defined linearity more rigorously, and any linear system as we've previously defined in terms of $\mathbf{G} \mathbf{w} = \mathbf{H}$ is a linear system regardless of how we define inputs or outputs within $\mathbf{w}$.
If we combine the ideas from Level 2 (Multiple Linear Equations) with Level 5 (A Single Differential Equation), we can take the general matrix equation form $\mathbf{G} \mathbf{w} = \mathbf{H}$ and extend it to allow each equation to have a term for the derivative of $\mathbf{x}$:
$$\mathbf{A} \mathbf{x} + \mathbf{B} \mathbf{\dot{x}} = \mathbf{H}$$
The vector of derivatives $\mathbf{\dot{x}}$ simply includes the time derivatives of each of the individual components of $\mathbf{x}$:
$$\mathbf{\dot{x}} = \frac {d} {dt} \mathbf{x}(t) = \begin{bmatrix} \frac {d} {dt} x_1(t) \\ \frac {d} {dt} x_2(t) \\ \vdots \\ \frac {d} {dt} x_n(t) \\ \end{bmatrix}$$
This is the most general form of the derivative available. If we don't actually need to use all of the derivatives of all of the unknowns, the corresponding cells in the $\mathbf{B}$ matrix will just be 0.
This general form, like the one discussed in Level 2, includes both dependent and independent variables in the $\mathbf{x}$ vector.
As in Level 2, it is possible for us to choose what our dependent and independent variables are and algebraically rearrange the equations so that each dependentent variable is equal to some linear combination of the independent variables, their derivatives, and the constant terms. We won't get into that here. But if it helps you see that it is possible, it is possible to rewrite:
$$\mathbf{A} \mathbf{x} + \mathbf{B} \mathbf{\dot{x}} = \mathbf{H}$$
as a new matrix equation,
$$\mathbf{G} \mathbf{w} = \mathbf{H}$$
where we define $\mathbf{G} =\begin{bmatrix} \mathbf{A} & \mathbf{B} \end{bmatrix}$, $\mathbf{w}= \begin{bmatrix} \mathbf{x} \\ \mathbf{\dot{x}} \end{bmatrix}$. We've just folded the matrixes $\mathbf{A}$ and $\mathbf{B}$ together side-by-side, and pushed $\mathbf{x}$ and $\mathbf{\dot{x}}$ together into one longer vector. Matrix multiplication to expand still produces the same result.
As we'll talk about more in the Systems of Equations section, we need a square $N \times N$ matrix $\mathbf{G}$ in order to have a unique solution for $\mathbf{G} \mathbf{w} = \mathbf{H}$ -- we need the same number of unknowns as we have linearly independent variables. However, with differential equations in the mix, we have a problem: $x_i$ is an unknown, and so is $\dot{x}_i$. If our original $\mathbf{A}, \mathbf{B}$ were each of size $N \times N$, then our combined $\mathbf{G} =\begin{bmatrix} \mathbf{A} & \mathbf{B} \end{bmatrix}$ will have size $N \ \text {rows} \ \times 2 N \ \text{columns}$ -- more unknowns than equations; no longer square. This reflects the fact that with a differential equation we need to be given the value of $x$ in order to compute $\dot{x}$, or vice versa; if given neither, we have infinitely many possible solutions related by the differential equation between them.
The additional required equations reflect the fact that in order to solve a Kth order linear differential equation (with one equation and one unknown), we need K extra constraints, such as initial values or boundary values, which select a single curve from all the possible ones specified by the same differential equation.
The matrix formulation of a system of simultaneous linear equations:
$$\mathbf{A} \mathbf{x} = \mathbf{H}$$
is a linear system because $\mathbf{A}$ is a matrix of constant values.
Similarly,
$$\mathbf{A} \mathbf{x} + \mathbf{B} \mathbf{\dot{x}} = \mathbf{H}$$
is a linear system because $\mathbf{B}$ is also a matrix of constant values, and $\mathbf{\dot{x}}$ is effectively another set of unknowns.
Overall, since $\mathbf{w}$ is just a combination of $\mathbf{x}$ and $\mathbf{\dot{x}}$, the overall system of differential equations is still just a linear system.
So, even when equations have time derivative terms, and even if it isn't easy to see or write the exact relationship between input and output, there's still a linearity to the system that is very useful for analysis.
Also, note that we can always unwrap multiple derivatives into a chain of single derivatives. If we want to write a second-order differential equation:
$$\frac{d^2 x_1(t)} {d t^2} = C$$
we can define a new variable $x_2(t) = \frac {d x_1(t)} {dt}$ and write two connected first-order equations:
$$ \begin{align} x_2(t) - \frac {d x_1(t)} {dt} & = 0\\ \frac{d x_2(t)} {d t} & = C \end{align} $$
The big idea here is that we can easily modify our approach to systems of linear equations to incorporate linear differential equations, which occur in electronics every time there's a capacitor or inductor.
Let's look at:
$$y(x) = \frac {1} {5} x^2$$
Clearly this is a nonlinear equation because there's a squared term. But we'd still like to think about and be able to make a statement about it, such as, "if we increase $x$ by a small amount $\Delta x$, what happens to $y$?" This question is just asking for a derivative, and calculus gives us the answer:
$$\frac {dy} {dx} = \frac {2} {5} x$$
This answer itself is not a constant! (If it were, then $y$ would have been linear.) However, if we know what $x$ is, we can construct a tangent line at that point $x=x_0$. It's simply:
$$\hat{y}(x_0 + \Delta x) = y(x_0) + \Big( \frac {dy} {dx} \Bigg\rvert_{x=x_0} \Big) \Delta x$$
If you're more comfortable with numbers, let's say we're working around $x_0=5$. Then:
$$ \begin{align} y(5) & = \frac {1} {5} *(5^2) = 5 \\ \frac {dy} {dx} \Bigg\rvert_{x=5} & = 2/5*5 = 2 \end{align} $$
We can combine these two pieces of information to form the local tangent line:
$$\hat{y}_0(5 + \Delta x) = 5 + 2 \Delta x$$
We might also write it more simply by substituting in $\Delta x = x - x_0$:
$$\hat{y}_0(x) = 5 + 2 (x-5) = -5 + 2 x$$
If you've studied calculus, you'll see that this is a first order Taylor series expansion of $y(x)$ around $x=x_0$. This is useful because it's very easy to have intuition about linear functions, and as long as we stay close to $x=x_0$ our approximation will be pretty good.
Here's where things get interesting. Let's say you were asked to invert or solve the function: for example to find the value of $x$ for which $y(x) = 6$.
In this particular case, we have the closed form function algebraically, and it's algebraically invertible, so you'd use your algebra skills to find that $x = \pm \sqrt {30}$. But in many cases in engineering, we run into a few problems:
In those cases, we can use the first-order approximation tangent line $\hat{y}$ to make our guess:
$$6 = \hat{y}_0(x_1) = -5 + 2 x_1$$
From this we can solve for our unknown to find $x_1=5.5$. This $x_1$ is an approximation (not an exact answer! $5.5 \ne \sqrt{30}$) because it was based on our tangent line based near $x_0$.
And here's the cool part: we can improve our approximation! How? By re-linearizing $y(x)$ to find a new tangent line around this new point $x_1$:
$$\hat{y}_1(x) = y(x_1) + \Big( \frac {dy} {dx} \Bigg\rvert_{x=x_1} \Big) * (x-x_1)$$
Numerically, we get a new tangent line approximation:
$$\hat{y}_1(x) = 6.05 + 2.2 * (x - 5.5) = -6.05 + 2.2 x$$
Since we're still trying to solve the problem $y(x) = 6$, we set $6$ equal to our new approximation line as before:
$$6 = \hat{y}_1(x_2) = -6.05 + 2.2 x_2$$
From which we solve numerically for our unknown $x_2 = 5.4772727273$. This is a new approximate solution: we went from $x_0 = 5$ to $x_1 = 5.5$ to $x_2 = 5.4772727273$.
We can repeat once more, creating a new tangent line at this point $x_2$:
$$ \begin{align} \hat{y}_2(x) & = y(x_2) + \Big( \frac {dy} {dx} \Bigg\rvert_{x=x_2} \Big) * (x-x_2) \\ \hat{y}_2(x) & = 6.0001033058 + 2.4000413223 * (x - 5.4772727273) \\ \hat{y}_2(x) & = -7.1455775732 + 2.4000413223 x \end{align} $$
Again, we can solve this tangent line for our desired $y=6$ condition:
$$6 = \hat{y}_2(x_3) = -7.1455775732 + 2.4000413223 x_3$$
From which we find $x_3=5.4772296839$.
We can do this a few times and find that the $x_n$ series quickly numerically converges toward the real answer of $\sqrt {30}$.
This is called the Newton-Raphson Method:
Repeat as many times as desired to improve accuracy. After just three iterations, we've improved our guess to a substantial level of precision:
$$ \begin{array}{c|cccc} n & x_n & y_n & \text{Error in}\ x & \text{Error in}\ y \\ \hline 0 & 5 & 5 & -0.4772255751 & -1 \\ 1 & 5.5 & 6.05 & 0.02277442495 & 0.05 \\ 2 & 5.4772727273 & 6.0001033058 & 0.00004715224834 & 0.0001033058 \\ 3 & 5.4772296839 & 6.000009002 & 0.000004108848339 & 0.000009002 \end{array} $$
This process is actually hard to see on a graph because it converges so quickly, but if you wish, you can plot these successive tangent lines:
Interactive Exercise Click to plot the tangent lines.
We can repeat the technique as many times as we'd like to get more precision, but note that we're extremely close even after just 3 iterations. As $n \rightarrow \infty$ the series will converge:
$$ \begin{align} \lim_{n \to \infty} x_n & = \sqrt{30} \\ \lim_{n \to \infty} y_n & = 6 \\ \lim_{n \to \infty} \text{Error in}\ x & = 0 \\ \lim_{n \to \infty} \text{Error in}\ y & = 0 \\ \end{align} $$
Notice that it matters where we started! $x = -\sqrt{30}$ was also a possible answer to our question, and it's the answer we would have found if our initial guess had been any negative number.
The big idea here is that even without an invertible closed-form equation for $y(x)$ we can very quickly solve problems $y(x)=C$ as long as we can compute the forward values $y(x)$ at any $x$, and the local first derivative there. This is a remarkable technique that is at the root of much of scientific computing. Linearization is a powerful technique that allows solving even nonlinear problems.
The Newton-Raphson method presented above may have seemed trivial because we already knew the algebraic form of $y(x)$ and could just directly solve for $x$ if we wanted to. The terminology we use of dependent and independent variables usually implies that $x$ is a independent variable and $y$ is a dependent variable. However, when we start to build systems of equations as we did in Level 2 above, it's more and more clear that many or most equations may not be written in that clearly make input and output so easily solved. Even with strictly linear systems of equations, it takes work to rearrange them into an input-output relationship. But, once we allow those systems of equations to have any nonlinear terms, it can be either a lot of work to express the relationship in input-output format -- or it can in fact be impossible to express a closed-form inverse function.
However, the Newton-Raphson method works just as well in multiple dimensions as it works in a single dimension!
Instead of evaluating $\frac {dy} {dx}$, we evaluate partial derivatives $\frac {\partial y} {\partial x}$. This creates a Jacobian matrix $\mathbf{G}$ and we can put our entire equation in the form $\mathbf{G} \mathbf{x} = \mathbf{H}$.
For example, if we have any 3 nonlinear equations and 3 unknowns of the form:
$$ \begin{align} F_1(x_1,x_2,x_3) & = 0 \\ F_2(x_1,x_2,x_3) & = 0 \\ F_3(x_1,x_3,x_3) & = 0 \\ \end{align} $$
(Note that we can always set the right-hand side to zero without any loss of generality because we can fold any constant into the nonlinear function on the left.)
In order to create our linearization, we can work with one equation at a time. For example, for nonlinear function $F_1$, we can create a simple linearization about our point of interest $(\tilde{x}_1, \tilde{x}_2, \tilde{x}_3)$:
$$\hat{F}_1 (\tilde{x}_1 + \Delta x_1, \tilde{x}_2 + \Delta x_2, \tilde{x}_3 + \Delta x_3) = F(\tilde{x}_1, \tilde{x}_2, \tilde{x}_3) + \left( \frac {\partial F_1} {\partial x_1} \right) \Delta x_1 + \left( \frac {\partial F_1} {\partial x_2} \right) \Delta x_2 + \left( \frac {\partial F_1} {\partial x_3} \right) \Delta x_3$$
As we are solving for $F_1 = 0$, we'll also set $\hat{F}_1 = 0$:
$$ \begin{align} F_1(\tilde{x}_1, \tilde{x}_2, \tilde{x}_3) + \left( \frac {\partial F_1} {\partial x_1} \right) \Delta x_1 + \left( \frac {\partial F_1} {\partial x_2} \right) \Delta x_2 + \left( \frac {\partial F_1} {\partial x_3} \right) \Delta x_3 & = 0 \\ \left( \frac {\partial F_1} {\partial x_1} \right) \Delta x_1 + \left( \frac {\partial F_1} {\partial x_2} \right) \Delta x_2 + \left( \frac {\partial F_1} {\partial x_3} \right) \Delta x_3 & = - F_1(\tilde{x}_1, \tilde{x}_2, \tilde{x}_3) \\ \begin{bmatrix} \frac {\partial F_1} {\partial x_1} & \frac {\partial F_1} {\partial x_2} & \frac {\partial F_1} {\partial x_3} \end{bmatrix} \begin{bmatrix} \Delta x_1 \\ \Delta x_2 \\ \Delta x_3 \end{bmatrix} & = - F_1(\tilde{x}_1, \tilde{x}_2, \tilde{x}_3) \end{align} $$
If we extend this to also cover our other equations $F_2, F_3$ we'll find:
$$ \begin{bmatrix} \frac {\partial F_1} {\partial x_1} & \frac {\partial F_1} {\partial x_2} & \frac {\partial F_1} {\partial x_3} \\ \frac {\partial F_2} {\partial x_1} & \frac {\partial F_2} {\partial x_2} & \frac {\partial F_2} {\partial x_3} \\ \frac {\partial F_3} {\partial x_1} & \frac {\partial F_3} {\partial x_2} & \frac {\partial F_3} {\partial x_3} \\ \end{bmatrix} \begin{bmatrix} \Delta x_1 \\ \Delta x_2 \\ \Delta x_3 \end{bmatrix} = - \begin{bmatrix} F_1(\tilde{x}_1, \tilde{x}_2, \tilde{x}_3) \\ F_2(\tilde{x}_1, \tilde{x}_2, \tilde{x}_3) \\ F_3(\tilde{x}_1, \tilde{x}_2, \tilde{x}_3) \end{bmatrix} $$
The matrix of partial derivatives for every equation (row) and every unknown (column) as seen on the left is called the Jacobian matrix.
$$ \begin{equation} \mathbf{G} = \begin{bmatrix} \frac {\partial {F_1}} {\partial {x_1}} & \frac {\partial {F_1}} {\partial {x_2}} & \frac {\partial {F_1}} {\partial {x_3}} \\ \frac {\partial {F_2}} {\partial {x_1}} & \frac {\partial {F_2}} {\partial {x_2}} & \frac {\partial {F_2}} {\partial {x_3}} \\ \frac {\partial {F_3}} {\partial {x_1}} & \frac {\partial {F_3}} {\partial {x_2}} & \frac {\partial {F_3}} {\partial {x_3}} \end{bmatrix} \end{equation} $$
Because each function $F_i$ is nonlinear, the cells of this matrix are not constant; they depend on the point $(\tilde{x}_1,\tilde{x}_2,\tilde{x}_3)$ at which we evaluate all the partial derivatives.
We now have a matrix equation of the form $\mathbf{G} (\mathbf{\Delta x}) = -\mathbf{F}$. We could solve this to find the $\mathbf{\Delta x}$ values to update. However, since $\mathbf{\Delta x} = \mathbf{x} - \mathbf{\tilde{x}}$, we can expand our equation:
$$ \begin{align} \mathbf{G} (\mathbf{\Delta x}) & = -\mathbf{F} \\ \mathbf{G} (\mathbf{x} - \mathbf{\tilde{x}}) & = -\mathbf{F} \\ \mathbf{G} \mathbf{x} - \mathbf{G} \mathbf{\tilde{x}} & = -\mathbf{F} \\ \mathbf{G} \mathbf{x} & = \mathbf{G} \mathbf{\tilde{x}} - \mathbf{F} \\ \mathbf{G} \mathbf{x} & = \mathbf{H} \\ \text{where} \quad \mathbf{H} = \mathbf{G} \mathbf{\tilde{x}} - \mathbf{F} \end{align} $$
After this manipulation, we have a standard matrix problem of the form $\mathbf{G} \mathbf{x} = \mathbf{H}$ where $G, H$ are known constant matrixes (at a particular linearization point), and then our new approximate solution $\mathbf{x}$ is easily solvable.
Again, as with the single equation Newton-Raphson method, we can successively use a "guessed" starting point for $\mathbf{x_0}$, evaluate all the equations and their derivatives at that point to generate $\mathbf{G_0}$ and $\mathbf{H_0}$, and then solve $\mathbf{G_0} \mathbf{x_1} = \mathbf{H_0}$.
This gives us a new "best guess" $\mathbf{x_1}$. From $\mathbf{x_1}$ we can regenerate a new $\mathbf{G_1}$ and $\mathbf{H_1}$ from the derivatives of the original nonlinear system, and solve $\mathbf{G_1} \mathbf{x_2} = \mathbf{H_1}$, and we'll get an even better guess $\mathbf{x_2}$. And we can do this on and on until our $\mathbf{x_n}$ converges.
As we discussed in Level 6 above, it's possible for our $\mathbf{x}$ vector to include time derivative terms, so now we can work with nonlinear differential equations!
This is what a circuit simulator like CircuitLab does: it takes the schematic provided by the user, it writes down tens or hundreds of simultaneous nonlinear differential equations, and it solves them, again and again.
This Jacobian matrix is also the root of what's called incremental analysis or small signal analysis, an incredibly powerful circuit analysis tool, that relies on using a linearized model of a nonlinear circuit and just looking at small deviations around an operating point. The Jacobian matrix contains all those small-signal relationships. (As shown briefly in Level 2 above, these values are effectively exposed within CircuitLab's Frequency Domain Simulation tools.)
At some level of physics intuition, you can think of this as being what the circuit itself does: when you read the later section about Thermodynamics, Energy, & Equilibrium, you can think about the mathematical system exploring nearby states with little wiggles in each variable (derivatives), and converging toward a low energy state (one at which all the equations are satisfied)!
From a mechanics perspective, a book resting on a table doesn't "know" its equations of force balance between gravity and the normal force provided by the table. Instead, the atoms and electrons in each material are always vibrating at random. If a few electrons on the bottom of the book happen to randomly wiggle a nanometer closer to the electrons on the top of the table, they'll feel a repulsive force pushing them away. This first-order exploration is an automatic part of the universe until equilibrium is established. As we'll discuss in Steady-State and Transient, in most cases equilibrium is actively maintained through these tiny random interaction, rather than passively purely stable, when we look closely enough at physical systems.
In general, this microscopic process happens so fast that we usually virtually ignore it as engineers! But, inspected over the right time or distance scales, it matters. See the Lumped Element Model for more about the assumption of instantaneous equilibrium.
The big idea here is that the Newton-Raphson method can be extended to solving systems of multiple nonlinear differential equations. Even if there are hundreds or thousands of simultaneous unknowns and equations, this numerical method will start from an initial guess and quickly converge toward the true solution point by creating a linearized version of the nonlinear system at each successive approximation point.
We'll stop looking at Systems of Equations until the next section.
Instead, we want to consider systems at an even higher level of abstraction.
Here's a pulse-width modulation circuit that might not make much sense now, but will in a few chapters:
Interactive Exercise Click the circuit above, then click "Simulate," and finally click "Run Time-Domain Simulation" to see what this circuit does.
This circuit turns an input voltage $V_{\text{in}}$ into a series of pulses of different lengths. Then, it smooths them back out into an output voltage $V_{\text{out}}$.
Overall, over a reasonably wide range of possible inputs, this does something roughly linear: the output voltage and the input voltage are quite close to each other as long as we don't zoom in too closely.
You can see the output $V_{\text{out}}$ and input $V_{\text{in}}$ traces aren't exactly the same: there's some jaggedness in the output, while the input is smooth; there's a delay between input and output; and the size or scale aren't exactly the same either! The jaggedness is an effect of digital on-off switching in the intermediate circuit. The phase and scale issues are effects of filters that we'll talk about in later chapters.
Yet, despire these mismatches, to a rough approximation, when the input goes up, so does the output.
Why would we have all this complexity? Why turn a nice smooth analog sine wave into a series of digital pulses, and then back into an analog signal that "looks worse" by some metrics? Well, this particular circuit is an example of pulse width modulation (PWM). It turns out that this technique is often the most efficient to drive a motor or an LED by pulsing it on and off very quickly, and letting the "inertia" of either the motor load or of our ability to perceive light smooth out the pulses into a continuously variable average, just like this circuit does. Just for this example here, there are probably billions of this exact idea produced every year: every class-D audio amplifier (including every smartphone), many RF amplifiers, many LED controls, many motor controllers, etc.
The big idea here is that linearity lets us bundle up all that internal complexity, and for the user or for the engineer integrating this into a larger system, it's easy to say, "This black box labeled 'amplifier' will make the output roughly proportional to the input." The fact that it does it in a particularly energy-efficient way is a great bonus.
In Level 9 we showed an example where an input and output voltage were made (approximately) linear to each other, even though the process in the middle was fairly complicated and nonlinear.
However, it doesn't have to be linear from voltage to voltage. In fact, we can cross domains entirely and think more abstractly.
In this example, we'll convert voltage to frequency. An adjustment in input voltage will adjust the frequency of the output:
Interactive Exercise Click the circuit, click "Simulate," and finally "Run Time-Domain Simulation."
For higher input voltages $V_{\text{in}}$, the output frequency is slower. For lower input voltages $V_{\text{in}}$, the output frequency is faster.
If you were to plot the input voltage versus the output frequency, you'd see that this behavior is approximately linear. It's not perfect, but over a fairly wide range, it's reasonably linear.
This general concept of voltage-to-frequency conversion circuit also finds many practical uses: for example, as a voltage-controlled oscillator (VCO) in radio-frequency systems.
In FM radio, and many digital radio systems as well, a signal to transmit is converted from a voltage to a frequency. This frequency is then transmitted wirelessly over an antenna, received, and the detected frequency is converted back into a voltage.
Frequency is not a state variable. Only currents and voltages (and possibly their derivatives) are state variables in the time-domain simulation of this circuit.
Nonetheless, at a higher level of abstraction, this circuit can be examined as a black box where you put a voltage in and get a linearly-related frequency out.
The big idea is that this type of cross-domain thinking is extremely valuable in an engineering mindset.
Obviously, linearity is useful in actual numerical problem solving as shown above. This is not just useful in electronics, but in any engineering field; in mechanical or civil engineering, for example, we can think about the loads on a structure and the displacements that result as a linear system. The linear behavior will only be an approximation to the real nonlinear behavior, but still a very useful approximation, or even a tool for solving the nonlinear behavior as shown above.
However, the concept of linearity is even more broadly useful to engineers because of abstractions.
As shown in Levels 9 and 10, we can hide a lot of complexity underneath the idea that if we "zoom out" a bit, or make certain assumptions or constraints, then the overall behavior is linear, or close enough to linear that we can model it as such. Even if the underlying mechanism is quite complicated, we can still most easily think and talk about stuff when it's somewhat linear.
The first-order (linear) approximation to any function's behavior is always the most significant term, compared to second-order and third-order effects.
The big idea of linearity is as a tool in managing what would otherwise be a quickly unmanageable tangle of nonlinear complexity, so that we can actually wrap our heads around how to analyze and design useful systems.
In the next section, Systems of Equations, we'll talk about how to solve math problems with multiple equations and multiple variables.
Cite it: