In this section, we'll discuss how to solve systems of linear equations: perhaps 2, 3, 100, or even 1000 simultaneous equations, and correspondingly many unknowns.
Simple circuit problems can have tens to hundreds of equations and unknowns. It is not uncommon for complex circuits to be in the thousands of equations or beyond.
We're not going to discuss nonlinear equations because their complexity is hard to reason about. For example, consider the equation:
$$\sin(x)=0$$
This nonlinear equation has just 1 equation and 1 unknown, but has infinitely many solutions: $x = n\pi$ for all integers $n$. Additionally, as discussed in Linear & Nonlinear, we can find a solution for a nonlinear system by making a series of locally linear approximations using the Newton-Raphson Method.
To start, we need to define what we mean by a linear equation. A linear equation is one that can be written as:
$$r_1 x_1 + r_2 x_2 + \dots + r_n x_n = c$$
This equation has $n$ unknown variables $x_i$. Each term has some known constant coefficient $r_i$, a number which may be zero, in which case we don't usually write the $x_i$ term at all. And this equation has a single known constant term $c$ which the equation sums up to, which might be 0, or some other number.
These are a few unrelated linear equations:
$$ \begin{align} 5 x_1 - 3 x_2 + 6 x_4 & = 10 \\ y - 3 x & = 0 \\ 10 k & = 30 \end{align} $$
They are unrelated because they don't fit into the same space of variables. They do not form a system.
Sometimes, an equation needs to be algebraically manipulated to fit our desired standard form, but is still a linear equation. For example, from the slope-intercept form of a line:
$$ \begin{align} y & = 5(x - 2) \\ y & = 5 x - 10 \\ y - 5 x & = -10 \end{align} $$
If an equation can't be manipulated to fit this form, it is nonlinear. (It may be possible to construct a linear approximation, however: see the Algebraic Approximations and Linear & Nonlinear sections.)
Here are some examples of nonlinear equations:
$$ \begin{align} y & = \sin(x) & \text{has non-linear function inside} \\ x^2 & = 1 & \text{has polynomial term of order 2} \\ x (y + 1) & = 3 & \text{has product of unknowns} \end{align} $$
These sorts of equations will not be addressed here, but are still solvable with multiple numerical iterations using the same techniques shown here as a foundation.
A system of equations simply means that we have multiple equations, all of which must be satisfied at the same time, and multiple unknowns, which are shared between the equations.
An equation with unknowns is a search problem: we are searching for the value of the unknowns that will make the equation be true. The equation is true when the left side equals the right side.
For most values of the unknowns, the equation will be false: $y + 1 = 3$ is a false statement for infinitely many possible choices of $y$. It is only a true statement at one particular choice. In this case, this equation is a true statement only when $y=2$.
When we jump to having multiple equations and multiple unknowns, we have to think about not just whether our one equation is a true statement, but whether all of the equations in our system are true at the same time and for the same values of the unknowns.
Consider this system of two linear equations in two unknowns:
$$ \begin{align} x + 2 y & = 4 \\ x - y & = 1 \end{align} $$
Let's search for a solution by trying different values for the unknowns. We have to choose specific values for all of the unknowns in order to evaluate the left-hand sides of each equation, so we are searching over all possible values of all unknowns.
If we choose $x=0, y=0$, this will make both equations false. Therefore, $x=0, y=0$ is not a solution to this system.
If we choose $x=0, y=2$, this will make the first equation true. But it will make the second equation false. All of the equations must be true for the solution to be valid. Therefore, $x=0, y=2$ is not a solution to this system.
If we choose $x=2, y=1$, this will make both equations true. Therefore, $x=2, y=1$ is a solution to this system.
In fact, it is a unique solution point: it is the only solution to this system. Why? Because if, for example, we were to increase $x$ by a little bit, then one equation would require $y$ to increase a little bit to remain valid, and the other equation would require it to decrease. The variable $y$ can't both increase and decrease at the same time.
If we only have two unknowns, it's easy to map these to a two-dimensional x-y plane.
Continuing with the same two-equation example above, we could convert both equations to slope-intercept form:
$$ \begin{align} y & = - \frac {1} {2} x + 2\\ y & = x - 1 \end{align} $$
And now we can use the CircuitLab software to plot these two lines:
Click the "circuit," click "Simulate," then "Run DC Sweep." You'll see the two lines plotted, with an intersection at $x=2, y=1$. This shows how we can quickly use the DC Sweep mode of a circuit simulator as a simple but flexible and powerful graphing tool.
A solution is a specified configuration of values for all of the unknowns such that all of the equations are simultaneously satisfied.
There are only three possible outcomes for any system of equations. Either:
Using our two-equation, two-unknown example from earlier, we can consider these three cases. (If you haven't plotted it, do so now!)
If our two equations produce two perfectly parallel lines (but not the same line), there is no intersection. This means that there is no solution. No point in the x-y plane lies on both lines, so no x-y pair satisfies both equations at once.
If our two equations produce the exact same line overlapping with itself, then all points are intersecting. There are infinitely many solutions. All points on that line are possible solutions of the system of equations.
If the lines are not parallel, then they are guaranteed to have a single intersection point. This point is our one unique solution to the system. This is the normal case, and usually the one we're interested in, but it's important to see what can go wrong and what's necessary for our solution to exist at all.
You can quickly modify the plotted equations above to illustrate the two other cases for yourself. For example, see this one for two equations with no solution:
In two dimensions, solving a system of linear equations is equivalent to finding an intersection of the lines described by each equation. Each equation $A x + B y = C$ describes some line in 2D space.
In three dimensions, solving a linear system is equivalent to finding an intersection of the planes described by each equation. Each equation $A x + B y + C z = D$ describes some plane in 3D space.
In 3D, note that the intersection of two planes could be either:
That's why we can't just have an intersection between two planes (two equations) to uniquely solve a 3D linear system problem.
If we have a third equation (i.e. a third plane), we can take the intersection of that plane with the line that resulted from the intersection of the first two planes. In 3D, the intersection of a plane and a line can either be:
When we successfully solve a system of 3 equations and 3 unknowns, we're in the case where the first two equations intersected to produce a line, and that line then intersected with the third equation's plane to produce a single solution point.
In higher dimensions, solving a linear system is equivalent to finding an intersection of the hyperplanes described by each equation.
Every equation in our system is like a hyperplane in N-1 dimensions where N is the number of unknowns in the system. Why N-1? Because, in general, if you specify N-1 of the unknowns, then there's only 1 specific numerical value the remaining unknown can take for that equation to be satisfied:
Every time we include another equation, we're applying a new constraint, and, so long as a unique solution exists, we're reducing the dimensionality of the solution space by 1. Ideally, after we've applied all our equations one at a time, we're left with just a single unique solution point.
While most people do not have good intuition for geometry above 3D (or good drawing skills above 2D!), the geometric interpretation allows us to quickly see the possible cases of solving any linear system.
Consider an equation with just one unknown and one equation:
$$2 x = 10$$
This has one solution: $x=5$. But can we modify this equation to have no solution? Actually, yes:
$$0 x = 10$$
There is no value of $x$ for which this equation is true. Therefore, this system (of 1 equation and 1 unknown) has no solution.
But what if we change the right hand side, too?
$$0 x = 0$$
In this case, $x$ can take on any value, whether $x=5$ or $x=-\pi$ or $x=6 \times 10^{23}$, and still the equation remains true. There are infinitely many possible solutions.
It may appear that we've oversimplified, but in fact, there's nothing special about this example. These three outcomes are the only three possibilities with systems of any size, but they are illustrated here with the system of minimum size $N=1$ equations and unknowns.
When we have fewer equations than unknowns, a system definitely does not have a single solution, but it might have [i]either[/] zero or infinitely many solutions.
We're familiar with an equation like:
$$-x + y = 3$$
This is the equation of a line, with 1 equation and 2 unknowns. It has infinitely many solution points: all points on that line are solutions to the equation. That is the definition of a line.
With 2 unknowns and 1 equation constraining them, this infinite solution space happens to have $2 - 1 = 1$ degrees of freedom. For example, we could write the line as a vector equation:
$$ \begin{equation} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 1 \\ -1 \end{bmatrix} m + \begin{bmatrix} 0 \\ 3 \end{bmatrix} \end{equation} $$
where the real number $m$ projects the slope vector along the one degree of freedom allowed by the equation. (In a problem with 3 unknowns and 1 equation, we would have a vector equation with two different vectors and two independent degrees of freedom.)
How can a 1 equation, 2 unknown system have zero solutions? Well, a simple way is:
$$0 x + 0 y = 3$$
There are no values of $(x,y)$ that would satisfy this equation.
We don't normally think about it, but when we evaluate a line such as $-x + y = 3$ at a single point, such as $x=10$, we're really adding a second equation to the system:
$$ \begin{align} -x + y & = 3 \\ x & = 10 \end{align} $$
To evaluate at $x=10$, we are actually first adding that second equation to our system, and then solving a system with 2 equations and 2 unknowns. This happens so fast that we don't consider it consciously or write it down, but that simple second equation is vital to producing a single solution point. Geometrically, we are intersecting the original line with a new vertical line defined as $x=10$, and then finding the unique intersection point.
When a system has more equations than unknowns, the system is usually overdetermined and inconsistent, meaning no solution exists. For example:
$$ \begin{align} 2 x & = 4 \\ 4 x & = 6 \end{align} $$
There is no value of $x$ that satisfies both of these equations, so no solution exists.
However, what if our two equations were:
$$ \begin{align} 2 x & = 4 \\ 4 x & = 8 \end{align} $$
In this special case, the value $x=2$ happens to satisfy both equations. What's happened here? We've found two equations that are not linearly independent. The second equation doesn't actually give us any new information compared to the first. Any value of $x$ that satisfies the first equation will also satisfy the second.
We're not just interested in having N equations for N unknowns. We need all of those N equations to provide new information -- new constraints -- or else we would have the same solution space with just N-1 equations.
Our simple example of not-independent equations was very simple: $$ \begin{align} 2 x & = 4 \\ 4 x & = 8 \end{align} $$
It's clear that one equation is just double the other.
But linear independence is not a pairwise property. That means we can't figure it out by only looking at one pair of equations at a time. We have to look at the whole collection of equations.
Here's an example of three equations:
$$ \begin{align} x + 2 y + 3 z & = 1 \\ x & = -5 \\ 2 y + 3 z & = 6 \end{align} $$
Any two of these equations taken alone are linearly independent, but all three together are not. We can add the second and third equations together and get the first equation.
If a set of equations are not linearly independent, we can remove one (or possibly more) of them and still have the same space of solutions.
There is no simple rule for inspecting a set of equations and quickly determining if they are or are not linearly independent. If you see quickly see any obvious issues, such as where one equation is a just a nonzero multiple of another, or where one equation is a linear combination of two other equations, you can quickly determine that the set is not linearly independent. However, in general, you may have to combine all previous N-1 equations to determine that the Nth equation is or is not linearly independent. In practice, determining linear independence is done by solving the system using Gaussian eliminiation or LU decomposition, as will be described below.
Note that the linear independence of a set of equations only has to do with the coefficients of the unknowns on the left-hand-side of the equation. It does not depend on the values of the constants on the right-hand-side.
In the case where the system is not linearly independent, then the right-hand-side constants may help determine whether there are zero solutions (if the duplicated equations are inconsistent with each other) or infinitely many solutions (if the duplicated equations are consistent with each other).
For example, the non-independent system:
$$
\begin{align}
x + 2 y + 3 z & = 1 \\
x & = -5 \\
2 y + 3 z & = 6
\end{align}
$$
happens to be consistent with respect to the right-hand-side constants, so it could be reduced to a system of 2 equations in 3 unknowns which represent a line (in 3D space) passing through the $(x,y,z)$ points $(-5,3,0) \ \text{and}\ (-5,0,2)$, and infinitely many other points on the same line.
However, if we change just a single right-hand-side constant from 6 to 0:
$$ \begin{align} x + 2 y + 3 z & = 1 \\ x & = -5 \\ 2 y + 3 z & = 0 \end{align} $$
Now, while the left-hand-sides of the second and third equations sum to match the first, the right-hand-side constants do not. This guarantees that this system has no solutions.
While the outcome of no solutions versus infinitely many solutions may seem quite different, this is analogous to our $0 x = 10$ versus $0 x = 0$ in the 1 equation and 1 unknown base case described earlier. In either case, it is the structure of the left-hand-side alone that guarantees that we've lost our ability to produce a single solution point. That's why, at least from the perspective of solving linear systems of equations, linear independence is a property of the left-hand-side coefficients only.
A system having N linearly independent equations and N unknowns, where N is the same for both quantities, is a prerequisite for having a solvable linear system of equations. Why?
Only N linearly independent equations in N unknowns can produce a single solution point, which is what we're usually looking for.
Let's consider a system with 3 equations and 3 unknowns:
$$
\begin{align}
f + 2 g + 3 h & = 1 \\
f & = -5 \\
g + h & = 1
\end{align}
$$
This system has 3 linearly independent equations, and has a single valid solution: $f=-5, g=-3, h=4$. You should quickly substitute these values in to each equation and confirm that it is valid.
We could also write this directly as a matrix equation $\mathbf{A} \mathbf{x} = \mathbf{B}$, where $\mathbf{A}$ is a 3x3 matrix of left-hand-side coefficients, $\mathbf{x}$ is a 3x1 vector of unknowns, and $\mathbf{B}$ is a 3x1 vector of right-hand-side constants:
$$ \begin{equation} \begin{bmatrix} 1 & 2 & 3 \\ 1 & 0 & 0 \\ 0 & 1 & 1 \end{bmatrix} \begin{bmatrix} f \\ g \\ h \end{bmatrix} = \begin{bmatrix} 1 \\ -5 \\ 1 \end{bmatrix} \end{equation} $$
By performing the matrix multiplication on the left, we recover one equation from each row of $\mathbf{A}$.
You should become comfortable going back and forth between the matrix representation and the equation representation. They are truly identical. However, we often work in the matrix representation because it's more compact and it's also the way we solve these systems with computational tools. Learning to approach and solve problems both in the equation representation and the matrix representation is valuable, and we will flip back and forth in the rest of this book.
This particular problem is a simple matrix equation, and can easily be solved by any matrix solver. For example, click here to solve this matrix equation on Wolfram Alpha.
Solving a linear system like this is so common that in many mathematical software packages (including MATLAB, Octave, Sage) one simply uses the backslash ("\") operator to designate solving an equation: $\mathbf{x} = \mathbf{A} \backslash \mathbf{B}$.
To make it even more compact to represent systems of equations, it's common to omit the vector of unknowns, and to simply join the $\mathbf{B}$ vector on to the right-hand-side of $\mathbf{A}$, with a vertical bar separating them:
$$ \begin{equation} \left[ \begin{array}{ccc|c} 1 & 2 & 3 & 1 \\ 1 & 0 & 0 & -5 \\ 0 & 1 & 1 & 1 \end{array} \right] \ \text{for} \ (f,g,h) \end{equation} $$
This is called an augmented matrix, and it's just another identical way of writing the system of equations from above. There's nothing special about it except as a convenient way of writing these equations, and it happens to keep the left-hand-side coefficients and right-hand-side constants of each row together nicely.
When we consider writing a system of equations as a matrix, we have two considerations:
The ordering of the columns maps directly to the ordering of the unknowns $(f,g,h)$ in the example above. However, we could have chosen a different column ordering, so long as we're careful to keep track as we work.
The ordering of the rows maps directly to the ordering of the equations. But a system of equations has the same meaning regardless of what order we write the equations. If we reorder the rows of $\mathbf{A}$, we must also reorder the corresponding rows of $\mathbf{B}$. However, the augmented matrix form $[\mathbf{A}|\mathbf{B}]$ does this for us "automatically" by keeping the entire row together. We can write this augmented matrix in any row order and the system will have the same solution.
Are there any especially convenient orderings that make a system of equations easier to solve? Yes. Let's look at a few.
Here's a system of 5 equations in 5 unknowns which takes practically zero effort to solve:
$$ \begin{align} - v & = 3 \\ 9 w & = 6 \\ 5 x & = 10 \\ 8 y & = 24 \\ z & = 99 \end{align} $$
It's obvious that the equations don't interact with each other at all, and we can solve each by simply dividing the right-hand-side constant by the left-hand-side coefficient: $$ \begin{align} v = \frac {3} {-1} \\ w = \frac {6} {9} \\ x = \frac {10} {5} \\ y = \frac {24} {8} \\ z = \frac {99} {1} \end{align} $$
As an augmented matrix, this system of equations looks like:
$$ \begin{equation} \left[ \begin{array}{ccccc|c} -1 & \color{silver}{0} & \color{silver}{0} & \color{silver}{0} & \color{silver}{0} & 3 \\ \color{silver}{0} & 9 & \color{silver}{0} & \color{silver}{0} & \color{silver}{0} & 6 \\ \color{silver}{0} & \color{silver}{0} & 5 & \color{silver}{0} & \color{silver}{0} & 10 \\ \color{silver}{0} & \color{silver}{0} & \color{silver}{0} & 8 & \color{silver}{0} & 24 \\ \color{silver}{0} & \color{silver}{0} & \color{silver}{0} & \color{silver}{0} & 1 & 99 \end{array} \right] \ \text{for} \ (v,w,x,y,z) \end{equation} $$
The 5x5 coefficient matrix has a special shape: it's a diagonal matrix, meaning it only has nonzero entries on the main diagonal, and is strictly zero everywhere else.
Remember that for any system of equations we are free to rearrange the ordering of the rows freely, and additionally, we are free to rerrange the ordering of the columns so long as we keep track of the corresponding ordering of the unknown variables. That's why the easy-to-solve diagonal structure has more to do with the structure of the equations themselves, rather than the order in which we may first see them written.
Here's a system of 4 equations in 4 unknowns which was selected to be easily solvable by hand:
$$ \begin{align} 2 w & = 16 \\ w - 3 x & = 14 \\ -w + x + 10 y & = -10 \\ w + x + 6 y + 2 z & = 0 \end{align} $$
Make sure you can solve this system quickly by hand before you continue. You should find $w=8, x=-2, y=0, z=3$.
This system is easy to solve because we can work one row at a time. Looking only at the first equation, we have one equation and one unknown, so we'll immediately find a value for $w = \frac {16} {2} = 8$.
Next, looking at the second row, we can substitute in the first value we found $w=8$, and then the second row then becomes $8 - 3 x = 14$, or $-3 x = 6$, so $x = -2$.
This process continues on and on for each subsequent equation because each new equation contains only variables we've already solved, plus one new unknown which is now easily solved by doing a bit of subtraction and division.
Here's the same system written as an augmented matrix:
$$ \begin{equation} \left[ \begin{array}{cccc|c} 2 & \color{silver}{0} & \color{silver}{0} & \color{silver}{0} & 16 \\ 1 & -3 & \color{silver}{0} & \color{silver}{0} & 14 \\ -1 & 1 & 10 & \color{silver}{0} & -10 \\ 1 & 1 & 6 & 2 & 0 \end{array} \right] \ \text{for} \ (w,x,y,z) \end{equation} $$
This shape of coefficient matrix is called a lower triangular matrix, meaning that nonzero values are present only on the main diagonal and any cells below the main diagonal. The area above the main diagonal is filled with strictly zero values.
A lower triangular matrix problem is easy to solve through forward substitution:
Similarly, we can mirror the situation. For example, consider this 3 equation and 3 unknown system:
$$ \begin{align} x - y + z & = 2\\ 3 y - z & = 3 \\ 2 z & = 6 \end{align} $$
We can again easily solve by hand to find $x=1, y=2, z=3$. But in this case, our intuition is to solve from the bottom up: first find $z = \frac {6} {2}$, then substitute that into the equation above that to find $y$, and so on.
This system written as an augmented matrix:
$$ \begin{equation} \left[ \begin{array}{ccc|c} 1 & -1 & 1 & 2 \\ \color{silver}{0} & 3 & -1 & 3 \\ \color{silver}{0} & \color{silver}{0} & 2 & 6 \end{array} \right] \ \text{for} \ (x,y,z) \end{equation} $$
The shape of this coefficient matrix is upper triangular, meaning that there are nonzero values present only on the main diagonal and any cells above the main diagonal. All the cells below the main diagonal are zero.
An upper triangular matrix problem is easy to solve through back substitution:
In general, most systems of equations we're interested in solving will not happen to be one of the three easy to solve cases (diagonal, lower triangular, upper triangular). However, we'll now show that it's possible to solve any matrix problem by doing a series of simple operations that convert it into a triangular matrix problem without changing the solution of the system of equations.
Equations have some nice properties that we take for granted:
$$ \begin{align} a & = b \\ a + m & = b + m \\ k a & = k b \end{align} $$
These basic rules say that if we have an equation, we can always add a constant to both sides and it's still true, or we can multiply by a constant and it's still true.
If we have two equations $a=b \ \text{and} \ c=d$ then we can combine them:
$$a + c = b + d$$
Suppose we have a linearly independent system of 2 equations with 2 unknowns $x, y$:
$$ \begin{align} a_{11} x + a_{12} y & = b_1 \\ a_{21} x + a_{22} y & = b_2 \end{align} $$
We can, for example, add the two equations together to create a new equation:
$$(a_{11} x + a_{12} y) + (a_{21} x + a_{22} y) = b_1 + b_2 $$
Or, after factoring the terms:
$$(a_{11} + a_{21}) x + (a_{12} + a_{22}) y = b_1 + b_2$$
This last line is a new equation that is a linear combination of the original two equations. It is not linearly independent of the original two. However, importantly, if we pick either one (and only one) of the original equations, alongside our new equation, those two form a new system of two equations that is a linearly independent (assuming the original equations were linearly independent).
That is to say that we can modify our original system of equations in this way and get a new system of equations which is equivalent to (i.e. has the same solution to) the original system.
Because this is usually done with the matrix representation as discussed above, and each equation is represented by a row in our matrix, these modifications are called "row operations." For example, to illustrate the same addition and overwrite to form a new system:
$$ \begin{equation} \left[ \begin{array}{cc|c} a_{11} & a_{12} & b_1 \\ a_{21} & a_{22} & b_2 \end{array} \right] \ \text{for} \ (x,y) \\ \xrightarrow{\text{Add row 1 + row 2; store result in row 2}} \\ \left[ \begin{array}{cc|c} a_{11} & a_{12} & b_1 \\ (a_{11} + a_{21}) & (a_{12} + a_{22}) & (b_1 + b_2) \end{array} \right] \ \text{for} \ (x,y) \end{equation} $$
Both the original & modified augmented matrices have the same solution. We can apply further row operations to continue to manipulate the system. The goal of these row operations will be to turn our original system into a triangular system which we know is easy to solve.
Here are the three types of allowed row operations:
An equation can always be scaled by multiplying both sides by a constant factor (as long as that factor is not zero). This does not change the solution space of that equation.
We have to store the new, scaled equation in place of the original. We might write:
$$ 2 R_3 \rightarrow R_3 $$
to indicate that we're going to double row #3 and then store that new equation back in the same row.
A system of equations has the same meaning regardless of what order we list the equations in. Therefore, we can always swap two rows of our augmented matrix and have a new system with the same meaning. We might write:
$$ R_2 \leftrightarrow R_3 $$
to indicate that we're going to swap row #2 with row #3.
(All reorderings of rows are possible through a series of pairwise row swaps.)
We can always add any nonzero multiple of one row to another row. This produces a linear combination of the two parent rows.
In order to maintain all the information that is previously present in the original system, we have to replace one of the two parent rows with our new linear combination.
For example, we might write:
$$ R_4 - 3 R_1 \rightarrow R_4 $$
to indicate that we're going to take the fourth row, subtract three times the first row, and then store the result again in the fourth row.
All three of these row operations produce a new system of equations which has the same solution as the original system. Now, we'll see how to use these operations to solve a system of equations.
Solving a linear system using Gaussian elimination is a two-step process:
Let's consider the following system of equations:
$$ \begin{align} w + x + y & = 11 \\ 2 w - y & = 0 \\ 8 x + y + z & = 12 \\ w + y & = 9 \end{align} \\ \text{or, as an augmented matrix:} \\ \left[ \begin{array}{cccc|c} 1 & 1 & 1 & 0 & 11 \\ 2 & 0 & -1 & 0 & 0 \\ 0 & 8 & 1 & 1 & 12 \\ 1 & 0 & 1 & 0 & 9 \end{array} \right] \ \text{for} \ (w,x,y,z) $$
For each row of the matrix, we'll look at a pivot value -- the value that happens to be on the main diagonal -- and use it to eliminate any nonzero matrix coefficients in the equations below it. For the first row, the pivot is marked here in red, and the two nonzero values to be eliminated are marked in green:
$$ \left[ \begin{array}{cccc|c} \color{red}{1} & 1 & 1 & 0 & 11 \\ \color{green}{2} & 0 & -1 & 0 & 0 \\ \color{silver}{0} & 8 & 1 & 1 & 12 \\ \color{green}{1} & 0 & 1 & 0 & 9 \end{array} \right] \ \text{for} \ (w,x,y,z) $$
To remove the green values we'll subtract the correct multiple (the multiple which cancels out the corresponding term) of the pivot row from each of the rows below it:
$$ \xrightarrow{\begin{align} R_2 - \frac{2} {1} R_1 & \rightarrow R_2 \\ R_4 - \frac{1} {1} R_1 & \rightarrow R_4 \end{align}} \\ \left[ \begin{array}{cccc|c} \color{red}{1} & 1 & 1 & 0 & 11 \\ \color{silver}{0} & -2 & -3 & 0 & -22 \\ \color{silver}{0} & 8 & 1 & 1 & 12 \\ \color{silver}{0} & -1 & 0 & 0 & -2 \end{array} \right] \ \text{for} \ (w,x,y,z) $$
(No operation was necessary on the 3rd row, because its corresponding cell in the pivot column was already 0.)
Note that after these two row operations, we've made it so that the entire column below the pivot cell is all zeros.
Now, we move to the 2nd row. The pivot is the red cell on the main diagonal, and the green cells below it are ones we'll work on eliminating: $$ \left[ \begin{array}{cccc|c} 1 & 1 & 1 & 0 & 11 \\ \color{silver}{0} & \color{red}{-2} & -3 & 0 & -22 \\ \color{silver}{0} & \color{green}{8} & 1 & 1 & 12 \\ \color{silver}{0} & \color{green}{-1} & 0 & 0 & -2 \end{array} \right] \ \text{for} \ (w,x,y,z) $$
Again, we need two row operations to eliminate the 8 and the -1 values:
$$ \xrightarrow{\begin{align} R_3 - \frac{8} {-2} R_2 & \rightarrow R_3 \\ R_4 - \frac{-1} {-2} R_2 & \rightarrow R_4 \end{align}} \\ \left[ \begin{array}{cccc|c} 1 & 1 & 1 & 0 & 11 \\ \color{silver}{0} & \color{red}{-2} & -3 & 0 & -22 \\ \color{silver}{0} & \color{silver}{0} & -11 & 1 & -76 \\ \color{silver}{0} & \color{silver}{0} & \frac{3}{2} & 0 & 9 \end{array} \right] \ \text{for} \ (w,x,y,z) $$
There are lots of negative signs flying around, so be careful!
Finally, we'll operate on the 3rd row:
$$ \left[ \begin{array}{cccc|c} 1 & 1 & 1 & 0 & 11 \\ \color{silver}{0} & -2 & -3 & 0 & -22 \\ \color{silver}{0} & \color{silver}{0} & \color{red}{-11} & 1 & -76 \\ \color{silver}{0} & \color{silver}{0} & \color{green}{\frac{3}{2}} & 0 & 9 \end{array} \right] \ \text{for} \ (w,x,y,z) $$
With just one remaining row, only one row operation is necessary to clear that cell.
$$ \xrightarrow{\begin{align} R_4 - \frac{3/2} {-11} R_3 \rightarrow R_4 \end{align}} \\ \left[ \begin{array}{cccc|c} 1 & 1 & 1 & 0 & 11 \\ \color{silver}{0} & -2 & -3 & 0 & -22 \\ \color{silver}{0} & \color{silver}{0} & \color{red}{-11} & 1 & -76 \\ \color{silver}{0} & \color{silver}{0} & \color{silver}{0} & \frac {3} {22} & -\frac{30} {22} \end{array} \right] \ \text{for} \ (w,x,y,z) $$
There are some unpleasant fractions flying around, but the result of this operation is that the coefficient matrix is now upper triangular!
$$ \left[ \begin{array}{cccc|c} 1 & 1 & 1 & 0 & 11 \\ \color{silver}{0} & -2 & -3 & 0 & -22 \\ \color{silver}{0} & \color{silver}{0} & -11 & 1 & -76 \\ \color{silver}{0} & \color{silver}{0} & \color{silver}{0} & \frac {3} {22} & -\frac{30} {22} \end{array} \right] \ \text{for} \ (w,x,y,z) $$
Now that it's upper triangular, we just solve with back-substitution: start from the last row, $\frac {3} {22} z = -\frac{30} {22}$ and solve to find $\boxed{z=-10}$.
Next, we move up a row and look at the 3rd row, while substituting in the value of the one unknown $z$ we've already solved:
$$ \begin{align} -11 y + z & = -76 \\ -11 y - 10 & = -76 \\ -11 y & = -66 \\ \boxed{y = 6} \end{align} $$
And now the 2nd row, substituting in the two unknowns $y, z$ we now know:
$$ \begin{align} -2 x -3 y & = -22 \\ -2 x - 18 & = -22 \\ -2 x & = -4 \\ \boxed{x = 2} \end{align} $$
And finally the 1st row, substituting in all the other terms:
$$ \begin{align} w + x + y & = 11 \\ w + 2 + 6 & = 11 \\ \boxed{w = 3} \end{align} $$
And that's it! We've solved for all our unknowns. Most of the work was involved in using each pivot to eliminate all the nonzero values in the column below it. The resulting upper triangular matrix was easy to solve bottom-up.
Gaussian elimination, as shown here, is a very mechanical way to use row operations to transform an arbitrary matrix problem into a triangular one by eliminating nonzero terms below the main diagonal, one at a time.
At each step in the Gaussian elimination process, we looked at row $p$ and our pivot value $A_{pp}$, and then used it to cancel out any nonzero terms $A_{np} \ \text{for} \ n>p$. The cancellation worked because we multiplied row $p$ by a ratio of the terms, $\frac{A_{np}}{A_{pp}}$ so that:
$$A_{np} - \big( \frac{A_{np}}{A_{pp}} \big) A_{pp} = 0$$
in order to zero out that cell. However, this process breaks if we have to divide by zero, i.e. if $A_{pp} = 0$.
As an example, let's consider the same four equations, but let's swap row 1 and row 3 before we begin:
$$ \begin{align} 8 x + y + z & = 12 \\ 2 w - y & = 0 \\ w + x + y & = 11 \\ w + y & = 9 \end{align} \\ \text{or, as an augmented matrix:} \\ \left[ \begin{array}{cccc|c} \color{red}{0} & 8 & 1 & 1 & 12 \\ \color{green}{2} & 0 & -1 & 0 & 0 \\ \color{green}{1} & 1 & 1 & 0 & 11 \\ \color{green}{1} & 0 & 1 & 0 & 9 \end{array} \right] \ \text{for} \ (w,x,y,z) $$
There is no way we can use the red 0 pivot cell to eliminate the green nonzero values in the column below it. There is no multiple of 0 that will additively cancel out nonzero values.
A pivot value must be nonzero.
If you encounter a case where the pivot value is zero, you must swap rows so that the pivot is not zero before you proceed. If this happens at an intermediate step, that's OK: you can pause and swap row $p$ with some later row $n>p$, and then resume. (Don't swap with an earlier row $n \lt p$ because that will have nonzero values to the left of the pivot cell!)
Circuit simulators like CircuitLab (and other numerical software) usually approach Gaussian elimination with a small twist. Instead of only working on an NxN original coefficient matrix $\mathbf{A}$ to create the NxN upper triangular matrix (which we'll call $\mathbf{U}$ for upper), we'll simultaneously create a NxN lower triangular matrix ($\mathbf{L}$ for lower), such that $\mathbf{A} = \mathbf{L} \mathbf{U}$, the matrix product of the lower and upper triangular matrices.
The process of solving $\mathbf{A} = \mathbf{L} \mathbf{U}$ is called LU Decomposition or LU Factorization.
Notice that the Gaussian elimination process to make the matrix be upper triangular did not depend on the right-hand-side constant values; while they were modified by the row operations, they did not determine the modifications of the coefficient rows, and were referenced only in the backsubstitution step. If we wanted to use the same left-hand-side $\mathbf{A}$ but re-solve our equations for a different set of constants $\mathbf{B}$ on the right, we would have to redo much of our work.
However, if we already have a decomposition $\mathbf{A} = \mathbf{L} \mathbf{U}$, then we can solve the system $\mathbf{A} \mathbf{x} = \mathbf{B}$ for a new right-hand-side $\mathbf{B}$ using this two-step process:
$$ \begin{align} \mathbf{A} \mathbf{x} & = \mathbf{B} \\ \mathbf{L} \mathbf{U} \mathbf{x} & = \mathbf{B} \\ \mathbf{L} (\mathbf{U} \mathbf{x}) & = \mathbf{B} \\ \mathbf{L} \mathbf{y} & = \mathbf{B} \end{align} $$
These two solves are very, very fast because they're each working on a triangular matrix.
The process of creating the $\mathbf{L}, \mathbf{U}$ matrices is very similar to Gaussian elimination, but uses $\mathbf{L}$ to keep track of the multiplications used for each pivot elimination. We won't get into more detail here, but the LU decomposition for our 4x4 example matrix above is:
$$ \begin{equation} \mathbf{A} = \mathbf{L} \mathbf{U} \\ \begin{bmatrix} 1 & 1 & 1 & 0 \\ 2 & 0 & -1 & 0 \\ 0 & 8 & 1 & 1 \\ 1 & 0 & 1 & 0 \end{bmatrix} = \begin{bmatrix} 1 & \color{silver}{0} & \color{silver}{0} & \color{silver}{0} \\ 2 & 1 & \color{silver}{0} & \color{silver}{0} \\ 0 & -4 & 1 & \color{silver}{0} \\ 1 & -\frac{1}{2} & -\frac{3}{22} & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 & 1 & 0 \\ \color{silver}{0} & -2 & -3 & 0 \\ \color{silver}{0} & \color{silver}{0} & -11 & 1 \\ \color{silver}{0} & \color{silver}{0} & \color{silver}{0} & \frac {3} {22} \end{bmatrix} \end{equation} $$
Where do these $\mathbf{L}, \mathbf{U}$ matrices come from?
Solving $\mathbf{L} \mathbf{y} = \mathbf{B}$ finds the same transformed right-hand side constants that Gaussian elimination produces on the augmented matrix. Then, solving $\mathbf{U} \mathbf{x} = \mathbf{y}$ just solves the same upper triangular system resulting from Gaussian elimination. The first step may look unfamiliar, but it's really just doing the same work that Gaussian elimination would do on the right-hand side.
In practice, many problems (including many circuit problems) involve re-solving the same coefficient matrix $\mathbf{A}$ with different constant values $\mathbf{B}$ repeatedly, so performing the relatively slow LU decomposition just once and then performing the fast triangular solves as many times as needed can be a tremendous speedup.
Just as Gaussian elimination requires swapping rows to ensure nonzero pivot values, LU decomposition actually may require reordering the rows too. In practical implementations, LU decomposition actually transforms $\mathbf{A}$ by solving $\mathbf{P} \mathbf{A} = \mathbf{L} \mathbf{U}$, where $\mathbf{P}$ is a permutation matrix (exactly one 1 in every row and every column; zero otherwise) which simply reorders the rows of A so that a decomposition exists.
From the LU decomposition with row permutation, we can solve:
$$ \begin{align} \mathbf{A} \mathbf{x} = \mathbf{B} \\ \mathbf{P} \mathbf{A} \mathbf{x} = \mathbf{P} \mathbf{B} \\ \mathbf{L} \mathbf{U} \mathbf{x} = \mathbf{P} \mathbf{B} \\ \end{align} $$
Notice that we apply the row permutation to both the coefficients $\mathbf{P} \mathbf{A}$ and to the right-hand-side constants $\mathbf{P} \mathbf{B}$; this preserves their integrity as equations, just like the augmented matrix form does.
As $\mathbf{P} \mathbf{B}$ is just a reordering of the vector $\mathbf{B}$, we can then continue with the forward and back substitution solves as before:
The algorithms for actually finding $\mathbf{P}, \mathbf{L}, \mathbf{U}$ from a given $\mathbf{A}$ are somewhat advanced and we won't elaborate in this book, but the implementation details are important to producing a good result, especially on matrices of large size, high sparsity, and wide dynamic range, as are often present in electronics problems. High quality implementations of LU decomposition like the one built into the CircuitLab software provide high performance and excellent numerical stability over a wide space of possible input matrices.
When we're asked to solve a system of equations by hand, we'll still use the same tools, but we can make our task slightly easier by taking a few shortcuts as we see them.
For systems of N linear equations with N unknowns, write the equations with all unknown terms on the left-hand side and all constants on the right-hand side, and then follow these three rules:
Repeat, doing one operation from this list until the system is completely solved. Some tips:
This strategy is essentially Gaussian elimination, but we've intermixed the reduction and the substitution steps to do whichever is more convenient first, because if we're showing our work by writing each step on paper it's often convenient to reduce the dimensionality of our problem as early as possible.
Here's an example of how we might solve a system by hand. Let's start with the same system we solved using Gaussian elimination earlier:
$$ \begin{align} w + x + y & = 11 \\ 2 w - y & = 0 \\ 8 x + y + z & = 12 \\ w + y & = 9 \end{align} \\ \text{or, as an augmented matrix:} \\ \left[ \begin{array}{cccc|c} 1 & 1 & 1 & 0 & 11 \\ 2 & 0 & -1 & 0 & 0 \\ 0 & 8 & 1 & 1 & 12 \\ 1 & 0 & 1 & 0 & 9 \end{array} \right] \ \text{for} \ (w,x,y,z) $$
First, we notice that we can use the 4th row to cancel out 3rd-column $y$ terms in both the 1st and 2nd equations without introducing any new nonzero terms:
$$ \xrightarrow{\begin{align}R_1 - R_4 & \rightarrow R_1 \\ R_2 + R_4 & \rightarrow R_2 \end{align}} \\ \left[ \begin{array}{cccc|c} 0 & 1 & 0 & 0 & 2 \\ 3 & 0 & 0 & 0 & 9 \\ 0 & 8 & 1 & 1 & 12 \\ 1 & 0 & 1 & 0 & 9 \end{array} \right] \ \text{for} \ (w,x,y,z) $$
The first and second row now have just one nonzero term each, so we immediately have two of our results:
$$ \boxed{x = 2} \ \text{and} \ \boxed{w = 3} $$
We can substitute these values into the remaining two equations to produce our remaining system:
$$ \left[ \begin{array}{cc|c} 1 & 1 & -4 \\ 1 & 0 & 6 \end{array} \right] \ \text{for} \ (y,z) $$
We now have a simple second equation with $\boxed{y=6}$, which we can then plug into the first equation and find $\boxed{z=-10}$. We now have solved all four unknowns. You can and should quickly check that these values satisfy the original four equations, and they match the Gaussian elimination solution we found previously.
Let's try another approach. Here's the original system of equations again:
$$ \left[ \begin{array}{cccc|c} 1 & 1 & 1 & 0 & 11 \\ 2 & 0 & -1 & 0 & 0 \\ 0 & 8 & 1 & 1 & 12 \\ 1 & 0 & 1 & 0 & 9 \end{array} \right] \ \text{for} \ (w,x,y,z) $$
Now we'll notice that the 2nd row has two terms and zero constant value: it says $2 w - y = 0$, or $y = 2 w$. We're going to choose to keep $w$ and remove $y$ from our set of equations by substituting $2 w$ every time we see a $y$, and then we can remove the 2nd equation entirely from our system:
$$ \xrightarrow{\begin{align} \text{substitute}\ 2w \ \text{for} \ y \\ \text{remove} \ R_2 \end{align}} \\ \left[ \begin{array}{ccc|c} 3 & 1 & 0 & 11 \\ 2 & 8 & 1 & 12 \\ 3 & 0 & 0 & 9 \end{array} \right] \ \text{for} \ (w,x,z) $$
Notice that the right-hand side values are unchanged. While we haven't actually determined the numerical value for $y$ yet, we've now quickly reduced the dimensionality of our problem from 4 to 3. And in doing so, the new third equation is trivially solvable for $\boxed{w=3}$.
Now that we know $w$, we also can remember to solve our simpliciation, $y = 2w \ \text{for} \ \boxed{y = 6}$.
We substitute $w=3$ into the remaining two equations and remove the third equation to find:
$$ \left[ \begin{array}{cc|c} 1 & 0 & 2 \\ 8 & 1 & 6 \\ \end{array} \right] \ \text{for} \ (x,z) $$
From here, it's clear from the 1st row that $\boxed{x=2}$, and then when substituting that value into the second row, we find $\boxed{z=10}$.
This is the same example system of equations we solved mechanically in the "Gaussian Elimination" section above, but by applying these solve-by-hand strategies, we have to do a lot less work (significantly fewer total operations), avoid ugly fractions, and get the answer faster!
When solving by hand, we have lots of opportunities to make arithmetic mistakes (for example, adding rather than subtracting when computing the new right-hand side after substitution), as well as record-keeping mistakes (for example, not keeping track of which column maps to which variable as we go). Fortunately, it's relatively easy to check your solution after you compute it: just plug it into each of the equations and make sure the left side equals the right. Get in the habit of quickly checking your work after you solve systems of equations.
In the next section, Steady State & Transient, we'll consider how to separate the analysis of a system's normal operating conditions from events that may change those conditions temporarily or permanently.
How to cite this source: