# Elementary matrices and the general linear group

** Published:**

*THIS POST IS CURRENTLY UNDER CONSTRUCTION*

## Introduction

In a previous blog post, we showed how systems of linear equations can be represented as a matrix equation. For example, the system of linear equations,

\[\begin{align*}a_{1,1}x_1 + a_{1,2}x_2 + a_{1,3}x_3 &= b_1 \\ a_{2,1}x_1 + a_{2,2}x_2 + a_{2,3}x_3 &= b_2 \\ a_{3,1}x_1 + a_{3,2}x_2 + a_{3,3}x_3 &= b_3 \end{align*}\]can be represented succinctly as

\[\boldsymbol{Ax} = \boldsymbol{b}\]where $\boldsymbol{A}$ is the matrix of coefficients $a_{1,1}, a_{1,2}, \dots, a_{3,3}$ and $\boldsymbol{b}$ is the matrix of coefficients of $b_1, b_2,$ and $b_3$. Furthermore, we noted that this system will have exactly one solution if $\boldsymbol{A}$ is an invertible matrix.

In this post, we will discuss how one can solve for this exact solution using a series of operations of the system that involve a particular class of invertible matrices called the **elementary matrices**. We will show that any invertible matrix can be converted to another invertible matrix by performing some sequence of matrix multiplications with elementary matrices. In fact, this reveals an elegant mathematical structure regarding invertible matrices: they form a **group**. That is, you can always go from one invertible matrix to another via a series of multipliciations by elementary row matrices!

## Elementary row operations

Before digging into matrices, let’s first discuss how one can go about solving a system of linear equations. Say we have a system with three equations and three variables:

\[\begin{align*}a_{1,1}x_1 + a_{1,2}x_2 + a_{1,3}x_3 &= b_1 \\ a_{2,1}x_1 + a_{2,2}x_2 + a_{2,3}x_3 &= b_2 \\ a_{3,1}x_1 + a_{3,2}x_2 + a_{3,3}x_3 &= b_3 \end{align*}\]To solve such a system, our goal is perform simple algebraic operations on these equations until we convert the system to one with the following form:

\[\begin{align*}x_1 &= c_1 \\ x_2 &= c_2 \\ x_3 &= c_3 \end{align*}\]where $c_1, c_2$, and $c_3$ are the solutions to the system – that is, they are the values we can assign to $x_1, x_2$, and $x_3$ so that all of the equations in the system are valid.

Now, what kinds of algebraic operations can we perform on the equations of the system to solve it? There are three main categories, called **elementary row operations** (we’ll see soon, why they have this name):

**Scalar multiplication**: Simply multiply both sides of one of the equations by a scalar. For example,**Row swap**: You can move one equation above or below another. Note, the order the equations are written is irrevalent to the solution, so swapping rows is really just changing how we organize the formulas. Nonetheless, this organization will be important as we demonstrate how elementary matrices can be used to solve the system.**Row sum**: Add a multiple of one equation to another. Note, this is a perfectly valid operation because, if the equality truly holds, then this equations to simply adding the same quantity to both sides of a given equation.

Let’s use these operations to solve the following system:

\[\begin{align*}-x_1 - 2 x_2 + x_3 &= -3 \\ 3 x_2 &= 3 \\ 2 x_1 + 4 x_2 &= 10\end{align*}\]- First, we
*row swap*the first and third equations:

- Next, let’s perform
*scalar multiplication*and multiply the first equation by 1/2:

- Next, let’s perform a
*row sum*and add the first row to the third:

- Next, let’s perform
*scalar multiplication*and multiply the second equation by 1/3:

- Finally, let’s perform a
*row sum*and add -2 multiplied by the second row to the first:

And there we go, we’ve solved the system using these elementary row operations.

## Elementary row operations in matrix notation

Recall, we can represent a system of linear equations as a matrix equation, we showed how systems of linear equations can be represented as a matrix equation.

The linear system that we just solved can be written as:

\[\begin{bmatrix}-1 & -2 & 1 \\ 0 & 3 & 0 \\ 2 & 4 & 0 \end{bmatrix}\begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix} = \begin{bmatrix}-3 \\ 3 \\ 10\end{bmatrix}\]When solving the system using the elementary row operations, we needn’t write out all of the equations. Really, all we need to do is keep track of how $\boldsymbol{A}$ and \(\boldsymbol{b}\) are being transformed upon each iteration. For ease of notation, we can join $\boldsymbol{A}$ and $\boldsymbol{b}$ into a single matrix, called an **augmented matrix**. In our example, this augmented matrix would look like:

In the augmented matrix, the final column stores $\boldsymbol{b}$ and all of the previous columns store the columns of $\boldsymbol{A}$. Our execution of the row operations can now operate only on this augmented matrix as follows:

*Row swap*: swap the first and third equations:

*Scalar multiplication*: Multiply the first equation by 1/2:

*Row sum*: add the first row to the third:

*Scalar multiplication*: Multiply the second equation by 1/3:

*Row sum*and add -2 multiplied by the second row to the first:

Now, let’s re-write the augmented matrix as a matrix equation:

\[\begin{bmatrix}1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}\begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix} = \begin{bmatrix}3 \\ 1 \\ 2\end{bmatrix}\]Note that \(\boldsymbol{A}\) has been *transformed* to the identity matrix \(\boldsymbol{I}\). This will be a key observation as we move into the next section.

## Elementary matrices

Notice on each elementary row operation, we transformed the matrix $\boldsymbol{A}$ using a series of steps until it became the identity matrix $\boldsymbol{I}$. In fact, each of these elementary row operations can be represented as a matrix that is operating on $\boldsymbol{I}$. Such a matrix that represents an elementary row operation is called an **elementary matrix**.

To demonstrate, that our elementary row operations can be performed using matrix multiplication, let’s look back at our example. We start with the matrix

\[\boldsymbol{A} := \begin{bmatrix}-1 & -2 & 1 \\ 0 & 3 & 0 \\ 2 & 4 & 0 \end{bmatrix}\]Then, first we *row swap* the first and third equations:

Then perform *scalar multiplication* and multiply the first equation by 1/2:

Then perform a *row sum* and add the first row to the third:

Then perform *scalar multiplication* and multiply the second equation by 1/3:

Then perform a *row sum* and add -2 multiplied by the second row to the first:

Notice, we’ve derived a series of matrices that when multiplied by $\boldsymbol{A}$ produces the identity matrix:

\[\boldsymbol{E}_5\boldsymbol{E}_4\boldsymbol{E}_3\boldsymbol{E}_2\boldsymbol{E}_1\boldsymbol{A} = \boldsymbol{I}\]By the definition of an inverse matrix, we see that the matrix formed by $\boldsymbol{E}_5\boldsymbol{E}_4\boldsymbol{E}_3\boldsymbol{E}_2\boldsymbol{E}_1$ is the inverse of $\boldsymbol{A}$! That is,

\[\boldsymbol{A}^{-1} = \boldsymbol{E}_5\boldsymbol{E}_4\boldsymbol{E}_3\boldsymbol{E}_2\boldsymbol{E}_1\]Thus, we have found a way to decompose the inverse of $\boldsymbol{A}$ into a set of matrices that when multiplied together yield its inverse. Each of these matrices represents a transformation on $\boldsymbol{A}$ equivalent to an elementary row operation that one would use to solve an equation of the form $\boldsymbol{Ax} = \boldsymbol{b}$!

## The general linear group

Now, we will note a few important observations about elementary row matrices and invertible matrices in general:

First, notice that each elementary matrix is invertible. The inverse of an elementary matrix that row-scales by a constant $c$ is simply the elementary matrix that scales the row by $\frac{1}{c}$. The inverse of an elementary matrix that row swaps is simply the elementary matrix that swaps the rows back to their original configuration. The inverse of an elementary matrix that performs a row sum is simply the elementary matrix that performs the subtraction. Thus we see that not only are elementary matrices invertible, but their inverses are also elementary matrices!

Because these elementary row matrices are invertible, instead of starting with some invertible matrix $\boldsymbol{A}$ and producing the identity matrix $\boldsymbol{I}$ via some sequence of multiplications,

\[\boldsymbol{I} = \boldsymbol{E}_n \dots, \boldsymbol{E}_1\boldsymbol{A}\]we can instead start with the identity matrix and produce $\boldsymbol{A}$:

\[\boldsymbol{A} = \boldsymbol{E}^{-1}_1 \dots \boldsymbol{E}^{-1}_n\boldsymbol{I}\]From this fact, we can see that we can go from any invertible matrix to another by multiplying the matrix with some series of elementary matrices. For example, say we have two different invertible matrices $\boldsymbol{A}$ and $\boldsymbol{B}$. Then we can transform $\boldsymbol{A}$ into the identity matrix via some some sequence of multiplications by elementary matrices:

\[\boldsymbol{I} = \boldsymbol{E}_n \dots, \boldsymbol{E}_1\boldsymbol{A}\]We can also transform $\boldsymbol{B}$ into the identity matrix via some some sequence of multiplications by elementary matrices:

\[\boldsymbol{I} = \boldsymbol{E}_{n+m} \dots, \boldsymbol{E}_{n+1}\boldsymbol{B}\]We can convert $\boldsymbol{A}$ to $\boldsymbol{B}$ by converting $\boldsymbol{A}$ to $\boldsymbol{I}$ and then convert $\boldsymbol{I}$ to $\boldsymbol{B}$ via the inverses of the elementary matrices used to convert $\boldsymbol{B}$ to $\boldsymbol{I}$:

$$\boldsymbol{B} = \boldsymbol{E}^{-1}*{n+1} \dots \boldsymbol{E}^{-1}*{n+m}\boldsymbol{E}_n \dots, \boldsymbol{E}_1\boldsymbol{A}

Let’s let the matrix $\boldsymbol{C}$ be defined as:

\[\boldsymbol{C} := \boldsymbol{E}^{-1}_{n+1} \dots \boldsymbol{E}^{-1}_{n+m}\boldsymbol{E}_n \dots, \boldsymbol{E}_1\]Then, we see that

\[\boldsymbol{B} = \boldsymbol{CA}\]Notably, $\boldsymbol{C}$ is also an invertible matrix because all of the elementary matrices we multiplied together to produce $\boldsymbol{C}$ are all invertible!