This post is about abstract algebra. Specifically, I’m going to take a look at the reasoning behind associativity, the property of binary operations which requires that, for any triplet of values:

(A∘B)∘C = A∘(B∘C)

where ∘ is the operation in question. For example, addition is associative because for any three numbers, A, B, and C, the values (A+B)+C and A+(B+C) are equal.

This property is required by groups, fields, lattices, and many other algebraic structures. I’ll start off with a review of groups, which are probably the most widely studied of the lot, and then I’ll dive right into my analysis. The review only serves as background context, so you can skip it if you want to get straight to the point (or if you’re already familiar with groups).

## Groups

A group is a set G paired with a binary operation satisfying four properties. For now, I will write the operation as multiplication, i.e. if *a* and *b* are two elements of a group then I will use *ab* to refer to the result of applying the operation to the pair (*a, b*). The required properties are:

**Closure:**If*a*and*b*are members of G, then*ab*is a member of G.**Existence of an identity:**G contains an element*e*such that combining*e*with any other element leaves it the same (i.e.*ea = ae = a*for all*a*in G). You can think of this as the “do nothing” element.**Existence of inverses:**For any element*a*in G, there is an element you can combine it with to produce the identity (i.e.*ab = ba = e*for some*b*in G). You can think of this as saying that, whatever you can do with an element of G, you can also undo with some other element (or possibly the same one — sometimes an element is its own inverse).**Associativity:**For any three elements*a*,*b*, and*c*, the equation*(ab)c = a(bc)*must hold.

For example, G could be the set of all integers (positive and negative) and the operation could be normal addition. Let’s check that all the axioms are satisfied:

**Closure:**The sum of two integers is always another integer. ✓**Existence of an identity:**Zero is an identity, since adding zero to any integer leaves it the same. ✓**Existence of inverses:**For any integer X, the number “negative X” is also an integer, and their sum is (X) + (-X) = 0, which is the identity element. ✓**Associativity:**This one isn’t so easy to prove. Most of us learned that addition is associative in elementary school and never questioned it. It isn’t even obvious what you’re allowed to assume along the way to proving this fact — if the associative property of addition can’t be taken for granted, what can? See the Peano Axioms for more about this, and see this Wikipedia page for the proof. For our purposes, we can just say “✓” and be done with it.

There are countless other examples, too:

- G is the set of positive real numbers and the operation is normal multiplication.
- G is the set of 2×2 invertible matrices and the operation is matrix multiplication.
- G is the set {0, 1, 2, 3, 4, 5, 6, 7} and the operation is addition modulo 8.
*Yes, groups can be finite in size.* - G is the set {A, B, C, D, E, F} and the operation is defined by this look-up table:

*Yes, look-up tables are allowed.* Nowhere in the group axioms does it state that the operation has to be something familiar. You can verify that the axioms still hold: the identity is *A*, inverses can be found manually, and associativity can be proven laboriously for each triplet of letters. For example *(DC)F = EF = C*, while *D(CF) = DE = C*.

Note that this operation isn’t commutative; some pairs of elements have a different product depending on the order you multiply them in. *CE = D* but *EC = F*. This operation manages to be associative despite not being commutative. *Yes, that’s allowed*. Anything not directly or indirectly ruled out by the group axioms is allowed.

Groups where the order of multiplication makes a difference are called *non-abelian*. For an infinite number of examples of finite-sized, non-abelian groups, see the dihedral groups.

## Why associative operations?

Anyway, what I’m here to talk about today is that the associative property never used to make sense to me — who cares about associative operations? Why that restriction and not something else?

It turns out that the key to understanding it is a very simple dictum: **parentheses don’t matter.** Since *A(BC) *and *(AB)C* are guaranteed to be equal anyway, you might as well toss out the parentheses and just write *ABC*. It doesn’t matter which of the two “explicit” versions you’re referring to, because they both have the same value. And, very importantly, *the same goes for longer sequences*. *A((BC)(D(EF)))* is equal to *((AB)(CD))(EF) *by the following chain of manipulations, applying the associative property over and over again:

*A ((BC) (D(EF)))**(A (BC)) (D (EF))**((AB) C) (D (EF))**(((AB)C) D) (EF)**((AB) (CD)) (EF)*

And in fact, *any* two parenthesis-filled versions of *ABCDEF *can be shown to be equal by similar chains of manipulations. Citation needed. Don’t worry, I’ll prove this in a minute. Right now, just think about the implication: since parentheses never matter, you can think about groups not as having *binary* operations — maps from *pairs* of elements to single elements — with the associative property, but rather as having *sequential* operations — maps from *sequences* of elements to single elements — perhaps with some additional properties.

I say “perhaps with some additional properties” because we haven’t yet determined whether every sequential operation comes from an associative binary operation, only that every associative binary operation can be transmogrified into a sequential one. There might yet be some “orphan” sequential operations that you’d never create, no matter how many associative binary operations you transmogrified. We will investigate this later on in the post.

## Proof of my claim

**Lemma:** The value that an associative binary operation assigns to a parenthesized sequence of elements doesn’t depend on the positions of the parentheses.

**Proof: **Let S be such a sequence. For example, S could be *((A(BC)) (DE)) (F(GH))*. We’re going to show that any such sequence such sequence can be re-arranged into a fully right-associative sequence, in this case *A(B(C(D(E(F(GH))))))*, without changing its value. It will follow that all starting sequences have the same value.

The first step is to use the associative property to re-arrange S so that the left side of the top-most application of the operation is a single element. If this is already so, we can move on. Otherwise, we can make it so by the following process. Since the left side is not a single element, it must be a product of two elements. In this case, the left side is the product of *(A(BC)) *and *(DE)*. In other words, the overall structure of S is *(x y) z*.

Now, by the associative property, we can re-arrange this into *x (y z)* without changing its value:

Now our sequence is *(A(BC)) ((DE) (F(GH)))*. Its left side is smaller than it was before: three elements instead of five elements. But it still has the form *(x y) z*, so we can apply the same transformation again:

This transformation can be repeated as many times as necessary, until you get a sequence whose left side is a single element, a sequence that begins *A (…)*. In this case, it took two applications. Once you’ve pulled the first element out to the front, you recurse. Apply the same process to the right side of the sequence, until *its* left side is a single element, so that the sequence begins with *A (B (…))*. Then recurse again, pulling out yet another element to achieve *A (B (C (…)))*. Recurse over and over until you achieve the fully right-associative sequence. Since we only transformed the sequence by applying the associative property, its value has not changed. This completes the proof. □

## The tower-of-powers counterexample

I said earlier in this post that we don’t yet know whether every sequential operation has an underlying associative binary operation. Well, I’ll tell you one that doesn’t: the operation that assigns to a sequence *(A, B, C, …)* of integers the value of:

A^{BC…}

…a tower-of-powers with one entry for each element.

The proof is easy, but let me start with some notation. I will denote applications of sequential operations with [square brackets]. So, for the sequential operation defined above, I could write:

- [3 3] = 3
^{3}= 27 - [2 2 2] = 2
^{22}= 16 - [1 6 7 8] = 1
^{678}= 1 - [5 4 3] = 5
^{43}≈ 5.4 * 10^{44}

Also, since there are now two different operations under discussion, I will use ∘ to denote a binary operation — i.e. I will now write *a∘b* instead of just *ab*.

Now, the proof. Suppose the sequential operation [] defined above really does derive from some binary operation ∘. In other words, suppose that [2 3 4 5] is really just a shorthand for “apply ∘ to the operands 2, 3, 4, and 5, with parentheses anywhere you want”. We can figure out what value that operation must spit out for any pair of just two inputs *a* and *b*, since by definition [a b] must equal a∘b. So my proof is simply to note that:

(2∘3)∘4 = [[2 3] 4] = (2^{3})^{4} = 4096

but:

2∘(3∘4) = [2 [3 4]] = 2^{34} = 2417851639229258349412352.

Since these two values aren’t equal, ∘ isn’t associative. QED!

Now, here’s a (tangential) harder question. Suppose the operation [] is defined as above, but *only for sequences of three or more elements*. In other words, suppose that the 2-ary version of the operation is surgically removed, so that it’s no longer so easy to determine the exact values of a hypothetical underlying binary operation. Could the 2-ary version of the operation then be replaced in some fashion such that one would exist? I will include the answer to this question in a spoiler box at the end of this post. The material in the next section will render the puzzle easier, so consider pausing here and coming back when you’ve solved it.

## And vice versa

So parentheses don’t matter for an associative binary operation. Is it obvious to you that parentheses also don’t matter for a sequential operation that derives from an associative binary operation? If so, then you’re either slightly quicker than me or slightly overconfident. But in any case, it’s true.

**Lemma: **If ∘ is an associative binary operation and [] is the corresponding sequential operation, then the value that [] assigns to a bracketed sequence of elements is independent of the positions of any extra internal brackets.

**Proof:** Let S be such a sequence. For example, S could be the sequence *[A B C [D E] F [G H [I J] K L] ]*. (Note that “bracketed sequence of elements” means something slightly different than “parenthesized sequence of elements”, since the fact that we’re working with a sequential operation instead of a binary operation means that more than two elements can appear between a given pair of brackets.) Now, replace each application of [] with repeated applications of ∘, parenthesized however you like. For our sequence, this might yield:

* (A ∘ (B ∘ (C ∘ ( (D ∘ E) ∘ (F ∘ (G ∘ ( H ∘ ( (I ∘ J) ∘ (K ∘ L) ) ) ) ) ) ) ) )*

But this too is a parenthesization of the given sequence of elements, and as such it is equal in value to any other parenthesization, and to the plain application of [], in our case [A B C D E F G H I J K L]. Hence the value that [] assigned to S did not depend on the positions of the extra brackets. □

—

But now look:

**Lemma:** If [] is a sequential operation whose value doesn’t depend on the positions of the brackets, then it corresponds to an underlying associative binary operation ∘.

**Proof:** Take the binary operation to be the 2-ary version of the sequential operation. In other words, define A∘B = [A B]. Then:

(A∘B)∘C = [[A B] C] = [A [B C]] = A∘(B∘C)

and we see that ∘ is associative. And it is easy to see that [] is the sequential operation that corresponds to ∘, because any parenthesized sequence of applications of ∘ can be transformed into a corresponding bracketed sequence of applications of []. Then the extra brackets can simply be removed, since the value of [] doesn’t depend on them. For example:

(A∘B)∘(C∘(D∘E)) = [[A B] [C [D E]]] = [A B C D E]

So we see that repeated applications of ∘ always yield the same value as a single application of the sequential operation. ◻

—

So there it is — every binary associative operation corresponds to a sequential operation whose value doesn’t depend on internal brackets, and vice versa. The two types of operation are the same, and can be used interchangeably.

## The ghosts of departed sequences

How can we leverage this discovery to improve our intuitions about groups and other associative algebraic structures?

Take a moment to consider the Rubik’s Cube group, which is a group of all possible permutations of a Rubik’s cube.. For example, the element **R** is the permutation you get if you turn the right side clockwise one quarter-turn, and the element **LLL** is the permutation you get after turning the left side clockwise three quarter-turns (or counterclockwise one quarter-turn). What about **LLLL**? That’s the same as the solved permutation, since turning the left side four times in a row returns you to the starting state. The other four sides are called **U** for “up”, **D** for “down”, **F** for “front”, and **B** for “back”.

It’s easy to see what the sequential operation of the Rubik’s Cube group is — given a sequence of permutations, you simply apply one permutation after the other. [**L** **L** **B** **RF**] = **LLBRF**. It’s also easy to see that extra brackets don’t matter. Consider these two applications of the operation:

- [L B U D]
- [L [B U] D]

The two sequences are the same, except that the second one has an extra set of brackets. Now, what would it mean for them to have different values? “Turning the left, then the back, then the top, then the bottom is different from turning the left, then the back and top in succession, then the bottom.” If that sounds strange to you, then you have all the intuitions you need already in place. “Apply each of these individual operations in sequence” is an inherently structureless notion. There’s no possible concept that would be analogous to adding parentheses to the written-out sequence of moves.

Or consider the group of 2×2 linear transforms, like “compress by a factor of 2 in the x direction” or “rotate 45 degrees”. Suppose you’ve got four transforms that you’d like to compose with one another — A, B, C, and D. Would it make any sense for [A B C D] to be different from [A B [C D]]? “First apply A, then apply B, then apply C and D really fast without stopping for breath. Make sure those last two really *count* as being just one big move, since they’re supposed to be grouped together.”

Or for that matter, take the group of integers under addition. “Start with four, then add the sum of three and six, then add nine”. Think back to first grade, when adding two numbers together meant combining two piles of mangos and counting how many you ended up with. What am I going to do to denote that the three mangos are being added to the six mangos *before* I add them all to the pile? Put them in a ziploc bag together?

In fact, forget the other properties of groups. There’s another algebraic structure called a semigroup that only requires closure and associativity, and not identity or inverses. The set of all real functions is a semigroup under the operation of *function composition*. For example, consider the elements F(x) = x^{2}, G(x) = sin(x), and H(x) = 3x. You can see the operation in action:

- (F∘G)(x) = (sin(x))
^{2} - (H∘F∘H)(x) = 27x
^{2} - (G∘G∘H∘H)(x) = sin(sin(9x))

Or, if you prefer:

- [F G](x) = (sin(x))
^{2} - [H F H](x) = 27x
^{2} - [G G H H](x) = sin(sin(9x))

And again it’s the same question. Could [G [G H] H] possibly differ from [G G H H]? Not really. “First put x through H, then quickly put it through H and then G, then put it through G.” There just isn’t room for there to be a difference.

But above all, consider the tower-of-powers sequential operation. It *does not* correspond to any associative binary operation at all, because in the context of exponentiation, the elements of the set of positive integers *cannot* be thought of as a set of actions that you take one after another. At least, not if you want to add new elements on the right.

What do I mean? Well, suppose you’ve got some sequence that ends in 3, like this: [… 3]. Since brackets aren’t supposed to matter, you decide to add an extra pair of brackets to make [[…] 3], so that you can evaluate all that stuff on the left first. Say the stuff on the left evaluates to 16. What should the final value be?

Well, if the subsequence you just collapsed was [2 2 2] then the original sequence was [2 2 2 3] and you should get an answer of 2^{256}. But if the subsequence was [4 2] then your answer should be [4 2 3] = 2^{16}, and if it was just a plain [16] then you should get [16 3] = 2^{12}.

If the tower-of-powers operation were associative — that is, if extra parentheses didn’t affect the answer you get — then any time you apply 3 to something, that thing should behave just like any of its expansions would.

Just like how **BU** from the Rubik’s Cube group behaves exactly like the sequence [B U] does. Or how the linear transform [C D] behaves just like the linear transform D followed by the linear transform C. Or how adding nine mangos to a pile is the same as adding six and then adding three more. Or how composing sin(3x) onto another function is the same as composing 3x and then composing sin(x).

In each of these cases, the individual element resulting from a collapse of a sequence [S_{1} S_{2} S_{3} …] has managed to retain all of the qualities of the original sequence that were relevant for performing further applications of the operation. It’s as though the individual element were possessed by the ghost of the sequence it came from.

But this isn’t the case for the tower-of-powers operation. When [2 2 2] is collapsed into 16, you lose the information about where in the stack any further elements are supposed to go. Before collapse, you know it’s supposed to go at the top of a stack of three 2’s. But after the collapse, that 16 could just as well have come from [4 2] or [16 1 1] or whatever, and those guys have different ideas of where further elements are supposed to go. This kind of thing doesn’t happen in the other cases. That’s what it really means for an operation to be associative.

Now, use this newfound understanding to explain to me in the comments why it’s obvious that gcd, the greatest-common-divisor function, is associative.

## Post-script 1: solution to the puzzle

This is the solution to the puzzle presented in the section about the tower-of-powers sequential operation. I won’t repeat the puzzle here. If you want to see the answer, click the button below.

## Post-script 2: infinite sequences

“Aha!” says the careful reader who isn’t very familiar with abstract algebra, after reading the first proof in this post. “But what about *infinite *sequences of elements? Your proof implicitly uses induction to show that parenthesized sequences of arbitrary length can be re-arranged into fully right-associative sequences, but induction only works for the *finite* case.”

My answer: the the normal definition of associative operations does not guarantee the existence of infinite products. Consider the group where G is the set {1, 3, 5, 7} and the operation is multiplication modulo 8, and try to find the value of the expression 3∘3∘3∘3∘3∘… with a countably infinite number of threes. You could try taking a limit of the product of N threes as N goes to infinity, but…

- 3∘3 = 9 mod 8 =
**1** - 3∘3∘3 = 27 mod 8 =
**3** - 3∘3∘3∘3 = 81 mod 8 =
**1** - 3∘3∘3∘3∘3 = 243 mod 8 =
**3** - 3∘3∘3∘3∘3∘3 = 729 mod 8 =
**1** - …

…you won’t find one. Because who said there would be one, anyway? One’s mind might turn to infinite sequences once operations are framed as acting on sequences of elements, but the original definition only calls for a binary version, and if you can’t find any sensible way to define the value of an infinite product like this one, that’s just too bad.

## Post-script 3: the look-up table group

But what about the six-element group with a lookup table for an operation that I defined at the start of this post? Do the intuitions built herein not apply to that group?

Sure they do. You just need the right interpretation of what the elements of that group actually are. Consider this equilateral triangle:

Each element of the group corresponds to an action you can take on the triangle that doesn’t change its appearance. There are six such actions:

- Do nothing.
- Rotate left 120 degrees.
- Rotate left 240 degrees.
- Flip over the vertical axis.
- Flip over the vertical axis, then rotate left 120 degrees.
- Flip over the vertical axis, then rotate left 240 degrees.

Applying any number of these actions in sequence is always equivalent to performing just a single action. For example, 2∘4 = 6 and 3∘3∘3 = 1. If you’ve been following along, it should be obvious that the operation is associative. The identity element is action 1 and inverses are easy to find experimentally. Try cutting out a paper triangle and marking the front and back with distinct, asymmetrical symbols, such as the letters F and B. Play around with it to get a feel for how the actions interrelate. I will leave it as an exercise to you to discover which elements of the group correspond to which actions. Hint: there may be more than one solution.