Multi-Armed Bandit – Explore and Exploit

TLDR; An algorithm optimizing between exploiting the existing best options and exploring potentially even better options is relevant to navigating the wild world of thousands upon thousands of BJJ/Grappling techniques and concepts.

—long form —

An algorithm is a set of rules to optimize problem-solving.

Problem-solving under physical stress is a lot of what submission grappling is about.

In computer science, algorithms are finite sequences of well-defined instructions; take this, add that, move it here, rinse and repeat.

Sorting algorithms are one the most famous examples, such as Quicksort taking a list of numbers and re-arranging them from smallest to largest.

1. Pick an element in the list (a pivot in the array)

2. Compare all the other elements to it one by one, reorder the smaller values before it, the higher values after it.

3. repeat. (Recursively re-apply steps one and two)

Quicksort was 2x to 3x faster than other methods when it was published in 1961.

Algorithms make systems more efficient.

Great visualization on wikipedia

What is your algorithm to sort through the thousands of techniques, concepts and variations that are in Jiu Jitsu?

– The Lapel Guard Encyclopedia by Keenan had 134 individual lessons when I went through it, and they just added two sections for #InternationalLapelDay 7.29, bringing it to 155 individual sections, each one with something to be drilled or some concept to be integrated. (or even 180 if you include the Andris Brunovskis extra lessons on the Grappler’s Guide… it never ends…)

– High-Percentage No-gi Chokes by Lachlan; 86 sections

– Danaher’s Enter the System – Leglocks : 121 sections

Just with those 3 systems that’s arguably 300 techniques / methods / concepts / variations to help us master the art and control our opponents.

BJJ Fanatics has (45 pages each with 15 videos = 675 series), each a few hours long, and they might be releasing new ones faster than anyone could realistically keep up with.

So, how do we sort through what works and what doesn’t?

How do we not dismiss something of high-value but that doesn’t work for us immediately?

How do we avoid false positives ?

Listening to the pros who have battle-tested their games is a good start; ou own coaches and seniors, or the world champions sharing their knowledge online. Looking at competition footage to see what actually works with high rates of success is also highly beneficial.

Yet we are all individuals with varying body types, abilities and affinities, so at some point we start to develop our own games.

What do we include and exclude from our A Game?

Enters another cool algorithm : The multi-armed bandit !

I first learned of this one when working at Adobe in Digital Marketing, as it is commonly used for recommendation engines or A/B testing layouts and offers. The name comes from an analogy to the slot machine, a.k.a. the one-armed bandit.

Essentially all it does is use the option that works best most of the time, while making room to try new stuff sometimes.

And I believe that this is a good model for our journeys in Jiu Jitsu.

Exploit – Explore.

Say you Exploit 80% of the time, rely on your go-to options, and Explore 20% of the time, try new stuff.

Those parameters can and should change “as the color of our belt darkens”.

A white and blue belt it should be mostly about exploration, to map out the terrain, the realm of possibilities, later on it is normal to exploit and refine more what you know works, while keeping an open-mind that unexplored areas might be improvements.

The Multi-Armed Bandit algorithm actually has dozens of strategies with cool names such as :

Epsilon-greedy strategy : Select the best lever (option) most of the time (1-ε), where ε= 10% or so.

BJJ : 90% kick their butt with what you know best. 10% play around

Epsilon First Strategy : a pure exploration phase, followed by a pure exploitation phase.

BJJ : it sounds interesting, but if someone stuck with that in the 90’s, he would be missing a hell of a lot of developments in the Leg Lock, Lapel, and various rolling-back takes areas.

Epsilon Decreasing Strategy : Starts with pulling the highly exploratory levers, while little by little moving to a more exploitative mode.

BJJ : as time goes on, been there done that, I don’t want to do any of that fancy spinning shit that doesn’t work anyway.

Contextual-Epsilon-Greedy-Strategy : highly explorative behavior when the situation is not critical and highly exploitative behavior during critical situations.

This the most relatable for me.

There is a time and a place for exploration; with good partners who can give you “the look” to drill and stress-test new moves, and help you figure out how they could complement your existing game. There is also a time and a place to fully exploit what you know works; competitions.

This is not a judgment that old or new is better.

100 hours working on the bread and butter of jiu-jitsu, such as taking and controlling the back, and working the various finishes and transitions.

On one extreme you’d have a Jack of all trades, constantly trying the new stuff, but never really developing anything deeply.

On the other extreme you’d have the image we have of Roger Gracie, only finishing top level opponents with BJJ 101.

Personally I love both exploring and deep-diving, which makes this hobby very time-consuming, but I wouldn’t have it any other way!

Ars Longa, Vita Brevis
Art is long, life is short.

Happy rolling,

jelaludo