Regret Minimization

Regret minimization is the process of selecting actions to execute on a system that minimize the cumulative regret throughout an experiment. Instantaneous regret of action $a_k$ is defined as the difference between the utility of the executed action $r(a_k)$ and the utility of the optimal action $r(a^*)$:

\[\text{Instantenous Regret} := r(a_k) - r(a^*).\]

The cumulative regret (often just termed regret) is then the sum of the instantaneous regret of all N actions up to the current action:

\[\text{Regret} := \sum_{k = 1}^N r(a_k) - r(a^*)\]

Thus in words, regret minimization is the process of identifying and sampling the optimal action as quickly as possible.

Thompson Sampling

There are many posterior sampling techniques designed for regret minimization. In POLAR, we utilize Thompson Sampling. The general idea of Thompson Sampling is the following:

Draw $n$ samples $f_i$ for $i = 1, \dots, n$ from the posterior distribution $P(\bf{r} | \bf{D})$ where $\bf{r} := [r(a_1), \dots, r(a_N)]^T$ is the vectorized utilities and $\bf{D}$ is the collected feedback.
Select $n$ actions $a_i$ for $i = 1, \dots, n$ to execute such that: $a_i = \underset{a \in \mathcal{A}}{\text{argmax}}~ r_i,$ where $\mathcal{A}$ is the set of all actions.

POLAR Settings

To use Thompson Sampling in POLAR, use the following setting:

obj.settings.sampling.type = 1;

Thompson Sampling Example

To visualize Thompson Sampling, first change the settings of setupLearning.m in the 1D Function example to the following (n and b are optional but better for visualization):

settings.sampling.type = 1;
settings.n = 2;
settings.b = 0;

Next, to run a simulation with visualization run:

toolbox_addpath
cd('examples/ThompsonSamplingVisualization')
run_simulation

Thompson Sampling Simulation