Non-Stationary Latent Auto-Regressive Bandits
Non-Stationary Latent Auto-Regressive Bandits
We consider the stochastic multi-armed bandit problem with non-stationary rewards. We present a novel formulation of non-stationarity in the environment where changes in the mean reward of the arms over time are due to some unknown, latent, auto-regressive (AR) state of order $k$. We call this new environment the latent …