In 1933, William R. Thompson published an article on a Bayesian model-based algorithm that would ultimately become known as Thompson sampling. This heuristic was largely ignored by the academic community until recently, when it became the subject of intense study, thanks in part to internet companies that successfully implemented it for online ad display.
Thompson sampling chooses actions for addressing the exploration-exploitation in the multiarmed bandit problem to maximize performance and continually learn, acquiring new information to improve future performance.
In a new study, “Online Network Revenue Management Using Thompson Sampling,” MIT Professor David Simchi-Levi and his team have now demonstrated that Thompson sampling can be used for a revenue management problem, where demand function is unknown.
Incorporating inventory constraints
A main challenge to adopting Thompson sampling for revenue management is that the original method does not incorporate inventory constraints. However, the authors show that Thompson sampling can be naturally combined with a classical linear program formulation to include inventory constraints.
The result is a dynamic pricing algorithm that incorporates domain knowledge and has strong theoretical performance guarantees as well as promising numerical performance results.
Interestingly, the authors demonstrate that Thompson sampling achieves poor performance when it does not take into account domain knowledge.