This book discusses the Partially Observable Markov Decision Process
(POMDP) framework applied in dialogue systems. It presents POMDP as a
formal framework to represent uncertainty explicitly while supporting
automated policy solving. The authors propose and implement an
end-to-end learning approach for dialogue POMDP model components.
Starting from scratch, they present the state, the transition model, the
observation model and then finally the reward model from unannotated and
noisy dialogues. These altogether form a significant set of
contributions that can potentially inspire substantial further work.
This concise manuscript is written in a simple language, full of
illustrative examples, figures, and tables.