Eugene A. Feinberg Adam Shwartz This volume deals with the theory of
Markov Decision Processes (MDPs) and their applications. Each chapter
was written by a leading expert in the re- spective area. The papers
cover major research areas and methodologies, and discuss open questions
and future research directions. The papers can be read independently,
with the basic notation and concepts ofSection 1.2. Most chap- ters
should be accessible by graduate or advanced undergraduate students in
fields of operations research, electrical engineering, and computer
science. 1.1 AN OVERVIEW OF MARKOV DECISION PROCESSES The theory of
Markov Decision Processes-also known under several other names including
sequential stochastic optimization, discrete-time stochastic control,
and stochastic dynamic programming-studiessequential optimization
ofdiscrete time stochastic systems. The basic object is a discrete-time
stochas- tic system whose transition mechanism can be controlled over
time. Each control policy defines the stochastic process and values of
objective functions associated with this process. The goal is to select
a "good" control policy. In real life, decisions that humans and
computers make on all levels usually have two types ofimpacts: (i) they
cost orsavetime, money, or other resources, or they bring revenues, as
well as (ii) they have an impact on the future, by influencing the
dynamics. In many situations, decisions with the largest immediate
profit may not be good in view offuture events. MDPs model this paradigm
and provide results on the structure and existence of good policies and
on methods for their calculation.