A study of methods for the solution of Markovian decision processes



Journal Title

Journal ISSN

Volume Title



Problems of sequential decisions are marked by the fact that the consequences of a decision made at a certain time cannot be evaluated without considering the future evolution of the process. The general solution for problems of this type was presented by Richard Bellman, based on his "Principle of Optimality". This solution consists of assuming a "truncation" of the process at a chosen time, which allows the computation of the optimal decisions for the preceding times by a recurrence relation. A number of the difficulties encountered in the application of the General Dynamic Programming Method can be circumvented when some information about the behavior of the process is known. Particularly, there are some problems of this type which possess certain linear features. The mathematical model known as a Markov Process is one such case. When the process follows a Markov chain and is assumed to have Infinite duration, a steady state optimal solution can be reached in finite number of iterations, following a criterion of "partial maximization". This method was presented by Ronald Howard. Howard's approach will always give an optimal solution when the process is far from its ending, but will not solve the problem when the process is in its "transient" period (near the end of the process). The steady state case can also be solved by means of Linear Programming, as presented by Guy de Ghellinck. Like Howard's method, Linear Programming will yield only steady state solutions. If it is necessary to know the optimal decisions in the transient period, the Dynamic Programming solution must be employed. Five examples were solved using the three different methods. By varying the number of states of the processes and the number of possible alternatives in each state, some conparlsons were made concerning the effectiveness of the solutions. When the number of alternatives is high in conparlson with the number of states, there was no significant differences with respect to the time of computation. When the number of states is high in comparison with the number of alternatives, the linear programming method achieved the optimal solution in shorter time.