Markov decision processes with weak and strong interaction

Next:6. Risk-Sensitive Hierarchical ControlsUp:5. Hierarchical Controls under thePrevious:5.4 Hierarchical controls of dynamic

5.5 Markov decision processes with weak and strong
interaction

Markovian decision processes (MDP) have received much attention in the recent years because of their capability in dealing with a large class of practical problems under uncertainty. The formulation of many practical problems such as stock-option, allocation, queuing, and machine replacement, fits well in the framework of Markov decision processes, see Derman (1970). In this section we present results that provide a justification for hierarchical controls of a class of Markov decision problems. We focus on the problem of a finite state continuous-time Markov decision process that has both weak and strong interactions. More specifically, consider the model in which the state of the process can be divided into several groups such that transitions among the states within each group occur much more frequently than the transitions among the states belonging to different groups. By replacing the states in each group by the corresponding average distribution, we can derive a limiting problem which is simpler to solve. Given an optimal solution to limiting problem, we can construct a solution for the original problem which is asymptotically optimal.

Let us consider a Markov decision process $x(\cdot)=\{x(t): \ t\geq0\}$ and a control process $u(\cdot)=\{u(t)=u(x(t)): \ t\geq0\}$ such that $u(t)\in U$ , $t\geq 0$ , where U is a set with finite elements.

Let $Q^\varepsilon(u(t))=(q^\varepsilon_{ij}(u(t)))$ , $t\geq 0$ , denote the generator of $x(\cdot)$ such that

$\displaystyle Q^\varepsilon(u)=\displaystyle \frac{1}{\varepsilon}A(u)+B(u)$

(5.51)

where $A(u)=\mbox{diag}\{A^1(u),\ldots,A^r(u)\}$ ,A^k(u)=(a^k_ij(u))_{mk
x mk} with $a^k_{ij}(u)\geq0$ for $j \neq i$ and $\sum_{j}a^k_{ij}(u)=0$ ,B(u)=(b_ij(u))_{m
x m} with $m=m_1+\cdots+m_r$ , $b_{ij}(u)\geq0$ for $j \neq i$ and $\sum_{j}b_{ij}(u)=0$ , and $\varepsilon$ is a small parameter. For each $k=1,\ldots,r$ , let ${\cal M}_k=\{s_{k1},\ldots ,s_{km_k}\}$ , where m_k is the dimension of A^k(u). The state space of $x(\cdot)$ is given by ${\cal M}={\cal M}_1\times \ldots \times {\cal M}_r$ , i.e., $\begin{eqnarray*}{\cal M}=\{s_{11},\ldots,s_{1m_1},s_{21},\ldots,s_{2m_2},\ldots,s_{r1},\ldots,s_{rm_r}\}. \end{eqnarray*}$

Example 5.2. Let us consider a failure-prone manufacturing system that consists of a two-machine flowshop. Each of the two machine has two states up, denoted by 1, and down, denoted by 0. Then, the system has four states, represented by $\{(1,1),(0,1), (1,0), (0,0)\}$ . Let s₁₁=(1,1), s₁₂=(0,1), s₂₁=(1,0), and s₂₂=(0,0). We consider u to be the rate of preventive maintenance, which is used to reduce the failure rate of machines and to improve the productivity of the system. The objective of the problem is to choose u to keep the average machine capacity at a reasonable level and, in the meantime, to avoid excessive costly preventive maintenance.

We consider the system in which the state of the first machine is changing more rapidly than that of the second one. A typical way of modeling the machine state process is to formulate the process (see Jiang and Sethi (1991), Sethi and Zhang (1994a), and Yin and Zhang (1997) ) as a Markov chain with the following generator:

$\begin{eqnarray*}Q^\varepsilon(u)=&&\displaystyle \frac{1}{\varepsilon}\left......&-\mu_2(u)&0\\0&\mu_2(u)&0&-\mu_2(u)\\\end{array}\right).\end{eqnarray*}$ The breakdown rate and repair rate for the first machine are $\varepsilon^{-1}\lambda_1(u)$ and $\varepsilon^{-1}\mu_1(u)$ and $\lambda_2(u)$ and $\mu_2(u)$ for the second machine. The magnitude of $\varepsilon$ represents the frequency of transitions of $x(\cdot)$ between s₁₁ and s₁₂or s₂₁ and s₂₂. For small $\varepsilon$ , the transition of $x(\cdot)$ within either {s₁₁,s₁₂} or{s₂₁,s₂₂} is much more frequently than that between the two groups {s₁₁,s₁₂} and {s₂₁,s₂₂}.

The matrix A(u) dictates the fast transition of the process $x(\cdot)$ within each group ${\cal M}_k$ , $k=1,\ldots,r$ and the matrix B(u) together with the average distribution of the Markov chain generated by A^k(u) dictates the slow transition of $x(\cdot)$ between different groups. When $\varepsilon$ is small, the process $x(\cdot)$ has a strong interaction within any group ${\cal M}_k$ , $k=1,\ldots,r$ and has a weak interaction among these groups.

Let u=u(i) denote a function such that $u(i)\in {\cal U}$ for all $i\in {\cal M}$ . We call u an admissible control and use $\Upsilon$ to denote all such functions. We make the following assumptions:

Assumption 5.6 For each $U\in \Gamma$ , $\bar{A}^k(U^k)$ is irreducible, $k=1,\ldots,r$ , and for each $\varepsilon >0$ , $\bar{Q}^\varepsilon(U)$ is irreducible, where $\bar{Q}^\varepsilon(U)$ is defined in (49).

Assumption 5.7 U is a finite set.

We consider the following problem:

$\displaystyle {\cal P}^{\varepsilon}_{\mbox{av}}:\left\{\begin{array}{ll}\mbo......aystyle \inf_{u\in {\calA}^{\varepsilon}} J^\varepsilon(u).\end{array}\right.$

(5.52)

Note that the average cost function is independent of the initial value x(0)=i, and so is the value function. The DP equation for ${\cal P}^{\varepsilon}_{\mbox{av}}$ is

$\displaystyle \lambda^\varepsilon=\min_{u\in U}\left\{G(i,u)+Q^\varepsilon(u)h^\varepsilon(\cdot)(i)\right\},$

(5.53)

where $h^\varepsilon(i)$ is a function defined on ${\cal M}$ .
Theorem 5.11 (i) For each fixed $\varepsilon >0$ , there exists a pair $(\lambda^\varepsilon,h^\varepsilon(i))$ that satisfies the DP equation (5.53).
(ii) The DP equation (5.53) has a unique solution in the sense that for any other solution $(\tilde\lambda^\varepsilon,\tilde h^\varepsilon(i))$ to (5.53), $\tilde\lambda^\varepsilon=\lambda^\varepsilon$ and $\tilde h^\varepsilon(i)=h^\varepsilon(i)+K$ , for some constant K.
(iii) Let $u_{*\varepsilon}=u_{*\varepsilon}(i)\in U$ denote a minimizer of the right-hand side of (5.53). Then $u_{*\varepsilon}(i)\in{\cal A}^{\varepsilon}$ is optimal, i.e., $J^\varepsilon(u_{*\varepsilon})=\lambda^\varepsilon$ .

To proceed, we define the control set for the limiting problem. Motivated by Sethi and Zhang (1994a), we consider a control set in which each control variable corresponds to a state in ${\cal M}$ . To be more precise, for each $k=1,\ldots,r$ , let

$\begin{eqnarray*}\Gamma_k=\{U^k:=(u^{k1},\ldots,u^{km_k}):\mbox{ such that }u^{kj}\in U, \; j=1,\ldots,m_k\}.\end{eqnarray*}$ The control set for the limiting problem is defined as $\Gamma=\Gamma_1\times\cdots\times \Gamma_r$ , i.e., $\begin{eqnarray*}\Gamma&=&\{U=(U^1,\ldots,U^r)=(u^{11},\ldots,u^{1m_1},\ldot......& \ \ \ :\mbox{such that } U^k\in \Gamma_k,\; k=1,\ldots,r\}.\end{eqnarray*}$ We define matrices $\bar{A}^k$ , $\bar{B}$ , and $\bar{Q}^\varepsilon$ as follows. For each $U=(U^1,\ldots,U^r)\in\Gamma$ , let $\bar{A}^k(U^k)=(a^k_{ij}(s_{ki},u^{ki}))$ , $k=1,\ldots,r$ , $\bar{B}(U)=(\bar b_{ij}(U))$ where $\begin{eqnarray*}\barb_{ij}(U)=\left\{\begin{array}{ll} b_{ij}(u^{1i})&\mbo......+m_r)})&\mbox{ if}m-m_r+1\leq i\leq m.\\\end{array}\right.\end{eqnarray*}$ with $m=m_1+\cdots+m_r$ , and

$\displaystyle \bar{Q}^\varepsilon(U)=\frac{1}{\varepsilon}\mbox{diag}\left\{\bar{A}^1(U^1),\ldots,\bar{A}^r(U^r)\right\} +\bar{B}(U).$

(5.54)

For each $U^k\in\Gamma_k$ , let $\nu^k(U^k)$ denote the average distribution of the Markov chain generated by $\bar{A}^k(U^k)$ for $k=1,\ldots,r$ . Define $\nu(U)=\mbox{diag}\{\nu^1(U^1),\ldots,\nu^r(U^r)\}$ and $I_e=\mbox{diag}\{e_{m_1},\ldots,e_{m_r}\}$ with $e_{m_k}=(1,\ldots,1)'$ being m_k dimensional column vector. Using $\nu(U)$ and I_e, we define another matrix $\bar{Q}(U)$ as a function of $U\in \Gamma$ :

$\displaystyle \bar{Q}(U)=\nu(U)\bar{B}(U)I_e,$

(5.55)

Note that the kth row of $\bar{Q}(U)$ depends only on U^k. Thus, $\bar{Q}(U)=(\bar q_{ij}(U^i))_{r\times r}$ . We write, with a little abuse of notation, $\bar{Q}(U^k)f(\cdot)(k)=\sum_{k'\neq k}\barq_{kk'}(U^k)(f(k')-f(k))$ instead of $\bar{Q}(U)f(\cdot)(k)$ , for a function f defined on $\{1,\ldots,r\}$ .

We next define the limiting problem. For $U^k\in\Gamma_k$ , define

$\begin{eqnarray*}\bar{G}(k,U^k)=\sum_{j=1}^{m_k}\nu^k_j(U^k)G(s_{kj},u^{kj}), \;k=1,\ldots,r,\end{eqnarray*}$ where $\nu^k(U^k)=(\nu^k_1(U^k),\ldots,\nu^k_{m_k}(U^k))$ is the average distribution of the Markov chain generated by A^k(U^k). Let ${\cal A}^0$ denote a class of functions $U=U(k)\in\Gamma_k$ , $k=1,\ldots,r$ . For convenience, we will call $U=(U(1),\ldots,U(r))\in{\cal A}^0$ an admissible control for the limiting problem termed ${\cal P}^0$ .

We define the limiting problem with the long-run average cost.

$\displaystyle {\cal P}^0_{\mbox{av}}:\left\{\begin{array}{ll}\mbox{minimize}&......ction}&\lambda^0=\displaystyle \inf_{U\in{\cal A}^0} J^0(U).\end{array}\right.$

(5.56)

It can be shown that $\bar{Q}(U)$ is irreducible for each $U\in \Gamma$ .

The DP equation for ${\cal P}^0_{\mbox{av}}$ is

$\displaystyle \lambda^0=\min_{U^k\in\Gamma_k}\left\{\bar{G}(k,U^k)+\bar{Q}(U^k)h^0(\cdot)(k)\right\},$

(5.57)

for some function h⁰(k).

Theorem 5.12 (i) There exists a pair $(\lambda^0,h^0(k))$ that satisfies the DP equation (5.57).
(ii) The DP equation (5.57) has a unique solution in the sense that for any other solution $(\tilde\lambda^0,\tilde h^0(k))$ to (5.57), $\tilde\lambda^0=\lambda^0$ and $\tilde h^0(k)=h^0(k)+K$ , for some constant K.
(iii) Let $U_*\in\Gamma$ denote a minimizer of the right-hand side of (5.57). Then $U_*\in {\calA}^0$ is optimal, i.e., $J^0(U_*)=\lambda^0$ .
Remark 5.5 Note that the number of the DP equations for ${\cal P}^{\varepsilon}_{\mbox{av}}$ is equal to $m=m_1+\cdots+m_r$ , while the number of that for ${\cal P}^0_{\mbox{av}}$ is only r. Since for each $k=1,\ldots,r$ , $m_k\geq 2$ , it follows that $m-r\geq r$ . The difference between m and r could be very large for either large r or a large m_k for some k. Since the computational effort involved in solving the DP equations depends largely on the number of the equations to be solved (cf. Hillier and Lieberman (1989)), the effort in solving the DP equations for ${\cal P}^0_{\mbox{av}}$ is substantially less than that of solving ${\cal P}^{\varepsilon}_{\mbox{av}}$ for m-r large.

We construct a control $u_\varepsilon$ as follows:

$\displaystyle u_\varepsilon=u_\varepsilon(x)=\sum_{k=1}^r\sum_{j=1}^{m_k}I_{\{x=s_{kj}\}}u^{kj}_*,$

(5.58)

where I_A is the indicator of a set A.

Theorem 5.13 Let $U_*\in {\calA}^0$ denote an optimal control for ${\cal P}^0_{\mbox{av}}$ and let $u_\varepsilon\in{\cal A}^{\varepsilon}$ be the corresponding control constructed as in (5.58). Then, $u_\varepsilon$ is asymptotically optimal and with error bound $\varepsilon$ , i.e.,

$\begin{eqnarray*}J^\varepsilon(u_\varepsilon)-\lambda^\varepsilon=O(\varepsilon).\end{eqnarray*}$

Remark 5.6 Filar, Haurie, Moresino, and Vial (1999) introduce diffusions in these models; see Remark 4.5. In their model, all jumps are associated with a slow process, while the continuous part including the diffusions is associated with a small parameter $\varepsilon$ (see Remark 4.5) representing a fast process. As $\varepsilon$ tends to zero, the problem is shown to reduce to a structured linear program. Its optimal solution can then be used as an approximation to the optimal solution of the original system associated with a small $\varepsilon$ .

Next:6. Risk-Sensitive Hierarchical ControlsUp:5. Hierarchical Controls under thePrevious:5.4 Hierarchical controls of dynamic

5.5 Markov decision processes with weak and strong interaction

5.5 Markov decision processes with weak and strong
interaction