nextupprevious
Next:4.2 Optimal control of dynamicUp:4. Optimal Control with thePrevious:4. Optimal Control with the

  
4.1 Optimal control of single or parallel machine systems

We consider an n-product manufacturing system given in Section 2.1. For any ${\mbox{\boldmath$u$ }}(\cdot)\in {\cal A}(m)$, define
$\displaystyle \bar J({\mbox{\boldmath$x$ }}, m, {\mbox{\boldmath$u$ }}(\cdot))=......1}{T} E\int_0^T (h({\mbox{\boldmath$x$ }}(t))+c({\mbox{\boldmath$u$ }}(t))) dt,$
    (4.1)
where ${\mbox{\boldmath$x$ }}(\cdot)$ is the surplus process corresponding to the production process ${\mbox{\boldmath$u$ }}(\cdot)$ with ${\mbox{\boldmath$x$ }}(0)={\mbox{\boldmath$x$ }}$, and$h(\cdot)$ and $c(\cdot)$ are given in Section 2.1. Our goal is to choose ${\mbox{\boldmath$u$ }}(\cdot)\in {\cal A}(m)$ so as to minimize the cost function $\bar J({\mbox{\boldmath$x$ }},m,{\mbox{\boldmath$u$ }}(\cdot))$.

Except Assumption 2.1 in Section 2.1 on the production cost functions$c(\cdot)$, we assume the production capacity process $m(\cdot)$ and the surplus cost function $h(\cdot)$ to satisfy the following:

Assumption 4.1$h(\cdot)$ is a nonnegative convex function with h(0)=0. There are positive constants C41, C42, and$\kappa_{41}$ such that

\begin{displaymath}h({\mbox{\boldmath$x$ }})\geq C_{41}\vert{\mbox{\boldmath$x$ }}\vert^{\kappa_{41}}-C_{42}.\end{displaymath}

Moreover, there are constants C43 and $ \kappa_{42}$ such that

\begin{displaymath}\vert h({\mbox{\boldmath$x$ }})-h({\mbox{\boldmath$x$ }}')\......-1})\vert{\mbox{\boldmath$x$ }}-{\mbox{\boldmath$x$ }}'\vert.\end{displaymath}

Assumption 4.2   m(t) is a finite state Markov chain with generator Q, where Q=(qij), $i,j\in{\cal M}$ is a (p+1) x (p+1) matrix such that $q_{ij}\geq 0$ for $i\neq j$ and $q_{ii}=-\sum_{i \neq j}q_{ij}$. We assume that Q is weakly irreducible. Let $\nu=(\nu_0,\nu_1,...,\nu_p)$ be the equilibrium distribution vector of m(t).

Assumption 4.3   The average capacity $\barm=\sum_{j=0}^pj\nu_j>\sum_{i=1}^nz_i$.A control ${\mbox{\boldmath$u$ }}(\cdot)\in {\cal A}(m)$ is called stable if the condition

$\displaystyle \lim_{T\rightarrow \infty}\frac{E\vert{\mbox{\boldmath$x$ }}(T)\vert^{\kappa_{42}+1}}{T}=0$ (4.2)
holds, where ${\mbox{\boldmath$x$ }}(\cdot)$ is the surplus process corresponding to the control ${\mbox{\boldmath$u$ }}(\cdot)$ with $({\mbox{\boldmath$x$ }}(0),m(0))=({\mbox{\boldmath$x$ }},m)$ and $ \kappa_{42}$ is defined in Assumption 4.1. Let ${\calB}(m)\subset {\cal A}(m)$ denote the class of stable controls.

We will show that there exists a constant $\lambda^*$, independent of the initial condition $({\mbox{\boldmath$x$ }}(0),m(0))=({\mbox{\boldmath$x$ }},m)$, and a stable Markov control policy ${\mbox{\boldmath$u$ }}^*(\cdot)\in{\cal A}(m)$ such that ${\mbox{\boldmath$u$ }}^*(\cdot)$ is optimal, i.e., it minimizes the cost defined by (4.1) over all ${\mbox{\boldmath$u$ }}(\cdot)\in {\cal A}(m)$, and furthermore,

$\displaystyle \lim_{T\rightarrow \infty}\frac{1}{T}E\int_0^T(h({\mbox{\boldmath$x$ }}^*(t))+c({\mbox{\boldmath$u$ }}^*(t)))dt =\lambda^*,$
    (4.3)
where ${\mbox{\boldmath$x$ }}^*(\cdot)$ is the surplus process corresponding to${\mbox{\boldmath$u$ }}^*(\cdot)$ with $({\mbox{\boldmath$x$ }}(0),m(0))=({\mbox{\boldmath$x$ }},m)$. Moreover, for any other (stable) control ${\mbox{\boldmath$u$ }}(\cdot) \in{\cal B}(m)$,
$\displaystyle \liminf_{T\rightarrow \infty}\frac{1}{T}E\int_0^T(h({\mbox{\boldmath$x$ }}(t))+c({\mbox{\boldmath$u$ }}(t)))dt\geq\lambda^*.$
    (4.4)
Since we use the vanishing discount approach to treat our problem, we provide the required results for the discounted problem. First we introduce a corresponding control problem with the cost discounted at a rate $\rho>0$. For ${\mbox{\boldmath$u$ }}(\cdot)\in {\cal A}(m)$, we define the expected discounted cost as
\begin{displaymath}J^{(\rho)} ({\mbox{\boldmath$x$ }}, m, {\mbox{\boldmath$u$ ......\mbox{\boldmath$x$ }}(t))+c({\mbox{\boldmath$u$ }}(t)))\, dt.\end{displaymath}

The value function of the discounted problem is defined as

$\displaystyle V^{\rho} ({\mbox{\boldmath$x$ }}, m)=\inf_{{\mbox{\boldmath$u$ }}......al A}(m)} J^{(\rho)}({\mbox{\boldmath$x$ }},m, {\mbox{\boldmath$u$ }}(\cdot)).$
    (4.5)
In order to study the long-run average cost control problem using the vanishing discount approach, we must first obtain some estimates for the value function $V^{\rho} ({\mbox{\boldmath$x$ }},m)$. To do this, we give the following auxiliary lemma. Its proof is given in Sethi, Suo, Taksar and Zhang (1997).

Lemma 4.1   For any$({\mbox{\boldmath$x$ }},m) \inR^n\times {\cal M}$,${\mbox{\boldmath$y$ }}\in R^n$, there exist a constantC43 and a control policy${\mbox{\boldmath$u$ }}(t), t\geq 0$, such that for$r\geq 1$,

\begin{displaymath}E\tau_0^r \leq C_{43}(1+\sum_{i=1}^n \vert y_i-x_i\vert^r),\end{displaymath}
where
\begin{displaymath}\tau_0 = \inf\{ t\geq 0: {\mbox{\boldmath$x$ }}(t)={\mbox{\boldmath$y$ }}\},\end{displaymath}
and${\mbox{\boldmath$x$ }}(t), t\geq 0$, is the surplus process corresponding to the control policy${\mbox{\boldmath$u$ }}(t)$and initial condition$({\mbox{\boldmath$x$ }}(0),m(0))=({\mbox{\boldmath$x$ }},m)$.
 

This lemma leads to the following result proved in Sethi, Suo, Taksar and Zhang (1997).

Theorem 4.1   For any$({\mbox{\boldmath$x$ }},m),({\mbox{\boldmath$y$ }}, m') \in R^n\times {\cal M}$, there exist a constant C44 and a control policy${\mbox{\boldmath$u$ }}(\cdot)$such that for$r\geq 1$,

\begin{displaymath}E\tau^r \leq C_{44}(1+\sum_{i=1}^n \vert y_i-x_i\vert^r),\end{displaymath}
where
\begin{displaymath}\tau= \inf\{ t\geq 0: ({\mbox{\boldmath$x$ }}(t), m(t))=({\mbox{\boldmath$y$ }}, m')\},\end{displaymath}
and${\mbox{\boldmath$x$ }}(\cdot)$is the surplus process corresponding to the control policy ${\mbox{\boldmath$u$ }}(\cdot)$and initial condition$({\mbox{\boldmath$x$ }}(0),m(0))=({\mbox{\boldmath$x$ }},m)$.
 

With Theorem 4.1 in hand, Sethi, Suo, Taksar and Zhang (1997) prove the following theorem.

Theorem 4.2 (i)   There exists a constant$\rho_0>0$such that$\{ \rho V^{\rho} (0,0): \,0<\rho\leq\rho_0\}$is bounded.
(ii) The function

$\displaystyle W^\rho({\mbox{\boldmath$x$ }},m)= V^\rho({\mbox{\boldmath$x$ }},m)-V^\rho(0,0)$     (4.6)
is convex in${\mbox{\boldmath$x$ }}$. It is locally uniformly bounded, i.e., there exists a constant C 45>0 such that
\begin{displaymath}\vert V^\rho({\mbox{\boldmath$x$ }},m)-V^\rho(0,0)\vert \leq C_{45} (1+\vert{\mbox{\boldmath$x$ }}\vert^{\kappa_{42}+1})\end{displaymath}

for all$({\mbox{\boldmath$x$ }},m)\in R\times{\cal M}$,$\rho\geq 0$.
(iii)$W^\rho({\mbox{\boldmath$x$ }},m)$is locally uniformly Lipschitz continuous in${\mbox{\boldmath$x$ }}$, with respect to$\rho>0$, i.e., for any X>0, there exists a constantC46>0, independent of $\rho$, such that

\begin{displaymath}\vert W^\rho ({\mbox{\boldmath$x$ }},m)-W^\rho ({\mbox{\bol......_{46}\vert{\mbox{\boldmath$x$ }}-{\mbox{\boldmath$x$ }}'\vert\end{displaymath}



for all$m\in {\cal M}$and all$\vert{\mbox{\boldmath$x$ }}\vert, \vert{\mbox{\boldmath$x$ }}'\vert\leq X$.
 

The HJB equation associated with the long-run average cost optimal control problem as formulated above takes the following form

$\displaystyle \lambda =\inf_{{\mbox{\boldmath$u$ }}\in\Omega(m)}\{ \langle W_{{......ldmath$u$ }})\}+h({\mbox{\boldmath$x$ }})+QW({\mbox{\boldmath$x$ }},\cdot)(m),$
    (4.7)
where $\lambda$ is a constant, $W(\cdot,m)$ is a real-valued function, known as the potential function or the relative value function, defined on $R^n\times {\cal M}$,$W_{{\mbox{\boldmath$x$ }}}(\cdot,m)$ is the partial derivative of the relative value function$W(\cdot,m)$ with respect to the state variable ${\mbox{\boldmath$x$ }}$, and$\langle\cdot, \cdot\rangle$ denotes the inner product Without requiring that $W(\cdot,m)$ is C1, it is convenient to write the HJBDD for our problem. The corresponding HJBDD equation can be written as
$\displaystyle \lambda =\inf_{{\mbox{\boldmath$u$ }}\in\Omega(m)} \left\{ \frac{......h$u$ }})\right\}+h({\mbox{\boldmath$x$ }})+QW({\mbox{\boldmath$x$ }},\cdot)(m).$
    (4.8)
Let ${\cal G}$ denote the family of real-valued functions $W(\cdot,\cdot)$ defined on $R\times{\cal M}$ such that
(i)
$W(\cdot,m)$ is convex;
(ii)
$W(\cdot,m)$ has polynomial growth, i.e., there are constants$\kappa_{43}$ and C47>0 such that
\begin{displaymath}\vert W({\mbox{\boldmath$x$ }},m)\vert\leqC_{47}(1+\vert{......^{\kappa_{43}+1}) \qquad \forall {\mbox{\boldmath$x$ }}\in R.\end{displaymath}
A solution to the HJB or HJBDD equation is a pair $(\lambda, W(\cdot,\cdot)$ with $\lambda$ a constant and $W(\cdot,\cdot)\in{\cal G}$. The function$W(\cdot,\cdot)$ is called the potential function for the control problem, if$\lambda$ is the minimum long-run average cost. From Theorem 4.2 follows the next result.

Theorem 4.3   For$({\mbox{\boldmath$x$ }},m) \inR^n\times {\cal M}$, the following limits exist:

$\displaystyle \lambda=\lim_{\rho\rightarrow 0} \rho V^\rho({\mbox{\boldmath$x$ ......x{\boldmath$x$ }},m)=\lim_{\rho\rightarrow 0}W^\rho({\mbox{\boldmath$x$ }},m).$
    (4.9)
Furthermore$(\lambda, V(\cdot,\cdot))$is a viscosity solution to the HJB equation (4.7).
 

Using results from convex analysis, Sethi, Suo, Taksar and Zhang (1997) prove the following theorem.

Theorem 4.4$(\lambda, V(\cdot,\cdot))$defined in Theorem 4.3 is a solution to the HJBDD equation (4.8).
Remark 4.1   When there is no cost of production, i.e., $c({\mbox{\boldmath$u$ }})\equiv 0$, Veatch and Caramanis (1997) introduce the following differential cost function

\begin{eqnarray*}\hat W({\mbox{\boldmath$x$ }},m)=\lim_{T \rightarrow \infty} ......^T _0h({\mbox{\boldmath$x$ }}^*(t))dt - T\lambda ^* \right ],\end{eqnarray*}
where m=m(0), $\lambda^*$ is the optimal value, and${\mbox{\boldmath$x$ }}^*(t)$ is the surplus process corresponding to the optimal production process ${\mbox{\boldmath$u$ }}^*(\cdot)$ with ${\mbox{\boldmath$x$ }}={\mbox{\boldmath$x$ }}^*(0)$. The differential cost function is used in the algorithms to compute a reasonable control policy using infinitesimal perturbation analysis or direct computation of average cost; see Caramanis and Liberopoulos (1992), and Liberopoulos and Caramanis (1995). They prove that the differential cost function $\hatW({\mbox{\boldmath$x$ }},m)$ is convex and differentiable in ${\mbox{\boldmath$x$ }}$. If n=1, h(x1)=|x1| and ${\cal M}=\{0,1\}$, we know from Bielecki and Kumar (1988) that
$\displaystyle \hat W(x,m)=V(x,m).$
    (4.10)
This means that the differential cost function is the same as the potential function given by (4.9). However so far, (4.10) has not been established in general. Now we state the following verification theorem proved by Sethi, Suo, Taksar and Yan (1998).

Theorem 4.5   Let$(\lambda, W(\cdot,\cdot))$be a solution to the HJBDD equation (4.8). Then the following holds.

(i)
If there is a control ${\mbox{\boldmath$u$ }}^*(\cdot)\in{\cal A}(m)$such that
   
$\displaystyle \inf_{{\mbox{\boldmath$u$ }}\in\Omega( m(t))} \left\{ \frac{\part......box{\boldmath$u$ }}-{\mbox{\boldmath$z$ }})}+c({\mbox{\boldmath$u$ }})\right\}$
 
   
$\displaystyle = \frac{\partialW({\mbox{\boldmath$x$ }}^*(t),m(t))}{\partial ({......x{\boldmath$u$ }}^*(t)-{\mbox{\boldmath$z$ }})}+c({\mbox{\boldmath$u$ }}^*(t))$
(4.11)
for a.e.$t\geq 0$with probability one, where${\mbox{\boldmath$x$ }}^*(\cdot)$is the surplus process corresponding to the control ${\mbox{\boldmath$u$ }}^*(\cdot)$, and 
$\displaystyle \lim_{T\rightarrow\infty} \frac{W({\mbox{\boldmath$x$ }}^*(T),m(T))}{T}=0,$
    (4.12)
then
\begin{displaymath}\lambda = J({\mbox{\boldmath$x$ }},m,{\mbox{\boldmath$u$ }}^*(\cdot)). \end{displaymath}
(ii)
For any ${\mbox{\boldmath$u$ }}(\cdot)\in {\cal A}(m)$, we have$\lambda\leq J({\mbox{\boldmath$x$ }},m,{\mbox{\boldmath$u$ }}(\cdot))$, i.e.,
\begin{displaymath}\limsup_{t\rightarrow \infty}E\frac{1}{t}\int_0^t(h({\mbox{......math$x$ }}(t))+c({\mbox{\boldmath$u$ }}(t)))\,dt \geq \lambda. \end{displaymath}
(iii)
Furthermore, for any (stable) control policy ${\mbox{\boldmath$u$ }}(\cdot) \in{\cal B}(m)$, we have 
$\displaystyle \liminf_{t\rightarrow \infty}\frac{1}{t}E\int_0^t(h({\mbox{\boldmath$x$ }}(t))+c({\mbox{\boldmath$u$ }}(t)))\,dt \geq \lambda.$
    (4.13)
In the remainder of this section, let us consider the single product case, i.e., n=1. For this case, Sethi, Suo, Taksar and Zhang (1997) prove the following result.

Theorem 4.6   For$\lambda$andV(x,m) given in (4.9), we have

(i)
V(x,m) is continuously differentiable in x.
(ii)
$(\lambda, V(\cdot,\cdot))$is a classical solution to the HJB equation (4.7).
Let us define a control policy $\hat u(\cdot, \cdot)$ via the potential function $V(\cdot,\cdot)$ as follows:
\begin{displaymath}\hat u(x,m)=\left \{\begin{array}{lll}0 &\mbox{if} \ V_......m & \mbox{if} \ V_x(x,m)<- c_u(m),\end{array}\right.\end{displaymath} (4.14)

if the function $c(\cdot)$ is strictly convex, or

$\displaystyle \hat u(x,m)=\left \{\begin{array}{lll}0 &\mbox{if} \ V_x(x,m)>-......z & \mbox{if} \ V_x(x,m)=-c,\\m & \mbox{if} \ V_x(x,m)<-c,\end{array}\right.$
    (4.15)
if c(u)=cu. Therefore, the control policy $\hat u(\cdot, \cdot)$ satisfies (i) of Theorem 4.5.

From the convexity of the potential function $V(\cdot, m)$, there are $x_m, \ y_m, \ -\infty <y_m <x_m < \infty$ such that

\begin{displaymath}(x_m, \infty) =\{x \ : \ V_x(x,m)>-c_u(0)\} \end{displaymath}

and

\begin{displaymath}(-\infty, y_m)=\{x \ : \ V_x(x,m) < -c_u(m)\}.\end{displaymath}

The control policy $\hat u(\cdot, \cdot)$ can be written as

\begin{eqnarray*}\hat u(x,m)=\left \{\begin{array}{lll}0 &\mbox{if} \ x>x_......eq x \leq x_m,\\m & \mbox{if} \ x<y_m.\end{array}\right.\end{eqnarray*}
Then we have the following result.

Theorem 4.7   The control policy$\hat u(\cdot, \cdot)$defined in (4.14) and (4.15), as the case may be, is optimal.

Proof. By Theorem 4.5, we need only to show that

\begin{displaymath}\lim_{t \rightarrow \infty} \frac {V(\hat x(t), m (t))}{t}=0.\end{displaymath}

But this is implied by Theorem 4.6 and the fact that $\hat u(\cdot, \cdot)$ is a stable control. $\Box$Remark 4.2   When c(u) =0, i.e., there is no production cost in the model, the optimal control policy can be chosen to be the so-called hedging point policy, which has the following form: there are real numbers xk, k=1,...,m, such that

\begin{eqnarray*}\hat u(x,k)=\left \{\begin{array}{lll}0 &\mbox{if} \ x>x_......x{if} \ x=x_k,\\k & \mbox{if} \ x<x_k.\end{array}\right.\end{eqnarray*}
In particular, if h(x) =c1x++c2x- with $x^+=\max\{0,x\}$ and$x^-=\max\{0,-x\}$, we obtain the special case of Bielecki and Kumar (1988). This will be reviewed next.
 

The Bielecki-Kumar Case: Bielecki and Kumar (1988) treated the special case in which h(x)=c1x++c2x-, c(u)=0, and the production capacity $m(\cdot)$ is a two-state birth-death Markov process. Thus, the binary variable $m(\cdot)$ takes the value one when the machine is up and zero when the machine is down. Let 1/q1 and 1/q0 represent the mean time between failures and the mean repair time, respectively. Bielecki and Kumar obtain the following explicit solution:

\begin{eqnarray*}\hat u(x,k)=\left \{\begin{array}{lll}0 &\mbox{if} \ x>x^......x{if} \ x=x^*,\\k & \mbox{if} \ x<x^*,\end{array}\right.\end{eqnarray*}
where
\begin{eqnarray*}\hat x^*=\left \{\begin{array}{ll}0 &\mbox{if} \ \frac{q_......q_0+q_1)}\right]& \mbox{otherwise}.\\\end{array}\right.\end{eqnarray*}

Remark 4.3   When the system equation is governed by the stochastic differential equation

$\displaystyle dx(t)=b(x(t), \alpha(t), u(t))dt+g(x(t), \alpha(t))d\xi(t),$
    (4.16)
where $b(\cdot, \cdot, \cdot)$,$g(\cdot, \cdot)$ are suitable function and $\xi(t)$ is a standard Brownian motion, Ghosh, Arapostathis, Marcus (1993), Ghosh, Arapostathis, and Marcus (1997) and Basak, Bisi and Ghosh (1997) have studied the corresponding HJB equation and established the existence of their solutions and the existence of an optimal control under certain conditions. In particular, Basak, Bisi and Ghosh (1997) allow the matrix $g(\cdot, \cdot)$ to be of any rank between 1 and n.

Remark 4.4   For n=2 and $c({\mbox{\boldmath$u$ }})=0$, Srivastsan and Dallery (1998) limit their focus to only the class of hedging point policies and attempt to partially characterize an optimal solution within this class.

Remark 4.5   Abbad, Bielecki, and Filar (1992) and Filar, Haurie, Moresino, and Vial (1999) consider the perturbed stochastic hybrid system whose continuous part is described by the following stochastic differential equation

\begin{eqnarray*}dx(t)=\varepsilon^{-1}f(x(t),u(t))dt+\varepsilon^{-1/2}A d\xi(t),\end{eqnarray*}
where $f(\cdot,\cdot)$ is continuous in both arguments, A is an n x n matrix, and$\xi(t)$ is a Brownian motion. The perturbation parameter $\varepsilon$ is assumed to be small. They prove that when $\varepsilon$ tends to zero, the optimal solution of the perturbed hybrid system can be approximated by a structured linear program.
nextupprevious
Next:4.2 Optimal control of dynamicUp:4. Optimal Control with thePrevious:4. Optimal Control with the