连续时域上的DP(Dynamic Programming)

首先考虑如下形式的优化问题:

\[\begin{aligned} &&\min J = h(x(t_f),t_f) + \int_{t_0}^{t_f}g(x(t), u(t), t)dt \\ &\text{subject to} \\ &&\dot{x} &= a(x, u, t) \\ &&x(t_0) &= x_0 \\ &&m(x(t_f), t_f) &= 0 \\ &&u(t) &\in \mathscr{U} \end{aligned} \tag{1}\]

其中$t_f$是终止时间,$t_0$是起始时间,$m(x(t_f),t_f)=0$是终止条件(可能不唯一,因为$m$的值域是一个向量),$\mathscr{U}$表示对于$u(t)$的约束。

这个问题的解决的最终形式是一个非线性偏微分方程(Nonlinear Partial Differential Equation),被称作Hamilton-Jacobi-Bellman方程(HJB),下面进行推导。

现在我们设$[t_0, t_f]$区间内的任意一个时间点$t$,我们考虑$[t,t_f]$这个区间内的代价函数,其中$\tau \in [t, t_f]$,那么有如下关系:

\[J(x(t), t, u(\tau)) = h(x(t_f), t_f) + \int_{t}^{t_f}g(x(\tau), u(\tau), \tau)d\tau \tag{2}\]

显然我们把区间$[t,t_f]$分成两个区间来考虑:$[t,t+\Delta t]$和$[t+\Delta t,t_f]$。如下:

\[\begin{aligned} \hat{J}(x(t), t) &= \underset{u(\tau)\in\mathscr{U},\tau\in[t, t_f]}{\min}J(x(t),t,u(\tau)) \\ &=\underset{u(\tau)\in\mathscr{U},\tau\in[t, t_f]}{\min}\left\{h(x(t_f), t_f)+\int_{t}^{t_f}g(x(\tau),u(\tau), \tau)d\tau\right\}\\ &=\underset{u(\tau)\in\mathscr{U},\tau\in[t, t_f]}{\min}\left\{h(x(t_f),t_f)+\int_{t}^{t+\Delta{t}}g(x(\tau), u(\tau),\tau)d\tau+\int_{t+\Delta{t}}^{t_f}g(x(\tau), u(\tau),\tau)d\tau\right\} \end{aligned} \tag{4}\]

我们定义$[t+\Delta t,t_f]$范围内的最优代价函数:

\[\begin{aligned} \hat{J}(x(t+\Delta{t}), t+\Delta{t}) &=\underset{u(\tau)\in\mathscr{U},\tau\in[t+\Delta{t}, t_f]}{\min}\left\{h(x(t_f),t_f)+\int_{t+\Delta{t}}^{t_f}g(x(\tau), u(\tau),\tau)d\tau\right\} \end{aligned} \tag{5}\]

于是有:

\[\begin{aligned} \hat{J}(x(t), t) &=\underset{u(\tau)\in\mathscr{U},\tau\in[t, t+\Delta{t}]}{\min}\left\{\int_{t}^{t+\Delta{t}}g(x(\tau), u(\tau),\tau)d\tau+\hat{J}(x(t+\Delta{t}), t+\Delta{t}) \right\} \end{aligned} \tag{6}\]

我们假设$\Delta t$很小,那么可以对上式进行泰勒展开,有:

\[\hat{J}(x(t+\Delta{t}), t+\Delta{t}) \approx\hat{J}(x(t), t)+\left[\frac{\partial\hat{J}}{\partial{t}}(x(t),t)\right]\Delta{t}+\left[\frac{\partial\hat{J}}{\partial{x}}(x(t),t)\right][x(t+\Delta{t})-x(t)] \tag{7}\]

下面定义其中两个偏微分的别名:

\[\begin{aligned} \hat{J}_t(x(t),t)&=\frac{\partial\hat{J}}{\partial{t}}(x(t),t)\\ \hat{J}_x(x(t),t)&=\frac{\partial\hat{J}}{\partial{x}}(x(t),t) \end{aligned} \tag{8}\]

代入(7)中可以得到:

\[\begin{aligned} \hat{J}(x(t+\Delta{t}), t+\Delta{t})&\approx\hat{J}(x(t), t)+\hat{J}_t(x(t),t)\Delta{t}+\hat{J}_x(x(t),t)[x(t+\Delta{t})-x(t)]\\ &\approx\hat{J}(x(t), t)+\hat{J}_t(x(t),t)\Delta{t}+\hat{J}_x(x(t),t)a(x(t),u(t),t)\Delta{t}\\ \end{aligned} \tag{9}\]

将公式(10)和公式(9)代入到公式(6),得到:

\[\begin{aligned} \hat{J}(x(t), t)&=\underset{u(t)\in\mathscr{U}}{\min}\{g(x(t),u(t),t)\Delta{t}+\hat{J}(x(t),t)+\hat{J}_t(x(t),t)\Delta{t}+\hat{J}_x(x(t),t)a(x(t),u(t),t)\Delta{t}\}\\ &=\hat{J}(x(t),t)+\hat{J}_t(x(t),t)\Delta{t}+\underset{u(t)\in\mathscr{U}}{\min}\{g(x(t),u(t),t)\Delta{t}+\hat{J}_x(x(t),t)a(x(t),u(t),t)\Delta{t}\}\\ 0&=\hat{J}_t(x(t),t)\Delta{t}+\underset{u(t)\in\mathscr{U}}{\min}\{g(x(t),u(t),t)\Delta{t}+\hat{J}_x(x(t),t)a(x(t),u(t),t)\Delta{t}\} \end{aligned} \tag{11}\]

上式中提取出和$u(t)$无关的项目,得到最终的结果。这是一个$\hat{J}(x(t),t)$的偏微分方程,根据最终状态$\hat{J}(x(t_f),t_f)$反向推导前边的状态,其中有:

\[\hat{J}(x(t_f),t_f)=h(x(t_f)) \tag{12}\]

HJB(Hamiltonian-Jacobi-Bellman)等式

下面定义哈密顿量(Hamiltonian):

\[\mathscr{H}(x(t),u(t),\hat{J}_x(x(t),t),t)=g(x(t),u(t),t)+\hat{J}_x(x(t),t)a(x(t),u(t),t) \tag{13}\]

根据公式(11),代入公式(13),并且约掉$\Delta t$,得到:

\[\begin{aligned} -\hat{J}_t(x(t),t)&=\underset{u(t)\in\mathscr{U}}{\min}\{\mathscr{H}(x(t),u(t),\hat{J}_x(x(t),t),t)\} \end{aligned} \tag{14}\]

这即是HJB等式

连续时域LQR(Continuous LQR)

下面考虑如下线性系统模型(Linear System Model)和它的二次代价函数(Quadratic Cost Function):

\[\dot{x}(t)=A(t)x(t)+B(t)u(t) \tag{15}\] \[J=\frac{1}{2}x(t_f)^THx(t_f)+\frac{1}{2}\int_{t_0}^{t_f}\{x(t)^TR_{xx}(t)x(t)+u(t)^TR_{uu}(t)u(t)\}dt \tag{16}\]

假设$t_f$是固定的,$u(t)$没有约束;假设$H,R_{xx}\geq 0$(半正定),$R_{uu}(t)>0$(正定),根据公式(16)和公式(13)的定义,可以得到哈密顿量:

\[\mathscr{H}(x(t),u(t),\hat{J}_x(x(t),t),t)=\frac{1}{2}[x(t)^TR_{xx}(t)x(t)+u(t)^TR_{uu}(t)u(t)]+\hat{J}_x(x(t),t)[A(t)x(t)+B(t)u(t)] \tag{17}\]

根据公式(14),需要找到一个最优的$u(t)$,即$\hat u(t)$ ,使得$\mathscr{H}$最小,那么在没有约束的情况下,需要满足以下必要条件:

\[\frac{\partial\mathscr{H}}{\partial{u}}=u(t)R_{uu}(t)+\hat{J}_x(x(t),t)B(t)=0 \tag{18}\]

于是,根据上式可以得到一个最优控制率(Optimal Control Law):

\[\hat{u}(t)=-R_{uu}^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T \tag{19}\]

如果需要使上述最优条件充分且必要,还需要满足:

\[\frac{\partial^2\mathscr{H}}{\partial{u}^2}=R_{uu}(t)>0 \tag{20}\]

显然是成立的,因此该最优值是全局最小值。

下面把(19)的最优控制率代入到(17)中的哈密顿量中,得到:

\[\begin{aligned} \mathscr{H}(x(t),\hat{u}(t),\hat{J}_x(x(t),t),t)&=\frac{1}{2}[x(t)^TR_{xx}(t)x(t)+\hat{u}(t)^TR_{uu}(t)\hat{u}(t)]+\hat{J}_x(x(t),t)[A(t)x(t)+B(t)\hat{u}(t)]\\ &=\frac{1}{2}[x(t)^TR_{xx}(t)x(t)+[-R_{uu}(t)^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T]^TR_{uu}(t)[-R_{uu}(t)^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T]]+\hat{J}_x(x(t),t)[A(t)x(t)+B(t)[-R_{uu}^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T]]\\ &=\frac{1}{2}[x(t)^TR_{xx}(t)x(t)+\hat{J}_x(x(t),t)B(t)R_{uu}^{-1}(t)R_{uu}(t)R_{uu}^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T]+\hat{J}_x(x(t),t)A(t)x(t)-\hat{J}_x(x(t),t)B(t)R_{uu}^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T\\ &=\frac{1}{2}x(t)^TR_{xx}(t)x(t)+\hat{J}_x(x(t),t)A(t)x(t)-\frac{1}{2}\hat{J}_x(x(t),t)B(t)R_{uu}^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T \end{aligned} \tag{21}\]

于是根据(14)式,我们可以得到如下关系:

\[\begin{aligned} -\hat{J}_t(x(t),t)=\frac{1}{2}x(t)^TR_{xx}(t)x(t)+\hat{J}_x(x(t),t)A(t)x(t)-\frac{1}{2}\hat{J}_x(x(t),t)B(t)R_{uu}^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T \end{aligned} \tag{22}\]

这是一个关于$\hat J$的偏微分方程,根据(16)可以很容易的知道$\hat J$的边界条件:

\[\hat{J}(x(t_f),t_f)=\frac{1}{2}x^T(t_f)Hx(t_f) \tag{23}\]

根据上述公式,我们可以大胆假设$\hat J$在所有时间$t$上均是$x(t)$的二次型(Quadratic Form),因此我们假设:

\[\hat{J}(x(t),t)=\frac{1}{2}x^T(t)P(t)x(t), \quad P(t)=P^T(t) \tag{24}\]

根据假设很容易得到:

\[\begin{aligned} &\hat{J}_x(x(t),t)=\frac{\partial\hat{J}}{\partial{x}}=x^T(t)P(t)\\ &\hat{J}_t(x(t),t)=\frac{\partial\hat{J}}{\partial{t}}=\frac{1}{2}x^T(t)\dot{P}(t)x(t)\\ \end{aligned} \tag{25}\]

将其代入到(22)中可以得到如下关系:

\[\begin{aligned} -\frac{1}{2}x^T(t)\dot{P}(t)x(t)&=\frac{1}{2} x^T(t)R_{xx}(t)x(t)+x^T(t)P(t)A(t)x(t)-\frac{1}{2}x^T(t)P(t)B(t)R_{uu}^{-1}(t)B^T(t)P(t)x(t)\\ &=\frac{1}{2} x^T(t)R_{xx}(t)x(t)+\frac{1}{2}x^T(t)\{P(t)A(t)+A^T(t)P(t)\}x(t)-\frac{1}{2}x^T(t)P(t)B(t)R_{uu}^{-1}(t)B^T(t)P(t)x(t)\\ &=\frac{1}{2}x^T(t)\{R_{xx}(t)+P(t)A(t)+A^T(t)P(t)-P(t)B(t)R_{uu}^{-1}(t)B^T(t)P(t)\}x(t) \end{aligned} \tag{26}\]

综合(23)和(26)中的结果,$P(t)$满足如下关系:

\[\begin{aligned} -\dot{P}(t)&=R_{xx}(t)+P(t)A(t)+A^T(t)P(t)-P(t)B(t)R_{uu}^{-1}(t)B^T(t)P(t)\\ P(t_f)&=H \end{aligned} \tag{27}\]

上式被称为黎卡提微分方程(Differential Riccati Equation)。通过求解这个方程,可以得到$P(t)$,进而可以根据(25)得到$\hat{J}_x(x(t),t)$,最后根据(19)得到最优控制输入$\hat u(t)$,如下:

\[\begin{aligned} \hat{u}(t)&=-R_{uu}^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T\\ &=-R_{uu}^{-1}(t)B(t)^TP(t)x(t) \end{aligned} \tag{28}\]

综上,可以得到最优反馈控制增益$F(t)$:

\[F(t)=R_{uu}^{-1}(t)B^T(t)P(t)\Rightarrow u(t)=-F(t)x(t) \tag{29}\]

线性时不变(LTI)系统

如果系统是线性时不变系统,则(27)中的系统参数矩阵均不再是时间的函数,因此有:

\[\begin{aligned} -\dot{P}(t)&=R_{xx}+P(t)A+A^TP(t)-P(t)BR_{uu}^{-1}B^TP(t)\end{aligned} \tag{30}\]

如果考虑当$t_f\rightarrow\infty$时,我们期望$P(t)$趋近一个定值,因此有:

\[\dot{P}(t) = 0 \tag{31}\] \[\begin{aligned} R_{xx}+PA+A^TP-PBR_{uu}^{-1}B^TP=0\end{aligned} \tag{32}\]

其中$P=\lim_{t \rightarrow \infty}{P(t)}$,上式被称为连续时间代数黎卡提方程(Continuous time Algebraic Riccati Equation - CARE)。对于线性时不变系统的LQR控制器,最优反馈控制增益是:

\[F=R_{uu}^{-1}B^TP\Rightarrow u(t)=-Fx(t) \tag{33}\]

推导完毕。

未经授权,禁止转载


<
Previous Post
从线性连续时间状态转移方程推导离散状态转移方程
>
Next Post
离散系统的LQR推导