Asymptotic behavior of Newton-like inertial dynamics involving the sum of potential and nonpotential terms

Adly, Samir; Attouch, Hedy; Vo, Van Nam

doi:10.1186/s13663-021-00702-7

Research
Open access
Published: 18 October 2021

Asymptotic behavior of Newton-like inertial dynamics involving the sum of potential and nonpotential terms

Samir Adly¹,
Hedy Attouch² &
Van Nam Vo¹

Fixed Point Theory and Algorithms for Sciences and Engineering volume 2021, Article number: 17 (2021) Cite this article

1863 Accesses
3 Citations
Metrics details

Abstract

In a Hilbert space $\mathcal{H}$, we study a dynamic inertial Newton method which aims to solve additively structured monotone equations involving the sum of potential and nonpotential terms. Precisely, we are looking for the zeros of an operator $A= \nabla f +B $, where ∇f is the gradient of a continuously differentiable convex function f and B is a nonpotential monotone and cocoercive operator. Besides a viscous friction term, the dynamic involves geometric damping terms which are controlled respectively by the Hessian of the potential f and by a Newton-type correction term attached to B. Based on a fixed point argument, we show the well-posedness of the Cauchy problem. Then we show the weak convergence as $t\to +\infty $ of the generated trajectories towards the zeros of $\nabla f +B$. The convergence analysis is based on the appropriate setting of the viscous and geometric damping parameters. The introduction of these geometric dampings makes it possible to control and attenuate the known oscillations for the viscous damping of inertial methods. Rewriting the second-order evolution equation as a first-order dynamical system enables us to extend the convergence analysis to nonsmooth convex potentials. These results open the door to the design of new first-order accelerated algorithms in optimization taking into account the specific properties of potential and nonpotential terms. The proofs and techniques are original and differ from the classical ones due to the presence of the nonpotential term.

1 Introduction and preliminary results

Let $\mathcal{H}$ be a real Hilbert space endowed with the scalar product $\langle \cdot ,\cdot \rangle $ and the associated norm $\|\cdot \|$. Many situations coming from physics, biology, human sciences involve equations containing both potential and nonpotential terms. In human sciences, this comes from the presence of both cooperative and noncooperative aspects. In physics, this comes from the joint presence of terms of diffusion and convection. To describe such situations we will focus on solving additively structured monotone equations of the type

$$ \text{Find }x\in \mathcal{H} : \nabla f (x) + B(x) =0. $$

(1.1)

In the above equation, ∇f is the gradient of a convex continuously differentiable function $f: \mathcal{H}\to \mathbb{R}$ (that’s the potential part), and $B: \mathcal{H}\to \mathcal{H}$ is a nonpotential operator ^{Footnote 1} which is supposed to be monotone and cocoercive. To this end, we will consider continuous inertial dynamics whose solution trajectories converge as $t \to +\infty $ to solutions of (1.1). Our study is part of the active research stream that studies the close relationship between continuous dissipative dynamical systems and optimization algorithms which are obtained by their temporal discretization. To avoid lengthening the paper, we limit our study to the analysis of the continuous dynamic. The analysis of the algorithmic part and its link with first-order numerical optimization will be carried out in a second companion paper. From this perspective, damped inertial dynamics offer a natural way to accelerate these systems. As the main feature of our study, we will introduce the dynamic geometric dampings which are respectively driven by the Hessian for the potential part and by the corresponding Newton term for the nonpotential part. In addition to improving the convergence rate, this will considerably reduce the oscillatory behavior of the trajectories. We will pay particular attention to the minimal assumptions which guarantee convergence of the trajectories, and which highlight the asymmetric role played by the two operators involved in the dynamic. We will see that many results can be extended to the case where $f: \mathcal{H}\to \mathbb{R}\cup \{+\infty \}$ is a convex lower semicontinuous proper function, which makes it possible to broaden the field of applications.

1.1 Dynamical inertial Newton method for additively structured monotone problems

For $t\geq t_{0}$, let us introduce the following second-order differential equation which will form the basis of our analysis:

$$\begin{aligned}& \ddot{x}(t)+\gamma \dot{x}(t)+ \nabla f\bigl(x(t)\bigr)+B \bigl(x(t)\bigr)+ \beta _{f} \nabla ^{2} f\bigl(x(t) \bigr) \dot{x}(t) + \beta _{b} B'\bigl(x(t)\bigr) \dot{x}(t) = 0.\quad \end{aligned}$$

(DINAM)

We use (DINAM) as an abbreviation for dynamical inertial Newton method for additively structured monotone problems. We call $t_{0}\in \mathbb{R}$ the origin of time. Since we are considering autonomous systems, we can take any arbitrary real number for $t_{0}$. For simplicity, we set $t_{0}=0$. When considering the corresponding Cauchy problem, we add the initial conditions: $x(0)=x_{0}\in \mathcal{H}$ and $\dot{x}(0)=x_{1}\in \mathcal{H}$. The term $B'(x(t))\dot{x}(t)$ is interpreted as $\frac{d}{dt} (B(x(t)) )$ taken in the distribution sense. Likewise the term $\nabla ^{2} f(x(t)) \dot{x}(t)$ is interpreted as $\frac{d}{dt} ( \nabla f(x(t)) )$ taken also in the distribution sense. Because of the assumptions made below, these terms are indeed measurable functions which are bounded on the bounded time intervals. So, we will consider strong solutions of the above equation (DINAM).

Throughout the paper we make the following standing assumptions:

$$ \textstyle\begin{cases} {\mathrm{(A1)}}\quad f: \mathcal{H}\to \mathbb{R} \text{ is convex, of class } \mathcal{C}^{1}, \\ \hphantom{{\mathrm{(A1)}}\quad}\nabla f \text{ is {Lipschitz} continuous on the bounded sets} ; \\ {\mathrm{(A2)}} \quad B: \mathcal{H}\to \mathcal{H} \text{ is a } \lambda \text{-cocoercive operator for some } \lambda > 0; \\ {\mathrm{(A3)}} \quad \gamma >0, \beta _{f} > 0, \beta _{b}\geq 0 \text{ are given real damping parameters}. \end{cases} $$

We emphasize the fact that we do not assume the gradient of f to be globally Lipschitz continuous. Developing our analysis without using any bound on the gradient of f is a key to further extend the theory to the nonsmooth case. As a specific property, the inertial system (DINAM) combines two different types of driving forces associated respectively with the potential operator ∇f and the nonpotential operator B. It also involves three different types of friction:

(a)
The term $\gamma \dot{x}(t)$ models viscous damping with a positive coefficient $\gamma >0$.
(b)
The term $\beta _{f} \nabla ^{2} f(x(t)) \dot{x}(t)$ is the so-called Hessian driven damping, which allows to attenuate the oscillations that naturally occur with the inertial gradient dynamics.
(c)
The term $\beta _{b} B'(x(t))\dot{x}(t) $ is the nonpotential version of the Hessian driven damping. It can be interpreted as a Newton-type correction term.

Note that each driving force term enters (DINAM) with its temporal derivative. In fact, we have

$$ \nabla ^{2} f\bigl(x(t)\bigr) \dot{x}(t) =\frac{d}{dt} \bigl(\nabla f\bigl(x(t)\bigr) \bigr) \quad \text{and}\quad B' \bigl(x(t)\bigr)\dot{x}(t) =\frac{d}{dt} \bigl(B\bigl(x(t)\bigr) \bigr) . $$

This is a crucial observation which makes (DINAM) equivalent to a first-order system in time and space, and makes the corresponding Cauchy problem well posed. This will be proved later (see Sect. 2.1 for more details). The cocoercivity assumption on the operator B plays an important role in the analysis of (DINAM), not only to ensure the existence of solutions, but also to analyze their asymptotic behavior as time $t\to +\infty $.

Recall that the operator $B: \mathcal{H}\to \mathcal{H}$ is said to be λ-cocoercive for some $\lambda > 0$ if

$$ \langle By-Bx,y-x\rangle \ge \lambda \Vert By-Bx \Vert ^{2},\quad \forall x,y \in \mathcal{H}. $$

Note that B is λ-cocoercive is equivalent to $B^{-1}$ is λ-strongly monotone, i.e., cocoercivity is a dual notion of strong monotonicity. It is easy to check that B is λ-cocoercive implies that B is $1/\lambda $-Lipschitz continuous. The reverse implication holds true in the case where the operator is the gradient of a convex and differentiable function. Indeed, according to Baillon–Haddad’s theorem [17], ∇f is L-Lipschitz continuous implies that ∇f is a $1/L$-cocoercive operator (we refer to [18, Corollary 18.16] for more details).

1.2 Historical aspects of the inertial systems with Hessian-driven damping

The following inertial system with Hessian-driven damping

$$ \ddot{x}(t) + \gamma \dot{x}(t) + \beta \nabla ^{2} f \bigl(x(t) \bigr) \dot{x} (t) + \nabla f \bigl(x(t)\bigr) = 0 $$

was first considered by Alvarez, Attouch, Peypouquet, and Redont in [6]. Then, according to the continuous interpretation by Su, Boyd, and Candès [28] of the accelerated gradient method of Nesterov, Attouch, Peypouquet, and Redont [14] replaced the fixed viscous damping parameter γ with an asymptotic vanishing damping parameter $\frac{\alpha }{t}$, with $\alpha >0$. At first glance, the presence of the Hessian may seem to entail numerical difficulties. However, this is not the case as the Hessian intervenes in the above ODE in the form $\nabla ^{2} f (x(t)) \dot{x} (t)$, which is nothing but the derivative with respect to time of $\nabla f (x(t))$. So, the temporal discretization of these dynamics provides first-order algorithms of the form

$$ \textstyle\begin{cases} y_{k} = x_{k} + \alpha _{k} ( x_{k} - x_{k-1}) - \beta _{k} ( \nabla f (x_{k}) - \nabla f (x_{k-1}) ), \\ x_{k+1} = y_{k} -s \nabla f (y_{k}) . \end{cases} $$

As a specific feature, and by comparison with the classical accelerated gradient methods, these algorithms contain a correction term which is equal to the difference of the gradients at two consecutive steps. While preserving the convergence properties of the accelerated gradient method, they provide fast convergence to zero of the gradients and reduce the oscillatory aspects. Several recent studies have been devoted to this subject, see Attouch, Chbani, Fadili, and Riahi [7], Boţ, Csetnek, and László [20], Kim [24], Lin and Jordan [25], Shi, Du, Jordan, and Su [27], and Alesca, Lazlo, and Pinta [4] for an implicit version of the Hessian driven damping. Application to deep learning has been recently developed by Castera, Bolte, Févotte, and Pauwels [23]. In [3], Adly and Attouch studied the finite convergence of proximal-gradient inertial algorithms combining dry friction with Hessian-driven damping.

1.3 Inertial dynamics involving cocoercive operators

Let us come to the transposition of these techniques to the case of maximally monotone operators. Álvarez and Attouch [5] and Attouch and Maingé [10] studied the equation

$$ \ddot{x}(t) + \gamma \dot{x}(t) + A\bigl(x(t)\bigr) = 0, $$

(1.2)

when $A:\mathcal{H}\to \mathcal{H}$ is a cocoercive (and hence maximally monotone) operator (see also [19]). The cocoercivity assumption plays an important role in the study of (1.2), not only to ensure the existence of solutions, but also to analyze their long-term behavior. Assuming that the cocoercivity parameter λ and the damping coefficient γ satisfy the inequality $\lambda \gamma ^{2} >1$, Attouch and Maingé [10] showed that each trajectory of (1.2) converges weakly to a zero of A, i.e., $x(t)\rightharpoonup x_{\infty }\in A^{-1}(0)$ as $t\to +\infty $. Moreover, the condition $\lambda \gamma ^{2} >1$ is sharp.

For general maximally monotone operators, this property has been further exploited by Attouch and Peypouquet [13] and by Attouch and Laszlo [8, 9]. The key property is that, for $\lambda >0$, the Yosida approximation $A_{\lambda }$ of A is λ-cocoercive and $A_{\lambda }^{-1} (0) = A^{-1}(0)$. So the idea is to replace the operator A with its Yosida approximation and adjust the Yosida regularization parameter. Another related work has been done by Attouch and Maingé [10] who first considered the asymptotic behavior of the second-order dissipative evolution equation with $f:\mathcal{H}\to \mathbb{R}$ convex and $B:\mathcal{H}\to \mathcal{H}$ cocoercive

$$ \ddot{x}(t) + \gamma \dot{x}(t) + \nabla f\bigl(x(t)\bigr) + B \bigl(x(t)\bigr)= 0, $$

(1.3)

combining potential with nonpotential effects. Our study will therefore consist initially in introducing the Hessian term and the Newton-type correcting term into this dynamic.

1.4 Link with Newton-like methods for solving monotone inclusions

Let us specify the link between our study and Newton’s method for solving (1.1). To overcome the ill-posed character of the continuous Newton method for a general maximally monotone operator A, the following first-order evolution system was studied by Attouch and Svaiter [16]:

$$ \textstyle\begin{cases} v(t) \in A(x(t)), \\ \gamma (t) \dot{x}(t) + \beta \dot{v}(t) + v(t) =0. \end{cases} $$

This system can be considered as a continuous version of the Levenberg–Marquardt method, which acts as a regularization of the Newton method. Remarkably, under a fairly general assumption on the regularization parameter $\gamma (t)$, this system is well posed and generates trajectories that converge weakly to equilibria (zeroes of A). Parallel results have been obtained for the associated proximal algorithms obtained by implicit temporal discretization, see [2, 12, 15]. Formally, this system is written as

$$ \gamma (t) \dot{x}(t) + \beta \frac{d}{dt} \bigl( A\bigl(x(t)\bigr) \bigr) + A\bigl(x(t)\bigr) = 0. $$

Thus (DINAM) can be considered as an inertial version of this dynamical system for structured monotone operator $A= \nabla f + B$. Our study is also linked to the recent works by Attouch and Laszlo [8, 9] who considered the general case of monotone equations. By contrast with [8, 9], according to the cocoercivity of B, we do not use the Yosida regularization and exhibit minimal assumptions involving only the nonpotential component.

1.5 Contents

The paper is organized as follows. Section 1 introduces (DINAM) with some historical perspective. In Sect. 2, based on the first-order equivalent formulation of (DINAM), we show that the Cauchy problem is well-posed (in the sense of existence and uniqueness of solutions). In Sect. 3, we analyze the asymptotic convergence properties of the trajectories generated by (DINAM). Using appropriate Lyapunov functions, we show that any trajectory of (DINAM) converges weakly as $t\to +\infty $, and that its limit belongs to $S=(\nabla f +B)^{-1}(0)$. The interplay between the damping parameters $\beta _{f}$, $\beta _{b}$, γ and the cocoercivity parameter λ will play an important role in our Lyapunov analysis. In Sect. 4, we perform numerical experiments showing that the well-known oscillations in the case of the heavy ball with friction are damped with the introduction of the geometric (Hessian-like) damping terms. An application to the LASSO problem with a nonpotential operator as well as a coupled system in dynamical games are considered. Section 5 deals with the extension of the study to the nonsmooth and convex case. Section 6 contains some concluding remarks and perspectives.

2 Well-posedness of the Cauchy–Lipschitz problem

We first show the existence and the uniqueness of the solution trajectory for the Cauchy problem associated with (DINAM) for any given initial condition data $(x_{0},x_{1})\in \mathcal{H}\times \mathcal{H}$.

2.1 First-order in time and space equivalent formulation

The following first-order equivalent formulation of (DINAM) was first considered by Alvarez, Attouch, Bolte, and Redont [6] and Attouch, Peypouquet, and Redont [14] in the framework of convex minimization. Specifically, in our context, we have the following equivalence, which follows from a simple differential and algebraic calculation.

Proposition 2.1

Suppose that $\beta _{f} >0$. Then the following problems are equivalent: $(\mathrm{i}) \Longleftrightarrow (\mathrm{ii})$

$$\begin{aligned} (\mathrm{i})&\quad \ddot{x}(t)+\gamma \dot{x}(t)+ \nabla f\bigl(x(t)\bigr)+B\bigl(x(t)\bigr)+ \beta _{f} \nabla ^{2} f\bigl(x(t)\bigr) \dot{x}(t) + \beta _{b} B'\bigl(x(t)\bigr) \dot{x}(t) = 0. \\ (\mathrm{ii}) &\quad \textstyle\begin{cases} \dot{x}(t) + \beta _{f} \nabla f(x(t)) + \beta _{b} B(x(t)) + ( \gamma - \frac{1}{\beta _{f}} ) x(t) + y(t) = 0; \\ \dot{y}(t) - (1- \frac{\beta _{b}}{\beta _{f}} ) B(x(t)) + \frac{1}{\beta _{f}} ( \gamma - \frac{1}{\beta _{f}} ) x(t) +\frac{1}{\beta _{f}} y(t) = 0. \end{cases}\displaystyle \end{aligned}$$

Proof

$(\mathrm{i})\Longrightarrow (\mathrm{ii})$. For $t\ge 0$, set

$$ y(t):=-\dot{x}(t) - \beta _{f} \nabla f\bigl(x(t) \bigr) -\beta _{b} B\bigl(x(t)\bigr) - \biggl( \gamma - \frac{1}{\beta _{f}} \biggr) x(t), $$

(2.1)

which gives the first equation of (ii). By differentiating $y(\cdot )$ and using (i), we get

$$\begin{aligned} \dot{y}(t)&=-\ddot{x}(t) - \beta _{f} \nabla ^{2} f\bigl(x(t)\bigr)\dot{x}(t) - \beta _{b} B'\bigl(x(t)\bigr)\dot{x}(t) - \biggl( \gamma - \frac{1}{\beta _{f}} \biggr) \dot{x}(t) \\ &=\gamma \dot{x}(t)+\nabla f\bigl(x(t)\bigr)+B\bigl(x(t)\bigr)- \biggl( \gamma - \frac{1}{\beta _{f}} \biggr) \dot{x}(t) \\ &=\nabla f\bigl(x(t)\bigr)+B\bigl(x(t)\bigr)+ \frac{1}{\beta _{f}} \dot{x}(t). \end{aligned}$$

(2.2)

By combining (2.1) and (2.2), we obtain

$$ \dot{y}(t)+\frac{1}{\beta _{f}} y(t) = \biggl(1- \frac{\beta _{b}}{\beta _{f}} \biggr) B\bigl(x(t)\bigr) - \frac{1}{\beta _{f}} \biggl( \gamma - \frac{1}{\beta _{f}} \biggr) x(t). $$

(2.3)

This gives the second equation of (ii).

$(\mathrm{ii})\Longrightarrow (\mathrm{i})$. By differentiating the first equation of (ii), we obtain

$$ \ddot{x}(t) + \beta _{f} \nabla ^{2} f \bigl(x(t)\bigr)\dot{x}(t) + \beta _{b} B'\bigl(x(t) \bigr) \dot{x}(t) + \biggl( \gamma - \frac{1}{\beta _{f}} \biggr) \dot{x}(t) + \dot{y}(t) = 0. $$

(2.4)

Let us eliminate y from this equation to obtain an equation involving only x. For this, we successively use the second equation in (ii), then the first equation in (ii) to obtain

$$\begin{aligned} \dot{y}(t) &= \biggl(1- \frac{\beta _{b}}{\beta _{f}} \biggr) B\bigl(x(t)\bigr) - \frac{1}{\beta _{f}} \biggl( \gamma - \frac{1}{\beta _{f}} \biggr) x(t) - \frac{1}{\beta _{f}} y(t) \\ &= \biggl(1- \frac{\beta _{b}}{\beta _{f}} \biggr) B\bigl(x(t)\bigr) - \frac{1}{\beta _{f}} \biggl( \gamma - \frac{1}{\beta _{f}} \biggr) x(t)+ \frac{1}{\beta _{f}}\dot{x}(t) \\ &\quad {}+ \nabla f\bigl(x(t)\bigr) + \frac{\beta _{b}}{\beta _{f}} B\bigl(x(t)\bigr) + \frac{1}{\beta _{f}} \biggl( \gamma - \frac{1}{\beta _{f}} \biggr) x(t). \end{aligned}$$

Therefore,

$$ \dot{y}(t) =\nabla f\bigl(x(t)\bigr)+B\bigl(x(t)\bigr)+ \frac{1}{\beta _{f}}\dot{x}(t) . $$

(2.5)

From (2.4) and (2.5), we obtain (i). □

2.2 Well-posedness of the evolution equation (DINAM)

In the following theorem, we show the well-posedness of the Cauchy problem for the evolution equation (DINAM).

Theorem 2.1

Suppose that $\beta _{f}>0 $ and $\beta _{b}\geq 0$. Then, for any $(x_{0}, x_{1}) \in \mathcal{H}\times \mathcal{H}$, there exists a unique strong global solution $x:[0, +\infty [ \, \to \mathcal{H}$ of the continuous dynamic (DINAM) which satisfies the Cauchy data $x(0) =x_{0}$, $\dot{x}(0) =x_{1}$.

Proof

System (ii) in Proposition 2.1 can be written equivalently as

$$ \dot{Z}(t) + F\bigl( Z(t)\bigr)=0, \qquad Z(0) = (x_{0},y_{0}), $$

where $Z(t) = (x(t), y(t)) \in \mathcal{H}\times \mathcal{H}$ and

$$\begin{aligned}& F(x,y) = \beta _{f} \bigl(\nabla f(x),0\bigr) \\& \hphantom{F(x,y) ={}}{}+ \biggl(\beta _{b} B(x)+ \biggl( \gamma -\frac{1}{\beta _{f}} \biggr) x + y, - \biggl(1- \frac{\beta _{b}}{\beta _{f}} \biggr) B(x)+ \frac{1}{\beta _{f}} \biggl( \gamma - \frac{1}{\beta _{f}} \biggr) x + \frac{1}{\beta _{f}} y \biggr), \\& y_{0} = - x_{1} - \beta _{f} \nabla f(x_{0}) - \beta _{b} B(x_{0}) - \biggl( \gamma - \frac{1}{\beta _{f}} \biggr) x_{0}. \end{aligned}$$

Therefore, $F = \nabla \Phi + G$, where $\Phi : \mathcal{H}\times \mathcal{H}\to \mathbb{R}$ is the convex differentiable function

$$ \Phi (x,y) := \beta _{f} f(x) $$

and $G: \mathcal{H}\times \mathcal{H}\to \mathcal{H}\times \mathcal{H}$

$$ G (x,y) := \biggl(\beta _{b} B(x)+ \biggl( \gamma - \frac{1}{\beta _{f}} \biggr) x + y, - \biggl(1- \frac{\beta _{b}}{\beta _{f}} \biggr) B(x)+ \frac{1}{\beta _{f}} \biggl( \gamma - \frac{1}{\beta _{f}} \biggr) x + \frac{1}{\beta _{f}} y \biggr) $$

is a Lipschitz continuous map. Indeed, the Lipschitz continuity of G is a direct consequence of the Lipschitz continuity of B. The existence of a classical solution to

$$ \dot{Z}(t) + \nabla \Phi \bigl( Z(t)\bigr) + G\bigl(Z(t)\bigr)=0,\qquad Z(0) = (x_{0},y_{0}) $$

follows from Brézis [21, Proposition 3.12]. In fact, the proof of this result relies on a fixed point argument. It consists in finding a fixed point of the mapping $u \in \mathcal{C} ([0,T], \mathcal{H}) \mapsto K(u) \in \mathcal{C} ([0,T], \mathcal{H})$, where $K(u)=w$ is the solution of

$$ \dot{w}(t) + \nabla \Phi \bigl(w(t)\bigr)= - G\bigl(u(t)\bigr),\qquad w(0) = (x_{0},y_{0}). $$

It is proved that the sequence of iterates $(w_{n})$ generated by the corresponding Picard iteration

$$ \dot{w}_{n+1}(t) + \nabla \Phi \bigl(w_{n+1}(t)\bigr)= - G \bigl(w_{n}(t)\bigr),\qquad w_{n+1}(0) = (x_{0},y_{0}), $$

converges uniformly on $[0,T]$ to a fixed point of K. When returning to (DINAM), that is, equation (i) of Proposition 2.1, we recover a strong solution. Precisely, ẋ is Lipschitz continuous on the bounded time intervals, and ẍ taken in the distribution sense is locally essentially bounded. □

Remark 2.1

Note that when ∇f is supposed to be globally Lipschitz continuous, the above proof can be notably simplified by just applying the classical Cauchy–Lipschitz theorem.

3 Asymptotic convergence properties of (DINAM)

In this section, we study the asymptotic behavior of the solution trajectories of (DINAM). For each solution trajectory $t\mapsto x(t)$ of (DINAM), we show that the weak limit w-$\lim_{t\to +\infty }x(t)=x_{ \infty }$ exists and satisfies $x_{\infty }\in S$, where

$$ S:=\bigl\{ p\in \mathcal{H}: \nabla f(p)+B(p)= 0\bigr\} . $$

Before stating our main result, notice that $B(p)$ is uniquely defined for $p\in S$.

Lemma 3.1

$B(p)$ is uniquely defined for $p\in S$, i.e.,

$$ p_{1}\in S,\quad p_{2} \in S \quad \Longrightarrow\quad B(p_{1})= B(p_{2}). $$

Proof

Since $p_{1}\in S$, $p_{2} \in S $, we have

$$ \nabla f (p_{1}) +B(p_{1})= \nabla f (p_{2}) +B(p_{2})=0. $$

By the monotonicity of ∇f, we have

$$ \bigl\langle \nabla f (p_{2}) - \nabla f (p_{1}), p_{2}-p_{1} \bigr\rangle \geq 0. $$

Replacing $\nabla f (p_{1})$ with $-B(p_{1})$ and $\nabla f (p_{2})$ with $-B(p_{2})$, we get

$$ \bigl\langle B (p_{2}) - B (p_{1}), p_{2}-p_{1} \bigr\rangle \leq 0, $$

which by cocoercivity of B gives $\lambda \|B(p_{2})-B(p_{1})\|^{2} \leq 0$, and hence $B(p_{2})=B(p_{1})$. □

3.1 General case

The general line of the proof is close to that given by Attouch and Laszlo in [8, 9]. The first major difference with the approach developed in [8, 9] is that in our context, thanks to the hypothesis of cocoercivity on the nonpotential part, we do not need to go through the Yosida regularization of the operators. The second difference is that we treat the potential and nonpotential operators in a differentiated way. These points are crucial for applications to numerical algorithms, because the computation of the Yosida regularization of the sum of the two operators is often out of reach numerically.

The following theorem states the asymptotic convergence properties of (DINAM).

Theorem 3.1

Let $B: \mathcal{H} \to \mathcal{H}$ be a λ-cocoercive operator and $f: \mathcal{H}\to \mathbb{R}$ be a $\mathcal{C}^{1}$ convex function whose gradient is Lipschitz continuous on the bounded sets. Suppose that $S= (\nabla f +B)^{-1} (0)\neq \emptyset $, and that the parameters involved in the evolution equation (DINAM) satisfy the following conditions: $\beta _{f} >0$ and

$$ 4 \lambda \gamma > \frac{(\beta _{b}-\beta _{f})^{2}}{\beta _{f}} + 2 \biggl(\beta _{b}+\frac{1}{\gamma } \biggr)+ 2 \sqrt{ \biggl( \beta _{b}+ \frac{1}{ \gamma } \biggr)^{2} + \frac{(\beta _{b}-\beta _{f})^{2}}{\gamma \beta _{f}}}. $$

(3.1)

Then, for any solution trajectory $x:[0,+\infty [\, \to \mathcal{H}$ of (DINAM), the following properties are satisfied:

(i)
(convergence) $x(t)$ converges weakly, as $t\to +\infty $, to an element of S.
(ii)
(integral estimates) Set $A:=B+\nabla f$ and $p\in S$. Then
$$\begin{aligned}& \int _{0}^{+\infty } \bigl\Vert \dot{x}(t) \bigr\Vert ^{2}\,dt< +\infty ,\qquad \int _{0}^{+ \infty } \bigl\Vert \ddot{x}(t) \bigr\Vert ^{2}\,dt< +\infty , \\& \int _{0}^{+\infty } \bigl\Vert B\bigl(x(t) \bigr)-B(p) \bigr\Vert ^{2}\,dt< +\infty ,\qquad \int _{0}^{+ \infty } \biggl\Vert \frac{d}{dt}B\bigl(x(t)\bigr) \biggr\Vert ^{2}\,dt< +\infty , \\& \int _{0}^{+\infty } \bigl\Vert A \bigl(x(t)\bigr) \bigr\Vert ^{2}\,dt< +\infty , \quad \textit{and}\quad \int _{0}^{+ \infty } \biggl\Vert \frac{d}{dt}A \bigl(x(t)\bigr) \biggr\Vert ^{2}\,dt< +\infty . \end{aligned}$$
(iii)
(pointwise estimates)
$$ \lim_{t\to +\infty } \bigl\Vert \dot{x}(t) \bigr\Vert =0,\qquad \lim _{t\to +\infty } \bigl\Vert B\bigl(x(t)\bigr)-B(p) \bigr\Vert =0,\qquad \lim_{t\to +\infty } \bigl\Vert A\bigl(x(t)\bigr) \bigr\Vert =0, $$
where $B(p)$ is uniquely defined for $p\in S$.

Proof

Lyapunov analysis. Set $A:=B+\nabla f$ and $A_{\beta }:=\beta _{b}B+\beta _{f}\nabla f$. Take $p\in S$. Consider the function $t\in [0, +\infty [\, \mapsto \mathcal{V}_{p}(t) \in \mathbb{R}_{+}$ defined by

$$ \mathcal{V}_{p}(t) :=\frac{1}{2} \bigl\Vert x(t)-p+ c \bigl(\dot{x}(t)+A_{\beta }\bigl(x(t)\bigr)-A_{\beta }(p) \bigr) \bigr\Vert ^{2} +\frac{\delta }{2} \bigl\Vert x(t)-p \bigr\Vert ^{2}, $$

(3.2)

where c and δ are coefficients to adjust. Using the differentiation chain rule for absolutely continuous functions (see [22, Corollary VIII.10]) and (DINAM), we get

$$ \begin{aligned}[b] \dot{\mathcal{V}}_{p}(t)= {}&\bigl\langle \dot{x}(t)-c\big(\gamma \dot{x}+A\bigl(x(t)\bigr)\big), x(t)-p+ c \bigl( \dot{x}(t)+A_{\beta }\bigl(x(t)\bigr)-A_{\beta }(p) \bigr) \bigr\rangle \\&{}+ \delta \bigl\langle \dot{x}(t), x(t)-p \bigr\rangle .\end{aligned} $$

(3.3)

Setting $\delta :=c\gamma -1>0$, from (3.3) we obtain

$$ \dot{\mathcal{V}}_{p}(t) = \bigl\langle -cA \bigl(x(t)\bigr), x(t)-p\bigr\rangle +c \bigl\langle (1-c\gamma )\dot{x}(t)-cA \bigl(x(t)\bigr) ,\dot{x}(t)+ A_{\beta }\bigl(x(t)\bigr)-A_{\beta }(p) \bigr\rangle . $$

(3.4)

We have

$$\begin{aligned}& c\bigl\langle (1-c\gamma )\dot{x}(t)-cA\bigl(x(t)\bigr) ,\dot{x}(t)+A_{\beta } \bigl(x(t)\bigr)-A_{\beta }(p)\bigr\rangle \\& \quad = c(1-c\gamma ) \bigl\Vert \dot{x}(t) \bigr\Vert ^{2} +c(1-c \gamma )\bigl\langle \dot{x}(t),A_{\beta }\bigl(x(t)\bigr)-A_{\beta }(p) \bigr\rangle \\& \qquad {} -c^{2}\bigl\langle A\bigl(x(t)\bigr),\dot{x}(t)\bigr\rangle -c^{2}\bigl\langle A\bigl(x(t)\bigr),A_{\beta } \bigl(x(t)\bigr) -A_{\beta }(p)\bigr\rangle , \\& \quad =c(1-c\gamma ) \bigl\Vert \dot{x}(t) \bigr\Vert ^{2}-c^{2} \beta _{b} \bigl\Vert B\bigl(x(t)\bigr)-B(p) \bigr\Vert ^{2}-c^{2} \beta _{f} \bigl\Vert \nabla f \bigl(x(t)\bigr)-\nabla f(p) \bigr\Vert ^{2} \\& \qquad {} +\bigl[c(1-c\gamma )\beta _{b}-c^{2} \bigr]\bigl\langle \dot{x}(t),B\bigl(x(t)\bigr)-B(p) \bigr\rangle \\& \qquad {}+\bigl[c(1-c \gamma )\beta _{f}-c^{2}\bigr]\bigl\langle \dot{x}(t), \nabla f\bigl(x(t)\bigr)- \nabla f(p)\bigr\rangle \\& \qquad {} -c^{2}(\beta _{b}+\beta _{f}) \bigl\langle B\bigl(x(t)\bigr)-B(p),\nabla f\bigl(x(t)\bigr)- \nabla f(p)\bigr\rangle . \end{aligned}$$

(3.5)

Using the fact that $p\in S $, ∇f is monotone, and B is λ-cocoercive, we have

$$\begin{aligned} -c\bigl\langle A\bigl(x(t)\bigr), x(t)-p\bigr\rangle =& -c\bigl\langle A \bigl(x(t)\bigr)-A(p) , x(t)-p \bigr\rangle \\ =& -c\bigl\langle \nabla f\bigl(x(t)\bigr)-\nabla f(p), x(t)-p\bigr\rangle -c \bigl\langle B\bigl(x(t)\bigr)-B(p), x(t)-p\bigr\rangle \\ \le & -c\lambda \bigl\Vert B\bigl(x(t)\bigr)-B(p) \bigr\Vert ^{2}. \end{aligned}$$

(3.6)

From (3.4)–(3.6), we deduce that

$$\begin{aligned} \dot{\mathcal{V}}_{p}(t) \le& - c\delta \bigl\Vert \dot{x}(t) \bigr\Vert ^{2}-\bigl[c^{2} \beta _{b}+c \lambda \bigr] \bigl\Vert B\bigl(x(t)\bigr)-B(p) \bigr\Vert ^{2}-c^{2}\beta _{f} \bigl\Vert \nabla f \bigl(x(t)\bigr)- \nabla f(p) \bigr\Vert ^{2} \\ &{}-\bigl[c\delta \beta _{b}+c^{2}\bigr]\bigl\langle \dot{x}(t),B\bigl(x(t)\bigr)-B(p)\bigr\rangle -\bigl[c \delta \beta _{f}+c^{2}\bigr]\bigl\langle \dot{x}(t),\nabla f \bigl(x(t)\bigr)-\nabla f(p) \bigr\rangle \\ &{}-c^{2}(\beta _{b}+\beta _{f}) \bigl\langle B\bigl(x(t)\bigr)-B(p),\nabla f\bigl(x(t)\bigr)- \nabla f(p) \bigr\rangle . \end{aligned}$$

(3.7)

Let $\Gamma : [0,+\infty [ \, \to \mathbb{R}$ be the function defined by

$$ \Gamma (t):=f\bigl(x(t)\bigr)-f(p) - \bigl\langle \nabla f(p), x(t)-p \bigr\rangle , $$

and $\mathcal{{E}}_{p}: [0,+\infty [ \, \to \mathbb{R}$ be the energy function given by

$$ \mathcal{{E}}_{p}(t):=\mathcal{{V}}_{p}(t)+\bigl[c \delta \beta _{f}+c^{2}\bigr] \Gamma (t). $$

Since f is convex, we have $\Gamma (t) \ge 0$ for all $t\ge 0$. This implies $\mathcal{{E}}_{p}(t)\ge 0$ for all $t\ge 0$ as well.

We have

$$\begin{aligned}& \dot{\Gamma }(t) =\bigl\langle \dot{x}(t),\nabla f\bigl(x(t) \bigr)-\nabla f(p) \bigr\rangle , \end{aligned}$$

(3.8)

$$\begin{aligned}& \dot{\mathcal{E}}_{p}(t) =\dot{\mathcal{V}}_{p}(t)+ \bigl[c\delta \beta _{f}+c^{2}\bigr] \dot{\Gamma }(t). \end{aligned}$$

(3.9)

By using (3.8) and (3.9), equation (3.7) can be rewritten as

$$\begin{aligned}& \dot{\mathcal{E}}_{p}(t)+ c\delta \bigl\Vert \dot{x}(t) \bigr\Vert ^{2}+\bigl[c^{2}\beta _{b}+c \lambda \bigr] \bigl\Vert B\bigl(x(t)\bigr)-B(p) \bigr\Vert ^{2}+c^{2} \beta _{f} \bigl\Vert \nabla f\bigl(x(t)\bigr)- \nabla f(p) \bigr\Vert ^{2} \\& \qquad {} +\bigl[c\delta \beta _{b}+c^{2}\bigr]\bigl\langle \dot{x}(t),B\bigl(x(t)\bigr)-B(p)\bigr\rangle +c^{2}( \beta _{b}+\beta _{f}) \bigl\langle B\bigl(x(t)\bigr)-B(p), \nabla f\bigl(x(t)\bigr)- \nabla f(p) \bigr\rangle \\& \quad \leq 0. \end{aligned}$$

(3.10)

Let us eliminate the term $\nabla f(x(t))-\nabla f(p)$ from this relation by using the elementary algebraic inequality

$$\begin{aligned}& c^{2}\beta _{f} \bigl\Vert \nabla f\bigl(x(t) \bigr)-\nabla f(p) \bigr\Vert ^{2} + c^{2}(\beta _{b}+ \beta _{f}) \bigl\langle B\bigl(x(t)\bigr)-B(p), \nabla f\bigl(x(t)\bigr)-\nabla f(p) \bigr\rangle \\& \quad \geq -\frac{ c^{2}(\beta _{b}+\beta _{f})^{2}}{4\beta _{f}} \bigl\Vert B\bigl(x(t)\bigr)-B(p) \bigr\Vert ^{2} . \end{aligned}$$

We obtain

$$\begin{aligned}& \dot{\mathcal{E}}_{p}(t)+ c\delta \bigl\Vert \dot{x}(t) \bigr\Vert ^{2}+\biggl[c^{2}\beta _{b}+c \lambda -\frac{ c^{2}(\beta _{b}+\beta _{f})^{2}}{4\beta _{f}}\biggr] \bigl\Vert B\bigl(x(t)\bigr)-B(p) \bigr\Vert ^{2} \\& \quad {}+\bigl[c\delta \beta _{b}+c^{2}\bigr]\bigl\langle \dot{x}(t),B\bigl(x(t)\bigr)-B(p)\bigr\rangle \leq 0. \end{aligned}$$

Equivalently

$$ \dot{\mathcal{E}}_{p}(t) +c\mathcal{S}(t) \leq 0, $$

(3.11)

where

$$\begin{aligned} \mathcal{S}(t) :=& \delta \bigl\Vert \dot{x}(t) \bigr\Vert ^{2}+[ \delta \beta _{b}+c] \bigl\langle \dot{x}(t),B\bigl(x(t)\bigr)-B(p) \bigr\rangle \\ &{}+\biggl[c\beta _{b}+\lambda - \frac{ c(\beta _{b}+\beta _{f})^{2}}{4\beta _{f}} \biggr] \bigl\Vert B\bigl(x(t)\bigr)-B(p) \bigr\Vert ^{2}. \end{aligned}$$

Set $X(t)= \dot{x}(t)$ and $Y(t) = B(x(t))-B(p)$. We have $\mathcal{S}(t)= q(X(t),Y(t)) $, where $q: \mathcal{H}\times \mathcal{H}\to \mathbb{R}$ is the quadratic form

$$ q(X,Y):= a \Vert X \Vert ^{2}+ b\langle X,Y\rangle +g \Vert Y \Vert ^{2} $$

with $a= \delta $, $b= \delta \beta _{b}+c$, and $g= c\beta _{b}+\lambda - \frac{ c(\beta _{b}+\beta _{f})^{2}}{4\beta _{f}} = \lambda - \frac{ c(\beta _{b}-\beta _{f})^{2}}{4\beta _{f}}$.

According to Lemma A.3, and since $a=\delta = c\gamma -1 >0$, we have that q is positive definite if and only if $4ag -b^{2} > 0$. Equivalently

$$ 4\delta \biggl( \lambda - \frac{ c(\beta _{b}-\beta _{f})^{2}}{4\beta _{f}} \biggr) - [ \delta \beta _{b}+c]^{2} >0. $$

(3.12)

Our aim is to find c such that $c\gamma -1 >0$ and such that (3.12) is satisfied. Take $\delta :=c\gamma -1>0$ as a new variable. Equivalently, we must find $\delta >0$ such that

$$ 4 \delta \biggl( \lambda -\frac{\delta +1}{\gamma }. \frac{(\beta _{b}-\beta _{f})^{2}}{4\beta _{f}} \biggr)- \biggl( \delta \beta _{b}+\frac{\delta +1}{\gamma } \biggr)^{2}>0. $$

After development and simplification we obtain

$$ 4 \lambda > \biggl[ \frac{(\beta _{b}-\beta _{f})^{2}}{\gamma \beta _{f}} + \frac{2}{ \gamma } \biggl(\beta _{b}+\frac{1}{\gamma } \biggr) \biggr] + \frac{1}{\gamma ^{2} \delta }+ \biggl[ \biggl( \beta _{b}+ \frac{1}{ \gamma } \biggr)^{2} + \frac{(\beta _{b}-\beta _{f})^{2}}{\gamma \beta _{f}} \biggr]\delta . $$

Therefore, we just need to assume that

$$ 4 \lambda > \biggl[ \frac{(\beta _{b}-\beta _{f})^{2}}{\gamma \beta _{f}} + \frac{2}{ \gamma } \biggl(\beta _{b}+\frac{1}{\gamma } \biggr) \biggr] + \inf _{\delta >0} \biggl( \frac{1}{\gamma ^{2} \delta }+ \biggl[ \biggl( \beta _{b}+ \frac{1}{ \gamma } \biggr)^{2} + \frac{(\beta _{b}-\beta _{f})^{2}}{\gamma \beta _{f}} \biggr]\delta \biggr). $$

Elementary optimization argument gives that

$$ \inf_{\delta >0} \biggl(\frac{C}{ \delta }+ D \delta \biggr)= 2 \sqrt{CD}. $$

Therefore we end up with the condition

$$ 4 \lambda > \biggl[ \frac{(\beta _{b}-\beta _{f})^{2}}{\gamma \beta _{f}} + \frac{2}{ \gamma } \biggl(\beta _{b}+\frac{1}{\gamma } \biggr) \biggr] + \frac{2}{ \gamma } \sqrt{ \biggl( \beta _{b}+ \frac{1}{ \gamma } \biggr)^{2} + \frac{(\beta _{b}-\beta _{f})^{2}}{\gamma \beta _{f}}}. $$

Equivalently

$$ 4\lambda \gamma > \biggl[ \frac{(\beta _{b}-\beta _{f})^{2}}{\beta _{f}} + 2 \biggl(\beta _{b}+ \frac{1}{\gamma } \biggr) \biggr] + 2 \sqrt{ \biggl( \beta _{b}+ \frac{1}{ \gamma } \biggr)^{2} + \frac{(\beta _{b}-\beta _{f})^{2}}{\gamma \beta _{f}}}. $$

(3.13)

When $\beta _{b}=\beta _{f}=\beta $, we recover the condition

$$ \lambda \gamma > \beta + \frac{1}{\gamma }. $$

Note that $c\gamma =1+\delta $ and $\delta >0$ implies $c>0$. According to (3.11), $\mathcal{S}(t)= q(X(t),Y(t)) $, and q positive definite, we deduce that there exist positive real numbers c, μ such that

$$ \dot{\mathcal{E}}_{p}(t)+c\mu \bigl\Vert \dot{x}(t) \bigr\Vert ^{2}+c\mu \bigl\Vert B\bigl(x(t)\bigr)-B(p) \bigr\Vert ^{2} \le 0. $$

(3.14)

Estimates. Let us start from (3.14) that we integrate on $[0,t]$, $t\ge 0$. We obtain

$$ \mathcal{E}_{p}(t)+c\mu \int _{0}^{t} \bigl\Vert \dot{x}(s) \bigr\Vert ^{2}\,ds+ c\mu \int _{0}^{t} \bigl\Vert B\bigl(x(s) \bigr)-B(p) \bigr\Vert ^{2}\,ds\le \mathcal{E}_{p}(0). $$

(3.15)

From (3.15) and the definition of $\mathcal{E}_{p}$, we immediately deduce

$$\begin{aligned} &\sup_{t\ge 0} \bigl\Vert x(t)-p \bigr\Vert < +\infty , \end{aligned}$$

(3.16)

$$\begin{aligned} &\sup_{t\ge 0} \bigl\Vert x(t)-p+ c\bigl( \dot{x}(t)+A_{\beta }\bigl(x(t)\bigr)-A_{\beta }(p)\bigr) \bigr\Vert < + \infty , \end{aligned}$$

(3.17)

$$\begin{aligned} & \int _{0}^{+\infty } \bigl\Vert \dot{x}(t) \bigr\Vert ^{2}\,dt< +\infty , \end{aligned}$$

(3.18)

$$\begin{aligned} & \int _{0}^{+\infty } \bigl\Vert B\bigl(x(t) \bigr)-B(p) \bigr\Vert ^{2} \,dt< +\infty . \end{aligned}$$

(3.19)

Let us return to (3.10). We recall that

$$\begin{aligned}& \dot{\mathcal{E}}_{p}(t)+ c\delta \bigl\Vert \dot{x}(t) \bigr\Vert ^{2}+\bigl[c^{2}\beta _{b}+c \lambda \bigr] \bigl\Vert B\bigl(x(t)\bigr)-B(p) \bigr\Vert ^{2}+c^{2} \beta _{f} \bigl\Vert \nabla f\bigl(x(t)\bigr)- \nabla f(p) \bigr\Vert ^{2} \\& \qquad {} +\bigl[c\delta \beta _{b}+c^{2}\bigr]\bigl\langle \dot{x}(t),B\bigl(x(t)\bigr)-B(p)\bigr\rangle +c^{2}( \beta _{b}+\beta _{f}) \bigl\langle B\bigl(x(t)\bigr)-B(p), \nabla f\bigl(x(t)\bigr)- \nabla f(p) \bigr\rangle \\& \quad \leq 0. \end{aligned}$$

(3.20)

After integration on $[0,t]$, and by using the integral estimates $\int _{0}^{+\infty }\|\dot{x}(t)\|^{2}\,dt<+\infty $ and $\int _{0}^{+\infty } \|B(x(t))-B(p)\|^{2} \,dt<+\infty $ obtained in (3.18) and (3.19), we get the existence of a constant $C>0$ such that

$$\begin{aligned}& c^{2}\beta _{f} \int _{0}^{t} \bigl\Vert \nabla f\bigl(x(s) \bigr)-\nabla f(p) \bigr\Vert ^{2} \,ds \\& \quad \leq C+ c^{2}( \beta _{b}+\beta _{f}) \int _{0}^{t} \bigl\Vert B\bigl(x(s) \bigr)-B(p) \bigr\Vert \bigl\Vert \nabla f\bigl(x(s)\bigr)-\nabla f(p) \bigr\Vert \,ds. \end{aligned}$$

Therefore, for any $\epsilon >0$, we have

$$\begin{aligned}& c^{2}\beta _{f} \int _{0}^{t} \bigl\Vert \nabla f\bigl(x(s) \bigr)-\nabla f(p) \bigr\Vert ^{2} \,ds \\& \quad \leq C+ c^{2}(\beta _{b}+\beta _{f}) \int _{0}^{t} \biggl( \frac{1}{4\epsilon } \bigl\Vert B\bigl(x(s)\bigr)-B(p) \bigr\Vert ^{2} + \epsilon \bigl\Vert \nabla f\bigl(x(s)\bigr)- \nabla f(p) \bigr\Vert ^{2} \biggr) \,ds. \end{aligned}$$

By taking $\epsilon >0$ such that $\beta _{f}> \epsilon (\beta _{b}+\beta _{f}) $, which is always possible since $\beta _{f} >0$, we conclude

$$ \int _{0}^{+\infty } \bigl\Vert \nabla f\bigl(x(t) \bigr)- \nabla f (p) \bigr\Vert ^{2} \,dt< +\infty . $$

Combining this with $\int _{0}^{+\infty } \|B(x(t))-B(p)\|^{2} \,dt<+\infty $, it follows immediately

$$ \int _{0}^{+\infty } \bigl\Vert A\bigl(x(t)\bigr)- A (p) \bigr\Vert ^{2}\,dt < +\infty . $$

(3.21)

Moreover, we also have

$$\begin{aligned}& \int _{0}^{+\infty } \bigl\Vert A_{\beta } \bigl(x(t)\bigr)-A_{\beta }(p) \bigr\Vert ^{2}\,dt \\& \quad = \int _{0}^{+ \infty } \bigl\Vert \beta _{f}\bigl(\nabla f\bigl(x(t)\bigr)-\nabla f(p)\bigr)+\beta _{b}\bigl(B\bigl(x(t)\bigr)-B(p)\bigr) \bigr\Vert ^{2}\,dt \\& \quad \le \bigl(\beta _{f}^{2}+\beta _{b}^{2} \bigr) \int _{0}^{+\infty } \bigl\Vert \nabla f\bigl(x(t) \bigr)-\nabla f(p) \bigr\Vert ^{2} + \bigl\Vert B\bigl(x(t) \bigr)-B(p) \bigr\Vert ^{2} \,dt< +\infty . \end{aligned}$$

(3.22)

According to (3.16) the trajectory $x(\cdot )$ is bounded. Set $R:= \sup_{t\geq 0} \|x(t)\|$. By assumption, ∇f is Lipschitz continuous on the bounded sets. Let $L_{R} <+\infty $ be the Lipschitz constant of ∇f on $B(0,R)$. Since B is λ-cocoercive, it is $\frac{1}{\lambda }$-Lipschitz continuous. Therefore A is L-Lipschitz continuous on the trajectory with $L:=L_{R} + \frac{1}{\lambda } $. Therefore

$$ \frac{d}{dt} \bigl\Vert A\bigl(x(t)\bigr) \bigr\Vert \leq \biggl\Vert \frac{d}{dt}A\bigl(x(t)\bigr) \biggr\Vert \le L \bigl\Vert \dot{x}(t) \bigr\Vert \quad \text{for all } t\ge 0. $$

(3.23)

Using (3.21) and (3.23), we deduce that $u(t):= \|A(x(t))\|$ satisfies the condition of Lemma A.2 (with $p=2$ and $r=2$). Therefore,

$$ \lim_{t\to +\infty } \bigl\Vert A\bigl(x(t)\bigr) \bigr\Vert =0. $$

(3.24)

Likewise, according to (3.22), we have

$$ \lim_{t\to +\infty } \bigl\Vert A_{\beta } \bigl(x(t)\bigr)-A_{\beta }(p) \bigr\Vert =0. $$

(3.25)

By using the same argument as in (3.23), we obtain that $\frac{d}{dt}A_{\beta }(x(t))$ is bounded. From (3.23) we also get that

$$ \int _{0}^{+\infty } \biggl\Vert \frac{d}{dt}A\bigl(x(t)\bigr) \biggr\Vert ^{2}\,dt< + \infty . $$

Similarly, we also have

$$ \int _{0}^{+\infty } \biggl\Vert \frac{d}{dt}B\bigl(x(t)\bigr) \biggr\Vert ^{2}\,dt< + \infty . $$

By using (DINAM), we have

$$\begin{aligned} \ddot{x}(t)&= -\gamma \dot{x}(t)-A\bigl(x(t)\bigr) -\frac{d}{dt}A_{\beta } \bigl(x(t)\bigr) \\ & = -\gamma \dot{x}(t)-A\bigl(x(t)\bigr) -\beta _{f} \frac{d}{dt}A\bigl(x(t)\bigr)-(\beta _{b}- \beta _{f})\frac{d}{dt}B\bigl(x(t)\bigr). \end{aligned}$$

Since the second member of the above equality belongs to $L^{2} (0, +\infty ; \mathcal{H})$, we finally get

$$ \int _{0}^{+\infty } \bigl\Vert \ddot{x}(t) \bigr\Vert ^{2}\,dt< +\infty . $$

Combining this property with (3.18) and using Lemma A.2, we deduce that

$$ \lim_{t\to +\infty } \bigl\Vert \dot{x}(t) \bigr\Vert =0. $$

(3.26)

The limit. To prove the existence of the weak limit of $x(t)$, we use Opial’s lemma (see [26] for more details). Given $p\in S $, let us consider the anchor function defined by, for every $t \in [0,+\infty [$,

$$ q_{p}(t):=\frac{1 }{2} \bigl\Vert x(t)-p \bigr\Vert ^{2}. $$

From $\dot{q}_{p}(t)=\langle \dot{x}(t), x(t)-p\rangle $ and $\ddot{q}_{p}(t)=\|\dot{x}(t)\|^{2}+ \langle \ddot{x}(t), x(t)-p \rangle $, we obtain

$$\begin{aligned} \ddot{q}_{p}(t) +\gamma \dot{q}_{p}(t)&= \bigl\Vert \dot{x}(t) \bigr\Vert ^{2}+ \bigl\langle \ddot{x}(t)+\gamma \dot{x}(t),x(t)-p \bigr\rangle \\ &= \bigl\Vert \dot{x}(t) \bigr\Vert ^{2}- \biggl\langle A \bigl(x(t)\bigr)+\frac{d}{dt}A_{\beta }\bigl(x(t)\bigr),x(t)-p \biggr\rangle \\ & \le \bigl\Vert \dot{x}(t) \bigr\Vert ^{2} - \biggl\langle \frac{d}{dt}A_{\beta }\bigl(x(t)\bigr),x(t)-p \biggr\rangle . \end{aligned}$$

Equivalently,

$$ \ddot{q}_{p}(t) + \gamma \dot{q}_{p}(t) + \biggl\langle \frac{d}{dt} A_{\beta }\bigl(x(t)\bigr), x(t)-p \biggr\rangle \leq \bigl\Vert \dot{x}(t) \bigr\Vert ^{2} . $$

(3.27)

According to the differentiation formula for a product, we can rewrite (3.27) as follows:

$$ \ddot{q}_{p}(t) + \gamma \dot{q}_{p}(t) + \frac{d}{dt} \bigl\langle A_{\beta }\bigl(x(t) \bigr)-A_{\beta }(p), x(t)-p \bigr\rangle \leq \bigl\Vert \dot{x}(t) \bigr\Vert ^{2} + \bigl\langle A_{\beta }\bigl(x(t) \bigr)-A_{\beta }(p), \dot{x}(t) \bigr\rangle . $$

By the Cauchy–Schwarz inequality, we get

$$\begin{aligned}& \ddot{q}_{p}(t) + \gamma \dot{q}_{p} (t) +\frac{d}{dt} \bigl\langle A_{\beta }\bigl(x(t) \bigr)-A_{\beta }(p), x(t)-p \bigr\rangle \\& \quad \leq \bigl\Vert \dot{x}(t) \bigr\Vert ^{2} + \bigl\Vert A_{\beta }\bigl(x(t) \bigr)-A_{\beta }(p) \bigr\Vert \bigl\Vert \dot{x}(t) \bigr\Vert . \end{aligned}$$

(3.28)

Then note that the second member of (3.28)

$$ g(t):= \bigl\Vert \dot{x}(t) \bigr\Vert ^{2} + \bigl\Vert A_{\beta }\bigl(x(t)\bigr)-A_{\beta }(p) \bigr\Vert \bigl\Vert \dot{x}(t) \bigr\Vert $$

is nonnegative and belongs to $L^{1} (0,+\infty )$. Indeed, we have

$$ \int _{0}^{+\infty } \bigl\Vert A_{\beta } \bigl(x(t)\bigr)-A_{\beta }(p) \bigr\Vert \bigl\Vert \dot{x}(t) \bigr\Vert \,dt\leq \frac{1}{2} \int _{0}^{+\infty } \bigl\Vert A_{\beta } \bigl(x(t)\bigr)-A_{\beta }(p) \bigr\Vert ^{2} \,dt+ \frac{1}{2} \int _{0}^{+\infty } \bigl\Vert \dot{x}(t) \bigr\Vert ^{2} \,dt. $$

Using (3.18) and (3.22), we deduce that

$$ \int _{0}^{+\infty } g(t)\,dt< +\infty . $$

Note that the left member of (3.28) can be rewritten as a derivative of a function, precisely

$$ \ddot{q}_{p}(t) + \gamma \dot{q}_{p}(t) + \frac{d}{dt} \bigl\langle A_{\beta }\bigl(x(t) \bigr)-A_{\beta }(p), x(t)-p \bigr\rangle = \dot{h}(t) $$

with

$$ h(t)= \dot{q}_{p}(t) + \gamma q_{p}(t) + \bigl\langle A_{\beta }\bigl(x(t)\bigr)-A_{\beta }(p), x(t)-p \bigr\rangle . $$

(3.29)

So we have

$$ \dot{h}(t) \leq g(t)\quad \text{for every } t\geq 0. $$

Let us prove that the function h given in (3.29) is bounded from below by some constant. Indeed, since the terms $q_{p}(t)$ and $\langle A_{\beta }(x(t))-A_{\beta }(p), x(t)-p \rangle $ are nonnegative, we have

$$ h(t)\geq \dot{q}_{p}(t) \geq - \bigl\Vert \dot{x}(t) \bigr\Vert \bigl\Vert x(t)-p \bigr\Vert . $$

According to the boundedness of $x(\cdot )$ and $\dot{x}(\cdot )$ (see (3.16) and (3.26)), we deduce that there exists $m\in \mathbb{R}$ such that

$$ h(t)\geq m \quad \text{for every } t\geq 0. $$

Let us introduce the real-valued function $\varphi : \mathbb{R}_{+}\to \mathbb{R}$, $t\mapsto \varphi (t)$ defined by

$$ \varphi (t)=h(t)- \int _{0}^{t} g(s)\,ds. $$

We have $\varphi '(t)=\dot{h}(t)-g(t)\leq 0$. Hence, the function φ is nonincreasing on $[0,+\infty [$. This classically implies that the limit of φ exists as $t\to +\infty $. Since $g\in L^{1}(0,+\infty )$, we deduce that $\lim_{t\to +\infty } h(t)$ exists.

Using the fact that $\langle A_{\beta }(x(t))-A_{\beta }(p), x(t)-p \rangle $ tends to zero as $t\to +\infty $ (a consequence of (3.25) and $x(\cdot )$ bounded), we obtain

$$ \dot{q}_{p}(t) + \gamma q_{p}(t) = \theta (t) $$

with limit of $\theta (t)$ exists as $t\to +\infty $. The existence of the limit of $q_{p}$ then follows from a classical general result concerning the convergence of evolution equations governed by strongly monotone operators (here γId, see Theorem 3.9, p. 88 in [21]). This means that, for all $p\in S$,

$$ \lim_{t\to +\infty } \bigl\Vert x(t)-p \bigr\Vert \quad \text{exists}. $$

To complete the proof via Opial’s lemma, we need to show that every weak sequential cluster point of $x(t)$ belongs to S. Let $t_{n} \to +\infty $ such that $x(t_{n}) \rightharpoonup x^{*}, n\to +\infty $. We have

$$ A\bigl( x(t_{n})\bigr) \to 0 \quad \text{strongly in } \mathcal{H}\quad \text{and}\quad x(t_{n}) \rightharpoonup x^{*}\quad \text{weakly in }\mathcal{H}. $$

From the closedness property of the graph of the maximally monotone operator A in $w-\mathcal{H}\times s-\mathcal{H}$, we deduce that $A(x^{*})=0$, that is, $x^{*} \in S$.

Consequently, $x(t)$ converges weakly to an element of S as t goes to +∞. The proof of Theorem 3.1 is thereby completed. □

Remark 3.1

In the statement of Theorem 3.1, the parameters have to satisfy a certain condition. If the rest of parameters are fixed, then the set of λs that fulfill the inequality can easily be found. Likewise, the feasible set of γs if the other parameters are fixed can be determined explicitly.

In fact, let us rewrite condition (3.1) as follows:

$$ 4\lambda > \frac{\beta _{b}^{2}+\beta _{f}^{2}}{\gamma \beta _{f}} + \frac{2}{\gamma ^{2}}+ \frac{2}{\gamma } \sqrt{ \beta _{b}^{2} + \frac{1}{ \gamma ^{2}} + \frac{\beta _{b}^{2}+\beta _{f}^{2}}{\gamma \beta _{f}}}. $$

Equivalently,

$$ 4\lambda +\beta _{b}^{2}>\beta _{b}^{2}+\frac{1}{\gamma ^{2}}+ \frac{\beta _{b}^{2}+\beta _{f}^{2}}{\gamma \beta _{f}} + \frac{2}{\gamma } \sqrt{ \beta _{b}^{2} + \frac{1}{ \gamma ^{2}} + \frac{\beta _{b}^{2}+\beta _{f}^{2}}{\gamma \beta _{f}}}+ \frac{1}{\gamma ^{2}}. $$

(3.30)

Thanks to

$$ \beta _{b}^{2}+\frac{1}{\gamma ^{2}}+ \frac{\beta _{b}^{2}+\beta _{f}^{2}}{\gamma \beta _{f}} + \frac{2}{\gamma } \sqrt{ \beta _{b}^{2} + \frac{1}{ \gamma ^{2}} + \frac{\beta _{b}^{2}+\beta _{f}^{2}}{\gamma \beta _{f}}}+ \frac{1}{\gamma ^{2}}= \biggl( \sqrt{ \beta _{b}^{2} + \frac{1}{ \gamma ^{2}} + \frac{\beta _{b}^{2}+\beta _{f}^{2}}{\gamma \beta _{f}}}+ \frac{1}{\gamma } \biggr)^{2}, $$

we immediately deduce that

$$ 4\lambda +\beta _{b}^{2}> \biggl( \sqrt{ \beta _{b}^{2} + \frac{1}{ \gamma ^{2}} + \frac{\beta _{b}^{2}+\beta _{f}^{2}}{\gamma \beta _{f}}}+ \frac{1}{\gamma } \biggr)^{2}. $$

Therefore (3.30) is equivalent to

$$ \sqrt{ \beta _{b}^{2} + \frac{1}{ \gamma ^{2}} + \frac{\beta _{b}^{2}+\beta _{f}^{2}}{\gamma \beta _{f}}}+ \frac{1}{\gamma }< \sqrt{4\lambda +\beta _{b}^{2}}. $$

This in turn is equivalent to

$$ \textstyle\begin{cases} \frac{1}{\gamma } < \sqrt{4\lambda +\beta _{b}^{2}}, \\ (\sqrt{ \beta _{b}^{2} + \frac{1}{ \gamma ^{2}} + \frac{\beta _{b}^{2}+\beta _{f}^{2}}{\gamma \beta _{f}}} )^{2} < (\sqrt{4\lambda +\beta _{b}^{2}}-\frac{1}{\gamma } )^{2}. \end{cases} $$

(3.31)

From the first inequation of (3.31), we deduce that

$$ \gamma > \frac{1}{\sqrt{4\lambda +\beta _{b}^{2}}}. $$

(3.32)

From the second inequation of (3.31), we deduce that

$$ \beta _{b}^{2} + \frac{1}{ \gamma ^{2}} + \frac{\beta _{b}^{2}+\beta _{f}^{2}}{\gamma \beta _{f}}< 4\lambda + \beta _{b}^{2}+ \frac{1}{\gamma ^{2}}-\frac{2}{\gamma }\sqrt{4\lambda + \beta _{b}^{2}}. $$

Therefore,

$$ \gamma >\frac{1}{4\lambda } \biggl( \frac{\beta _{b}^{2}+\beta _{f}^{2}}{\beta _{f}} +2 \sqrt{4\lambda + \beta _{b}^{2}} \biggr). $$

(3.33)

Since (3.33) implies (3.32), we obtain that the feasible set of γs is defined by

$$ \gamma >\frac{1}{4\lambda } \biggl( \frac{\beta _{b}^{2}+\beta _{f}^{2}}{\beta _{f}} +2\sqrt{4\lambda + \beta _{b}^{2}} \biggr). $$

3.2 Case $\beta _{b} =\beta _{f}$

Let us specialize the previous results in the case $\beta _{b}=\beta _{f}$. We set $\beta _{b}= \beta _{f}:=\beta > 0$ and $A:= \nabla f + B$. We thus consider the evolution system

$$ \text{(DINAM)}\quad \ddot{x}(t) + \gamma \dot{x}(t) + A\bigl(x(t)\bigr) + \beta \frac{d}{dt} \bigl( A\bigl(x(t)\bigr) \bigr)= 0,\quad t\geq 0. $$

The existence of strong global solutions to this system is guaranteed by Theorem 2.1. The convergence properties as $t \to +\infty $ of the solution trajectories generated by this system is a consequence of Theorem 3.1 and are given below.

Corollary 3.1

Let $B: \mathcal{H} \to \mathcal{H}$ be a λ-cocoercive operator and $f: \mathcal{H}\to \mathbb{R}$ be a $\mathcal{C}^{1}$ convex function whose gradient is Lipschitz continuous on the bounded sets. Suppose that the solution set $S= (\nabla f+B)^{-1} (0)\neq \emptyset $. Consider the evolution equation (DINAM), where $A= \nabla f +B$, $\beta _{b}=\beta _{f}:= \beta > 0$ and where the involved parameters satisfy the following conditions:

$$ \gamma >0,\qquad \beta > 0,\quad \textit{and}\quad \lambda \gamma >\beta + \frac{1}{\gamma }. $$

(3.34)

Then, for any solution trajectory $x:[0,+\infty [\,\to \mathcal{H}$ of (DINAM), the following properties are satisfied:

(i)
(convergence) The trajectory $x(\cdot )$ is bounded and $x(t)$ converges weakly, as $t\to +\infty $, to an element $x^{*}\in S$.
(ii)
(integral estimate)
$$\begin{aligned}& \int _{0}^{+\infty } \bigl\Vert \dot{x}(t) \bigr\Vert ^{2}\,dt< +\infty ,\qquad \int _{0}^{+ \infty } \bigl\Vert \ddot{x}(t) \bigr\Vert ^{2}\,dt< +\infty , \\& \int _{0}^{+\infty } \bigl\Vert A\bigl(x(t)\bigr) \bigr\Vert ^{2}\,dt< +\infty ,\quad \textit{and}\quad \int _{0}^{+ \infty } \biggl\Vert \frac{d}{dt}A\bigl(x(t)\bigr) \biggr\Vert ^{2}\,dt< +\infty . \end{aligned}$$
(iii)
(pointwise estimate)
$$ \lim_{t\to +\infty } \bigl\Vert \dot{x}(t) \bigr\Vert =0, \quad \textit{and}\quad \lim_{t\to + \infty } \bigl\Vert A\bigl(x(t)\bigr) \bigr\Vert =0. $$

Remark 3.2

It is worth stating the result of Corollary 3.1 apart because this is an important case. This also makes it possible to highlight this result compared to the existing literature for second-order dissipative evolution systems involving cocoercive operators. Indeed, letting β go to zero in (3.34) gives the condition

$$ \lambda \gamma ^{2} >1 $$

(3.35)

introduced by Attouch and Maingé in [10] to study the second-order dynamic (1.3) without geometric damping. With respect to [10], the introduction of the geometric damping, i.e., taking $\beta >0$, provides some useful additional estimates.

4 Numerical illustrations

In this section, we give some numerical illustrations of (DINAM).

4.1 From continuous dynamic to algorithms

Let us first give some indications concerning the algorithms obtained by temporal discretization of the continuous dynamic (DINAM). Their convergence analysis will be postponed to another research investigation. Let us recall the condensed formulation of (DINAM)

$$ \ddot{x}(t)+\gamma \dot{x}(t)+ A\bigl(x(t)\bigr)+ \frac{d}{dt} \bigl( A_{\beta }\bigl(x(t)\bigr) \bigr)= 0, $$

(DINAM)

where $A:=\nabla f+B$ and $A_{\beta }:=\beta _{b}B+\beta _{f}\nabla f$. Take a fixed time step $h>0$, and consider the following finite-difference scheme for (DINAM):

$$\begin{aligned} & \frac{1}{h^{2}}(x_{k+1}-2x_{k}+x_{k-1})+ \frac{\gamma }{h}(x_{k+1}-x_{k})+ \frac{\beta _{b}}{h}\bigl( B(x_{k+1})-B(x_{k})\bigr) \\ &\quad {}+\frac{\beta _{f}}{h}\bigl( \nabla f(x_{k})-\nabla f(x_{k-1})\bigr)+B(x_{k+1})+ \nabla f(x_{k})=0. \end{aligned}$$

(4.1)

This scheme is implicit with respect to the nonpotential B and explicit with respect to the potential operator ∇f. The temporal discretization of the Hessian driven damping $\beta _{f} \nabla ^{2} f(x(t)) \dot{x}(t)$ is taken equal to $\frac{\beta _{f}}{h}(\nabla f(x_{k})-\nabla f(x_{k-1}))$. After expanding (4.1), we obtain

$$\begin{aligned}& x_{k+1}+\frac{h^{2}}{1+\gamma h}B(x_{k+1})+ \frac{h\beta _{b}}{1+\gamma h}B(x_{k+1}) \\& \quad = x_{k}+ \frac{1}{1+\gamma h}(x_{k}-x_{k-1})+ \frac{h\beta _{b}}{1+\gamma h}B(x_{k}) \\& \qquad {}-\frac{h\beta _{f}}{1+h\gamma }\bigl( \nabla f(x_{k})-\nabla f(x_{k-1})\bigr)- \frac{h^{2}}{1+h\gamma }\nabla f(x_{k}). \end{aligned}$$

(4.2)

Set $s:=\frac{h}{1+\gamma h}$ and $\alpha :=\frac{1}{1+\gamma h}$. So we have

$$ x_{k+1}+s\mathcal{B}_{h}(x_{k+1})=y_{k}, $$

(4.3)

where $\mathcal{B}_{h}=(h+\beta _{b})B$, and

$$ y_{k}= x_{k}+\alpha (x_{k}-x_{k-1})+ s\beta _{b}B(x_{k})-s(h+\beta _{f}) \nabla f(x_{k})+s\beta _{f}\nabla f(x_{k-1}). $$

(4.4)

From (4.3) we get

$$ x_{k+1}=(\operatorname{Id}+s\mathcal{B}_{h})^{-1}(y_{k}). $$

(4.5)

By combining (4.4) and (4.5), we obtain the following algorithm, called (DINAAM). It is a splitting algorithm which involves the operators ∇f and B separately.

$$ \textstyle\begin{array}{|l|} \hline \mbox{ (DINAAM):} \\ \hline \mbox{ Initialize: $x_{0}\in \mathcal{H}$, $x_{1}\in \mathcal{H}$} \\ \mbox{ $h>0$,} \\ \mbox{ $\alpha =\dfrac{1}{1+\gamma h}$, } \\ \mbox{ $s=\dfrac{h}{1+\gamma h}$, } \\ \mbox{ $y_{k}= x_{k}+\alpha (x_{k}-x_{k-1})+ s\beta _{b}B(x_{k})-s(h+\beta _{f})\nabla f(x_{k})+s\beta _{f}\nabla f(x_{k-1})$, } \\ \mbox{ $x_{k+1}=(I+s\mathcal{B}_{h})^{-1}(y_{k})$. } \\ \hline \end{array}$$

(4.6)

4.2 Numerical experiments for the continuous dynamics (DINAM)

A general method to generate monotone cocoercive operators which are not gradients of convex functions is to start from a linear skew symmetric operator A and then take its Yosida approximation $A_{\lambda }$. As a model situation, take $\mathcal{H}= \mathbb{R}^{2}$ and start from A equal to the rotation of angle $\frac{\pi }{2}$. We have $A = (\begin{array}{c} 0 & - 1 \\ 1 & 0 \end{array})$ . An elementary computation gives that, for any $\lambda >0$, $A_{λ} = \frac{1}{1 + λ^{2}} (\begin{array}{c} λ & - 1 \\ 1 & λ \end{array})$ , which is therefore λ-cocoercive. As a consequence, for $\lambda =1$, we obtain that the matrix $B = (\begin{array}{c} 1 & - 1 \\ 1 & 1 \end{array})$ is $\frac{1}{2}$-cocoercive. With these basic blocks, one can easily construct many other cocoercive operators which are not potential operators. For that, use Lemma A.1 which gives that the sum of two cocoercive operators is still cocoercive, and therefore the set of cocoercive operators is a convex cone.

Example 4.1

Let us start this section by a simple illustrative example in $\mathbb{R}^{2}$. We take $\mathcal{H}= \mathbb{R}^{2}$ equipped with the usual Euclidean structure. Let us consider B as a linear operator whose matrix in the canonical basis of $\mathbb{R}^{2}$ is defined by $B=A_{\lambda }$ for $\lambda =5$. According to the above remark, we can check that B is λ-cocoercive with $\lambda =5$ and that B is a nonpotential operator. To observe the classical oscillations, in the heavy ball with friction, we take $f: \mathbb{R}^{2} \to \mathbb{R}$ defined by

$$ f(x_{1},x_{2})=50x_{2}^{2}. $$

We set $\gamma =0.9$. It is clear that f is convex but not strongly convex. We study three cases: (1) $\beta _{b}=1$, $\beta _{f}=0.5$, (2) $\beta _{b}=0.5$, $\beta _{f}=1$, and (3) $\beta _{b}=\beta _{f}=0.5$. As a straight application of Theorem 3.1, we obtain that the trajectory $x(t)$ generated by (DINAM) converges to $x_{\infty }$, where $x_{\infty }\in S=(B+\nabla f)^{-1}(0)=\{ 0\}$. The trajectory obtained by using Matlab is depicted in Fig. 1, where we represent the components $x_{1}(t)$ and $x_{2}(t)$ in red and blue respectively.

Now we study the behavior of the trajectories by considering more different values of $\beta _{b}$ and $\beta _{f}$. We study four cases in Fig. 2. The plots of the second variable of the solutions have been depicted in Fig. 2(a), while in Fig. 2(b) the number of iterations k versus $\| B(x_{k})+\nabla f(x_{k})\|$ is plotted. Through Figs. 1 and 2, we can conclude that by introducing the Hessian damping ($\beta _{f}>0$) the oscillations of the trajectories in Fig. 2 are attenuated. The oscillations of the solutions appear whenever $\beta _{f}$ goes to 0.

Example 4.2

Now we are looking at another higher dimensional example. Let us consider $f: \mathbb{R}^{n}\to \mathbb{R}$ given by $f(x)=\frac{1}{2}\| Mx-b\|^{2}$, where $M\in \mathbb{R}^{m\times n}$ and $b\in \mathbb{R}^{m}$. We have

$$ \nabla f(x)=M^{\top }(Mx-b),\qquad \nabla ^{2} f(x)= M^{\top }M. $$

Since $M^{\top }M$ is positive semidefinite for any matrix M, the quadratic function f is convex. Furthermore, if M has full column rank, i.e., $\operatorname{rank} (M)=n$, then $M^{\top }M$ is positive definite. Therefore f is strongly convex. Take

$$ B= \begin{pmatrix} 1 &-1& 0&\cdots &0 \\ 1 &1 &0 &\cdots &\vdots \\ 0&0& 1 & \cdots & \vdots \\ \vdots &\vdots & &\ddots &\vdots \\ 0 & \ldots & \ldots & \ldots &1 \end{pmatrix} \in \mathbb{R}^{n\times n}. $$

Then B is cocoercive. Indeed, for any $x,y\in \mathbb{R}^{n}$,

$$\begin{aligned} \langle Bx-By,x-y\rangle &= \Vert x_{1}-y_{1} \Vert ^{2}+ \Vert x_{2}-y_{2} \Vert ^{2}+ \cdots + \Vert x_{n}-y_{n} \Vert ^{2} \\ &\ge \frac{1}{2} \bigl[ 2\bigl( \Vert x_{1}-y_{1} \Vert ^{2}+ \Vert x_{2}-y_{2} \Vert ^{2}\bigr)+ \Vert x_{3}-y_{3} \Vert ^{2}+\cdots + \Vert x_{n}-y_{n} \Vert ^{2} \bigr] \\ &=\frac{1}{2} \Vert Bx-By \Vert ^{2}. \end{aligned}$$

If the matrix M has not full column rank with $M^{\top }M+B$ nonsingular, then

$$ B(x)+\nabla f(x)=0\quad \text{if and only if}\quad x=\bigl(M^{\top }M+B \bigr)^{-1}M^{\top }b. $$

In our experiment, we pick M a random $10\times 100$ matrix which has not full column rank. Set $\gamma =3$, $\beta _{b}=1$, $\beta _{f}=1$ and the operator B as presented above. Thanks to Corollary 3.1, we conclude that the trajectory $x(t)$ generated by the system (DINAM) converges to $x_{\infty }=(M^{\top }M+B)^{-1}M^{\top }b$. Implementing the algorithm (DINAAM) in Matlab, we obtain the plot of k versus the norm of $B(x_{k})+\nabla f(x_{k})$. Similarly, we study several cases by changing the parameters $\beta _{b}$, $\beta _{f}$. This is depicted in Fig. 3.

Before ending this part, we discuss an application of our model to dynamical games.

The following example is taken from Attouch and Maingé [10] and adapted to our context.

Example 4.3

We make the following standing assumptions:

(i)
$\mathcal{H}=\mathcal{X}_{1}\times \mathcal{X}_{2}$ is the Cartesian product of two Hilbert spaces equipped with norms $\| \cdot \|_{\mathcal{X}_{1}}$ and $\| \cdot \|_{\mathcal{X}_{2}}$ respectively. In which, $x=(x_{1},x_{2})$, with $x_{1}\in \mathcal{X}_{1}$ and $x_{2}\in \mathcal{X}_{2}$, stands for an element in $\mathcal{H}$;
(ii)
$f: \mathcal{X}_{1}\times \mathcal{X}_{2} \to \mathbb{R}$ is a convex function whose gradient is Lipschitz continuous on bounded sets;
(iii)
$B=(\nabla _{x_{1}}\mathcal{L},-\nabla _{x_{2}}\mathcal{L})$ is the maximally monotone operator which is attached to a smooth convex-concave function $\mathcal{L}: \mathcal{X}_{1}\times \mathcal{X}_{2}\to \mathbb{R}$. The operator B is assumed to be λ-cocoercive with $\lambda >0$.

In our setting, with $x(t)=(x_{1}(t),x_{2}(t))$ the system (DINAM) is written

$$ \textstyle\begin{cases} \ddot{x}_{1}(t) + \gamma \dot{x}_{1}(t)+\nabla _{x_{1}} f(x_{1}(t),x_{2}(t))+\nabla _{x_{1}} \mathcal{L}(x_{1}(t),x_{2}(t)) \\ \quad {} + \beta _{f}\frac{d}{dt} ( \nabla _{x_{1}} f(x_{1}(t),x_{2}(t)) ) +\beta _{b}\frac{d}{dt} ( \nabla _{x_{1}}\mathcal{L}(x_{1}(t),x_{2}(t)) )=0, \\ \ddot{x}_{2}(t) + \gamma \dot{x}_{2}(t)+\nabla _{x_{2}} f(x_{1}(t),x_{2}(t))-\nabla _{x_{2}} \mathcal{L}(x_{1}(t),x_{2}(t)) \\ \quad + \beta _{f}\frac{d}{dt} ( \nabla _{x_{2}} f(x_{1}(t),x_{2}(t)) ) -\beta _{b}\frac{d}{dt} ( \nabla _{x_{2}}\mathcal{L}(x_{1}(t),x_{2}(t)) )=0. \end{cases} $$

(4.7)

According to Theorem 3.1, $x(t) \rightharpoonup x_{\infty }=(x_{1,\infty },x_{2,\infty })$ weakly in $\mathcal{H}$, where $(x_{1,\infty },x_{2,\infty })$ is solution of

$$ \textstyle\begin{cases} \nabla _{x_{1}} f(x_{1}(t),x_{2}(t))+\nabla _{x_{1}}\mathcal{L}(x_{1}(t),x_{2}(t))=0, \\ \nabla _{x_{2}} f(x_{1}(t),x_{2}(t))-\nabla _{x_{2}}\mathcal{L}(x_{1}(t),x_{2}(t))=0. \end{cases} $$

(4.8)

Structured systems such as (4.8) contain both potential and nonpotential terms which are often present in decision sciences and physics. In game theory, (4.8) describes Nash equilibria of the normal form game with two players 1, 2 whose static loss functions are respectively given by

$$ \textstyle\begin{cases} F_{1}: (x_{1},x_{2})\in \mathcal{X}_{1}\times \mathcal{X}_{2} \to F_{1}(x_{1},x_{2})=f(x_{1},x_{2})+ \mathcal{L}(x_{1},x_{2}), \\ F_{2}: (x_{1},x_{2})\in \mathcal{X}_{1}\times \mathcal{X}_{2} \to F_{2}(x_{1},x_{2})=f(x_{1},x_{2})- \mathcal{L}(x_{1},x_{2}). \end{cases} $$

(4.9)

$f(\cdot ,\cdot )$ is their joint convex payoff, and $\mathcal{L}$ is a convex-concave payoff with zero-sum rule. For more details, we refer the reader to [10]. As an example, take $\mathcal{X}_{1}=\mathcal{X}_{2}=\mathbb{R}$ and $\mathcal{L}: \mathbb{R}^{2}\to \mathbb{R}$ given by $\mathcal{L}(x)=\frac{1}{2}(x_{1}^{2}-2x_{1}x_{2}-x_{2}^{2})$. Then $B = (\nabla_{x_{1}} L, - \nabla_{x_{2}} L) = (\begin{array}{c} 1 & - 1 \\ 1 & 1 \end{array})$ . Pick $f(x)=\frac{1}{2}(3x_{1}^{2}-2x_{1}x_{2}+x_{2}^{2})-x_{1}-2x_{2}$. The Nash equilibria described in (4.8) can be solved by using (DINAM). Take $\gamma =3$, $\beta _{b}=0.5$, $\beta _{f}=0.5$ and $x_{0}=(1,-1)$, $\dot{x}_{0}=(-10,10)$ as initial conditions, then the numerical solution for (DINAM) converges to $x_{\infty }=(\frac{3}{4},1)$ which is the solution of (4.8) as well. The numerical trajectories and phase portrait of our model applied to dynamical games are depicted in Fig. 4.

5 The nonsmooth case

The equivalence obtained in Proposition 2.1 between (DINAM) and a first-order evolution system in time and space allows a natural extension of both our theoretical and numerical results to the case of a convex, lower semicontinuous and proper function $f:\mathcal{H}\to \mathbb{R}\cup \{+\infty \}$. It suffices to replace the gradient of f with the convex subdifferential ∂f. We recall that the subdifferential of f at $x\in \mathcal{H}$ is defined by

$$ \partial f(x)=\bigl\{ z\in \mathcal{H}: \langle z,\xi -x\rangle \le f( \xi )-f(x) \text{ for every } \xi \in \mathcal{H}\bigr\} , $$

and the domain of f is equal to $\operatorname{dom}f= \{ x\in \mathcal{H}: f(x) < +\infty \}$. This leads to consider the system

$$ \textstyle\begin{cases} \dot{x}(t) + \beta _{f} \partial f(x(t)) + \beta _{b} B(x(t)) + ( \gamma - \frac{1}{\beta _{f}} ) x(t) + y(t) \ni 0; \\ \dot{y}(t) - (1- \frac{\beta _{b}}{\beta _{f}} ) B(x(t)) + \frac{1}{\beta _{f}} ( \gamma - \frac{1}{\beta _{f}} ) x(t) +\frac{1}{\beta _{f}} y(t) = 0. \end{cases} $$

(g-DINAM)

The prefix g in front of (DINAM) stands for generalized. Note that the first equation of (g-DINAM) is now a differential inclusion, because of the possibility for $\partial f(x(t))$ to be multivalued. By taking $f= f_{0} + \delta _{C}$, where $\delta _{C}$ is the indicator function of a constraint set C, the system (g-DINAM) allows to model damped inelastic shocks in mechanics and decision sciences, see [11]. The original aspect comes from the fact that (g-DINAM) now involves both potential driven forces (attached to $f_{0}$) and nonpotential driven forces (attached to B). As we will see, taking into account shocks created by nonpotential driving forces is a source of difficulties.

Let us first establish the existence and uniqueness of the solution trajectory of the Cauchy problem.

Theorem 5.1

Let $f:\mathcal{H}\to \mathbb{R}\cup \{+\infty \}$ be a convex, lower semicontinuous, and proper function. Suppose that $\beta _{f}>0 $ and $\beta _{b}\geq 0$. Then, for any $(x_{0}, y_{0}) \in \operatorname{dom}f \times \mathcal{H}$, there exists a unique strong global solution $(x,y):[0, +\infty [ \, \to \mathcal{H}\times \mathcal{H}$ of (g-DINAM) which satisfies the Cauchy data $x(0) =x_{0}$, $y(0) =y_{0}$.

Proof

The proof is parallel to that of Theorem 2.1. The system (g-DINAM) can be equivalently written as

$$ \dot{Z}(t) + \partial \Phi \bigl( Z(t)\bigr) + G\bigl(Z(t) \bigr)\ni 0, \qquad Z(0) = (x_{0},y_{0}), $$

(5.1)

where $Z:= (x,y)$, and the function $\Phi (Z)= \Phi (x,y) := \beta _{f} f(x) $ is now convex lower semicontinuous and proper on $\mathcal{H}\times \mathcal{H}$. The operator G is unchanged and is globally Lipschitz continuous. The above equation falls under the setting of the Lipschitz perturbation of an evolution system governed by the subdifferential of a convex lower semicontinuous and proper function. The existence and uniqueness of the strong solution to (5.1) follows from Brézis [21, Proposition 3.12] and the fact that $(x_{0}, y_{0})\in \operatorname{dom}\Phi $. Recall that strong solution means that $x(\cdot )$ and $y(\cdot )$ are locally absolutely continuous functions whose distributional derivatives ẋ and ẏ belong to $L^{2} (0,T, \mathcal{H})$ for any $T>0$. □

Remark 5.1

As a consequence of the general theory developed above, the system (g-DINAM) satisfies a regularization effect on the initial condition. Precisely given $(x_{0}, y_{0}) \in \overline{\operatorname{dom}f} \times \mathcal{H}$, there still exists a unique strong solution to the corresponding Cauchy problem, but now with $\sqrt{t}\dot{x}(t) \in L^{2} (0,T, \mathcal{H})$ and $\sqrt{t}\dot{y}(t) \in L^{2} (0,T, \mathcal{H})$ for any $T>0$.

The solution set S is now defined by

$$ S:=\bigl\{ p\in \mathcal{H}: \partial f(p)+B(p)\ni 0\bigr\} . $$

Before stating our main result, notice that $B(p)$ is uniquely defined for $p\in S$.

Lemma 5.1

$B(p)$ is uniquely defined for $p\in S$, i.e.,

$$ p_{1}\in S,\quad p_{2} \in S \quad \Longrightarrow\quad B(p_{1})= B(p_{2}). $$

Proof

The proof is similar to that of Lemma 3.1. It is based on the monotonicity of the subdifferential of f and the cocoercivity of the operator B. □

For the sake of simplicity, we give a detailed proof of the convergence analysis in the case $\beta _{f}=\beta _{b}=\beta >0$. The system (g-DINAM) takes the simpler form:

$$ \textstyle\begin{cases} \dot{x}(t) + \beta \partial f(x(t)) + \beta B(x(t)) + ( \gamma - \frac{1}{\beta } ) x(t) + y(t) \ni 0; \\ \dot{y}(t) + \frac{1}{\beta } ( \gamma - \frac{1}{\beta } ) x(t) +\frac{1}{\beta } y(t) = 0. \end{cases} $$

(g-DINAM)

To formulate the convergence results and the corresponding estimates, we write the first equation of (g-DINAM) as follows:

$$ \dot{x}(t) + \beta \xi (t) + \beta B\bigl(x(t)\bigr) + \biggl( \gamma - \frac{1}{\beta } \biggr) x(t) + y(t) = 0, $$

(5.2)

where $\xi (t) \in \partial f(x(t))$, and we set $A(x(t))= \xi (t) + B(x(t))$.

Theorem 5.2

Let $B: \mathcal{H} \to \mathcal{H}$ be a λ-cocoercive operator. Let $f:\mathcal{H}\to \mathbb{R}\cup \{+\infty \}$ be a convex, lower semicontinuous, proper function. Suppose that $S= \{p\in \mathcal{H}: 0\in \partial f(p)+B(p) \}\neq \emptyset $. Consider the evolution equation (g-DINAM) where the parameters satisfy the conditions: $\beta _{f}=\beta _{b}=\beta >0$ and

$$ \gamma >0,\qquad \beta >0\quad \textit{and}\quad \lambda \gamma >\beta + \frac{1}{\gamma }. $$

(5.3)

Then, for any solution trajectory $x:[0,+\infty [\,\to \mathcal{H}$ of (g-DINAM), the following properties are satisfied:

(i)
(integral estimates) Set $A (x(t)):=\xi (t) +B(x(t))$ with $\xi (t) \in \partial f(x(t))$ as defined in (5.2) and $p\in S$. Then
$$\begin{aligned}& \int _{0}^{+\infty } \bigl\Vert \dot{x}(t) \bigr\Vert ^{2}\,dt< +\infty ,\qquad \int _{0}^{+ \infty } \bigl\Vert B\bigl(x(t) \bigr)-B(p) \bigr\Vert ^{2}\,dt< +\infty , \\& \int _{0}^{+\infty } \bigl\Vert A \bigl(x(t)\bigr) \bigr\Vert ^{2}\,dt< +\infty ,\qquad \int _{0}^{\infty } \bigl\langle A\bigl(x(t)\bigr), x(t)-p \bigr\rangle \,dt < +\infty . \end{aligned}$$
(ii)
(convergence) For any $p\in S$,
1. 1.
  $\lim_{t\to +\infty }\|x(t)-p\|$ exists.
2. 2.
  $\lim_{t\to +\infty }\|B(x(t))-B(p)\|=0$, where $B(p)$ is uniquely defined for $p\in S$.

Proof

Let us adapt the Lyapunov analysis developed in the previous sections to the case where f is nonsmooth. We have to pay attention to the following points. First, we must invoke the (generalized) chain rule for derivatives over curves (see [21, Lemma 3.3]), that is, for a.e $t\geq 0$,

$$ \frac{d}{dt} f\bigl(x(t)\bigr)= \bigl\langle \xi (t), \dot{x} (t) \bigr\rangle . $$

The second ingredient is the validity of the subdifferential inequality for convex functions.

As a Lyapunov function, let us consider the function $t\in [0, +\infty [\, \mapsto \mathcal{E}_{p}(t) \in \mathbb{R}_{+}$ defined by

$$ \mathcal{E}_{p}(t) :=\frac{1}{2} \bigl\Vert x(t)-p+ c \bigl(\dot{x}(t)+ \beta A\bigl(x(t)\bigr) \bigr) \bigr\Vert ^{2} + \frac{\delta }{2} \bigl\Vert x(t)-p \bigr\Vert ^{2} +\bigl[c \delta \beta +c^{2}\bigr]\Gamma (t), $$

(5.4)

where we recall that $A (x(t)):=\xi (t) +B(x(t))$ with $\xi (t) \in \partial f(x(t))$ as defined in (5.2) and $p\in S$. To differentiate $\mathcal{E}_{p}(t)$, we use the formulation (g-DINAM)

$$ \dot{x}(t)+ \beta A\bigl(x (t)\bigr) = - \biggl( \gamma - \frac{1}{\beta } \biggr) x(t) - y(t). $$

Since x and y are locally absolutely continuous functions, this allows us to differentiate $\dot{x}(t)+ \beta A(x (t)) $ and obtain similar formulas as in the smooth case. Then a close examination of the Lyapunov analysis shows that we can obtain the additional estimate

$$ \int _{0}^{\infty } \bigl\langle A\bigl(x(t)\bigr) , x(t)-p \bigr\rangle \,dt < + \infty . $$

(5.5)

Set $0\in \partial f(p) + B(p)$. To obtain (5.5), we return to (3.6) and consider the following minorization that we split into a sum with coefficients $\epsilon '$ and $1-\epsilon '$ (where $\epsilon ' >0$ will be taken small enough). According to the monotonicity of ∂f and the definition of $A (x(t))$, we have

$$\begin{aligned} c\bigl\langle A\bigl(x(t)\bigr), x(t)-p\bigr\rangle =& c\epsilon '\bigl\langle A\bigl(x(t)\bigr), x(t)-p \bigr\rangle + c\bigl(1- \epsilon '\bigr) \bigl\langle A\bigl(x(t)\bigr) -Ap, x(t)-p\bigr\rangle \\ \geq & c\epsilon '\bigl\langle A\bigl(x(t)\bigr), x(t)-p\bigr\rangle + c\bigl(1-\epsilon '\bigr) \bigl\langle B\bigl(x(t) \bigr)-B(p), x(t)-p\bigr\rangle \\ \geq & c\epsilon '\bigl\langle A\bigl(x(t)\bigr), x(t)-p\bigr\rangle + c\bigl(1-\epsilon '\bigr) \lambda \bigl\Vert B \bigl(x(t)\bigr)-B(p) \bigr\Vert ^{2}. \end{aligned}$$

(5.6)

So the proof continues with λ replaced with $(1-\epsilon ')\lambda $. This does not change the conditions on the parameters since in our assumptions the inequality $\lambda \gamma >\beta +\frac{1}{\gamma }$ is strict, it is still satisfied by $(1-\epsilon ') \lambda $ when $\epsilon '$ is taken small enough. So, after integrating the resulting strict Lyapunov inequality, we obtain the supplementary property (5.5). Until (3.22) the proof is essentially the same as in the case of a smooth function f. We obtain the integral estimates

$$\begin{aligned}& \int _{0}^{+\infty } \bigl\Vert \dot{x}(t) \bigr\Vert ^{2}\,dt< +\infty ,\qquad \int _{0}^{+\infty } \bigl\Vert B\bigl(x(t) \bigr)-B(p) \bigr\Vert ^{2}\,dt< +\infty , \\& \int _{0}^{+\infty } \bigl\Vert A \bigl(x(t)\bigr) \bigr\Vert ^{2} \,dt< + \infty . \end{aligned}$$

But then, we can no longer invoke the Lipschitz continuity on the bounded sets of ∇f. To overcome this difficulty, we modify the end of the proof as follows. Recall that given $p\in S $, the anchor function is defined by, for every $t \in [0,+\infty [$,

$$ q_{p}(t):=\frac{1 }{2} \bigl\Vert x(t)-p \bigr\Vert ^{2}, $$

and that we need to prove that the limit of the anchor functions exists, as $t \to +\infty $. The idea is to play on the fact that we have in hand a whole collection of Lyapunov functions, parametrized by the coefficient c. Recall that we have obtained that the limit of $\mathcal{E}_{p}(t)$ exists as $t\to +\infty $, and this is satisfied for the whole interval of values of c. So, for such c, the limit of $W_{c} (t):=\frac{1}{c\delta \beta +c^{2}} \mathcal{E}_{p}(t)$ as $t\to +\infty $ exists, where

$$ W_{c} (t)= \frac{1}{2(c\delta \beta +c^{2})}\bigl\| x(t)-p+ c (\dot{x}(t)+ \beta A(x(t)) )\bigr\| ^{2} +\frac{\delta }{2(c\delta \beta +c^{2})}\bigl\| x(t)-p \bigr\| ^{2}+ \Gamma (t) . $$

Take two such values of c, let $c_{1}$ and $c_{2}$, and make the difference (recall that $\delta = c\gamma -1$). We obtain

$$ W_{c_{1}} (t) -W_{c_{2}} (t)= \biggl[ \frac{1}{(c_{1}\gamma -1)\beta + c_{1}} - \frac{1}{(c_{2}\gamma -1)\beta + c_{2}} \biggr] W(t), $$

where

$$ W(t):= \frac{\gamma }{2} \bigl\Vert x(t)-p \bigr\Vert ^{2} + \frac{\beta }{2(\gamma \beta +1)} \bigl\Vert \dot{x}(t)+ \beta A\bigl(x(t)\bigr) \bigr\Vert ^{2} + \bigl\langle \dot{x}(t)+ \beta A\bigl(x(t)\bigr) , x(t)-p \bigr\rangle . $$

So, we obtain the existence of the limit as $t\to +\infty $ of $W(t)$. Then note that $W(t)= \gamma q_{p}(t) + \frac{d}{dt}w(t) $ where

$$ w(t) := q_{p}(t) + \beta \int _{0}^{t} \bigl\langle A\big(x(s)\big) , x(s)-p \bigr\rangle \,ds + \frac{\beta }{2(\gamma \beta +1)} \int _{0}^{t} \bigl\Vert \dot{x}(s)+ \beta A \bigl(x(s)\bigr) \bigr\Vert ^{2}\,ds . $$

Reformulate $W(t)$ in terms of $w(t)$ as follows:

$$\begin{aligned} W(t) =& \gamma w(t) + \frac{d}{dt}w(t) \\ &{}- \biggl( \gamma \beta \int _{0}^{t} \bigl\langle A\big(x(s)\big) , x(s)-p \bigr\rangle \,ds + \frac{\gamma \beta }{2(\gamma \beta +1)} \int _{0}^{t} \bigl\Vert \dot{x}(s)+ A \bigl(x(s)\bigr) \bigr\Vert ^{2}\,ds \biggr). \end{aligned}$$

As a consequence of (5.5) and of the previous estimates, we have that the limit of the two above integrals exists as $t \to +\infty $. Therefore, according to the convergence of $W(t)$, we obtain that

$$ \lim_{t \to +\infty } \biggl(\gamma w(t) + \frac{d}{dt}w(t) \biggr) \quad \text{exists}. $$

The existence of the limit of w follows from a classical general result concerning the convergence of evolution equations governed by strongly monotone operators (here γId, see Theorem 3.9, p. 88 in [21]). In turn, using the same argument as above, we obtain that, for all $p\in S$,

$$ \lim_{t\to +\infty } \bigl\Vert x(t)-p \bigr\Vert \quad \text{exists}. $$

As in the smooth case, the strong convergence of $B(x(t))$ to $B(p)$ is a direct consequence of the integral estimates $\int _{0}^{+\infty }\|B(x(t))-B(p)\|^{2}\,dt<+\infty $, $\int _{0}^{+\infty }\|\dot{x}(t)\|^{2}\,dt<+\infty $ and of the fact that B is Lipschitz continuous. The proof of Theorem 3.1 is thereby completed. □

Remark 5.2

(i)
A natural question is to know if the weak limit of the trajectory exists. Indeed we are not far from this result since $\int _{0}^{+\infty }\|A(x(t))\|^{2}\,dt<+\infty $, which implies that $A(x(t))$ converges strongly to zero in an “essential” way. According to Opial’s lemma, this allows to complete the convergence proof as in the smooth case. This is a seemingly difficult question to examine in the future.
(ii)
A particular situation is the case $\gamma =\frac{1}{\beta }$, in which case the system (g-DINAM) can be written in an equivalent way
$$ \dot{u}(t) + \gamma u(t)=0, $$
where
$$ \dot{x}(t) + \beta A\bigl(x(t)\bigr)\ni u(t). $$
The convergence of the trajectory $t\mapsto x(t)$ is then a consequence of the convergence of the semigroup generated by the sum of a cocoercive operator with the subdifferential of a convex lower semicontinuous and proper function, see Abbas and Attouch [1]. Note that in this case the condition for the convergence of the trajectories generated by (g-DINAM) does not depend any more on the cocoercivity parameter λ.

6 Conclusion, perspectives

In this paper, in a general real Hilbert space setting, we investigated a dynamic inertial Newton method for solving additively structured monotone problems. The dynamic is driven by the sum of two monotone operators with distinct properties: the potential component is the gradient of a continuously differentiable convex function f, and the nonpotential one is a monotone and cocoercive operator B. The geometric damping is controlled by the Hessian of the potential f and by a Newton-type correction term attached to B. The well-posedness of the Cauchy problem is shown as well as the asymptotic convergence properties of the trajectories generated by the continuous dynamic. The convergence analysis is carried out through the parameters $\beta _{f}$ and $\beta _{b}$ attached to the geometric dampings as well as the parameters γ and λ (the viscous damping and the coefficient of cocoercivity respectively). The introduction of geometric damping makes it possible to control and attenuate the oscillations known for viscous damping of inertial systems, giving rise to faster numerical methods. It would be interesting to extend the analysis for both the continuous dynamic and its discretization to the case of an asymptotic vanishing damping $\gamma (t)=\frac{\alpha }{t}$, with $\alpha >0$ as in [28]. This is a decisive step towards proposing faster algorithms for solving structured monotone inclusions, which are connected to the accelerated gradient method of Nesterov. The study of the corresponding splitting methods is also an important topic which needs further investigations. In fact, by replacing ∇f with a general maximally monotone operator A, the resolvent of which can be easily computed, it would be interesting to study a forward-backward inertial algorithm with Hessian-driven damping for solving structured monotone inclusions of the form: $Ax+Bx\ni 0$. These are interesting topics for future research.

Notes

i.e. B is not supposed to be equal to the gradient of a given function.

References

Abbas, B., Attouch, H.: Dynamical systems and forward–backward algorithms associated with the sum of a convex subdifferential and a monotone cocoercive operator. Optimization 64(10), 2223–2252 (2015)
Article MathSciNet Google Scholar
Abbas, B., Attouch, H., Svaiter, B.F.: Newton-like dynamics and forward–backward methods for structured monotone inclusions in Hilbert spaces. J. Optim. Theory Appl. 161(2), 331–360 (2014)
Article MathSciNet Google Scholar
Adly, S., Attouch, H.: Finite convergence of proximal-gradient inertial algorithms combining dry friction with Hessian-driven damping. SIAM J. Optim. 30(3), 2134–2162 (2020)
Article MathSciNet Google Scholar
Alecsa, C.D., László, S., Pinta, T.: An extension of the second order dynamical system that models Nesterov’s convex gradient method. Appl. Math. Optim. 84, 1687–1716 (2021)
Article MathSciNet Google Scholar
Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9(1–2), 3–11 (2001)
Article MathSciNet Google Scholar
Alvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. J. Math. Pures Appl. 81(8), 747–779 (2002)
Article MathSciNet Google Scholar
Attouch, H., Chbani, Z., Fadili, J., Riahi, H.: First-order algorithms via inertial systems with Hessian driven damping. Math. Program. (2020). https://doi.org/10.1007/s10107-020-01591-1
Article Google Scholar
Attouch, H., László, S.C.: Continuous Newton-like inertial dynamics for monotone inclusions. Set-Valued Var. Anal. (2020). https://doi.org/10.1007/s11228-020-00564-y
Article MATH Google Scholar
Attouch, H., László, S.C.: Newton-like inertial dynamics and proximal algorithms governed by maximally monotone operators. SIAM J. Optim. 30(4), 3252–3283 (2020)
Article MathSciNet Google Scholar
Attouch, H., Maingé, P.E.: Asymptotic behavior of second order dissipative evolution equations combining potential with nonpotential effects. ESAIM Control Optim. Calc. Var. 17(3), 836–857 (2011)
Article MathSciNet Google Scholar
Attouch, H., Maingé, P.E., Redont, P.: A second-order differential system with Hessian-driven damping; application to nonelastic shock laws. Differ. Equ. Appl. 4(1), 27–65 (2012)
MathSciNet MATH Google Scholar
Attouch, H., Marques Alves, M., Svaiter, B.F.: A dynamic approach to a proximal-Newton method for monotone inclusions in Hilbert spaces, with complexity $\mathcal{O}(1/n^{2})$. J. Convex Anal. 23(1), 139–180 (2016)
MathSciNet MATH Google Scholar
Attouch, H., Peypouquet, J.: Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators. Math. Program. 174(1–2), 391–432 (2019)
Article MathSciNet Google Scholar
Attouch, H., Peypouquet, J., Redont, P.: Fast convex minimization via inertial dynamics with Hessian driven damping. J. Differ. Equ. 261(10), 5734–5783 (2016)
Article Google Scholar
Attouch, H., Redont, P., Svaiter, B.F.: Global convergence of a closed-loop regularized Newton method for solving monotone inclusions in Hilbert spaces. J. Optim. Theory Appl. 157(3), 624–650 (2013)
Article MathSciNet Google Scholar
Attouch, H., Svaiter, B.F.: A continuous dynamical Newton-like approach to solving monotone inclusions. SIAM J. Control Optim. 49(2), 574–598 (2011)
Article MathSciNet Google Scholar
Baillon, J.-B., Haddad, G.: Quelques propriétés des opérateurs angles-bornés et n-cycliquement monotones. Isr. J. Math. 26, 137–150 (1977)
Article Google Scholar
Bauschke, H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer, Berlin (2011)
Book Google Scholar
Boţ, R.I., Csetnek, E.R.: Second order forward–backward dynamical systems for monotone inclusion problems. SIAM J. Control Optim. 54, 1423–1443 (2016)
Article MathSciNet Google Scholar
Boţ, R.I., Csetnek, E.R., László, S.C.: Tikhonov regularization of a second order dynamical system with Hessian damping. Math. Program. (2020). https://doi.org/10.1007/s10107-020-01528-8
Article Google Scholar
Brézis, H.: Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution. Lecture Notes, vol. 5. North-Holland, Amsterdam (1972)
Google Scholar
Brézis, H.: Analyse fonctionnelle. Collection Mathématiques Appliquées pour le Maîtrise. Masson, Paris (1983)
MATH Google Scholar
Castera, C., Bolte, J., Févotte, C., Pauwels, E.: An inertial Newton algorithm for deep learning (2019). HAL-02140748
Kim, D.: Accelerated proximal point method for maximally monotone operators. Preprint (2020). arXiv:1905.05149v3
Lin, T., Jordan, M.I.: A control-theoretic perspective on optimal high-order optimization. Preprint. (2019). arXiv:1912.07168v1
Peypouquet, J., Sorin, S.: Evolution equations for maximal monotone operators: asymptotic analysis in continuous and discrete time. J. Convex Anal. 17(3–4), 1113–1163 (2010)
MathSciNet MATH Google Scholar
Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via high-resolution differential equations. Preprint (2018). arXiv:1810.08907 [math.OC]
Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method. J. Mach. Learn. Res. 17, 1–43 (2016)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire XLIM, Université de Limoges, 123, avenue Albert Thomas, 87060, Limoges, France
Samir Adly & Van Nam Vo
IMAG, Université Montpellier, CNRS, Place Eugène Bataillon, 34095, Montpellier CEDEX 5, France
Hedy Attouch

Authors

Samir Adly
View author publications
You can also search for this author in PubMed Google Scholar
Hedy Attouch
View author publications
You can also search for this author in PubMed Google Scholar
Van Nam Vo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors read and approved the final manuscript.

Corresponding author

Correspondence to Samir Adly.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Appendix

1.1 A.1 Technical lemmas

Let us show that the sum of two cocoercive operators is still cocoercive. For further properties concerning cocoercive operators see [18].

Lemma A.1

Let $T_{1},T_{2}: \mathcal{H} \to \mathcal{H}$ be two cocoercive operators with respective cocoercivity coefficients $\lambda _{1},\lambda _{2}>0$. Then $T:=T_{1}+T_{2}: \mathcal{H} \to \mathcal{H}$ is λ-cocoercive with $\lambda = \frac{\lambda _{1}\lambda _{2}}{\lambda _{1}+\lambda _{2}}$.

Proof

According to the cocoercivity assumptions of $T_{1}$ and $T_{2}$, we have

$$\begin{aligned}& \langle T_{1}y-T_{1}x,y-x\rangle \ge \lambda _{1} \Vert T_{1}y-T_{1}x \Vert ^{2},\quad \forall x,y \in \mathcal{H}, \\& \langle T_{2}y-T_{2}x,y-x\rangle \ge \lambda _{2} \Vert T_{2}y-T_{2}x \Vert ^{2},\quad \forall x,y \in \mathcal{H}. \end{aligned}$$

Let us show that the sum $T=T_{1}+T_{2}$ is still cocoercive. Using elementary computation in Hilbert spaces, for all $x,y\in \mathcal{H}$, we have

$$\begin{aligned} \Vert Ty-Tx \Vert ^{2}&= \Vert T_{1}y-T_{1}x+T_{2}y-T_{2}x \Vert ^{2} \\ &= \Vert T_{1}y-T_{1}x \Vert ^{2}+ \Vert T_{2}y-T_{2}x \Vert ^{2}+2\langle T_{1}y-T_{1}x, T_{2}y-T_{2}x \rangle \\ &\le \Vert T_{1}y-T_{1}x \Vert ^{2}+ \Vert T_{2}y-T_{2}x \Vert ^{2} + \frac{\lambda _{1}}{\lambda _{2}} \Vert T_{1}y-T_{1}x \Vert ^{2}+ \frac{\lambda _{2}}{\lambda _{1}} \Vert T_{2}y-T_{2}x \Vert ^{2} \\ &= \bigl( \lambda _{1}^{-1}+\lambda _{2}^{-1} \bigr) \bigl( \lambda _{1} \Vert T_{1}y-T_{1}x \Vert ^{2} +\lambda _{2} \Vert T_{2}y-T_{2}x \Vert ^{2} \bigr). \end{aligned}$$

Since $T_{1}$, $T_{2}$ are cocoercive, we deduce that

$$\begin{aligned} \Vert Ty-Tx \Vert ^{2}&\le \bigl( \lambda _{1}^{-1}+ \lambda _{2}^{-1} \bigr) \bigl( \langle T_{1}y-T_{1}x,y-x\rangle +\langle T_{2}y-T_{2}x,y-x \rangle \bigr) \\ &= \bigl( \lambda _{1}^{-1}+\lambda _{2}^{-1} \bigr) \langle Ty-Tx,y-x \rangle . \end{aligned}$$

Equivalently,

$$ \langle Ty-Tx,y-x\rangle \ge \frac{\lambda _{1}\lambda _{2}}{\lambda _{1} +\lambda _{2}} \Vert Ty-Tx \Vert ^{2}, \quad \forall x,y \in \mathcal{H}. $$

So, T is still λ-cocoercive with $\lambda = \frac{\lambda _{1}\lambda _{2}}{\lambda _{1}+\lambda _{2}} >0$.

Let us show that this estimate is sharp. Take $T_{1}: \mathcal{H}\to \mathcal{H}$, $x\mapsto \lambda _{1}^{-1}x$ and $T_{2}: \mathcal{H}\to \mathcal{H}$, $x\mapsto \lambda _{2}^{-1}x$. It is easy to check that $T_{1}$, $T_{2}$ are two cocoercive operators with cocoercivity coefficients $\lambda _{1}$, $\lambda _{2}$ respectively. Then their sum operator is equal to $Tx= ( \lambda _{1}^{-1} + \lambda _{2}^{-1} ) x = \lambda ^{-1} x $ with $\lambda = \frac{\lambda _{1}\lambda _{2}}{\lambda _{1}+\lambda _{2}} $, and hence is λ cocoercive. This shows that we cannot obtain a better estimate. □

The next lemma is a classical result in integration theory.

Lemma A.2

Let $1\leq p<\infty $ and $1\leq r\leq \infty $. Suppose that $u\in L^{p}([0,\infty [; \mathbb{R})$ is a locally absolutely continuous nonnegative function, $g\in L^{r}([0,\infty [; \mathbb{R})$ and

$$ \dot{u}(t)\leq g(t) $$

for almost every $t>0$. Then $\lim_{t\to \infty }u(t)=0$.

In the proof of Theorem 3.1, we use the following elementary result concerning positive quadratic forms.

Lemma A.3

Let a, b, c be three real numbers. The quadratic form $q: \mathcal{H}\times \mathcal{H}\to \mathbb{R}$

$$ q(X,Y):= a \Vert X \Vert ^{2}+ 2b\langle X,Y\rangle +c \Vert Y \Vert ^{2} $$

is positive definite if and only if $ac -b^{2} > 0$ and $a >0 $. Moreover,

$$ q(X,Y) \geq \mu \bigl( \Vert X \Vert ^{2} + \Vert Y \Vert ^{2} \bigr) \quad \textit{for all } X,Y \in \mathcal{H}, $$

where the positive real number $\mu := \frac{1}{2} ( a+c -\sqrt{(a-c)^{2} +4b^{2}} ) $ is the smallest eigenvalue of the positive symmetric matrix associated with q.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Adly, S., Attouch, H. & Vo, V.N. Asymptotic behavior of Newton-like inertial dynamics involving the sum of potential and nonpotential terms. Fixed Point Theory Algorithms Sci Eng 2021, 17 (2021). https://doi.org/10.1186/s13663-021-00702-7

Download citation

Received: 12 April 2021
Accepted: 03 September 2021
Published: 18 October 2021
DOI: https://doi.org/10.1186/s13663-021-00702-7

Asymptotic behavior of Newton-like inertial dynamics involving the sum of potential and nonpotential terms

Abstract

1 Introduction and preliminary results

1.1 Dynamical inertial Newton method for additively structured monotone problems

1.2 Historical aspects of the inertial systems with Hessian-driven damping

1.3 Inertial dynamics involving cocoercive operators

1.4 Link with Newton-like methods for solving monotone inclusions

1.5 Contents

2 Well-posedness of the Cauchy–Lipschitz problem

2.1 First-order in time and space equivalent formulation

Proposition 2.1

Proof

2.2 Well-posedness of the evolution equation (DINAM)

Theorem 2.1

Proof

Remark 2.1

3 Asymptotic convergence properties of (DINAM)

Lemma 3.1

Proof

3.1 General case

Theorem 3.1

Proof

Remark 3.1

3.2 Case \(\beta _{b} =\beta _{f}\)

Corollary 3.1

Remark 3.2

4 Numerical illustrations

4.1 From continuous dynamic to algorithms

4.2 Numerical experiments for the continuous dynamics (DINAM)

Example 4.1

Example 4.2

Example 4.3

5 The nonsmooth case

Theorem 5.1

Proof

Remark 5.1

Lemma 5.1

Proof

Theorem 5.2

Proof

Remark 5.2

6 Conclusion, perspectives

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Appendix

Appendix

1.1 A.1 Technical lemmas

Lemma A.1

Proof

Lemma A.2

Lemma A.3

Rights and permissions

About this article

Cite this article

Share this article

Keywords