Skip to main content

An adaptive splitting algorithm for the sum of two generalized monotone operators and one cocoercive operator

Abstract

Splitting algorithms for finding a zero of sum of operators often involve multiple steps which are referred to as forward or backward steps. Forward steps are the explicit use of the operators and backward steps involve the operators implicitly via their resolvents. In this paper, we study an adaptive splitting algorithm for finding a zero of the sum of three operators. We assume that two of the operators are generalized monotone and their resolvents are computable, while the other operator is cocoercive but its resolvent is missing or costly to compute. Our splitting algorithm adapts new parameters to the generalized monotonicity of the operators and, at the same time, combines appropriate forward and backward steps to guarantee convergence to a solution of the problem.

1 Introduction

Operator splitting algorithms are developed for structured optimization problems based on the idea of performing the computation separately on individual operators. At each iteration, they require multiple steps which are known as either forward or backward steps. The forward steps are almost always easy as they use the operator directly. The backward steps, on the other hand, are often more complicated as they use the resolvents of the operators. While there are many operators whose resolvents are readily computable, there exist operators whose resolvents may not be computable in closed form, thus, it is necessary to use the forward steps in certain situations. Notable examples of splitting algorithms include the forward-backward algorithm [15], the Douglas–Rachford algorithm [14, 15], and many others.

In this paper, we study an adaptive splitting algorithm for the inclusion problem

$$ \text{find $x\in X$ such that } 0\in Ax+Bx+Cx, $$
(1)

where X is a real Hilbert space, \(A,B\colon X\rightrightarrows X\) are generalized monotone operators, and \(C\colon X\to X\) is a cocoercive operator. It is worth mentioning that the problem of finding a zero of the sum of finitely many maximally monotone operators and a cocoercive operator can be written as a special instance of (1), where A is a maximally monotone operator and B is the normal cone of a closed subspace [16, 17]. This was in turn solved by the so-called forward-Douglas–Rachford splitting algorithm [8, 16]. In [13], a three-operator splitting algorithm was proposed to address (1) assuming A and B to be only maximally monotone. In [11], an adaptive Douglas–Rachford splitting algorithm was introduced for the case when A and B are strongly and weakly monotone operators, and \(C=0\). Therein, adaptive parameters were used to accommodate the corresponding monotonicity properties. This approach was later studied in [4] by means of conical averagedness.

Motivated by the two approaches in [11] and [13], this paper is devoted to develop an adaptive splitting algorithm for solving (1) when A and B are strongly and weakly monotone operators and C is a cocoercive operator. We utilize new parameters so that the generated sequence converges weakly to a fixed point, while the corresponding image sequence via the resolvent (the shadow sequence) converges weakly to a solution of the original problem. If the strong monotonicity outweighs the weak monotonicity, the convergence of the shadow sequence is strong. In addition, we recover some contemporary results for the Douglas–Rachford algorithm, forward-backward algorithm, and also backward-forward algorithm which has recently been studied in [1]. On the one hand, our new algorithm enhances the framework of [13] to allow for handling generalized monotone operators. On the other hand, it extends the adaptive approach in [11] to incorporate the third operator that is cocoercive and whose resolvent might not be explicitly computable. An application to minimizing the sum of three functions is also included.

On another note, it is well known that the alternating direction method of multipliers (ADMM) can be written as the Douglas–Rachford algorithm in dual settings. Recently, this important relation has been extended in [3] for the adaptive framework, namely, a new adaptive ADMM can be written as the adaptive Douglas–Rachford algorithm in dual settings. We refer interested readers to [3] for a rather comprehensive discussion on the ADMM.

The remainder of the paper is organized as follows. In Sect. 2, we present our adaptive splitting algorithm and recall some preliminary materials. Section 3 provides an abstract convergence result, which will be used to derive the main results in Sect. 4. In Sect. 5, we revisit some convergence results for the case of two operators based on the newly developed framework. Finally, Sect. 6 presents an immediate application of the main results to minimizing the sum of three functions.

2 The algorithm

Throughout, X is a real Hilbert space with inner product \(\langle {\cdot },{\cdot } \rangle \) and induced norm \(\|\cdot \|\). The set of nonnegative integers is denoted by \(\mathbb{N}\) and the set of real numbers is denoted by \(\mathbb{R}\). We denote the set of nonnegative real numbers by \(\mathbb{R}_{+}:= \{{x \in \mathbb{R}} \mid {x \geq 0}\}\) and the set of the positive real numbers by \(\mathbb{R}_{++}:= \{{x \in \mathbb{R}}\mid {x >0}\}\). The notation \(A\colon X\rightrightarrows X\) is to indicate that A is a set-valued operator on X and the notation \(A\colon X\to X\) is to indicate that A is a single-valued operator on X.

Let \(A\colon X\rightrightarrows X\) be an operator on X. Then its domain is \(\operatorname{dom}A :=\{{x\in X}\mid {Ax\neq \varnothing }\}\), its set of zeros is \(\operatorname{zer}A :=\{{x\in X}\mid {0\in Ax}\}\), and its set of fixed points is \(\operatorname{Fix}A :=\{{x\in X}\mid {x\in Ax}\}\). The graph of A is the set \(\operatorname{gra}A :=\{{(x,u)\in X\times X}\mid {u\in Ax}\}\) and the inverse of A, denoted by \(A^{-1}\), is the operator with graph \(\operatorname{gra}A^{-1} :=\{{(u,x)\in X\times X}\mid {u\in Ax} \}\). The resolvent of A is defined by

$$ J_{A} :=(\mathrm{Id}+A)^{-1}, $$
(2)

where Id is the identity operator.

Now, let \(\eta ,\gamma ,\delta \in \mathbb{R}_{++}\) and set \(\lambda :=1+\frac{\delta }{\gamma }\). In order to address problem (1), we employ the operator

$$ T_{A,B,C} :=\mathrm{Id}-\eta J_{\gamma A} +\eta J_{\delta B} \bigl((1-\lambda )\mathrm{Id}+\lambda J_{\gamma A} - \delta CJ_{ \gamma A} \bigr). $$
(3)

We will also refer to γ and δ as the resolvent parameters as they are used to scale the operators A, B in their respective resolvents. In fact, we adapt γ and δ to the generalized monotonicity of A and B in order to guarantee the convergence of \(T_{A,B,C}\). Intuitively, in the case A and B are maximally monotone, one would expect the use of equal resolvent parameters \(\gamma =\delta \), and in other cases, γ and δ are no longer the same. This phenomenon was initially observed in [11, 12]. Although the imbalance of monotonicity can be resolved by shifting the identity between the operators as in [11, Remark 4.15], our plan is to conduct the convergence analysis of the algorithm applied to the original operators.

To motivate the use of (3), the following result shows the relationship between the fixed point set of \(T_{A,B,C}\) and the solution set of (1).

Proposition 2.1

(fixed points of \(T_{A,B,C}\))

Let \(T_{A,B,C}\) be defined by (3). Then \(\operatorname{Fix}T_{A,B,C}\neq \varnothing \) if and only if \(\operatorname{zer}(A+B+C)\neq \varnothing \). Moreover, if \(J_{\gamma A}\) is single-valued, then

$$ J_{\gamma A}(\operatorname{Fix}T_{A,B,C}) =\operatorname{zer}(A+B+C). $$
(4)

Proof

Let \(x\in \operatorname{dom}T_{A,B,C}\). We have

$$ T_{A,B,C} x =\bigl\{ {x -\eta a +\eta J_{(\lambda -1)\gamma B} \bigl((1- \lambda )x +\lambda a -(\lambda -1)\gamma Ca \bigr)}\mid {a\in J_{ \gamma A}x}\bigr\} . $$
(5)

Therefore,

$$\begin{aligned} x\in \operatorname{Fix}T_{A,B,C} &\quad\iff\quad \exists a\in J_{\gamma A}x, \quad a\in J_{(\lambda -1)\gamma B} \bigl((1-\lambda )x +\lambda a -( \lambda -1)\gamma Ca \bigr) \end{aligned}$$
(6a)
$$\begin{aligned} &\quad\iff\quad \exists a\in J_{\gamma A}x,\quad (1-\lambda )x +\lambda a -( \lambda -1)\gamma Ca -a\in (\lambda -1)\gamma Ba \end{aligned}$$
(6b)
$$\begin{aligned} &\quad\iff\quad \exists a\in X,\quad x-a\in \gamma Aa \quad \text{and}\quad a-x\in Ba+Ca \end{aligned}$$
(6c)
$$\begin{aligned} &\quad\iff\quad \exists a\in J_{\gamma A}x\cap \operatorname{zer}(A+B+C), \end{aligned}$$
(6d)

which completes the proof. □

In the rest of this section, we recall some preliminary concepts and results. Let \(T\colon X\to X\) be a single-valued operator on X. Then T is nonexpansive if it is Lipschitz continuous with constant 1 on its domain, i.e.,

$$ \forall x, y\in \operatorname{dom}T,\quad \Vert Tx-Ty \Vert \leq \Vert x-y \Vert . $$
(7)

The operator T is said to be conically averaged with constant \(\theta \in \mathbb{R}_{++}\) (see [4, 7]) if there exists a nonexpansive operator \(N\colon X\to X\) such that

$$ T =(1-\theta )\mathrm{Id}+\theta N. $$
(8)

Given a conically θ-averaged operator, it is θ-averaged when \(\theta \in {]0,1 [}\) and nonexpansive when \(\theta =1\). Further properties are discussed in the following result from [4, Proposition 2.2].

Proposition 2.2

Let \(T\colon X\to X\), \(\theta \in \mathbb{R}_{++}\), and \(\lambda \in \mathbb{R}_{++}\). Then the following are equivalent:

  1. (i)

    T is conically θ-averaged.

  2. (ii)

    \((1-\lambda )\mathrm{Id}+\lambda T\) is conically λθ-averaged.

  3. (iii)

    For all \(x,y\in \operatorname{dom}T\),

    $$ \Vert Tx-Ty \Vert ^{2}\leq \Vert x-y \Vert ^{2} - \biggl(\frac{1}{\theta }-1 \biggr) \bigl\Vert ( \mathrm{Id}-T)x -( \mathrm{Id}-T)y \bigr\Vert ^{2}. $$
    (9)

Recall from [11] that an operator \(A\colon X\rightrightarrows X\) is α-monotone for some \(\alpha \in \mathbb{R}\) if

$$ \forall (x,u), (y,v)\in \operatorname{gra}A,\quad \langle {x-y},{u-v} \rangle \geq \alpha \Vert x-y \Vert ^{2}. $$
(10)

We say that A is monotone if \(\alpha =0\), strongly monotone if \(\alpha >0\), and weakly monotone if \(\alpha <0\). The operator A is said to be maximally α-monotone if it is α-monotone and there is no α-monotone operator \(B\colon X\rightrightarrows X\) such that graB properly contains graA.

We say that A is σ-cocoercive if \(\sigma \in \mathbb{R}_{++}\) and

$$ \forall (x,u), (y,v)\in \operatorname{gra}A,\quad \langle {x-y},{u-v} \rangle \geq \sigma \Vert u-v \Vert ^{2}. $$
(11)

Clearly, if A is σ-cocoercive, then A is single-valued and monotone. In fact, σ-cocoercivity was extended to σ-comonotonicity to allow for negative parameter σ, see [4, 7] for more details. Next, we recall a result from [11, Lemma 3.3 and Proposition 3.4].

Proposition 2.3

(single-valued and full domain)

Let \(A\colon X\rightrightarrows X\) be α-monotone and let \(\gamma \in \mathbb{R}_{++}\) such that \(1+\gamma \alpha >0\). Then the following hold:

  1. (i)

    \(J_{\gamma A}\) is single-valued and \((1+\gamma \alpha )\)-cocoercive.

  2. (ii)

    \(\operatorname{dom}J_{\gamma A}=X\) if and only if A is maximally α-monotone.

Finally, we recall the demiclosedness principle for cocoercive operators developed in [2]. A fundamental result in the theory of nonexpansive mapping is Browder’s celebrated demiclosedness principle [9]. It was extended for finitely many firmly nonexpansive mappings in [5] and was later generalized in [2] for a finite family of conically averaged mappings or for a finite family of cocoercive mappings. An instant application of the demiclosedness principle is to provide a simple proof for the weak convergence of the shadow sequence of the Douglas–Rachford algorithm [5] and of the adaptive Douglas–Rachford algorithm [2]. For our analysis, we recall only the result for two operators.

Proposition 2.4

(demiclosedness principle for balanced cocoercive operators)

Let \(T_{1}: X\to X\) and \(T_{2}: X\to X\) be respectively \(\sigma _{1}\)- and \(\sigma _{2}\)-cocoercive, let \((x_{n})_{n\in \mathbb{N}}\) and \((z_{n})_{n\in \mathbb{N}}\) be sequences in X, and let \(\rho _{1},\rho _{2}\in \mathbb{R}_{++}\) be such that

$$ \frac{\rho _{1}\sigma _{1}+\rho _{2}\sigma _{2}}{\rho _{1}+\rho _{2}} \geq 1. $$
(12)

Suppose that as \(n\to +\infty \),

$$\begin{aligned}& x_{n}\rightharpoonup x^{*},\qquad z_{n} \rightharpoonup z^{*}, \end{aligned}$$
(13a)
$$\begin{aligned}& T_{1}x_{n}\rightharpoonup y^{*},\qquad T_{2}z_{n}\rightharpoonup y^{*}, \end{aligned}$$
(13b)
$$\begin{aligned}& \rho _{1}(x_{n}-T_{1}x_{n}) + \rho _{2}(z_{n}-T_{2}z_{n})\to \rho _{1}\bigl(x^{*}-y^{*}\bigr)+ \rho _{2} \bigl(z^{*}-y^{*}\bigr), \end{aligned}$$
(13c)
$$\begin{aligned}& T_{1}x_{n}-T_{2}z_{n}\to 0. \end{aligned}$$
(13d)

Then \(y^{*} =T_{1}x^{*} =T_{2}z^{*}\).

Proof

Apply [2, Theorem 3.2] for two operators. □

3 An abstract convergence result

In order to study \(T_{A,B,C}\), it is reasonable to consider the general operator

$$ T :=\mathrm{Id}-\eta T_{1} +\eta T_{2}(-\nu \mathrm{Id}+ \lambda T_{1} -\delta T_{3}T_{1}), $$
(14)

where \(T_{1},T_{2},T_{3}\colon X\to X\) and \(\eta ,\nu ,\lambda ,\delta \in \mathbb{R}_{++}\). In this section, we establish a convergence result for the operator T under the cocoercivity of \(T_{1}\), \(T_{2}\), \(T_{3}\). We begin with a useful technical lemma.

Lemma 3.1

Let a, b, c, d be in X and let η, ν, λ, δ be in \(\mathbb{R}_{++}\). Set \(e :=-\nu a +\lambda b -\delta c\) and \(f :=a -\eta b +\eta d\). Then, for all \(\sigma \in \mathbb{R}_{++}\),

$$\begin{aligned} \Vert f \Vert ^{2} &= \Vert a \Vert ^{2} - \biggl( \frac{\lambda }{\eta \nu } - \frac{\delta }{2\eta \nu \sigma } -1 \biggr) \Vert a-f \Vert ^{2} - \frac{\delta }{2\eta \nu \sigma } \Vert a-f -2\eta \sigma c \Vert ^{2} \\ &\quad{} +\frac{\lambda \eta }{\nu } \Vert b \Vert ^{2} +\frac{\lambda \eta }{\nu } \Vert d \Vert ^{2} -2\eta \langle {a},{b} \rangle -2 \frac{\eta }{\nu } \langle {e},{d} \rangle - \frac{2\delta \eta }{\nu }\bigl( \langle {c},{b} \rangle - \sigma \Vert c \Vert ^{2}\bigr). \end{aligned}$$
(15)

Proof

By assumption, \(a-f =\eta (b-d)\) and \(\lambda b =\nu a +\delta c +e\), which imply that

$$\begin{aligned} \frac{\lambda }{\eta ^{2}} \Vert a-f \Vert ^{2} &=\lambda \Vert b-d \Vert ^{2} \end{aligned}$$
(16a)
$$\begin{aligned} &=\lambda \Vert b \Vert ^{2} +\lambda \Vert d \Vert ^{2} -2\lambda \langle {b},{d} \rangle \end{aligned}$$
(16b)
$$\begin{aligned} &=\lambda \Vert b \Vert ^{2} +\lambda \Vert d \Vert ^{2} -2 \langle {\nu a + \delta c +e},{d} \rangle \end{aligned}$$
(16c)
$$\begin{aligned} &=\lambda \Vert b \Vert ^{2} +\lambda \Vert d \Vert ^{2} -2\nu \langle {a},{d} \rangle -2\delta \langle {c},{d} \rangle -2 \langle {e},{d} \rangle . \end{aligned}$$
(16d)

Writing \(d =b -\frac{1}{\eta }(a-f)\), we have that

$$ -2\nu \langle {a},{d} \rangle =-2\nu \langle {a},{b} \rangle + \frac{2\nu }{\eta } \langle {a},{a-f} \rangle =-2\nu \langle {a},{b} \rangle + \frac{\nu }{\eta }\bigl( \Vert a \Vert ^{2} + \Vert a-f \Vert ^{2} - \Vert f \Vert ^{2}\bigr) $$
(17)

and that

$$\begin{aligned} -2\delta \langle {c},{d} \rangle &= -2\delta \langle {c},{b} \rangle + \frac{2\delta }{\eta } \langle {c},{a-f} \rangle \end{aligned}$$
(18a)
$$\begin{aligned} &=-2\delta \bigl( \langle {c},{b} \rangle -\sigma \Vert c \Vert ^{2}\bigr) - 2 \delta \sigma \Vert c \Vert ^{2} + \frac{2\delta }{\eta } \langle {c},{a-f} \rangle \end{aligned}$$
(18b)
$$\begin{aligned} &= -2\delta \bigl( \langle {c},{b} \rangle -\sigma \Vert c \Vert ^{2}\bigr) - \frac{\delta }{2\eta ^{2}\sigma } \Vert a-f -2\eta \sigma c \Vert ^{2} + \frac{\delta }{2\eta ^{2}\sigma } \Vert a-f \Vert ^{2}. \end{aligned}$$
(18c)

Substituting (17) and (18a)–(18c) into (16d) yields

$$\begin{aligned} \frac{\nu }{\eta } \Vert f \Vert ^{2} &=\frac{\nu }{\eta } \Vert a \Vert ^{2} - \biggl( \frac{\lambda }{\eta ^{2}} -\frac{\delta }{2\eta ^{2}\sigma } - \frac{\nu }{\eta } \biggr) \Vert a-f \Vert ^{2} -\frac{\delta }{2\eta ^{2}\sigma } \Vert a-f -2\eta \sigma c \Vert ^{2} \\ &\quad{} +\lambda \Vert b \Vert ^{2} +\lambda \Vert d \Vert ^{2} -2\nu \langle {a},{b} \rangle -2 \langle {e},{d} \rangle -2\delta \bigl( \langle {c},{b} \rangle -\sigma \Vert c \Vert ^{2}\bigr), \end{aligned}$$
(19)

which implies the conclusion. □

The following proposition is inspired by [13, Proposition 2.1].

Proposition 3.2

Let \(T_{1}\), \(T_{2}\), and \(T_{3}\) be respectively \(\sigma _{1}\)-, \(\sigma _{2}\)-, and \(\sigma _{3}\)-cocoercive. Let \(\eta , \nu ,\lambda ,\delta \in \mathbb{R}_{++}\) and define

$$ T :=\mathrm{Id}-\eta T_{1} +\eta T_{2}(-\nu \mathrm{Id}+ \lambda T_{1} -\delta T_{3}T_{1}). $$
(20)

Then the following hold:

  1. (i)

    If \(\lambda =2\nu \sigma _{1} =2\sigma _{2}\) and

    $$ \eta ^{*} :=\frac{1}{\nu } \biggl(\lambda -\frac{\delta }{2\sigma _{3}} \biggr)>0, $$
    (21)

    then, for all \(x,y\in \operatorname{dom}T\),

    $$\begin{aligned} \Vert Tx-Ty \Vert ^{2}&\leq \Vert x-y \Vert ^{2} - \biggl(\frac{\eta ^{*}}{\eta }-1 \biggr) \bigl\Vert ( \mathrm{Id}-T)x-(\mathrm{Id}-T)y \bigr\Vert ^{2} \\ &\quad{} -\frac{\delta }{2\eta \nu \sigma _{3}} \bigl\Vert (\mathrm{Id}-T)x-( \mathrm{Id}-T)y-2\eta \sigma _{3}(T_{3}T_{1}x-T_{3}T_{1}y) \bigr\Vert ^{2}. \end{aligned}$$
    (22)
  2. (ii)

    If \(\lambda <\nu \sigma _{1}+\sigma _{2}\) and

    $$ \eta ^{*} :=\frac{1}{\nu } \biggl( \frac{(2\nu \sigma _{1}-\lambda )(2\sigma _{2}-\lambda )}{2(\nu \sigma _{1}+\sigma _{2}-\lambda )}+ \lambda - \frac{\delta }{2\sigma _{3}} \biggr)>0, $$
    (23)

    then, for all \(x,y\in \operatorname{dom}T\),

    $$\begin{aligned} \Vert Tx-Ty \Vert ^{2}&\leq \Vert x-y \Vert ^{2} - \biggl(\frac{\eta ^{*}}{\eta }-1 \biggr) \bigl\Vert ( \mathrm{Id}-T)x-(\mathrm{Id}-T)y \bigr\Vert ^{2} \\ &\quad{}-\frac{\delta }{2\eta \nu \sigma _{3}} \bigl\Vert (\mathrm{Id}-T)x-( \mathrm{Id}-T)y-2\eta \sigma _{3}(T_{3}T_{1}x-T_{3}T_{1}y) \bigr\Vert ^{2} \\ &\quad{}-\frac{\eta }{2\nu (\nu \sigma _{1}+\sigma _{2}-\lambda )} \\ &\quad {}\times \bigl\Vert (2\nu \sigma _{1}-\lambda ) (T_{1}x-T_{1}y) +(2\sigma _{2}-\lambda ) (T_{2}Sx-T_{2}Sy) \bigr\Vert ^{2}, \end{aligned}$$
    (24)

    where \(S:=-\nu \mathrm{Id}+\lambda T_{1}-\delta T_{3}T_{1}\).

In both cases, T is conically \(\frac{\eta }{\eta ^{*}}\)-averaged.

Proof

Let \(x,y\in X\) be arbitrary and set \(S :=-\nu \mathrm{Id}+\lambda T_{1} -\delta T_{3}T_{1}\). Then \(T =\mathrm{Id}-\eta T_{1} +\eta T_{2}S\). Define

a:=xy,b:= T 1 x T 1 y,
(25a)
c:= T 3 T 1 x T 3 T 1 y,d:= T 2 Sx T 2 Sy,
(25b)
e:=SxSy,f:=TxTy.
(25c)

Then \(e =-\nu a +\lambda b -\delta c\) and \(f =a -\eta b +\eta d\). By applying Lemma 3.1 (with \(\sigma =\sigma _{3}\)),

$$\begin{aligned} \Vert f \Vert ^{2} &= \Vert a \Vert ^{2} - \biggl(\frac{\lambda }{\eta \nu } - \frac{\delta }{2\eta \nu \sigma _{3}} -1 \biggr) \Vert a-f \Vert ^{2} - \frac{\delta }{2\eta \nu \sigma _{3}} \Vert a-f -2\eta \sigma c \Vert ^{2} \\ &\quad{} +\frac{\lambda \eta }{\nu } \Vert b \Vert ^{2} +\frac{\lambda \eta }{\nu } \Vert d \Vert ^{2} -2\eta \langle {a},{b} \rangle -2 \frac{\eta }{\nu } \langle {e},{d} \rangle - \frac{2\delta \eta }{\nu }\bigl( \langle {c},{b} \rangle - \sigma _{3} \Vert c \Vert ^{2}\bigr). \end{aligned}$$
(26)

On the other hand, the cocoercivity of \(T_{1}\), \(T_{2}\), and \(T_{3}\) yields

$$ \langle {a},{b} \rangle \geq \sigma _{1} \Vert b \Vert ^{2},\qquad \langle {e},{d} \rangle \geq \sigma _{2} \Vert d \Vert ^{2},\qquad \langle {b},{c} \rangle \geq \sigma _{3} \Vert c \Vert ^{2}. $$
(27)

Combining this with (26), we obtain that

$$\begin{aligned} \Vert f \Vert ^{2} &\leq \Vert a \Vert ^{2} - \biggl(\frac{\lambda }{\eta \nu } - \frac{\delta }{2\eta \nu \sigma _{3}} -1 \biggr) \Vert a-f \Vert ^{2} - \frac{\delta }{2\eta \nu \sigma _{3}} \Vert a-f -2\sigma _{3} c \Vert ^{2} \\ &\quad {}-\frac{\eta }{\nu }(2\nu \sigma _{1}-\lambda ) \Vert b \Vert ^{2} - \frac{\eta }{\nu }(2\sigma _{2}-\lambda ) \Vert d \Vert ^{2}. \end{aligned}$$
(28)

(i): Since \(\lambda =2\nu \sigma _{1}=2\sigma _{2}\), (28) reduces to

$$\begin{aligned} \Vert f \Vert ^{2} &\leq \Vert a \Vert ^{2} - \biggl(\frac{\eta ^{*}}{\eta } -1 \biggr) \Vert a-f \Vert ^{2} - \frac{\delta }{2\eta \nu \sigma _{3}} \Vert a-f -2\sigma _{3} c \Vert ^{2}, \end{aligned}$$
(29)

which gives (22).

(ii): Set \(\kappa :=2\nu \sigma _{1}-\lambda \) and \(\mu :=2\sigma _{2}-\lambda \). Then \(\kappa +\mu =2(\nu \sigma _{1}+\sigma _{2}-\lambda ) >0\) and

$$\begin{aligned} &(2\nu \sigma _{1}-\lambda ) \Vert b \Vert ^{2} +(2 \sigma _{2}-\lambda ) \Vert d \Vert ^{2} \\ &\quad =\kappa \Vert b \Vert ^{2} +\mu \Vert d \Vert ^{2} \end{aligned}$$
(30a)
$$\begin{aligned} &\quad =\frac{1}{\kappa +\mu } \Vert \kappa b+\mu d \Vert ^{2} + \frac{\kappa \mu }{\kappa +\mu } \Vert b-d \Vert ^{2} \end{aligned}$$
(30b)
$$\begin{aligned} &\quad =\frac{1}{2(\nu \sigma _{1}+\sigma _{2}-\lambda )} \bigl\Vert (2\nu \sigma _{1}- \lambda )b +(2 \sigma _{2}-\lambda )d \bigr\Vert ^{2} \\ &\qquad {}+ \frac{(2\nu \sigma _{1}-\lambda )(2\sigma _{2}-\lambda )}{2\eta ^{2}(\nu \sigma _{1}+\sigma _{2}-\lambda )} \Vert a-f \Vert ^{2}, \end{aligned}$$
(30c)

where the last equality is due to the fact that \(b-d= \frac{1}{\eta }(a-f)\). Substituting into (28), we get

$$\begin{aligned} \Vert f \Vert ^{2} &\leq \Vert a \Vert ^{2} - \biggl( \frac{(2\nu \sigma _{1}-\lambda )(2\sigma _{2}-\lambda )}{2\eta \nu (\nu \sigma _{1}+\sigma _{2}-\lambda )}+ \frac{\lambda }{\eta \nu }-\frac{\delta }{2\eta \nu \sigma _{3}} -1 \biggr) \Vert a-f \Vert ^{2} \\ &\quad{} -\frac{\delta }{2\eta \nu \sigma _{3}} \Vert a-f -2\eta \sigma _{3} c \Vert ^{2} \\ &\quad {}-\frac{\eta }{2\nu (\nu \sigma _{1}+\sigma _{2}-\lambda )} \bigl\Vert (2 \nu \sigma _{1}- \lambda )b +(2\sigma _{2}-\lambda )d \bigr\Vert ^{2}, \end{aligned}$$
(31)

which proves (24).

Finally, in both cases (i) and (ii), we have that

$$ \Vert Tx-Ty \Vert ^{2}\leq \Vert x-y \Vert ^{2} - \biggl(\frac{\eta ^{*}}{\eta }-1 \biggr) \bigl\Vert (\mathrm{Id}-T)x-( \mathrm{Id}-T)y \bigr\Vert ^{2}, $$
(32)

which implies that T is conically \(\frac{\eta }{\eta ^{*}}\)-averaged due to Proposition 2.2(i) and (iii). □

In what follows, we say that \((x_{n})_{n\in \mathbb{N}}\) is a sequence generated by T if, for all \(n\in \mathbb{N}\), \(x_{n+1}\in Tx_{n}\).

Theorem 3.3

(abstract convergence)

Let \(T_{1}\), \(T_{2}\), and \(T_{3}\) be respectively \(\sigma _{1}\)-, \(\sigma _{2}\)-, and \(\sigma _{3}\)-cocoercive. Let \(\eta , \nu ,\lambda ,\delta \in \mathbb{R}_{++}\) and define

$$ T :=\mathrm{Id}-\eta T_{1} +\eta T_{2}(-\nu \mathrm{Id}+ \lambda T_{1} -\delta T_{3}T_{1}). $$
(33)

Suppose that \(\operatorname{Fix}T\neq \varnothing \) and that either

  1. (a)

    \(\lambda =2\nu \sigma _{1} =2\sigma _{2}\) and \(\eta <\eta ^{*} :=\frac{1}{\nu } (\lambda - \frac{\delta }{2\sigma _{3}} )\); or

  2. (b)

    \(\lambda <\nu \sigma _{1}+\sigma _{2}\) and \(\eta <\eta ^{*} :=\frac{1}{\nu } ( \frac{(2\nu \sigma _{1}-\lambda )(2\sigma _{2}-\lambda )}{2(\nu \sigma _{1}+\sigma _{2}-\lambda )}+ \lambda -\frac{\delta }{2\sigma _{3}} )\).

Let \((x_{n})_{n\in \mathbb{N}}\subset \operatorname{dom}T\) be a sequence generated by T and set \(S :=-\nu \mathrm{Id}+\lambda T_{1} -\delta T_{3}T_{1}\). Then the following hold:

  1. (i)

    T is \(\frac{\eta }{\eta ^{*}}\)-averaged. Consequently, \((x_{n})_{n\in \mathbb{N}}\) converges weakly to a point \(x^{*}\in \operatorname{Fix}T\) and the rate of asymptotic regularity of T is \(o(1/\sqrt{n})\), i.e., \(\|x_{n}-Tx_{n}\| =o(1/\sqrt{n})\).

  2. (ii)

    \((T_{3}T_{1}x_{n})_{n\in \mathbb{N}}\) converges strongly to \(T_{3}T_{1}x^{*}\) and \(T_{3}T_{1}(\operatorname{Fix}T) =\{T_{3}T_{1}x^{*}\}\).

  3. (iii)

    If (a) holds and \(\nu =\lambda -1\), then \((T_{1}x_{n})_{n\in \mathbb{N}}\) and \((T_{2}Sx_{n})_{n\in \mathbb{N}}\) converge weakly to \(T_{1}x^{*} =T_{2}Sx^{*}\).

  4. (iv)

    If (b) holds, then \((T_{1}x_{n})_{n\in \mathbb{N}}\) and \((T_{2}Sx_{n})_{n\in \mathbb{N}}\) converge strongly to \(T_{1}x^{*} =T_{2}Sx^{*}\) and \(T_{1}(\operatorname{Fix}T) = T_{2}S(\operatorname{Fix}T) =\{T_{1}x^{*} \}\).

Proof

Set \(\omega _{1} :=\frac{\eta ^{*}}{\eta }-1\), \(\omega _{2} :=\frac{\delta }{2\eta \nu \sigma _{3}}\), and

$$ \omega _{3} := \textstyle\begin{cases} 0 &\text{if } \lambda =2\nu \sigma _{1} =2\sigma _{2}, \\ \frac{\eta }{2\nu (\nu \sigma _{1}+\sigma _{2}-\lambda )} &\text{if } \lambda < \nu \sigma _{1}+\sigma _{2}. \end{cases} $$
(34)

Then \(\omega _{1} >0\), \(\omega _{2} >0\), and \(\omega _{3} \geq 0\). We derive from Proposition 3.2 that, for all \(x,y\in \operatorname{dom}T\),

$$\begin{aligned} \Vert Tx-Ty \Vert ^{2}&\leq \Vert x-y \Vert ^{2} -\omega _{1} \bigl\Vert (\mathrm{Id}-T)x-( \mathrm{Id}-T)y \bigr\Vert ^{2} \\ &\quad {}-\omega _{2} \bigl\Vert (\mathrm{Id}-T)x-( \mathrm{Id}-T)y-2 \eta \sigma _{3}(T_{3}T_{1}x-T_{3}T_{1}y) \bigr\Vert ^{2} \\ &\quad {}-\omega _{3} \bigl\Vert (2\nu \sigma _{1}-\lambda ) (T_{1}x-T_{1}y) +(2 \sigma _{2}-\lambda ) (T_{2}Sx-T_{2}Sy) \bigr\Vert ^{2} \end{aligned}$$
(35)

and T is conically \(\frac{\eta }{\eta ^{*}}\)-averaged.

(i): Since \(\eta <\eta ^{*}\), T is \(\frac{\eta }{\eta ^{*}}\)-averaged. By [4, Corollary 2.10], \((x_{n})_{n\in \mathbb{N}}\) converges weakly to a point \(x^{*}\in \operatorname{Fix}T\) and the rate of asymptotic regularity of T is \(o(1/\sqrt{n})\).

(ii): Let \(y\in \operatorname{Fix}T\). It follows from (35) that, for all \(n \in \mathbb{N}\),

$$\begin{aligned} \Vert x_{n+1}-y \Vert ^{2}&\leq \Vert x_{n}-y \Vert ^{2} -\omega _{1} \bigl\Vert ( \mathrm{Id}-T)x_{n} \bigr\Vert ^{2} \\ &\quad {}-\omega _{2} \bigl\Vert (\mathrm{Id}-T)x_{n}-2 \eta \sigma _{3}(T_{3}T_{1}x_{n}-T_{3}T_{1}y) \bigr\Vert ^{2} \\ &\quad {}-\omega _{3} \bigl\Vert (2\nu \sigma _{1}-\lambda ) (T_{1}x_{n}-T_{1}y) +(2 \sigma _{2}- \lambda ) (T_{2}Sx_{n}-T_{2}Sy) \bigr\Vert ^{2}. \end{aligned}$$
(36)

Telescoping this inequality yields

$$\begin{aligned} &\omega _{1}\sum_{n=0}^{\infty } \bigl\Vert (\mathrm{Id}-T)x_{n} \bigr\Vert ^{2} + \omega _{2}\sum_{n=0}^{\infty } \bigl\Vert (\mathrm{Id}-T)x_{n}-2\eta \sigma _{3}(T_{3}T_{1}x_{n}-T_{3}T_{1}y) \bigr\Vert ^{2} \\ &\qquad{}+\omega _{3}\sum_{n=0}^{\infty } \bigl\Vert (2\nu \sigma _{1}-\lambda ) (T_{1}x_{n}-T_{1}y) +(2\sigma _{2}-\lambda ) (T_{2}Sx_{n}-T_{2}Sy) \bigr\Vert ^{2} \\ &\quad \leq \Vert x_{0}-y \Vert ^{2} < +\infty . \end{aligned}$$
(37)

Since \(\omega _{1},\omega _{2} >0\) and \(\omega _{3} \geq 0\), we deduce that, as \(n\to +\infty \),

$$ (\mathrm{Id}-T)x_{n}\to 0 \quad \text{and}\quad ( \mathrm{Id}-T)x_{n}-2 \eta \sigma _{3}(T_{3}T_{1}x_{n}-T_{3}T_{1}y) \to 0, $$
(38)

which imply that

$$ T_{3}T_{1}x_{n}\to T_{3}T_{1}y. $$
(39)

As y is arbitrary in FixT and the limit of \(T_{3}T_{1}x_{n}\) is unique, we must have that \(T_{3}T_{1}\) is a constant on FixT. It follows that \(T_{3}T_{1}(\operatorname{Fix}T) =\{T_{3}T_{1}x^{*}\}\).

(iii): We will apply the demiclosedness principle in Proposition 2.4 to prove that \((T_{1}x_{n})_{n\in \mathbb{N}}\) converges weakly to \(T_{1}x^{*}\). First, recall from (i) that

$$ x_{n}\rightharpoonup x^{*}\in \operatorname{Fix}T. $$
(40)

As a result, \((x_{n})_{n\in \mathbb{N}}\) is bounded, and so is \((T_{1}x_{n})_{n\in \mathbb{N}}\). Let \(y^{*}\) be a weak cluster point of \((T_{1}x_{n})_{n\in \mathbb{N}}\). Then there exists a subsequence \((x_{k_{n}})_{n\in \mathbb{N}}\) such that

$$ T_{1}x_{k_{n}}\rightharpoonup y^{*}. $$
(41)

Define \(z_{n} :=Sx_{n} =(1-\lambda )x_{n} +\lambda T_{1}x_{n} -\delta T_{3}T_{1}x_{n}\). Since \(T_{3}T_{1}x_{n}\to T_{3}T_{1}x^{*}\) by (ii), it follows that

$$ z_{k_{n}}\rightharpoonup (1-\lambda ) x^{*} + \lambda y^{*} -\delta T_{3}T_{1}x^{*} =:z^{*}. $$
(42)

Next, we have from (i) that

$$ T_{1}x_{k_{n}} -T_{2}z_{k_{n}} =(T_{1}-T_{2}S)x_{k_{n}} = \frac{1}{\eta }( \mathrm{Id}-T)x_{k_{n}}\to 0, $$
(43)

which, due to (41), implies that

$$ T_{2}z_{k_{n}}\rightharpoonup y^{*}. $$
(44)

Set \(\rho _{1} :=\lambda -1 =\nu >0\) and \(\rho _{2} :=1\). Then

$$ \frac{\rho _{1}\sigma _{1}+\rho _{2}\sigma _{2}}{\rho _{1}+\rho _{2}} = \frac{(\lambda -1)\frac{\lambda }{2\nu }+1\cdot \frac{\lambda }{2}}{\lambda }=1 $$
(45)

and it follows from (42) that

$$ \rho _{1}\bigl(x^{*}-y^{*}\bigr)+\rho _{2}\bigl(z^{*}-y^{*}\bigr) =-\delta T_{3}T_{1}x^{*}. $$
(46)

Using the definition of \(z_{n}\) and then (43), we obtain

$$\begin{aligned} &\rho _{1}(x_{k_{n}}-T_{1}x_{k_{n}}) +\rho _{2}(z_{k_{n}}-T_{2}z_{k_{n}}) \\ &\quad =(\lambda -1) (x_{k_{n}}-T_{1}x_{k_{n}}) +(1-\lambda )x_{k_{n}} + \lambda T_{1}x_{k_{n}} -\delta T_{3}T_{1}x_{k_{n}} -T_{2}z_{k_{n}} \\ &\quad =T_{1}x_{k_{n}} -T_{2}z_{k_{n}} -\delta T_{3}T_{1}x_{k_{n}} \\ &\quad \to -\delta T_{3}T_{1}x^{*}=\rho _{1}\bigl(x^{*}-y^{*}\bigr) + \rho _{2} \bigl(z^{*}-T_{2}z^{*}\bigr). \end{aligned}$$
(47)

Now, in view of (40), (41), (42), (43), (44), (45), and (47), we apply Proposition 2.4 to derive that

$$ y^{*} =T_{1}x^{*} =T_{2}z^{*}, $$
(48)

which is the unique weak cluster point of \((T_{1}x_{n})_{n\in \mathbb{N}}\). Thus, \(T_{1}x_{n}\rightharpoonup T_{1}x^{*}\). Since \(T_{1}x_{n}-T_{2}Sx_{n} =\frac{1}{\eta }(\mathrm{Id}-T)x_{n} \to 0\) and \(x^{*}\in \operatorname{Fix}T\), we derive that \(T_{2}Sx_{n}\rightharpoonup T_{1}x^{*} =T_{2}Sx^{*}\).

(iv): In this case, \(\omega _{3}>0\). So (37) implies that, as \(n\to +\infty \),

$$ (2\nu \sigma _{1}-\lambda ) (T_{1}x_{n}-T_{1}y) +(2\sigma _{2}- \lambda ) (T_{2}Sx_{n}-T_{2}Sy) \to 0. $$
(49)

On the other hand,

$$ (T_{1}x_{n}-T_{1}y) -(T_{2}Sx_{n}-T_{2}Sy) =\frac{1}{\eta }( \mathrm{Id}-T)x_{n} -\frac{1}{\eta }( \mathrm{Id}-T)y = \frac{1}{\eta }(\mathrm{Id}-T)x_{n}\to 0, $$
(50)

which together with (49) yields

$$ T_{1}x_{n}\to T_{1}y \quad \text{and}\quad T_{2}Sx_{n}\to T_{2}Sy. $$
(51)

Since y is arbitrary in FixT and \(x^{*}\in \operatorname{Fix}T\), it also follows that \(T_{1}y =T_{1}x^{*}\) and \(T_{2}Sy =T_{2}Sx^{*} =T_{1}x^{*}\). Hence, \(T_{1}(\operatorname{Fix}T) =T_{2}S(\operatorname{Fix}T) =\{T_{1}x^{*} \}\). The proof is complete. □

4 Zeros of the sum of three operators

In this section, we apply the result to the problem of finding a zero of the sum of three operators. We assume that the operator A is maximally α-monotone, the operator B is maximally β-monotone, and the operator C is σ-cocoercive. We will consider two cases: \(\alpha +\beta =0\) and \(\alpha +\beta >0\).

Theorem 4.1

(convergence in the case \(\alpha +\beta =0\))

Suppose that A and B are respectively maximally α- and β-monotone with \(\alpha +\beta =0\), and C is σ-cocoercive. Let \(\eta \in \mathbb{R}_{++}\) and let \(\gamma \in \mathbb{R}_{++}\) be such that

$$ 1+2\gamma \alpha >0 \quad \textit{and}\quad \eta ^{*} :=2+2\gamma \alpha - \frac{\gamma }{2\sigma } >0. $$
(52)

Set \(\delta =\frac{\gamma }{1+2\gamma \alpha }\), \(\lambda =1+\frac{\delta }{\gamma }\), and let \((x_{n})_{n\in \mathbb{N}}\) be a sequence generated by \(T_{A,B,C}\) in (3). Then the following hold:

  1. (i)

    \(T_{A,B,C}\) is single-valued and has full domain.

  2. (ii)

    For all \(x,y\in X\),

    $$\begin{aligned} &\Vert T_{A,B,C} x-T_{A,B,C} y \Vert ^{2} \\ &\quad \leq \Vert x-y \Vert ^{2} - \biggl( \frac{\eta ^{*}}{\eta }-1 \biggr) \bigl\Vert (\mathrm{Id}-T_{A,B,C})x-( \mathrm{Id}-T_{A,B,C})y \bigr\Vert ^{2} \\ &\qquad{}-\frac{\gamma }{2\eta \sigma } \bigl\Vert (\mathrm{Id}-T_{A,B,C})x -(\mathrm{Id}-T_{A,B,C})y-2\eta \sigma (CJ_{\gamma A}x-CJ_{ \gamma A}y) \bigr\Vert ^{2}. \end{aligned}$$
    (53)

    In particular, \(T_{A,B,C}\) is conically \(\frac{\eta }{\eta ^{*}}\)-averaged.

  3. (iii)

    If \(\operatorname{zer}(A+B+C)\neq \varnothing \) and \(\eta <\eta ^{*}\), then the rate of asymptotic regularity of \(T_{A,B,C}\) is \(o(1/\sqrt{n})\) and \((x_{n})_{n\in \mathbb{N}}\) converges weakly to a point \(x^{*}\in \operatorname{Fix}T\), while \((J_{\gamma A}x_{n})_{n\in \mathbb{N}}\) and \((J_{\delta B}Sx_{n})_{n\in \mathbb{N}}\) converge weakly to \(J_{\gamma A}x^{*} =J_{\delta B}Sx^{*}\in \operatorname{zer}(A+B+C)\) where \(S :=(1-\lambda )\mathrm{Id}+\lambda J_{\gamma A} -\delta CJ_{ \gamma A}\), \((CJ_{\gamma A}x_{n})_{n\in \mathbb{N}}\) converges strongly to \(CJ_{\gamma A}x^{*}\), and \(C(\operatorname{zer}(A+B+C)) =\{CJ_{\gamma A}x^{*}\}\).

Proof

First, we can check that there always exists \(\gamma \in \mathbb{R}_{++}\) such that (52) holds (indeed, by choosing \(\gamma >0\) satisfying \(1/\gamma >\max \{-2\alpha , -\alpha +1/(4\sigma )\}\)). Next, we have that \(1+\gamma \alpha =1/2 +(1+2\gamma \alpha )/2 >0\). Since \(\alpha +\beta =0\), we also have

$$ 1+\delta \beta =1-\delta \alpha =1- \frac{\gamma \alpha }{1+2\gamma \alpha } = \frac{1+\gamma \alpha }{1+2\gamma \alpha } >0. $$
(54)

By Proposition 2.3, \(J_{\gamma A}\), \(J_{\delta B}\), and hence \(T_{A,B,C}\) are single-valued and have full domain. This proves (i).

Next, Proposition 2.3 also implies that \(J_{\gamma A}\) and \(J_{\delta B}\) are respectively \((1+\gamma \alpha )\)- and \((1+\delta \beta )\)-cocoercive. Set \(\sigma _{1} :=1+\gamma \alpha >0\), \(\sigma _{2} :=1+\delta \beta >0\), \(\sigma _{3} :=\sigma >0\), and \(\nu :=\lambda -1 >0\). Then \(2\nu \sigma _{1} =2(\lambda -1)(1+\gamma \alpha ) =2(1+\gamma \alpha )/(1+2\gamma \alpha ) =\lambda \) and, by (54), \(2\sigma _{2} =2(1+\delta \beta ) =2(1+\gamma \alpha )/(1+2\gamma \alpha ) =\lambda \). Also,

$$ \frac{1}{\nu } \biggl(\lambda -\frac{\delta }{2\sigma _{3}} \biggr) = \frac{1}{\lambda -1} \biggl(\lambda - \frac{(\lambda -1)\gamma }{2\sigma } \biggr) =1+ \frac{\gamma }{\delta }- \frac{\gamma }{2\sigma } =\eta ^{*} >0. $$
(55)

Applying Proposition 3.2(i), we get (ii).

Now, by Proposition 2.1, \(J_{\gamma A}(\operatorname{Fix}T_{A,B,C}) =\operatorname{zer}(A+B+C)\). We then apply Theorem 3.3(i)–(iii) to complete the proof. □

Using Theorem 4.1, we recover the results in [13, Theorem 2.1(1)], which partly spurred our interest in the topic.

Corollary 4.2

Suppose that A and B are respectively maximally monotone, that C is σ-cocoercive. Let \(\gamma \in {]0, 4\sigma [}\), \(\eta \in {]0, 2-\frac{\gamma }{2\sigma } [}\), and define

$$ T_{A,B,C} :=\mathrm{Id}-\eta J_{\gamma A} +\eta J_{\gamma B}(2J_{ \gamma A} -\mathrm{Id}-\gamma CJ_{\gamma A}). $$
(56)

Then the following hold:

  1. (i)

    \(T_{A,B,C}\) is \(\frac{2\eta \sigma }{4\sigma -\gamma }\)-averaged.

  2. (ii)

    If \(\operatorname{zer}(A+B+C)\neq \varnothing \), then the rate of asymptotic regularity of \(T_{A,B,C}\) is \(o(1/\sqrt{n})\) and \((x_{n})_{n\in \mathbb{N}}\) converges weakly to a point \(x^{*}\in \operatorname{Fix}T\), while \((J_{\gamma A}x_{n})_{n\in \mathbb{N}}\) and \((J_{\gamma B}(2J_{\gamma A} -\mathrm{Id}-\gamma CJ_{\gamma A})x_{n})_{n \in \mathbb{N}}\) converge weakly to \(J_{\gamma A}x^{*} =J_{\gamma B}(2J_{\gamma A} -\mathrm{Id}- \gamma CJ_{\gamma A})x^{*}\in \operatorname{zer}(A+B+C)\), \((CJ_{\gamma A}x_{n})_{n\in \mathbb{N}}\) converges strongly to \(CJ_{\gamma A}x^{*}\), and \(C(\operatorname{zer}(A+B+C)) =\{CJ_{\gamma A}x^{*}\}\).

Proof

Apply Theorem 4.1 with \(\alpha =\beta =0\), \(\delta =\gamma \), and \(\eta ^{*} =2-\frac{\gamma }{2\sigma }\). □

Remark 4.3

(range of parameter γ)

We note that while Corollary 4.2(i) is straightforward from [13, Proposition 2.1], Corollary 4.2(ii) improves upon [13, Theorem 2.1(1)] by only requiring the parameter \(\gamma \in {]0, 4\sigma [}\) instead of \(\gamma \in {]0, 2\sigma \varepsilon [}\) with \(\varepsilon \in {]0, 1 [}\).

Next, we consider the case \(\alpha +\beta >0\). This case indeed allows for some flexibility in choosing the resolvent parameters γ, δ. In particular, let us recall the case \(\alpha +\beta =0\) in Theorem 4.1, the resolvent parameters γ, δ must be directly related by

$$ \delta =\frac{\gamma }{1+2\gamma \alpha }, \quad \text{or equivalently,}\quad \frac{1}{\delta }=\frac{1}{\gamma }+2 \alpha . $$
(57)

In the case \(\alpha +\beta >0\), the above exact relation is no longer necessary; instead, for given γ, one can choose δ within a range such that

$$ \max \biggl\{ 0,\frac{1}{\gamma }+2\alpha -2\sqrt{\Delta } \biggr\} < \frac{1}{\delta } < \frac{1}{\gamma }+2\alpha +2\sqrt{\Delta } $$
(58)

for some positive Δ that depends on α, β, and γ. In the next results, we will show that such choices for \((\gamma ,\delta )\) always exist and will guarantee convergence of the algorithm.

Lemma 4.4

(existence of resolvent parameters)

Let \(\alpha ,\beta \in \mathbb{R}\) be such that \(\alpha +\beta >0\), let \(\sigma \in \mathbb{R}_{++}\), and let \(\gamma ,\delta \in \mathbb{R}_{++}\). Set

$$ \gamma _{0} := \textstyle\begin{cases} 0 &\textit{if } \alpha \geq \frac{1}{4\sigma }, \\ -\alpha +\frac{1}{4\sigma } &\textit{if } -\frac{1}{4\sigma } \leq \alpha < \frac{1}{4\sigma }, \\ 2\beta -2\sqrt{(\alpha +\beta )(\beta -\frac{1}{4\sigma })} &\textit{if } \alpha < -\frac{1}{4\sigma }. \end{cases} $$
(59)

Then \(\gamma _{0}\geq \max \{0, -\alpha +\frac{1}{4\sigma }\}\) and the following statements are equivalent:

  1. (i)

    \(\frac{4\gamma \delta (1+\gamma \alpha )(1+\delta \beta )-(\gamma +\delta )^{2}}{2\gamma \delta ^{2}(\alpha +\beta )}- \frac{\gamma }{2\sigma } >0\).

  2. (ii)

    \(\frac{1}{\gamma } >\gamma _{0}\) and \(\max \{0, \frac{1}{\gamma }+2\alpha -2\sqrt{\Delta }\} < \frac{1}{\delta } <\frac{1}{\gamma }+2\alpha +2\sqrt{\Delta }\), where \(\Delta :=(\alpha +\beta )(\frac{1}{\gamma }+\alpha -\frac{1}{4\sigma })\).

Consequently, there always exist \(\gamma ,\delta \in \mathbb{R}_{++}\) that satisfy both (i) and (ii).

Proof

If \(\alpha \geq -\frac{1}{4\sigma }\), then \(\gamma _{0} =\max \{0, -\alpha +\frac{1}{4\sigma }\}\) by definition. If \(\alpha < -\frac{1}{4\sigma } <0\), then \(\beta -\frac{1}{4\sigma } >\beta +\alpha >0\) and

$$\begin{aligned} \gamma _{0} &=2\beta -2\sqrt{(\alpha +\beta ) \biggl(\beta - \frac{1}{4\sigma }\biggr)} = \biggl(\sqrt{\beta -\frac{1}{4\sigma }} -\sqrt{ \alpha +\beta } \biggr)^{2} -\alpha +\frac{1}{4\sigma } \end{aligned}$$
(60a)
$$\begin{aligned} &\geq -\alpha +\frac{1}{4\sigma } =\max \biggl\{ 0, -\alpha + \frac{1}{4\sigma } \biggr\} . \end{aligned}$$
(60b)

Next, we have that

$$\begin{aligned} & \frac{4\gamma \delta (1+\gamma \alpha )(1+\delta \beta )-(\gamma +\delta )^{2}}{2\gamma \delta ^{2}(\alpha +\beta )}- \frac{\gamma }{2\sigma } >0 \end{aligned}$$
(61a)
$$\begin{aligned} &\quad\iff\quad (\gamma +\delta )^{2} < 4\gamma \delta (1+\gamma \alpha ) (1+ \delta \beta ) -\frac{\gamma ^{2}\delta ^{2}(\alpha +\beta )}{\sigma } \end{aligned}$$
(61b)
$$\begin{aligned} &\quad\iff\quad \biggl(1-4\gamma \beta -4\gamma ^{2}\alpha \beta + \frac{\gamma ^{2}(\alpha +\beta )}{\sigma } \biggr)\delta ^{2} -2 \gamma (1+2\gamma \alpha ) \delta +\gamma ^{2} < 0 \end{aligned}$$
(61c)
$$\begin{aligned} &\quad\iff\quad \biggl(\frac{1}{\gamma ^{2}}-4\beta \frac{1}{\gamma }-4 \alpha \beta + \frac{\alpha +\beta }{\sigma } \biggr) -2 \biggl( \frac{1}{\gamma }+2\alpha \biggr) \frac{1}{\delta } + \frac{1}{\delta ^{2}} < 0 \end{aligned}$$
(61d)
$$\begin{aligned} &\quad\iff\quad \Delta = (\alpha +\beta ) \biggl(\frac{1}{\gamma } +\alpha - \frac{1}{4\sigma }\biggr)>0 \quad \text{and} \\ &\hphantom{\quad\iff\quad\ } \frac{1}{\gamma }+2\alpha -2 \sqrt{ \Delta } < \frac{1}{\delta } < \frac{1}{\gamma }+2\alpha +2\sqrt{\Delta } \end{aligned}$$
(61e)
$$\begin{aligned} &\quad\iff\quad \frac{1}{\gamma } > -\alpha +\frac{1}{4\sigma } \quad \text{and}\quad \frac{1}{\gamma }+2\alpha -2\sqrt{\Delta } < \frac{1}{\delta } < \frac{1}{\gamma }+2\alpha +2\sqrt{\Delta }. \end{aligned}$$
(61f)

Suppose (ii) holds, then \(\frac{1}{\gamma }>\gamma _{0}\geq \max \{0, -\alpha + \frac{1}{4\sigma }\}\). So (61f) holds. It follows that (61a) holds, which is (i).

Now, suppose that (i) holds. Then (61f) holds, and so \(\frac{1}{\gamma } >\max \{0, -\alpha +\frac{1}{4\sigma }\}\) and \(\frac{1}{\gamma }+2\alpha +2\sqrt{\Delta } >0\). To obtain (ii), it suffices to show that \(\frac{1}{\gamma } >\gamma _{0}\). If \(\alpha \geq -\frac{1}{4\sigma }\), then \(\gamma _{0} =\max \{0, -\alpha +\frac{1}{4\sigma }\}\), and we readily have \(\frac{1}{\gamma } >\gamma _{0}\). Let us consider the case when \(\alpha < -\frac{1}{4\sigma }\). Then \(\beta -\frac{1}{4\sigma } >\beta +\alpha >0\) and

$$\begin{aligned} \frac{1}{\gamma }+2\alpha +2\sqrt{\Delta } >0 &\quad\iff\quad \biggl(\sqrt{ \alpha +\beta } +\sqrt{\frac{1}{\gamma }+\alpha -\frac{1}{4\sigma }} \biggr)^{2} > \beta -\frac{1}{4\sigma } \end{aligned}$$
(62a)
$$\begin{aligned} &\quad\iff\quad \sqrt{\frac{1}{\gamma }+\alpha -\frac{1}{4\sigma }} >\sqrt{ \beta - \frac{1}{4\sigma }} -\sqrt{\alpha +\beta } \end{aligned}$$
(62b)
$$\begin{aligned} &\quad\iff\quad \frac{1}{\gamma }+\alpha -\frac{1}{4\sigma } > \biggl(\sqrt{ \beta - \frac{1}{4\sigma }} -\sqrt{\alpha +\beta } \biggr)^{2} \end{aligned}$$
(62c)
$$\begin{aligned} &\quad\iff\quad \frac{1}{\gamma } > 2\beta -2\sqrt{(\alpha +\beta ) \biggl(\beta - \frac{1}{4\sigma }\biggr)}=\gamma _{0}, \end{aligned}$$
(62d)

which finish our claim.

To see the existence of γ and δ, we choose \(\gamma >0\) such that \(\frac{1}{\gamma } >\gamma _{0}\) and then choose \(\delta >0\) that satisfies the second condition in (ii). □

We are now ready to prove the convergence of the algorithm for the case \(\alpha +\beta >0\).

Theorem 4.5

(convergence in the case \(\alpha +\beta >0\))

Suppose that A and B are respectively maximally α- and β-monotone with \(\alpha +\beta >0\), that C is σ-cocoercive, and that \(\gamma ,\delta \in \mathbb{R}_{++}\) satisfy

$$ \eta ^{*} := \frac{4\gamma \delta (1+\gamma \alpha )(1+\delta \beta )-(\gamma +\delta )^{2}}{2\gamma \delta ^{2}(\alpha +\beta )}- \frac{\gamma }{2\sigma } >0. $$
(63)

Set \(\lambda =1+\frac{\delta }{\gamma }\) and let \(\eta \in \mathbb{R}_{++}\). Let \((x_{n})_{n\in \mathbb{N}}\) be a sequence generated by \(T_{A,B,C}\) in (3) and set \(S :=(1-\lambda )\mathrm{Id}+\lambda J_{\gamma A} -\delta CJ_{ \gamma A}\). Then the following hold:

  1. (i)

    \(T_{A,B,C}\) is single-valued and has full domain.

  2. (ii)

    For all \(x,y\in X\),

    $$\begin{aligned} &\Vert T_{A,B,C} x-T_{A,B,C} y \Vert ^{2} \\ &\quad \leq \Vert x-y \Vert ^{2} - \biggl( \frac{\eta ^{*}}{\eta }-1 \biggr) \bigl\Vert (\mathrm{Id}-T_{A,B,C})x-( \mathrm{Id}-T_{A,B,C})y \bigr\Vert ^{2} \\ &\qquad{}-\frac{\gamma }{2\eta \sigma } \bigl\Vert (\mathrm{Id}-T_{A,B,C})x -(\mathrm{Id}-T_{A,B,C})y-2\eta \sigma (CJ_{\gamma A}x-CJ_{ \gamma A}y) \bigr\Vert ^{2} \\ &\qquad{}-\frac{\gamma \eta }{2\delta ^{2}(\alpha +\beta )} \\ &\qquad {}\times \bigl\Vert (\lambda -2+2 \delta \alpha ) (T_{1}x-T_{1}y) +(2-\lambda +2\delta \beta ) (T_{2}Sx-T_{2}Sy) \bigr\Vert ^{2}. \end{aligned}$$
    (64)

    In particular, \(T_{A,B,C}\) is conically \(\frac{\eta }{\eta ^{*}}\)-averaged.

  3. (iii)

    If \(\operatorname{zer}(A+B+C)\neq \varnothing \) and \(\eta <\eta ^{*}\), then the rate of asymptotic regularity of \(T_{A,B,C}\) is \(o(1/\sqrt{n})\) and \((x_{n})_{n\in \mathbb{N}}\) converges weakly to a point \(x^{*}\in \operatorname{Fix}T\), while \((J_{\gamma A}x_{n})_{n\in \mathbb{N}}\) and \((J_{\delta B}Sx_{n})_{n\in \mathbb{N}}\) converge strongly to \(J_{\gamma A}x^{*} =J_{\delta B}Sx^{*}\in \operatorname{zer}(A+B+C)\), \((CJ_{\gamma A}x_{n})_{n\in \mathbb{N}}\) converges strongly to \(CJ_{\gamma A}x^{*}\), and \(\operatorname{zer}(A+B+C) =\{J_{\gamma A}x^{*}\}\).

Proof

First, Lemma 4.4 ensures the existence of \(\gamma ,\delta \in \mathbb{R}_{++}\) satisfying (63). In view of (63), it also follows from Lemma 4.4 that \(1/\gamma >-\alpha +\frac{1}{4\sigma }\), and so \(1+\gamma \alpha >\gamma /(4\sigma ) >0\), which together with (63) implies that \(1+\delta \beta >0\). In turn, Proposition 2.3 implies that \(J_{\gamma A}\), \(J_{\delta B}\), and hence \(T_{A,B,C}\) are single-valued and have full domain, and we get (i).

We also derive from Proposition 2.3 that \(J_{\gamma A}\) and \(J_{\delta B}\) are \((1+\gamma \alpha )\)- and \((1+\delta \beta )\)-cocoercive, respectively. Now, set \(\sigma _{1} :=1+\gamma \alpha >0\), \(\sigma _{2} :=1+\delta \beta >0\), and \(\sigma _{3} :=\sigma >0\), and \(\nu :=\lambda -1 >0\). On the one hand, since \(\alpha +\beta >0\),

$$ \nu \sigma _{1}+\sigma _{2} =(\lambda -1) (1+\gamma \alpha ) +(1+ \delta \beta ) =\lambda +\delta (\alpha +\beta ) >\lambda . $$
(65)

On the other hand,

$$\begin{aligned} &\frac{1}{\nu } \biggl( \frac{(2\nu \sigma _{1}-\lambda )(2\sigma _{2}-\lambda )}{2(\nu \sigma _{1}+\sigma _{2}-\lambda )}+ \lambda -\frac{\delta }{2\sigma _{3}} \biggr) \\ &\quad =\frac{1}{\nu } \biggl( \frac{4\nu \sigma _{1}\sigma _{2}-\lambda ^{2}}{2(\nu \sigma _{1}+\sigma _{2}-\lambda )}- \frac{\delta }{2\sigma _{3}} \biggr) \end{aligned}$$
(66a)
$$\begin{aligned} &\quad =\frac{\gamma }{\delta } \biggl( \frac{4\gamma \delta (1+\gamma \alpha )(1+\delta \beta )-(\gamma +\delta )^{2}}{2\gamma ^{2}\delta (\alpha +\beta )}- \frac{\delta }{2\sigma } \biggr) \end{aligned}$$
(66b)
$$\begin{aligned} &\quad = \frac{4\gamma \delta (1+\gamma \alpha )(1+\delta \beta )-(\gamma +\delta )^{2}}{2\gamma \delta ^{2}(\alpha +\beta )}- \frac{\gamma }{2\sigma } =\eta ^{*} >0. \end{aligned}$$
(66c)

Therefore, we obtain (ii) due to Proposition 3.2(ii).

Finally, applying Theorem 3.3(i), (ii), and (iv) and noting that \(J_{\gamma A}(\operatorname{Fix}T_{A,B,C}) =\operatorname{zer}(A+B+C)\) due to Proposition 2.1, we complete the proof. □

5 Zeros of the sum of two operators

The new results in Theorems 4.1 and 4.5 allow us to revisit the relaxed forward-backward, relaxed backward-forward, and adaptive Douglas–Rachford algorithms for finding a zero of the sum of two operators.

Theorem 5.1

(relaxed forward-backward)

Suppose that B is maximally β-monotone with \(\beta \in \mathbb{R}_{+}\) and that C is σ-cocoercive. Let \(\gamma \in {]0, 4\sigma [}\), \(\eta \in {]0, 2-\frac{\gamma }{2\sigma } [}\), and let \((x_{n})_{n\in \mathbb{N}}\) be a sequence generated by

$$ T_{\mathrm{FB}} :=(1-\eta )\mathrm{Id}+\eta J_{\gamma B}( \mathrm{Id}-\gamma C). $$
(67)

Then the following hold:

  1. (i)

    For all \(x,y\in X\),

    $$\begin{aligned} \Vert T_{\mathrm{FB}} x-T_{\mathrm{FB}} y \Vert ^{2} &\leq \Vert x-y \Vert ^{2} - \biggl( \frac{4\sigma -\gamma }{2\eta \sigma }-1 \biggr) \bigl\Vert (\mathrm{Id}-T_{ \mathrm{FB}})x-(\mathrm{Id}-T_{\mathrm{FB}})y \bigr\Vert ^{2} \\ &\quad {}-\frac{\gamma }{2\eta \sigma } \bigl\Vert (\mathrm{Id}-T_{ \mathrm{FB}})x -( \mathrm{Id}-T_{\mathrm{FB}})y-2\eta \sigma (Cx-Cy) \bigr\Vert ^{2}. \end{aligned}$$
    (68)

    In particular, \(T_{\mathrm{FB}}\) is \(\frac{2\eta \sigma }{4\sigma -\gamma }\)-averaged.

  2. (ii)

    If \(\operatorname{zer}(B+C)\neq \varnothing \), then the rate of asymptotic regularity of \(T_{\mathrm{FB}}\) is \(o(1/\sqrt{n})\) and \((x_{n})_{n\in \mathbb{N}}\) converges weakly to a point \(x^{*}\in \operatorname{zer}(B+C)\), while \((Cx_{n})_{n\in \mathbb{N}}\) converges strongly to \(Cx^{*}\), and \(C(\operatorname{zer}(B+C)) =\{Cx^{*}\}\). Moreover, if additionally \(\beta >0\), then \((x_{n})_{n\in \mathbb{N}}\) converges strongly to \(x^{*}\) and \(\operatorname{zer}(B+C) =\{x^{*}\}\).

Proof

Apply Theorems 4.1 and 4.5 with \(A =0\), \(\alpha =0\), \(\lambda =2\), and \(\delta =\gamma \). □

Theorem 5.2

(relaxed backward-forward)

Suppose that A is maximally α-monotone with \(\alpha \in \mathbb{R}_{+}\) and that C is σ-cocoercive. Let \(\gamma \in {]0, 4\sigma [}\), \(\eta \in {]0, 2-\frac{\gamma }{2\sigma } [}\), and let \((x_{n})_{n\in \mathbb{N}}\) be a sequence generated by

$$ T_{\mathrm{BF}} :=(1-\eta )\mathrm{Id}+\eta (\mathrm{Id}- \gamma C)J_{\gamma A}. $$
(69)

Then the following hold:

  1. (i)

    For all \(x,y\in X\),

    $$\begin{aligned} \Vert T_{\mathrm{BF}} x-T_{\mathrm{BF}} y \Vert ^{2} &\leq \Vert x-y \Vert ^{2} - \biggl( \frac{4\sigma -\gamma }{2\eta \sigma }-1 \biggr) \bigl\Vert (\mathrm{Id}-T_{ \mathrm{BF}})x-(\mathrm{Id}-T_{\mathrm{BF}})y \bigr\Vert ^{2} \\ &\quad {}-\frac{\gamma }{2\eta \sigma } \bigl\Vert (\mathrm{Id}-T_{ \mathrm{BF}})x -( \mathrm{Id}-T_{\mathrm{BF}})y-2\eta \sigma (Cx-Cy) \bigr\Vert ^{2}. \end{aligned}$$
    (70)

    In particular, \(T_{\mathrm{BF}}\) is \(\frac{2\eta \sigma }{4\sigma -\gamma }\)-averaged.

  2. (ii)

    If \(\operatorname{zer}(A+C)\neq \varnothing \), then the rate of asymptotic regularity of \(T_{\mathrm{BF}}\) is \(o(1/\sqrt{n})\) and \((x_{n})_{n\in \mathbb{N}}\) converges weakly to a point \(x^{*}\in \operatorname{Fix}T_{\mathrm{BF}}\), while \((J_{\gamma A}x_{n})_{n\in \mathbb{N}}\) converges weakly to \(J_{\gamma A}x^{*}\in \operatorname{zer}(A+C)\), \((Cx_{n})_{n\in \mathbb{N}}\) converges strongly to \(Cx^{*}\), and \(C(\operatorname{zer}(A+C)) =\{Cx^{*}\}\). Moreover, if additionally \(\alpha >0\), then \((J_{\gamma A}x_{n})_{n\in \mathbb{N}}\) converges strongly to \(J_{\gamma A}x^{*}\in \operatorname{zer}(A+C)\) and \(\operatorname{zer}(A+C) =\{J_{\gamma A}x^{*}\}\).

Proof

Apply Theorems 4.1 and 4.5 with \(B =0\), \(\beta =0\), \(\lambda =2\), and \(\delta =\gamma \). □

Theorem 5.3

(adaptive DR)

Suppose that A and B are respectively maximally α- and β-monotone, that either

  1. (a)

    \(\alpha +\beta =0\), \(1+2\gamma \alpha >0\), \(\delta =\frac{\gamma }{1+2\gamma \alpha }\), \(\eta ^{*} =2\); or

  2. (b)

    \(\alpha +\beta >0\), \(\eta ^{*} := \frac{4\gamma \delta (1+\gamma \alpha )(1+\delta \beta )-(\gamma +\delta )^{2}}{2\gamma \delta ^{2}(\alpha +\beta )} >0\).

Let \(\lambda =1+\frac{\delta }{\gamma }\), \(\eta \in {]0, \eta ^{*} [}\), and let \((x_{n})_{n\in \mathbb{N}}\) be a sequence generated by

$$ T_{\mathrm{DR}} :=\mathrm{Id}-\eta J_{\gamma A} +\eta J_{\delta B}\bigl((1- \lambda )\mathrm{Id}+\lambda J_{\gamma A} \bigr). $$
(71)

Set \(S :=(1-\lambda )\mathrm{Id}+\lambda J_{\gamma A}\). Then the following hold:

  1. (i)

    \(T_{\mathrm{DR}}\) is \(\frac{\eta }{\eta ^{*}}\)-averaged and has full domain.

  2. (ii)

    If \(\operatorname{zer}(A+B)\neq \varnothing \), then the rate of asymptotic regularity of \(T_{\mathrm{DR}}\) is \(o(1/\sqrt{n})\) and \((x_{n})_{n\in \mathbb{N}}\) converges weakly to a point \(x^{*}\in \operatorname{Fix}T\) with \(J_{\gamma A}x^{*}\in \operatorname{zer}(A+B)\). Moreover, when (a) holds, \((J_{\gamma A}x_{n})_{n\in \mathbb{N}}\) and \((J_{\delta B}Sx_{n})_{n\in \mathbb{N}}\) converge weakly to \(J_{\gamma A}x^{*} =J_{\delta B}Sx^{*}\); when (b) holds, \((J_{\gamma A}x_{n})_{n\in \mathbb{N}}\) and \((J_{\delta B}Sx_{n})_{n\in \mathbb{N}}\) converge strongly to \(J_{\gamma A}x^{*} =J_{\delta B}Sx^{*}\) and \(\operatorname{zer}(A+B) =\{J_{\gamma A}x^{*}\}\).

Proof

Apply Theorems 4.1 and 4.5 with \(C =0\) and note that the operator C is σ-cocoercive with any \(\sigma >0\). □

Remark 5.4

In terms of the range of parameter γ, Theorems 5.1 and 5.2 only require \(\gamma \in { ]0,4\beta [}\), improving the classical convergence results for the forward-backward and backward-forward algorithms which require \(\gamma \in {]0,2\beta [}\), see, e.g., [1, Corollaries 3.4 and 3.6]. On the other hand, as \(T_{\mathrm{DR}}\) in (71) is actually the adaptive DR operator (see [11, Lemma 4.1(ii)]), Theorem 5.3 unifies [11, Theorem 4.5], [4, Theorem 5.7], and [2, Theorem 4.1].

6 Minimizing the sum of three functions

In this section, we consider the problem of minimizing the sum of three functions. Let \(f\colon X\to {]-\infty ,+\infty ]}\). Then f is proper if \(\operatorname{dom}f :=\{{x\in X}\mid {f(x) <+\infty }\}\neq \varnothing \), and lower semicontinuous if \(\forall x\in X\), \(f(x)\leq \liminf_{z\to x} f(z)\). Given \(\alpha \in \mathbb{R}\), the function f is α-convex if \(\forall x,y\in \operatorname{dom}f\), \(\forall \kappa \in {]0,1 [}\),

$$ f\bigl((1-\kappa ) x+\kappa y\bigr) +\frac{\alpha }{2}\kappa (1- \kappa ) \Vert x-y \Vert ^{2} \leq (1-\kappa )f(x)+\kappa f(y). $$
(72)

We simply say f is convex if \(\alpha =0\). We also say that f is strongly convex or weakly convex, if \(\alpha >0\) or \(\alpha <0\), respectively.

Next, let \(f:X\to {]-\infty ,+\infty ]}\) be proper. The Fréchet subdifferential of f at x is defined by

$$ \widehat{\partial }f(x) := \biggl\{ {u\in X}\Bigm| {\liminf _{z\to x} \frac{f(z)-f(x)- \langle {u},{z-x} \rangle }{ \Vert z-x \Vert } \geq 0} \biggr\} . $$
(73)

The proximity operator of f with parameter \(\gamma \in \mathbb{R}_{++}\) is the mapping \(\mathrm{Prox}_{\gamma f}\colon X\rightrightarrows X\) defined by

$$ \forall x\in X, \quad \mathrm{Prox}_{\gamma f}(x) := \operatorname*{argmin}_{z\in X} \biggl(f(z)+\frac{1}{2\gamma } \Vert z-x \Vert ^{2} \biggr). $$
(74)

We refer to [10] for a list of proximity operators of common convex functions. For an α-convex function, the relationship between its Fréchet subdifferential and its proximity operator is described in the following lemma.

Lemma 6.1

(proximity operators of α-convex functions)

Let \(f\colon X\to {]-\infty ,+\infty ]}\) be a proper, lower semicontinuous and α-convex function. Let \(\gamma \in \mathbb{R}_{++}\) be such that \(1+\gamma \alpha >0\). Then

  1. (i)

    ∂̂f is maximally α-monotone.

  2. (ii)

    \(\mathrm{Prox}_{\gamma f} =J_{\gamma \widehat{\partial }f}\) is single-valued and has full domain.

Proof

See [11, Lemma 5.2]. □

Now, we assume that \(f,g:X\to {]-\infty ,+\infty ]}\) are proper lower semicontinuous, and respectively α- and β-convex functions, and \(h:X\to \mathbb{R}\) is a differentiable convex function with Lipschitz continuous gradient. We will solve the minimization problem

$$ \min_{x\in X} f(x)+g(x)+h(x) $$
(75)

by employing the operator

$$ T_{f,g,h} :=\mathrm{Id}-\eta \mathrm{Prox}_{\gamma f} + \eta \mathrm{Prox}_{\delta g}\bigl((1-\lambda )\mathrm{Id}+ \lambda \mathrm{Prox}_{\gamma f}-\delta \nabla h \mathrm{Prox}_{\gamma f} \bigr) $$
(76)

with appropriately chosen parameters \(\gamma ,\delta ,\lambda ,\eta \in \mathbb{R}_{++}\).

Theorem 6.2

(minimizing the sum of three functions)

Let \(f,g\colon X\to {]-\infty ,+\infty ]}\) be proper lower semicontinuous functions, and let \(h\colon X\to \mathbb{R}\) be a differentiable convex function whose gradient is Lipschitz continuous with constant \(1/\sigma \). Suppose that f and g are α-convex and β-convex, respectively, and that either

  1. (a)

    \(\alpha +\beta =0\), \(1+2\gamma \alpha >0\), \(\delta =\frac{\gamma }{1+2\gamma \alpha }\), \(\eta ^{*} :=2+2\gamma \alpha -\frac{\gamma }{2\sigma }\); or

  2. (b)

    \(\alpha +\beta >0\), \(\eta ^{*} := \frac{4\gamma \delta (1+\gamma \alpha )(1+\delta \beta )-(\gamma +\delta )^{2}}{2\gamma \delta ^{2}(\alpha +\beta )}- \frac{\gamma }{2\sigma } >0\).

Set \(\lambda =1 +\frac{\delta }{\gamma }\) and \(S :=(1-\lambda )\mathrm{Id}+\lambda \mathrm{Prox}_{ \gamma f}-\delta \nabla h \mathrm{Prox}_{\gamma f}\). Let \((x_{n})_{n\in {\mathbb{N}}}\) be a sequence generated by \(T_{f,g,h}\) in (76). Then the following hold:

  1. (i)

    \(T_{f,g,h}\) is conically \(\frac{\eta }{\eta ^{*}}\)-averaged and has full domain.

  2. (ii)

    If \(\operatorname{zer}(\widehat{\partial } f+\widehat{\partial } g+ \nabla h)\neq \varnothing \) and \(\eta <\eta ^{*}\), then the rate of asymptotic regularity of \(T_{f,g,h}\) is \(o(1/\sqrt{n})\) and \((x_{n})_{n\in \mathbb{N}}\) converges weakly to a point \(x^{*}\in \operatorname{Fix}T_{f,g,h}\) with

    $$ \mathrm{Prox}_{\gamma f}x^{*} \in \operatorname{zer}( \widehat{\partial } f+\widehat{\partial } g+\nabla h)\subseteq \operatorname*{argmin}(f+g+h), $$
    (77)

    while \((\nabla h \mathrm{Prox}_{\gamma f}x_{n})_{n\in \mathbb{N}}\) converges strongly to \(\nabla h \mathrm{Prox}_{\gamma f}x^{*}\) and \(\nabla h(\operatorname{zer}(\widehat{\partial } f+\widehat{\partial } g+ \nabla h)) =\{\nabla h \mathrm{Prox}_{\gamma f}x^{*}\}\). Moreover, when (a) holds, \((\mathrm{Prox}_{\gamma f}x_{n})_{n\in \mathbb{N}}\) and \((\mathrm{Prox}_{\delta g}Sx_{n})_{n\in \mathbb{N}}\) converge weakly to \(\mathrm{Prox}_{\gamma f}x^{*} =\mathrm{Prox}_{\delta g}Sx^{*}\); when (b) holds, \((\mathrm{Prox}_{\gamma f}x_{n})_{n\in \mathbb{N}}\) and \((\mathrm{Prox}_{\delta g}Sx_{n})_{n\in \mathbb{N}}\) converge strongly to \(\mathrm{Prox}_{\gamma f}x^{*} =\mathrm{Prox}_{\delta g}Sx^{*}\) and \(\operatorname{zer}(\widehat{\partial } f+\widehat{\partial } g+ \nabla h) =\{\mathrm{Prox}_{\gamma f}x^{*}\}\).

Proof

As in the proofs of Theorems 4.1 and 4.5, we have that \(1+\gamma \alpha >0\) and \(1+\delta \beta >0\). By Lemma 6.1, ∂̂f and ∂̂g are maximally α-monotone and β-monotone, respectively, and \(\mathrm{Prox}_{\gamma f} =J_{\gamma \widehat{\partial }f}\) and \(\mathrm{Prox}_{\gamma g} =J_{\gamma \widehat{\partial }g}\). By [6, Theorem 18.15(i)&(v)], h is σ-cocoercive. In addition, from Proposition 2.1 and [11, Lemma 5.3], we obtain the relationship between the fixed points of \(T_{f,g,h}\) and the minimizers of (75)

$$ \mathrm{Prox}_{\gamma f}(\operatorname{Fix}T_{f,g,h})= \operatorname{zer}(\widehat{\partial } f+\widehat{\partial } g+\nabla h) \subseteq \operatorname*{argmin}(f+g+h). $$
(78)

The conclusion then follows by applying Theorems 4.1 and 4.5 to \(A =\widehat{\partial }f\), \(B =\widehat{\partial }g\), and \(C =\nabla h\). □

Remark 6.3

(minimizing the sum of two functions)

Analogous to Sect. 5, one can apply Theorem 6.2 with \(f=0\), \(g=0\), or \(h=0\) to obtain corresponding algorithms for minimizing the sum of two functions.

Availability of data and materials

Not applicable.

References

  1. Attouch, H., Peypouquet, J., Redont, P.: Backward-forward algorithms for structured monotone inclusions in Hilbert spaces. J. Math. Anal. Appl. 457, 1095–1117 (2018)

    Article  MathSciNet  Google Scholar 

  2. Bartz, S., Campoy, R., Phan, H.M.: Demiclosedness principles for generalized nonexpansive mappings. J. Optim. Theory Appl. 186(3), 759–778 (2020)

    Article  MathSciNet  Google Scholar 

  3. Bartz, S., Campoy, R., Phan, H.M.: An adaptive alternating directions method of multipliers (2021). arXiv:2103.07159. preprint

  4. Bartz, S., Dao, M.N., Phan, H.M.: Conical averagedness and convergence analysis of fixed point algorithms. J. Glob. Optim. (2021). https://doi.org/10.1007/s10898-021-01057-4

    Article  Google Scholar 

  5. Bauschke, H.H.: New demiclosedness principles for (firmly) nonexpansive operators. In: Bailey, D.H., Bauschke, H.H., Borwein, P., Garvan, F., Théra, M., Vanderwerff, J.D., Wolkowicz, H. (eds.) Computational and Analytical Mathematics, pp. 19–28. Springer, New York (2013)

    Chapter  Google Scholar 

  6. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, Cham (2017)

    Book  Google Scholar 

  7. Bauschke, H.H., Moursi, W.M., Wang, X.: Generalized monotone operators and their averaged resolvents. Math. Program. Ser. B, 189, 55–74 (2021)

    Article  MathSciNet  Google Scholar 

  8. Briceño-Arias, L.M.: Forward-Douglas–Rachford splitting and forward-partial inverse method for solving monotone inclusions. Optimization 64(5), 1239–1261 (2015)

    Article  MathSciNet  Google Scholar 

  9. Browder, F.E.: Semicontractive and semiaccretive nonlinear mappings in Banach spaces. Bull. Am. Math. Soc. 74, 660–665 (1968)

    Article  MathSciNet  Google Scholar 

  10. Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Bauschke, H.H., Burachik, R.S., Combettes, P.L., Elser, V., Luke, D.R., Wolkowicz, H. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer, New York (2011)

    Chapter  Google Scholar 

  11. Dao, M.N., Phan, H.M.: Adaptive Douglas–Rachford splitting algorithm for the sum of two operators. SIAM J. Optim. 29(4), 2697–2724 (2019)

    Article  MathSciNet  Google Scholar 

  12. Dao, M.N., Phan, H.M.: Computing the resolvent of the sum of operators with application to best approximation problems. Optim. Lett. 14(5), 1193–1205 (2020)

    Article  MathSciNet  Google Scholar 

  13. Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications. Set-Valued Var. Anal. 25(4), 829–858 (2017)

    Article  MathSciNet  Google Scholar 

  14. Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)

    Article  MathSciNet  Google Scholar 

  15. Lions, P.-L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)

    Article  MathSciNet  Google Scholar 

  16. Raguet, H.: A note on the forward-Douglas–Rachford splitting for monotone inclusion and convex optimization. Optim. Lett. 13, 717–740 (2019)

    Article  MathSciNet  Google Scholar 

  17. Raguet, H., Fadili, J., Peyré, G.: A generalized forward-backward splitting. SIAM J. Imaging Sci. 6(3), 1199–1226 (2013)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank two anonymous referees for their constructive comments.

Funding

MND was partially supported by the Federation University Australia under Grant RGS21-8. HMP was partially supported by Autodesk, Inc. via a gift made to the Department of Mathematical Sciences, UMass Lowell.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed equally in this work. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Hung M. Phan.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dao, M.N., Phan, H.M. An adaptive splitting algorithm for the sum of two generalized monotone operators and one cocoercive operator. Fixed Point Theory Algorithms Sci Eng 2021, 16 (2021). https://doi.org/10.1186/s13663-021-00701-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13663-021-00701-8

MSC

Keywords