Let us begin by explicitly stating our algorithm for solving Problem (1.1) discussed in Section 1.
Algorithm 2.1
 Step 0.:

Take \(\delta, \sigma\in(0,1)\) with \(\delta\leq\sigma\). Choose \(x_{0} \in H\) arbitrarily and set \(d_{0} := (x_{0}  T(x_{0}))\) and \(n:= 0\).
 Step 1.:

Compute \(\alpha_{n} \in(0,1]\) satisfying
$$\begin{aligned}& \bigl\Vert x_{n} ( \alpha_{n} )  T \bigl( x_{n} ( \alpha _{n} ) \bigr) \bigr\Vert ^{2}  \bigl\Vert x_{n}  T ( x_{n} ) \bigr\Vert ^{2} \leq\delta \alpha_{n} \bigl\langle x_{n}  T (x_{n} ), d_{n} \bigr\rangle , \end{aligned}$$
(2.1)
$$\begin{aligned}& \bigl\langle x_{n} ( \alpha_{n} )  T \bigl(x_{n} ( \alpha_{n} ) \bigr), d_{n} \bigr\rangle \geq\sigma \bigl\langle x_{n}  T (x_{n} ), d_{n} \bigr\rangle , \end{aligned}$$
(2.2)
where \(x_{n}(\alpha_{n}) := x_{n} + \alpha_{n} d_{n}\). Compute \(x_{n+1} \in H\) by
$$ x_{n+1} := x_{n} + \alpha_{n} d_{n}. $$
(2.3)
 Step 2.:

If \(\ x_{n+1}  T(x_{n+1}) \= 0\), stop. Otherwise, go to Step 3.
 Step 3.:

Compute \(\beta_{n} \in\mathbb{R}\) by using each of the following formulas:
$$\begin{aligned}& \beta_{n}^{\mathrm{SD}} := 0, \\& \beta_{n}^{\mathrm{HS}+} := \max \biggl\{ \frac{ \langle x_{n+1}  T (x_{n+1} ), y_{n} \rangle}{ \langle d_{n}, y_{n} \rangle}, 0 \biggr\} , \qquad \beta_{n}^{\mathrm{FR}} := \frac{\Vert x_{n+1}  T (x_{n+1} ) \Vert ^{2}}{\Vert x_{n}  T (x_{n} ) \Vert ^{2}}, \\& \beta_{n}^{\mathrm{PRP}+} := \max \biggl\{ \frac{ \langle x_{n+1}  T (x_{n+1} ), y_{n} \rangle}{ \Vert x_{n}  T (x_{n} ) \Vert ^{2}}, 0 \biggr\} ,\qquad \beta_{n}^{\mathrm{DY}} := \frac{\Vert x_{n+1}  T (x_{n+1} ) \Vert ^{2}}{ \langle d_{n}, y_{n} \rangle}, \end{aligned}$$
(2.4)
where \(y_{n} := (x_{n+1}  T(x_{n+1}))  (x_{n}  T(x_{n}))\). Generate \(d_{n+1} \in H\) by
$$ d_{n+1} :=  \bigl( x_{n+1}  T (x_{n+1} ) \bigr) + \beta_{n} d_{n}. $$
 Step 4.:

Put \(n := n+1\) and go to Step 1.
We need to use appropriate line search algorithms to compute \(\alpha _{n}\) (\(n\in\mathbb{N}\)) satisfying (2.1) and (2.2). In Section 3, we use a useful one (Algorithm 3.1) [21], Algorithm 4.6, that can obtain the step sizes satisfying (2.1) and (2.2) whenever the line search algorithm terminates [21], Theorem 4.7. Although the efficiency of the line search algorithm depends on the parameters δ and σ, thanks to the reference [21], Section 6.1, we can set appropriate δ and σ before executing it [21], Algorithm 4.6, and Algorithm 2.1. See Section 3 for the numerical performance of the line search algorithm [21], Algorithm 4.6, and Algorithm 2.1.
It can be seen that Algorithm 2.1 is well defined when \(\beta _{n}\) is defined by \(\beta_{n}^{\mathrm{SD}}\), \(\beta_{n}^{\mathrm{FR}}\), or \(\beta_{n}^{\mathrm{PRP}+}\). The discussion in Section 2.2 shows that Algorithm 2.1 with \(\beta_{n} = \beta_{n}^{\mathrm{DY}}\) is well defined (Lemma 2.3(i)). Moreover, it is guaranteed that under certain assumptions, Algorithm 2.1 with \(\beta_{n} = \beta_{n}^{\mathrm {HS}+}\) is well defined (Theorem 2.5).
Algorithm 2.1 with \(\beta_{n} = \beta _{n}^{\mathrm{SD}}\)
This subsection considers Algorithm 2.1 with \(\beta _{n}^{\mathrm{SD}}\) (\(n\in\mathbb{N}\)), which is based on the steepest descent (SD) direction (see (1.17)), i.e.,
$$ x_{n+1} := x_{n} + \alpha_{n} \bigl( T (x_{n} )  x_{n} \bigr) \quad ( n\in\mathbb{N} ). $$
(2.5)
Theorems 4 and 8 in [17] indicate that, if \((\alpha _{n})_{n\in\mathbb{N}}\) satisfies the Armijotype condition (1.5), Algorithm (2.5) converges to a fixed point of T. The following theorem says that Algorithm (2.5), with \((\alpha _{n})_{n\in\mathbb{N}}\) satisfying the Wolfetype conditions (2.1) and (2.2), converges to a fixed point of T.
Theorem 2.1
Suppose that
\((x_{n})_{n\in\mathbb{N}}\)
is the sequence generated by Algorithm
2.1
with
\(\beta_{n} = \beta_{n}^{\mathrm{SD}}\) (\(n\in\mathbb{N}\)). Then
\((x_{n})_{n\in\mathbb{N}}\)
either terminates at a fixed point of
T
or
$$ \lim_{n\to\infty} \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert = 0. $$
In the latter situation, \((x_{n})_{n\in\mathbb{N}}\)
weakly converges to a fixed point of
T.
Proof of Theorem 2.1
If \(m \in\mathbb{N}\) exists such that \(\ x_{m}  T(x_{m}) \ = 0\), Theorem 2.1 holds. Accordingly, it can be assumed that, for all \(n\in\mathbb{N}\), \(\ x_{n}  T (x_{n}) \ \neq0\) holds.
First, the following lemma can be proven by referring to [18, 19, 32].
Lemma 2.1
Let
\((x_{n})_{n\in\mathbb{N}}\)
and
\((d_{n})_{n\in\mathbb{N}}\)
be the sequences generated by Algorithm
2.1. Assume that
\(\langle x_{n}  T(x_{n}), d_{n} \rangle< 0\)
for all
\(n\in \mathbb{N}\). Then
$$ \sum_{n=0}^{\infty} \biggl( \frac{ \langle x_{n}  T(x_{n}), d_{n} \rangle}{ \Vert d_{n} \Vert } \biggr)^{2} < \infty. $$
Proof
The CauchySchwarz inequality and the triangle inequality ensure that, for all \(n\in\mathbb{N}\), \(\langle d_{n}, ( x_{n+1}  T ( x_{n+1}) )  (x_{n}  T (x_{n} ) ) \rangle \leq \ d_{n} \ \ ( x_{n+1}  T ( x_{n+1}) )  (x_{n}  T (x_{n} ) ) \ \leq \ d_{n} \ ( \ T ( x_{n} )  T (x_{n+1} ) \ + \ x_{n+1}  x_{n} \ )\), which, together with the nonexpansivity of T and (2.3), implies that, for all \(n\in\mathbb{N}\),
$$ \bigl\langle d_{n}, \bigl( x_{n+1}  T ( x_{n+1} ) \bigr)  \bigl(x_{n}  T (x_{n} ) \bigr) \bigr\rangle \leq2 \alpha_{n} \Vert d_{n} \Vert ^{2}. $$
Moreover, (2.2) means that, for all \(n\in\mathbb{N}\),
$$ \bigl\langle d_{n}, \bigl( x_{n+1}  T ( x_{n+1} ) \bigr)  \bigl(x_{n}  T (x_{n} ) \bigr) \bigr\rangle \geq ( \sigma1 ) \bigl\langle d_{n}, x_{n}  T ( x_{n} ) \bigr\rangle . $$
Accordingly, for all \(n\in\mathbb{N}\),
$$ (\sigma1 ) \bigl\langle d_{n}, x_{n}  T ( x_{n} ) \bigr\rangle \leq2 \alpha_{n} \Vert d_{n} \Vert ^{2}. $$
Since \(\d_{n}\ \neq0\) (\(n\in\mathbb{N}\)) holds from \(\langle x_{n}  T(x_{n}), d_{n} \rangle< 0\) (\(n\in\mathbb{N}\)), we find that, for all \(n\in\mathbb{N}\),
$$ \frac{ (\sigma1 ) \langle d_{n}, x_{n}  T ( x_{n} ) \rangle}{2 \Vert d_{n} \Vert ^{2}} \leq\alpha_{n}. $$
(2.6)
Condition (2.1) means that, for all \(n\in\mathbb{N}\), \(\ x_{n+1}  T(x_{n+1} )\^{2}  \x_{n}  T (x_{n}) \^{2} \leq\delta\alpha_{n} \langle x_{n}  T (x_{n} ), d_{n} \rangle\), which, together with \(\langle x_{n}  T(x_{n}), d_{n} \rangle< 0\) (\(n\in \mathbb{N}\)), implies that, for all \(n\in\mathbb{N}\),
$$ \alpha_{n} \leq\frac{\Vert x_{n}  T (x_{n} ) \Vert ^{2}  \Vert x_{n+1}  T (x_{n+1} ) \Vert ^{2}}{\delta \langle x_{n}  T (x_{n} ), d_{n} \rangle}. $$
(2.7)
From (2.6) and (2.7), for all \(n\in\mathbb{N}\),
$$ \frac{ (\sigma1 ) \langle d_{n}, x_{n}  T ( x_{n} ) \rangle}{2 \Vert d_{n} \Vert ^{2}} \leq \frac{\Vert x_{n}  T (x_{n} ) \Vert ^{2}  \Vert x_{n+1}  T (x_{n+1} ) \Vert ^{2}}{\delta \langle x_{n}  T (x_{n} ), d_{n} \rangle}, $$
which implies that, for all \(n\in\mathbb{N}\),
$$ \frac{\delta (1  \sigma ) \langle d_{n}, x_{n}  T ( x_{n} ) \rangle^{2}}{2 \Vert d_{n} \Vert ^{2}} \leq \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert ^{2}  \bigl\Vert x_{n+1}  T (x_{n+1} ) \bigr\Vert ^{2}. $$
Summing up this inequality from \(n=0\) to \(n=N \in\mathbb{N}\) guarantees that, for all \(N\in\mathbb{N}\),
$$\begin{aligned} \frac{\delta (1  \sigma )}{2} \sum_{n=0}^{N} \frac{ \langle d_{n}, x_{n}  T ( x_{n} ) \rangle^{2}}{ \Vert d_{n} \Vert ^{2}} &\leq\bigl\Vert x_{0}  T (x_{0} ) \bigr\Vert ^{2}  \bigl\Vert x_{N+1}  T (x_{N+1} ) \bigr\Vert ^{2} \\ &\leq\bigl\Vert x_{0}  T (x_{0} ) \bigr\Vert ^{2} < \infty. \end{aligned}$$
Therefore, the conclusion in Lemma 2.1 is satisfied. □
Lemma 2.1 leads to the following.
Lemma 2.2
Suppose that the assumptions in Theorem
2.1
are satisfied. Then:

(i)
\(\lim_{n\to\infty} \ x_{n}  T(x_{n}) \= 0\).

(ii)
\((\ x_{n}  x \)_{n\in\mathbb{N}}\)
is monotone decreasing for all
\(x\in\operatorname{Fix}(T)\).

(iii)
\((x_{n})_{n\in\mathbb{N}}\)
weakly converges to a point in
\(\operatorname{Fix}(T)\).
Items (i) and (iii) in Lemma 2.2 indicate that Theorem 2.1 holds under the assumption that \(\ x_{n}  T (x_{n}) \ \neq0\) (\(n\in\mathbb{N}\)).
Proof
(i) In the case where \(\beta_{n} := \beta_{n}^{\mathrm{SD}} = 0\) (\(n\in \mathbb{N}\)), \(d_{n} =  (x_{n}  T(x_{n}))\) holds for all \(n\in\mathbb{N}\). Hence, \(\langle x_{n}  T(x_{n}), d_{n} \rangle=  \x_{n}  T(x_{n})\^{2} < 0\) (\(n\in\mathbb{N}\)). Lemma 2.1 thus guarantees that \(\sum_{n=0}^{\infty}\ x_{n}  T ( x_{n} ) \^{2} < \infty\), which implies \(\lim_{n\to\infty} \ x_{n}  T(x_{n}) \= 0\).
(ii) The triangle inequality and the nonexpansivity of T ensure that, for all \(n\in\mathbb{N}\) and for all \(x\in\operatorname{Fix}(T)\), \(\ x_{n+1}  x \ = \ x_{n} + \alpha_{n} ( T (x_{n})  x_{n} )  x \ \leq(1\alpha_{n} ) \ x_{n}  x \ + \alpha_{n} \T (x_{n})  T (x)\ \leq\ x_{n}  x \\).
(iii) Lemma 2.2(ii) means that \(\lim_{n\to\infty} \x_{n}  x\\) exists for all \(x\in\operatorname{Fix}(T)\). Accordingly, \((x_{n})_{n\in \mathbb{N}}\) is bounded. Hence, there is a subsequence \((x_{n_{k}})_{k\in\mathbb{N}}\) of \((x_{n})_{n\in\mathbb{N}}\) such that \((x_{n_{k}})_{k\in\mathbb{N}}\) weakly converges to a point \(x^{*} \in H\). Here, let us assume that \(x^{*} \notin\operatorname{Fix}(T)\). Then Opial’s condition [34], Lemma 1, Lemma 2.2(i), and the nonexpansivity of T guarantee that
$$\begin{aligned} \liminf_{k\to\infty} \bigl\Vert x_{n_{k}}  x^{*} \bigr\Vert &< \liminf_{k\to\infty} \bigl\Vert x_{n_{k}}  T \bigl(x^{*} \bigr) \bigr\Vert \\ &= \liminf_{k\to\infty} \bigl\Vert x_{n_{k}}  T ( x_{n_{k}} ) + T ( x_{n_{k}} )  T \bigl(x^{*} \bigr) \bigr\Vert \\ &= \liminf_{k\to\infty} \bigl\Vert T ( x_{n_{k}} )  T \bigl(x^{*} \bigr) \bigr\Vert \\ &\leq\liminf_{k\to\infty} \bigl\Vert x_{n_{k}}  x^{*} \bigr\Vert , \end{aligned}$$
which is a contradiction. Hence, \(x^{*} \in\operatorname{Fix}(T)\). Let us take another subsequence \((x_{n_{i}})_{i\in\mathbb{N}}\) (\(\subset(x_{n})_{n\in\mathbb{N}}\)) which weakly converges to \(x_{*} \in H\). A similar discussion to the one for obtaining \(x^{*} \in\operatorname{Fix}(T)\) ensures that \(x_{*} \in\operatorname{Fix}(T)\). Assume that \(x^{*} \neq x_{*}\). The existence of \(\lim_{n\to\infty} \ x_{n}  x \\) (\(x\in\operatorname{Fix}(T)\)) and Opial’s condition [34], Lemma 1, imply that
$$\begin{aligned} \begin{aligned} \lim_{n\to\infty} \bigl\Vert x_{n}  x^{*} \bigr\Vert &= \lim_{k\to\infty} \bigl\Vert x_{n_{k}}  x^{*} \bigr\Vert < \lim_{k\to\infty} \Vert x_{n_{k}}  x_{*} \Vert \\ &= \lim_{n\to\infty} \Vert x_{n}  x_{*} \Vert = \lim_{i\to\infty} \Vert x_{n_{i}}  x_{*} \Vert \\ &< \lim_{i\to\infty} \bigl\Vert x_{n_{i}}  x^{*} \bigr\Vert = \lim_{n\to\infty} \bigl\Vert x_{n}  x^{*} \bigr\Vert , \end{aligned} \end{aligned}$$
which is a contradiction. Therefore, \(x^{*} = x_{*}\). Since any subsequence of \((x_{n})_{n\in\mathbb{N}}\) weakly converges to the same fixed point of T, it is guaranteed that the whole \((x_{n})_{n\in\mathbb{N}}\) weakly converges to a fixed point of T. This completes the proof. □
Algorithm 2.1 with \(\beta_{n} = \beta _{n}^{\mathrm{DY}}\)
The following is a convergence analysis of Algorithm 2.1 with \(\beta_{n} = \beta_{n}^{\mathrm{DY}}\).
Theorem 2.2
Suppose that
\((x_{n})_{n\in\mathbb{N}}\)
is the sequence generated by Algorithm
2.1
with
\(\beta_{n} = \beta_{n}^{\mathrm{DY}}\) (\(n\in\mathbb{N}\)). Then
\((x_{n})_{n\in\mathbb{N}}\)
either terminates at a fixed point of
T
or
$$ \lim_{n\to\infty} \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert = 0. $$
Proof of Theorem 2.2
Since the existence of \(m\in\mathbb{N}\) such that \(\ x_{m}  T(x_{m}) \ = 0\) implies that Theorem 2.2 holds, it can be assumed that, for all \(n\in\mathbb{N}\), \(\ x_{n}  T (x_{n}) \ \neq0\) holds. Theorem 2.2 can be proven by using the ideas presented in the proof of [28], Theorem 3.3. The proof of Theorem 2.2 is divided into three steps.
Lemma 2.3
Suppose that the assumptions in Theorem
2.2
are satisfied. Then:

(i)
\(\langle x_{n}  T(x_{n}), d_{n} \rangle< 0\) (\(n\in\mathbb{N}\)).

(ii)
\(\liminf_{n\to\infty} \ x_{n}  T(x_{n}) \= 0\).

(iii)
\(\lim_{n\to\infty} \ x_{n}  T(x_{n}) \= 0\).
Proof
(i) From \(d_{0} :=  (x_{0}  T(x_{0}))\), \(\langle x_{0}  T(x_{0}), d_{0} \rangle=  \x_{0}  T(x_{0}) \^{2} < 0\). Suppose that \(\langle x_{n}  T(x_{n}), d_{n} \rangle< 0\) holds for some \(n\in\mathbb{N}\). Accordingly, the definition of \(y_{n}:= (x_{n+1}  T(x_{n+1}))  (x_{n}  T(x_{n}))\) and (2.2) ensure that
$$\begin{aligned} \langle d_{n}, y_{n} \rangle &= \bigl\langle d_{n}, x_{n+1}  T (x_{n+1} ) \bigr\rangle  \bigl\langle d_{n}, x_{n}  T (x_{n} ) \bigr\rangle \\ &\geq ( \sigma 1 ) \bigl\langle d_{n}, x_{n}  T (x_{n} ) \bigr\rangle > 0, \end{aligned}$$
which implies that
$$ \beta_{n}^{\mathrm{DY}} := \frac{\Vert x_{n+1}  T (x_{n+1} ) \Vert ^{2}}{ \langle d_{n}, y_{n} \rangle} > 0. $$
From the definition of \(d_{n+1} :=  (x_{n+1}  T(x_{n+1})) + \beta _{n}^{\mathrm{DY}} d_{n}\), we have
$$\begin{aligned} \begin{aligned} \bigl\langle d_{n+1}, x_{n+1}  T (x_{n+1} ) \bigr\rangle &=  \bigl\Vert x_{n+1}  T (x_{n+1} ) \bigr\Vert ^{2} + \beta _{n}^{\mathrm{DY}} \bigl\langle d_{n}, x_{n+1}  T (x_{n+1} ) \bigr\rangle \\ &= \bigl\Vert x_{n+1}  T (x_{n+1} ) \bigr\Vert ^{2} \biggl\{ 1 + \frac{ \langle d_{n}, x_{n+1}  T (x_{n+1} ) \rangle}{ \langle d_{n}, y_{n} \rangle } \biggr\} \\ &= \bigl\Vert x_{n+1}  T (x_{n+1} ) \bigr\Vert ^{2} \frac{ \langle d_{n}, ( x_{n+1}  T (x_{n+1} ) )  y_{n} \rangle}{ \langle d_{n}, y_{n} \rangle}, \end{aligned} \end{aligned}$$
which, together with the definitions of \(y_{n}\) and \(\beta_{n}^{\mathrm {DY}}\) (>0), implies that
$$\begin{aligned} \bigl\langle d_{n+1}, x_{n+1}  T (x_{n+1} ) \bigr\rangle &= \bigl\Vert x_{n+1}  T (x_{n+1} ) \bigr\Vert ^{2} \frac{ \langle d_{n}, x_{n}  T (x_{n} ) \rangle }{ \langle d_{n}, y_{n} \rangle} \\ &= \beta_{n}^{\mathrm{DY}} \bigl\langle d_{n}, x_{n}  T (x_{n} ) \bigr\rangle < 0. \end{aligned}$$
(2.8)
Induction shows that \(\langle x_{n}  T(x_{n}), d_{n} \rangle< 0\) for all \(n\in\mathbb{N}\). This implies \(\beta_{n}^{\mathrm{DY}} > 0\) (\(n\in\mathbb{N}\)); i.e., Algorithm 2.1 with \(\beta_{n} = \beta_{n}^{\mathrm{DY}}\) is well defined.
(ii) Assume that \(\liminf_{n\to\infty} \ x_{n}  T(x_{n}) \ > 0\). Then there exist \(n_{0} \in\mathbb{N}\) and \(\varepsilon> 0\) such that \(\x_{n}  T(x_{n})\ \geq\varepsilon\) for all \(n \geq n_{0}\). Since we have assumed that \(\x_{n}  T(x_{n})\ \neq0\) (\(n\in\mathbb {N}\)), we may further assume that \(\x_{n}  T(x_{n})\ \geq\varepsilon\) for all \(n \in\mathbb{N}\). From the definition of \(d_{n+1} :=  (x_{n+1}  T(x_{n+1})) + \beta _{n}^{\mathrm{DY}} d_{n}\) (\(n\in\mathbb{N}\)), we have, for all \(n\in\mathbb{N}\),
$$\begin{aligned} \beta_{n}^{\mathrm{DY}^{2}} \Vert d_{n} \Vert ^{2} &= \bigl\Vert d_{n+1} + \bigl(x_{n+1}  T (x_{n+1} ) \bigr) \bigr\Vert ^{2} \\ &= \Vert d_{n+1} \Vert ^{2} + 2 \bigl\langle d_{n+1}, x_{n+1}  T (x_{n+1} ) \bigr\rangle + \bigl\Vert x_{n+1}  T (x_{n+1} ) \bigr\Vert ^{2}. \end{aligned}$$
Lemma 2.3(i) and (2.8) mean that, for all \(n\in \mathbb{N}\),
$$ \beta_{n}^{\mathrm{DY}} = \frac{ \langle d_{n+1}, x_{n+1}  T (x_{n+1} ) \rangle}{ \langle d_{n}, x_{n}  T (x_{n} ) \rangle}. $$
Hence, for all \(n\in\mathbb{N}\),
$$\begin{aligned}& \frac{\Vert d_{n+1} \Vert ^{2}}{ \langle d_{n+1}, x_{n+1}  T (x_{n+1} ) \rangle^{2}} \\& \quad =  \frac{\Vert x_{n+1}  T (x_{n+1} ) \Vert ^{2}}{ \langle d_{n+1}, x_{n+1}  T (x_{n+1} ) \rangle^{2}}  \frac{2}{ \langle d_{n+1}, x_{n+1}  T (x_{n+1} ) \rangle} + \frac{\Vert d_{n} \Vert ^{2}}{ \langle d_{n}, x_{n}  T (x_{n} ) \rangle^{2}} \\& \quad = \frac{\Vert d_{n} \Vert ^{2}}{ \langle d_{n}, x_{n}  T (x_{n} ) \rangle^{2}} + \frac{1}{\Vert x_{n+1}  T (x_{n+1} ) \Vert ^{2}} \\& \qquad {}  \biggl\{ \frac{1}{\Vert x_{n+1}  T (x_{n+1} ) \Vert } + \frac{\Vert x_{n+1}  T (x_{n+1} ) \Vert }{ \langle d_{n+1}, x_{n+1}  T (x_{n+1} ) \rangle} \biggr\} ^{2} \\& \quad \leq\frac{\Vert d_{n} \Vert ^{2}}{ \langle d_{n}, x_{n}  T (x_{n} ) \rangle^{2}} + \frac{1}{\Vert x_{n+1}  T (x_{n+1} ) \Vert ^{2}}. \end{aligned}$$
Summing up this inequality from \(n=0\) to \(n=N\in\mathbb{N}\) yields, for all \(N\in\mathbb{N}\),
$$ \frac{\Vert d_{N+1} \Vert ^{2}}{ \langle d_{N+1}, x_{N+1}  T (x_{N+1} ) \rangle^{2}} \leq\frac{\Vert d_{0} \Vert ^{2}}{ \langle d_{0}, x_{0}  T (x_{0} ) \rangle^{2}} + \sum_{k=1}^{N+1} \frac{1}{\Vert x_{k}  T (x_{k} ) \Vert ^{2}}, $$
which, which together with \(\x_{n}  T(x_{n})\ \geq\varepsilon\) (\(n \in \mathbb{N}\)) and \(d_{0} := (x_{0} T(x_{0}))\), implies that, for all \(N\in\mathbb{N}\),
$$ \frac{\Vert d_{N+1} \Vert ^{2}}{ \langle d_{N+1}, x_{N+1}  T (x_{N+1} ) \rangle^{2}} \leq\sum_{k=0}^{N+1} \frac{1}{\Vert x_{k} T (x_{k} ) \Vert ^{2}} \leq\frac{N+2}{\varepsilon^{2}}. $$
Since Lemma 2.3(i) implies \(\ d_{n} \ \neq0\) (\(n\in\mathbb{N}\)), we have, for all \(N \in\mathbb{N}\),
$$ \frac{ \langle d_{N+1}, x_{N+1}  T (x_{N+1} ) \rangle^{2}}{\Vert d_{N+1} \Vert ^{2}} \geq\frac{\varepsilon^{2}}{N+2}. $$
Therefore, Lemma 2.1 guarantees that
$$ \infty> \sum_{k=1}^{\infty}\biggl( \frac{ \langle d_{k}, x_{k}  T (x_{k} ) \rangle}{ \Vert d_{k} \Vert } \biggr)^{2} \geq\sum_{k=1}^{\infty}\frac{\varepsilon^{2}}{k+1} = \infty. $$
This is a contradiction. Hence, \(\liminf_{n\to\infty} \x_{n}  T(x_{n})\ =0\).
(iii) Condition (2.1) and Lemma 2.3(i) lead to that, for all \(n\in\mathbb{N}\),
$$ \bigl\Vert x_{n+1}  T ( x_{n+1} ) \bigr\Vert ^{2}  \bigl\Vert x_{n}  T ( x_{n} ) \bigr\Vert ^{2} \leq\delta\alpha_{n} \bigl\langle x_{n}  T (x_{n} ), d_{n} \bigr\rangle < 0. $$
Accordingly, \((\ x_{n}  T(x_{n}) \)_{n\in\mathbb{N}}\) is monotone decreasing; i.e., there exists \(\lim_{n\to\infty} \ x_{n}  T(x_{n}) \\). Lemma 2.3(ii) thus ensures that \(\lim_{n\to\infty} \x_{n}  T(x_{n})\ = 0\). This completes the proof. □
Algorithm 2.1 with \(\beta_{n} = \beta _{n}^{\mathrm{FR}}\)
To establish the convergence of Algorithm 2.1 when \(\beta_{n} = \beta_{n}^{\mathrm{FR}}\), we assume that the step sizes \(\alpha_{n}\) satisfy the strong Wolfetype conditions, which are (2.1) and the following strengthened version of (2.2): for \(\sigma\leq1/2\),
$$ \bigl\vert \bigl\langle x_{n} (\alpha_{n} )  T \bigl(x_{n} (\alpha_{n} ) \bigr), d_{n} \bigr\rangle \bigr\vert \leq \sigma \bigl\langle x_{n}  T (x_{n} ), d_{n} \bigr\rangle . $$
(2.9)
See [30] on the global convergence of the FR method for unconstrained optimization under the strong Wolfe conditions.
The following is a convergence analysis of Algorithm 2.1 with \(\beta_{n} = \beta_{n}^{\mathrm{FR}}\).
Theorem 2.3
Suppose that
\((x_{n})_{n\in\mathbb{N}}\)
is the sequence generated by Algorithm
2.1
with
\(\beta_{n} = \beta_{n}^{\mathrm{FR}}\) (\(n\in\mathbb {N}\)), where
\((\alpha_{n})_{n\in\mathbb{N}}\)
satisfies (2.1) and (2.9). Then
\((x_{n})_{n\in\mathbb{N}}\)
either terminates at a fixed point of
T
or
$$ \lim_{n\to\infty} \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert = 0. $$
Proof of Theorem 2.3
It can be assumed that, for all \(n\in\mathbb{N}\), \(\ x_{n}  T (x_{n}) \ \neq0\) holds. Theorem 2.3 can be proven by using the ideas in the proof of [30], Theorem 2.
Lemma 2.4
Suppose that the assumptions in Theorem
2.3
are satisfied. Then:

(i)
\(\langle x_{n}  T(x_{n}), d_{n} \rangle< 0\) (\(n\in\mathbb{N}\)).

(ii)
\(\liminf_{n\to\infty} \ x_{n}  T(x_{n}) \= 0\).

(iii)
\(\lim_{n\to\infty} \ x_{n}  T(x_{n}) \= 0\).
Proof
(i) Let us show that, for all \(n\in\mathbb{N}\),
$$  \sum_{j=0}^{n} \sigma^{j} \leq\frac{ \langle x_{n}  T (x_{n} ), d_{n} \rangle}{ \Vert x_{n}  T (x_{n} ) \Vert ^{2}} \leq2 + \sum _{j=0}^{n} \sigma^{j}. $$
(2.10)
From \(d_{0} :=  (x_{0}  T(x_{0}))\), (2.10) holds for \(n:= 0\) and \(\langle x_{0}  T(x_{0}), d_{0} \rangle< 0\). Suppose that (2.10) holds for some \(n\in\mathbb{N}\). Accordingly, from \(\sum_{j=0}^{n} \sigma^{j} < \sum_{j=0}^{\infty}\sigma ^{j} = 1/(1\sigma)\) and \(\sigma\in(0,1/2]\), we have
$$ \frac{ \langle x_{n}  T (x_{n} ), d_{n} \rangle }{\Vert x_{n}  T (x_{n} ) \Vert ^{2}} < 2 + \sum_{j=0}^{\infty}\sigma^{j} = \frac{ ( 1  2 \sigma )}{1\sigma} \leq0, $$
which implies that \(\langle x_{n}  T (x_{n} ), d_{n} \rangle< 0\). The definitions of \(d_{n+1}\) and \(\beta_{n}^{\mathrm{FR}}\) enable us to deduce that
$$\begin{aligned} \frac{ \langle x_{n+1}  T (x_{n+1} ), d_{n+1} \rangle}{ \Vert x_{n+1}  T ( x_{n+1} ) \Vert ^{2}} &= \frac{ \langle x_{n+1}  T (x_{n+1} ),  ( x_{n+1}  T (x_{n+1} ) ) + \beta_{n}^{\mathrm{FR}} d_{n} \rangle}{ \Vert x_{n+1}  T ( x_{n+1} ) \Vert ^{2}} \\ &= 1 + \frac{\Vert x_{n+1}  T ( x_{n+1} ) \Vert ^{2}}{\Vert x_{n}  T ( x_{n} ) \Vert ^{2}} \frac{ \langle x_{n+1}  T (x_{n+1} ), d_{n} \rangle}{ \Vert x_{n+1}  T ( x_{n+1} ) \Vert ^{2}} \\ &= 1 + \frac{ \langle x_{n+1}  T (x_{n+1} ), d_{n} \rangle}{ \Vert x_{n}  T ( x_{n} ) \Vert ^{2}}. \end{aligned}$$
Since (2.9) satisfies \(\sigma\langle x_{n}  T(x_{n}),d_{n} \rangle\leq\langle x_{n+1}  T(x_{n+1}),d_{n} \rangle\leq \sigma\langle x_{n}  T(x_{n}),d_{n} \rangle\) and (2.10) holds for some n, it is found that
$$\begin{aligned} 1 + \frac{ \langle x_{n+1}  T (x_{n+1} ), d_{n} \rangle}{ \Vert x_{n}  T ( x_{n} ) \Vert ^{2}} &\geq1 + \sigma\frac{ \langle x_{n}  T (x_{n} ), d_{n} \rangle}{ \Vert x_{n}  T ( x_{n} ) \Vert ^{2}} \\ &\geq1  \sigma\sum_{j=0}^{n} \sigma^{j} =  \sum_{j=0}^{n+1} \sigma^{j} \end{aligned}$$
and
$$\begin{aligned} 1 + \frac{ \langle x_{n+1}  T (x_{n+1} ), d_{n} \rangle}{ \Vert x_{n}  T ( x_{n} ) \Vert ^{2}} &\leq1  \sigma\frac{ \langle x_{n}  T (x_{n} ), d_{n} \rangle}{ \Vert x_{n}  T ( x_{n} ) \Vert ^{2}} \\ &\leq1 + \sigma\sum_{j=0}^{n} \sigma^{j} = 2 + \sum_{j=0}^{n+1} \sigma^{j}. \end{aligned}$$
Hence,
$$  \sum_{j=0}^{n+1} \sigma^{j} \leq \frac{ \langle x_{n+1}  T (x_{n+1} ), d_{n+1} \rangle}{ \Vert x_{n+1}  T (x_{n+1} ) \Vert ^{2}} \leq2 + \sum_{j=0}^{n+1} \sigma^{j}. $$
A discussion similar to the one for obtaining \(\langle x_{n} T(x_{n}), d_{n} \rangle< 0\) guarantees that \(\langle x_{n+1}  T(x_{n+1}), d_{n+1} \rangle< 0\) holds. Induction thus shows that (2.10) and \(\langle x_{n} T(x_{n}), d_{n} \rangle< 0\) hold for all \(n\in\mathbb{N}\).
(ii) Assume that \(\liminf_{n\to\infty} \ x_{n}  T(x_{n}) \ > 0\). A discussion similar to the one in the proof of Lemma 2.3(ii) ensures the existence of \(\varepsilon> 0\) such that \(\x_{n}  T(x_{n})\ \geq\varepsilon\) for all \(n \in\mathbb{N}\). From (2.9) and (2.10), we have, for all \(n\in \mathbb{N}\),
$$ \bigl\vert \bigl\langle x_{n+1}  T ( x_{n+1} ), d_{n} \bigr\rangle \bigr\vert <  \sigma \bigl\langle x_{n}  T ( x_{n} ), d_{n} \bigr\rangle \leq\sum _{j=1}^{n+1} \sigma^{j} \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert ^{2}, $$
which, together with \(\sum_{j=1}^{n+1} \sigma^{j} < \sum_{j=1}^{\infty} \sigma^{j} = \sigma/(1  \sigma)\) and \(\beta_{n}^{\mathrm{FR}} := \ x_{n+1}  T ( x_{n+1} ) \^{2}/\ x_{n}  T(x_{n}) \^{2}\) (\(n\in\mathbb{N}\)), implies that, for all \(n\in\mathbb{N}\),
$$ \beta_{n}^{\mathrm{FR}} \bigl\vert \bigl\langle x_{n+1}  T ( x_{n+1} ), d_{n} \bigr\rangle \bigr\vert < \frac{\sigma}{1  \sigma} \bigl\Vert x_{n+1}  T (x_{n+1} ) \bigr\Vert ^{2}. $$
Accordingly, from the definition of \(d_{n+1} :=  (x_{n+1}  T(x_{n+1})) + \beta_{n}^{\mathrm{FR}} d_{n}\), we find that, for all \(n\in\mathbb{N}\),
$$\begin{aligned} \Vert d_{n+1} \Vert ^{2} &= \bigl\Vert \beta_{n}^{\mathrm{FR}} d_{n}  \bigl(x_{n+1}  T (x_{n+1} ) \bigr) \bigr\Vert ^{2} \\ &= \beta_{n}^{\mathrm{FR}^{2}} \Vert d_{n} \Vert ^{2}  2 \beta_{n}^{\mathrm{FR}} \bigl\langle d_{n}, x_{n+1}  T (x_{n+1} ) \bigr\rangle + \bigl\Vert x_{n+1}  T (x_{n+1} ) \bigr\Vert ^{2} \\ &\leq\frac{\Vert x_{n+1}  T (x_{n+1} )\Vert ^{4}}{\Vert x_{n}  T (x_{n} )\Vert ^{4}} \Vert d_{n} \Vert ^{2} + \biggl( \frac{2\sigma}{1  \sigma} + 1 \biggr) \bigl\Vert x_{n+1}  T (x_{n+1} ) \bigr\Vert ^{2}, \end{aligned}$$
which means that, for all \(n\in\mathbb{N}\),
$$ \frac{\Vert d_{n+1} \Vert ^{2}}{\Vert x_{n+1}  T (x_{n+1} )\Vert ^{4}} \leq\frac{\Vert d_{n} \Vert ^{2}}{\Vert x_{n}  T (x_{n} )\Vert ^{4}} + \frac{1+\sigma}{1\sigma} \frac{1}{\Vert x_{n+1}  T (x_{n+1} ) \Vert ^{2}}. $$
The sum of this inequality from \(n=0\) to \(n=N \in\mathbb{N}\) and \(d_{0} :=  (x_{0}  T(x_{0}))\) ensure that, for all \(N\in\mathbb{N}\),
$$ \frac{\Vert d_{N+1} \Vert ^{2}}{\Vert x_{N+1}  T (x_{N+1} )\Vert ^{4}} \leq\frac{1}{\Vert x_{0}  T (x_{0} ) \Vert ^{2}} + \frac{1+\sigma}{1\sigma} \sum _{k=1}^{N+1} \frac{1}{\Vert x_{k}  T (x_{k} ) \Vert ^{2}}. $$
From \(\x_{n}  T(x_{n})\ \geq\varepsilon\) (\(n \in\mathbb{N}\)), for all \(N\in\mathbb{N}\),
$$ \frac{\Vert d_{N+1} \Vert ^{2}}{\Vert x_{N+1}  T (x_{N+1} )\Vert ^{4}} \leq\frac{1}{\varepsilon^{2}} + \frac{1+\sigma}{1\sigma}\frac {N+1}{\varepsilon^{2}} = \frac{ ( 1 + \sigma ) N + 2}{\varepsilon^{2} ( 1  \sigma )}. $$
Therefore, from Lemma 2.4(i) guaranteeing that \(\d_{n}\\neq 0\) (\(n\in\mathbb{N}\)) and \(\sum_{k=1}^{\infty}\varepsilon^{2} ( 1  \sigma)/( ( 1 + \sigma) (k 1) + 2) = \infty\), it is found that
$$ \sum_{k=1}^{\infty}\frac{\Vert x_{k}  T (x_{k} ) \Vert ^{4}}{\Vert d_{k} \Vert ^{2}} = \infty. $$
Meanwhile, since (2.10) guarantees that \(\langle x_{n}  T(x_{n}), d_{n} \rangle \leq(2 + \sum_{j=0}^{n} \sigma^{j} ) \ x_{n}  T(x_{n}) \^{2} < ((12 \sigma)/(1\sigma)) \ x_{n}  T(x_{n}) \^{2}\) (\(n\in\mathbb{N}\)), Lemma 2.1 and Lemma 2.4(i) lead to the deduction that
$$ \infty> \sum_{k=0}^{\infty}\biggl( \frac{ \langle x_{k}  T (x_{k} ), d_{k} \rangle}{ \Vert d_{k} \Vert } \biggr)^{2} \geq \biggl( \frac{1 2 \sigma}{1\sigma} \biggr)^{2} \sum_{k=0}^{\infty}\frac{\Vert x_{k}  T (x_{k} )\Vert ^{4}}{\Vert d_{k} \Vert ^{2}} = \infty, $$
which is a contradiction. Therefore, \(\liminf_{n\to\infty} \ x_{n}  T(x_{n}) \ = 0\).
(iii) A discussion similar to the one in the proof of Lemma 2.3(iii) leads to Lemma 2.4(iii). This completes the proof. □
Algorithm 2.1 with \(\beta_{n} = \beta _{n}^{\mathrm{PRP}+}\)
It is well known that the convergence of the nonlinear conjugate gradient method with \(\beta_{n}^{\mathrm{PRP}}\) defined as in (1.19) for a general nonlinear function is uncertain [23], Section 5. To guarantee the convergence of the PRP method for unconstrained optimization, the following modification of \(\beta _{n}^{\mathrm{PRP}}\) was presented in [35]: for \(\beta _{n}^{\mathrm{PRP}}\) defined as in (1.19), \(\beta_{n}^{\mathrm {PRP}+} := \max\{ \beta_{n}^{\mathrm{PRP}}, 0 \}\). On the basis of the idea behind this modification, this subsection considers Algorithm 2.1 with \(\beta_{n}^{\mathrm{PRP}+}\) defined as in (2.4).
Theorem 2.4
Suppose that
\((x_{n})_{n\in\mathbb{N}}\)
and
\((d_{n})_{n\in\mathbb{N}}\)
are the sequences generated by Algorithm
2.1
with
\(\beta_{n} = \beta_{n}^{\mathrm{PRP}+}\) (\(n\in\mathbb{N}\)) and there exists
\(c > 0\)
such that
\(\langle x_{n}  T(x_{n}), d_{n} \rangle\leq c \ x_{n}  T(x_{n}) \^{2}\)
for all
\(n\in\mathbb{N}\). If
\((x_{n})_{n\in\mathbb{N}}\)
is bounded, then
\((x_{n})_{n\in\mathbb{N}}\)
either terminates at a fixed point of
T
or
$$ \lim_{n\to\infty} \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert = 0. $$
Proof of Theorem 2.4
It can be assumed that \(\ x_{n}  T (x_{n}) \ \neq0\) holds for all \(n\in \mathbb{N}\). Let us first show the following lemma by referring to the proof of [31], Lemma 4.1.
Lemma 2.5
Let
\((x_{n})_{n\in\mathbb{N}}\)
and
\((d_{n})_{n\in\mathbb{N}}\)
be the sequences generated by Algorithm
2.1
with
\(\beta_{n} \geq0\) (\(n\in\mathbb{N}\)) and assume that there exists
\(c > 0\)
such that
\(\langle x_{n}  T(x_{n}), d_{n} \rangle\leq c \x_{n}  T(x_{n})\^{2}\)
for all
\(n\in\mathbb{N}\). If there exists
\(\varepsilon> 0\)
such that
\(\x_{n}  T(x_{n})\ \geq \varepsilon\)
for all
\(n\in\mathbb{N}\), then
\(\sum_{n=0}^{\infty}\ u_{n+1}  u_{n} \^{2}< \infty\), where
\(u_{n} := d_{n}/\d_{n}\\) (\(n\in\mathbb{N}\)).
Proof
Assuming \(\ x_{n}  T (x_{n}) \ \geq\varepsilon\) and \(\langle x_{n}  T(x_{n}), d_{n} \rangle\leq c \x_{n}  T(x_{n})\^{2}\) (\(n\in\mathbb{N}\)), \(\ d_{n} \\neq0\) holds for all \(n\in\mathbb{N}\). Define \(r_{n} :=  (x_{n}  T(x_{n}))/\d_{n}\\) and \(\delta_{n} := \beta_{n} \ d_{n}\/\ d_{n+1} \\) (\(n\in\mathbb{N}\)). From \(\delta_{n} u_{n} = \beta_{n} d_{n} /\ d_{n+1}\\) and \(d_{n+1} =  (x_{n+1}  T(x_{n+1})) + \beta_{n} d_{n}\) (\(n\in\mathbb{N}\)), we have, for all \(n\in\mathbb{N}\),
$$ u_{n+1} =  r_{n+1} + \delta_{n} u_{n}, $$
which, together with \(\ u_{n+1}  \delta_{n} u_{n} \^{2} = \u_{n+1}\^{2} 2 \delta_{n} \langle u_{n+1}, u_{n} \rangle+ \delta_{n}^{2} \u_{n}\^{2} = \u_{n}\^{2} 2 \delta_{n} \langle u_{n}, u_{n+1} \rangle+ \delta_{n}^{2} \u_{n+1} \^{2} = \ u_{n}  \delta_{n} u_{n+1} \^{2}\) (\(n\in\mathbb{N}\)), implies that, for all \(n\in\mathbb{N}\),
$$ \Vert r_{n+1} \Vert = \Vert u_{n+1}  \delta_{n} u_{n} \Vert = \Vert u_{n}  \delta_{n} u_{n+1} \Vert . $$
Accordingly, the condition \(\beta_{n} \geq0\) (\(n\in\mathbb{N}\)) and the triangle inequality mean that, for all \(n\in\mathbb{N}\),
$$\begin{aligned} \Vert u_{n+1}  u_{n} \Vert &\leq (1+ \delta_{n} ) \Vert u_{n+1}  u_{n} \Vert \\ &\leq \Vert u_{n+1}  \delta_{n} u_{n} \Vert + \Vert u_{n}  \delta_{n} u_{n+1} \Vert \\ &= 2 \Vert r_{n+1} \Vert . \end{aligned}$$
(2.11)
From Lemma 2.1, \(\langle x_{n}  T(x_{n}), d_{n} \rangle\leq c \ x_{n}  T(x_{n})\^{2}\) (\(n\in\mathbb{N}\)), the definition of \(r_{n}\), and \(\ x_{n}  T(x_{n}) \ \geq\varepsilon\) (\(n\in\mathbb{N}\)), we have
$$ \infty> \sum_{n=0}^{\infty}\biggl( \frac{ \langle x_{n} T (x_{n} ), d_{n} \rangle}{ \Vert d_{n}\Vert } \biggr)^{2} \geq c^{2} \sum _{n=0}^{\infty}\frac{\Vert x_{n}  T (x_{n} ) \Vert ^{4}}{\Vert d_{n}\Vert ^{2}} \geq c^{2} \varepsilon^{2} \sum_{n=0}^{\infty} \Vert r_{n} \Vert ^{2}, $$
which, together with (2.11), completes the proof. □
The following property, referred to as Property (⋆), is a result of modifying [31], Property (∗), to conform to Problem (1.1).
 Property (⋆).:

Suppose that there exist positive constants γ and γ̄ such that \(\gamma\leq\ x_{n}  T(x_{n}) \ \leq\bar{\gamma}\) for all \(n\in\mathbb{N}\). Then Property (⋆) holds if \(b > 1\) and \(\lambda> 0\) exist such that, for all \(n\in\mathbb{N}\),
$$ \vert \beta_{n} \vert \leq b \quad \text{and}\quad \Vert x_{n+1}  x_{n} \Vert \leq\lambda\quad \text{implies} \quad \vert \beta_{n} \vert \leq\frac{1}{2b}. $$
The proof of the following lemma can be omitted since it is similar to the proof of [31], Lemma 4.2.
Lemma 2.6
Let
\((x_{n})_{n\in\mathbb{N}}\)
and
\((d_{n})_{n\in\mathbb{N}}\)
be the sequences generated by Algorithm
2.1
and assume that there exist
\(c > 0\)
and
\(\gamma> 0\)
such that
\(\langle x_{n}  T(x_{n}), d_{n} \rangle\leq c \x_{n}  T(x_{n})\^{2}\)
and
\(\x_{n}  T(x_{n})\ \geq\gamma\)
for all
\(n\in\mathbb{N}\). Suppose also that Property (⋆) holds. Then there exists
\(\lambda> 0\)
such that, for all
\(\Delta\in\mathbb {N} \backslash\{0\}\)
and for any index
\(k_{0}\), there is
\(k \geq k_{0}\)
such that
\( \mathcal {K}_{k,\Delta}^{\lambda} > \Delta/2\), where
\(\mathcal{K}_{k,\Delta}^{\lambda}:= \{ i\in\mathbb{N} \backslash\{ 0\} \colon k \leq i \leq k + \Delta1, \ x_{i}  x_{i1} \ > \lambda\}\) (\(k\in\mathbb{N}\), \(\Delta\in\mathbb{N} \backslash\{0\}\), \(\lambda> 0\)) and
\(\mathcal{K}_{k,\Delta}^{\lambda}\)
stands for the number of elements of
\(\mathcal{K}_{k,\Delta}^{\lambda}\).
The following can be proven by referring to the proof of [31], Theorem 4.3.
Lemma 2.7
Let
\((x_{n})_{n\in\mathbb{N}}\)
be the sequence generated by Algorithm
2.1
with
\(\beta_{n} \geq0\) (\(n\in\mathbb{N}\)) and assume that there exists
\(c > 0\)
such that
\(\langle x_{n}  T(x_{n}), d_{n} \rangle\leq c \x_{n}  T(x_{n})\^{2}\)
for all
\(n\in\mathbb{N}\)
and Property (⋆) holds. If
\((x_{n})_{n\in\mathbb{N}}\)
is bounded, \(\liminf_{n\to\infty} \x_{n}  T (x_{n} ) \ = 0\).
Proof
Assuming that \(\liminf_{n\to\infty} \x_{n}  T (x_{n} ) \ > 0\), there exists \(\gamma> 0\) such that \(\ x_{n}  T(x_{n}) \ \geq\gamma\) for all \(n\in\mathbb{N}\). Since \(c> 0\) exists such that \(\langle x_{n}  T(x_{n}), d_{n} \rangle\leq c \x_{n}  T(x_{n})\^{2}\) (\(n\in\mathbb{N}\)), \(\ d_{n} \ \neq0\) (\(n\in\mathbb{N}\)) holds. Moreover, the nonexpansivity of T ensures that, for all \(x\in\operatorname{Fix}(T)\), \(\ T (x_{n} )  x \ \leq\ x_{n} x \\), and this, together with the boundedness of \((x_{n})_{n\in\mathbb{N}}\), implies the boundedness of \((T(x_{n}))_{n\in\mathbb{N}}\). Accordingly, \(\bar{\gamma} > 0\) exists such that \(\ x_{n}  T(x_{n}) \ \leq\bar{\gamma}\) (\(n\in\mathbb {N}\)). The definition of \(x_{n}\) implies that, for all \(n\geq1\),
$$ x_{n}  x_{n1} = \alpha_{n1} d_{n1} = \alpha_{n1} \Vert d_{n1} \Vert u_{n1} = \Vert x_{n}  x_{n1}\Vert u_{n1}, $$
where \(u_{n} := d_{n}/\d_{n}\\) (\(n\in\mathbb{N}\)). Hence, for all \(l, k \in\mathbb{N}\) with \(l \geq k > 0\),
$$ x_{l}  x_{k1} = \sum_{i=k}^{l} ( x_{i}  x_{i1} ) = \sum_{i=k}^{l} \Vert x_{i}  x_{i1}\Vert u_{i1}, $$
which implies that
$$ \sum_{i=k}^{l} \Vert x_{i}  x_{i1}\Vert u_{k1} = x_{l}  x_{k1}  \sum_{i=k}^{l} \Vert x_{i}  x_{i1}\Vert (u_{i1}  u_{k1} ). $$
From \(\ u_{n} \ = 1\) (\(n\in\mathbb{N}\)) and the triangle inequality, for all \(l, k \in\mathbb{N}\) with \(l \geq k > 0\), \(\sum_{i=k}^{l} \x_{i}  x_{i1} \ \leq\ x_{l}  x_{k1} \ + \sum_{i=k}^{l} \x_{i}  x_{i1} \ \ u_{i1}  u_{k1} \\). Since the boundedness of \((x_{n})_{n\in\mathbb{N}}\) means there is \(M > 0\) satisfying \(\ x_{n+1}  x_{n} \ \leq M\) (\(n\in\mathbb{N}\)), we find that, for all \(l, k \in\mathbb{N}\) with \(l \geq k > 0\),
$$ \sum_{i=k}^{l} \Vert x_{i}  x_{i1} \Vert \leq M + \sum _{i=k}^{l} \Vert x_{i}  x_{i1} \Vert \Vert u_{i1}  u_{k1} \Vert . $$
(2.12)
Let \(\lambda> 0\) be as given by Lemma 2.6 and define \(\Delta := \lceil4M/\lambda\rceil\), where \(\lceil\cdot\rceil\) denotes the ceiling operator. From Lemma 2.5, an index \(k_{0}\) can be chosen such that \(\sum_{i=k_{0}}^{\infty}\ u_{i}  u_{i1} \^{2} \leq1/(4 \Delta)\). Accordingly, Lemma 2.6 guarantees the existence of \(k \geq k_{0}\) such that \( \mathcal{K}_{k,\Delta}^{\lambda} > \Delta/2\). Since the CauchySchwarz inequality implies that \((\sum_{i=1}^{m} a_{i})^{2} \leq m \sum_{i=1}^{m} a_{i}^{2}\) (\(m \geq1\), \(a_{i} \in\mathbb{R}\), \(i=1,2,\ldots,m\)), we have, for all \(i\in[k,k+\Delta1]\),
$$ \Vert u_{i1}  u_{k1} \Vert ^{2} \leq \Biggl( \sum_{j=k}^{i1} \Vert u_{j}  u_{j1} \Vert \Biggr)^{2} \leq ( i  k ) \sum _{j=k}^{i1} \Vert u_{j}  u_{j1} \Vert ^{2} \leq \frac{1}{4}. $$
Putting \(l:= k+\Delta1\), (2.12) ensures that
$$ M \geq\frac{1}{2} \sum_{i=k}^{k+\Delta1} \Vert x_{i}  x_{i1} \Vert > \frac{\lambda}{2} \bigl\vert \mathcal{K}_{k,\Delta}^{\lambda}\bigr\vert > \frac{\lambda\Delta}{4}, $$
which implies that \(\Delta< 4M/\lambda\). This contradicts \(\Delta:= \lceil4M/\lambda\rceil\). Therefore, \(\liminf_{n\to\infty} \x_{n}  T (x_{n} ) \ = 0\). □
Now we are in the position to prove Theorem 2.4.
Proof
The condition \(\beta_{n}^{\mathrm{PRP}+} \geq0\) holds for all \(n\in \mathbb{N}\). Suppose that positive constants γ and γ̄ exist such that \(\gamma\leq\x_{n}  T(x_{n})\ \leq\bar{\gamma}\) (\(n\in\mathbb{N}\)) and define \(b:= 2\bar{\gamma}^{2}/\gamma^{2}\) and \(\lambda:= \gamma^{2}/(4\bar{\gamma} b)\). The definition of \(\beta_{n}^{\mathrm{PRP}+}\) and the CauchySchwarz inequality mean that, for all \(n\in\mathbb{N}\),
$$ \bigl\vert \beta_{n}^{\mathrm{PRP}+} \bigr\vert \leq \frac{\vert \langle x_{n+1}  T (x_{n+1} ), y_{n} \rangle \vert }{\Vert x_{n}  T (x_{n} ) \Vert ^{2}} \leq \frac{\Vert x_{n+1}  T (x_{n+1} ) \Vert \Vert y_{n} \Vert }{\Vert x_{n}  T (x_{n} ) \Vert ^{2}} \leq\frac{2 \bar{\gamma}^{2}}{\gamma^{2}} = b, $$
where the third inequality comes from \(\y_{n}\ \leq\x_{n+1}  T(x_{n+1})\ + \ x_{n}  T(x_{n})\ \leq2 \bar {\gamma}\) and \(\gamma\leq\x_{n}  T(x_{n})\ \leq\bar{\gamma}\) (\(n\in\mathbb{N}\)). When \(\ x_{n+1}  x_{n} \ \leq\lambda\) (\(n\in\mathbb{N}\)), the triangle inequality and the nonexpansivity of T imply that \(\y_{n}\ \leq\x_{n+1}  x_{n}\ + \ T(x_{n})  T(x_{n+1})\ \leq2 \ x_{n+1}  x_{n} \ \leq2 \lambda\) (\(n\in\mathbb{N}\)). Therefore, for all \(n\in\mathbb{N}\),
$$ \bigl\vert \beta_{n}^{\mathrm{PRP}+} \bigr\vert \leq \frac{\bar{\gamma} \Vert y_{n} \Vert }{\Vert x_{n}  T (x_{n} ) \Vert ^{2}} \leq\frac{2 \lambda\bar{\gamma}}{\gamma^{2}} = \frac{1}{2b}, $$
which implies that Property (⋆) holds. Lemma 2.7 thus guarantees that \(\liminf_{n\to\infty} \ x_{n}  T(x_{n}) \ = 0\) holds. A discussion in the same manner as in the proof of Lemma 2.3(iii) leads to \(\lim_{n\to\infty} \ x_{n}  T(x_{n}) \ = 0\). This completes the proof. □
Algorithm 2.1 with \(\beta_{n} = \beta _{n}^{\mathrm{HS}+}\)
The convergence properties of the nonlinear conjugate gradient method with \(\beta_{n}^{\mathrm{HS}}\) defined as in (1.19) are similar to those with \(\beta_{n}^{\mathrm{PRP}}\) defined as in (1.19) [23], Section 5. On the basis of this fact and the modification of \(\beta_{n}^{\mathrm {PRP}}\) in Section 2.4, this subsection considers Algorithm 2.1 with \(\beta _{n}^{\mathrm{HS}+}\) defined by (2.4).
Lemma 2.7 leads to the following.
Theorem 2.5
Suppose that
\((x_{n})_{n\in\mathbb{N}}\)
and
\((d_{n})_{n\in\mathbb{N}}\)
are the sequences generated by Algorithm
2.1
with
\(\beta_{n} = \beta_{n}^{\mathrm{HS}+}\) (\(n\in\mathbb{N}\)) and there exists
\(c > 0\)
such that
\(\langle x_{n}  T(x_{n}), d_{n} \rangle\leq c \ x_{n}  T(x_{n}) \^{2}\)
for all
\(n\in\mathbb{N}\). If
\((x_{n})_{n\in\mathbb{N}}\)
is bounded, then
\((x_{n})_{n\in\mathbb{N}}\)
either terminates at a fixed point of
T
or
$$ \lim_{n\to\infty} \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert = 0. $$
Proof
When \(m\in\mathbb{N}\) exists such that \(\x_{m}  T(x_{m}) \ =0\), Theorem 2.5 holds. Let us consider the case where \(\ x_{n}  T(x_{n}) \ \neq0\) for all \(n\in\mathbb{N}\). Suppose that \(\gamma, \bar{\gamma} > 0\) exist such that \(\gamma\leq \ x_{n}  T(x_{n}) \ \leq\bar{\gamma}\) (\(n\in\mathbb{N}\)) and define \(b:= 2\bar{\gamma}^{2}/((1\sigma)c\gamma^{2})\) and \(\lambda := (1\sigma)c \gamma^{2}/(4\bar{\gamma}b)\). Then (2.2) implies that, for all \(n\in\mathbb{N}\),
$$\begin{aligned} \langle d_{n}, y_{n} \rangle &= \bigl\langle d_{n}, x_{n+1}  T ( x_{n+1} ) \bigr\rangle  \bigl\langle d_{n}, x_{n}  T ( x_{n} ) \bigr\rangle \\ &\geq ( 1  \sigma ) \bigl\langle d_{n}, x_{n}  T ( x_{n} ) \bigr\rangle , \end{aligned}$$
which, together with the existence of \(c, \gamma> 0\) such that \(\langle x_{n}  T(x_{n}), d_{n} \rangle\leqc \ x_{n}  T(x_{n}) \^{2}\), and \(\gamma\leq\ x_{n}  T(x_{n}) \\) (\(n\in\mathbb{N}\)), implies that, for all \(n\in\mathbb{N}\),
$$ \langle d_{n}, y_{n} \rangle\geq ( 1  \sigma ) c \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert ^{2} \geq ( 1  \sigma ) c \gamma^{2} > 0. $$
This means Algorithm 2.1 with \(\beta_{n} = \beta_{n}^{\mathrm {HS}+}\) is well defined. From \(\x_{n}  T(x_{n})\ \leq\bar{\gamma}\) (\(n\in\mathbb{N}\)) and the definition of \(y_{n}\), we have, for all \(n\in\mathbb{N}\),
$$ \bigl\vert \beta_{n}^{\mathrm{HS}+} \bigr\vert \leq \frac{\vert \langle x_{n+1}  T (x_{n+1} ), y_{n} \rangle \vert }{ \vert \langle d_{n}, y_{n} \rangle \vert } \leq\frac{2 \bar{\gamma}^{2}}{ ( 1  \sigma ) c \gamma ^{2}} = b. $$
When \(\ x_{n+1}  x_{n} \ \leq\lambda\) (\(n\in\mathbb{N}\)), the triangle inequality and the nonexpansivity of T imply that \(\y_{n}\ \leq\x_{n+1}  x_{n}\ + \ T(x_{n})  T(x_{n+1})\ \leq2 \ x_{n+1}  x_{n} \ \leq2 \lambda\) (\(n\in\mathbb{N}\)). Therefore, from \(\ x_{n}  T(x_{n}) \ \leq\bar{\gamma}\) (\(n\in\mathbb {N}\)), for all \(n\in\mathbb{N}\),
$$ \bigl\vert \beta_{n}^{\mathrm{HS}+} \bigr\vert \leq \frac{\bar{\gamma} \Vert y_{n} \Vert }{ \langle d_{n}, y_{n} \rangle} \leq\frac{2 \lambda\bar{\gamma}}{ ( 1  \sigma ) c \gamma^{2}} = \frac{1}{2b}, $$
which in turn implies that Property (⋆) holds. Lemma 2.7 thus ensures that \(\liminf_{n\to\infty} \ x_{n}  T(x_{n}) \ = 0\) holds. A discussion similar to the one in the proof of Lemma 2.3(iii) leads to \(\lim_{n\to\infty} \ x_{n}  T(x_{n}) \ = 0\). This completes the proof. □
Convergence rate analyses of Algorithm 2.1
Sections 2.12.5 show that Algorithm 2.1 with equations (2.4) satisfies \(\lim_{n\to\infty} \ x_{n}  T(x_{n}) \ = 0\) under certain assumptions. The next theorem establishes rates of convergence for Algorithm 2.1 with equations (2.4).
Theorem 2.6

(i)
Under the Wolfetype conditions (2.1) and (2.2), Algorithm
2.1
with
\(\beta_{n} = \beta_{n}^{\mathrm{SD}}\)
satisfies, for all
\(n\in\mathbb{N}\),
$$ \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert \leq \frac{\Vert x_{0}  T (x_{0} ) \Vert }{\sqrt {\delta\sum_{k=0}^{n} \alpha_{k}}}. $$

(ii)
Under the strong Wolfetype conditions (2.1) and (2.9), Algorithm
2.1
with
\(\beta_{n} = \beta_{n}^{\mathrm{DY}}\)
satisfies, for all
\(n\in\mathbb{N}\),
$$ \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert \leq \frac{\Vert x_{0}  T (x_{0} ) \Vert }{\sqrt {\frac{1}{1+\sigma} \delta\sum_{k=0}^{n} \alpha_{k}}}. $$

(iii)
Under the strong Wolfetype conditions (2.1) and (2.9), Algorithm
2.1
with
\(\beta_{n} = \beta_{n}^{\mathrm{FR}}\)
satisfies, for all
\(n\in\mathbb{N}\),
$$ \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert \leq \frac{\Vert x_{0}  T (x_{0} ) \Vert }{\sqrt {\frac{1}{1\sigma} \delta\sum_{k=0}^{n} ( 12\sigma+ \sigma ^{k} ) \alpha_{k}}}. $$

(iv)
Under the assumptions in Theorem
2.4, Algorithm
2.1
with
\(\beta_{n} = \beta_{n}^{\mathrm{PRP}+}\)
satisfies, for all
\(n\in\mathbb{N}\),
$$ \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert \leq \frac{\Vert x_{0}  T (x_{0} ) \Vert }{\sqrt{c \delta\sum_{k=0}^{n} \alpha_{k}}}. $$

(v)
Under the assumptions in Theorem
2.5, Algorithm
2.1
with
\(\beta_{n} = \beta_{n}^{\mathrm{HS}+}\)
satisfies, for all
\(n\in\mathbb{N}\),
$$ \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert \leq \frac{\Vert x_{0}  T (x_{0} ) \Vert }{\sqrt{c \delta\sum_{k=0}^{n} \alpha_{k}}}. $$
Proof
(i) From \(d_{k} =  (x_{k}  T(x_{k}))\) (\(k\in\mathbb{N}\)) and (2.1), we have \(0 \leq\delta\alpha_{k} \x_{k}  T(x_{k})\^{2} \leq\ x_{k}  T(x_{k}) \ ^{2}  \x_{k+1}  T(x_{k+1})\^{2}\) (\(k\in\mathbb{N}\)). Summing up this inequality from \(k=0\) to \(k=n\) guarantees that, for all \(n\in\mathbb{N}\),
$$ \delta\sum_{k=0}^{n} \alpha_{k} \bigl\Vert x_{k}  T (x_{k} ) \bigr\Vert ^{2} \leq\bigl\Vert x_{0}  T (x_{0} ) \bigr\Vert ^{2}  \bigl\Vert x_{n+1}  T (x_{n+1} ) \bigr\Vert ^{2} \leq\bigl\Vert x_{0}  T (x_{0} ) \bigr\Vert ^{2}, $$
which, together with the monotone decreasing property of \((\ x_{n}  T(x_{n}) \^{2})_{n\in\mathbb{N}}\), implies that, for all \(n\in\mathbb{N}\),
$$ \delta\bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert ^{2} \sum_{k=0}^{n} \alpha_{k} \leq\bigl\Vert x_{0}  T (x_{0} ) \bigr\Vert ^{2}. $$
This completes the proof.
(ii) Condition (2.9) and Lemma 2.3(i) ensure that \( \sigma\leq\langle x_{k+1}  T(x_{k+1}), d_{k} \rangle/\langle x_{k}  T(x_{k}), d_{k} \rangle\leq\sigma\) (\(k\in\mathbb{N}\)). Accordingly, (2.8) means that, for all \(k\in\mathbb{N}\),
$$\begin{aligned} \bigl\langle x_{k+1}  T (x_{k+1} ), d_{k+1} \bigr\rangle &= \frac{ \langle x_{k}  T (x_{k} ), d_{k} \rangle}{ \langle d_{k}, (x_{k+1}  T (x_{k+1} ) )  ( x_{k}  T (x_{k} ) ) \rangle} \bigl\Vert x_{k+1}  T (x_{k+1} ) \bigr\Vert ^{2} \\ &= \biggl(\frac{\langle x_{k+1}  T(x_{k+1}), d_{k} \rangle}{\langle x_{k}  T(x_{k}), d_{k} \rangle} 1 \biggr)^{1} \bigl\Vert x_{k+1}  T (x_{k+1} ) \bigr\Vert ^{2} \\ &\leq \frac{1}{1+\sigma} \bigl\Vert x_{k+1}  T (x_{k+1} ) \bigr\Vert ^{2}. \end{aligned}$$
Hence, (2.1) implies that, for all \(k\in\mathbb{N}\),
$$ \bigl\Vert x_{k+1}  T (x_{k+1} ) \bigr\Vert ^{2}  \bigl\Vert x_{k}  T (x_{k} ) \bigr\Vert ^{2} \leq \frac{1}{1+\sigma} \delta\alpha_{k} \bigl\Vert x_{k}  T (x_{k} ) \bigr\Vert ^{2}. $$
Summing up this inequality from \(k=0\) to \(k=n\) and the monotone decreasing property of \((\ x_{n}  T(x_{n}) \^{2})_{n\in\mathbb{N}}\) ensure that, for all \(n\in\mathbb{N}\),
$$ \frac{1}{1+\sigma} \delta\bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert ^{2} \sum_{k=0}^{n} \alpha_{k} \leq\bigl\Vert x_{0}  T (x_{0} ) \bigr\Vert ^{2}, $$
which completes the proof.
(iii) Inequality (2.10) guarantees that, for all \(k\in\mathbb{N}\),
$$\begin{aligned} \bigl\langle x_{k}  T (x_{k} ), d_{k} \bigr\rangle &\leq \Biggl(2 + \sum_{j=0}^{k} \sigma^{j} \Biggr) \bigl\Vert x_{k}  T (x_{k} ) \bigr\Vert ^{2} \\ &=  \frac{12\sigma+ \sigma^{k}}{1  \sigma} \bigl\Vert x_{k}  T (x_{k} ) \bigr\Vert ^{2}, \end{aligned}$$
which, together with (2.1), implies that, for all \(k\in \mathbb{N}\),
$$ \bigl\Vert x_{k+1}  T (x_{k+1} ) \bigr\Vert ^{2}  \bigl\Vert x_{k}  T (x_{k} ) \bigr\Vert ^{2} \leq \frac{12\sigma+ \sigma^{k}}{1\sigma} \delta\alpha_{k} \bigl\Vert x_{k}  T (x_{k} ) \bigr\Vert ^{2}. $$
Summing up this inequality from \(k=0\) to \(k=n\) and the monotone decreasing property of \((\ x_{n}  T(x_{n}) \^{2})_{n\in\mathbb{N}}\) ensure that, for all \(n\in\mathbb{N}\),
$$ \frac{1}{1\sigma} \delta\bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert ^{2} \sum_{k=0}^{n} \bigl(12\sigma+ \sigma^{k} \bigr) \alpha_{k} \leq\bigl\Vert x_{0}  T (x_{0} ) \bigr\Vert ^{2}, $$
which completes the proof.
(iv), (v) Since there exists \(c > 0\) such that \(\langle x_{k}  T(x_{k}), d_{k} \rangle \leqc \ x_{k}  T(x_{k})\^{2}\) for all \(k\in\mathbb{N}\), we have from (2.1) and the monotone decreasing property of \((\ x_{n}  T(x_{n}) \^{2})_{n\in\mathbb{N}}\), for all \(n\in\mathbb{N}\),
$$ c \delta\bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert ^{2} \sum_{k=0}^{n} \alpha_{k} \leq c \delta\sum_{k=0}^{n} \alpha_{k} \bigl\Vert x_{k}  T (x_{k} ) \bigr\Vert ^{2} \leq \bigl\Vert x_{0}  T (x_{0} ) \bigr\Vert ^{2}. $$
This concludes the proof. □
The conventional Krasnosel’skiĭMann algorithm (1.2) with a step size sequence \((\alpha_{n})_{n\in\mathbb{N}}\) obeying (1.3) satisfies the following inequality [8], Propositions 10 and 11:
$$ \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert \leq \frac{\mathrm{d} (x_{0}, \operatorname{Fix} (T ) )}{\sqrt{\sum_{k=0}^{n} \alpha_{k} (1\alpha_{k} )}} \quad (n\in\mathbb{N} ), $$
where \(\mathrm{d}(x_{0}, \operatorname{Fix} (T)) := \min_{x\in\operatorname{Fix}(T)} \ x_{0}  x \\). When \(\alpha_{n}\) (\(n\in\mathbb{N}\)) is a constant in the range of \((0,1)\), which is the most tractable choice of step size satisfying (1.3), the Krasnosel’skiĭMann algorithm (1.2) has the rate of convergence,
$$ \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert = O \biggl( \frac {1}{\sqrt{n+1}} \biggr). $$
(2.13)
Meanwhile, according to Theorem 5 in [17], Algorithm (1.2) with \((\alpha_{n})_{n\in\mathbb{N}}\) satisfying the Armijotype condition (1.5) satisfies, for all \(n\in\mathbb{N}\),
$$ \bigl\Vert x_{n}  T (x_{n} ) \bigr\Vert \leq\frac{\Vert x_{0}  T (x_{0} ) \Vert }{\sqrt {\beta\sum_{k=0}^{n} ( \alpha_{k}  \frac{1}{2} )^{2}}}. $$
(2.14)
In general, the step sizes satisfying (1.3) do not coincide with those satisfying the Armijotype condition (1.5) or the Wolfetype conditions (2.1) and (2.2). This is because the line search methods based on the Armijotype conditions (1.5) and (2.1) determine step sizes at each iteration n so as to satisfy \(\ x_{n+1}  T(x_{n+1}) \ < \x_{n}  T(x_{n})\\), while the constant step sizes satisfying (1.3) do not change at each iteration. Accordingly, it would be difficult to evaluate the efficiency of these algorithms by using only the theoretical convergence rates in (2.13), (2.14), and Theorem 2.6. To verify whether Algorithm 2.1 with the convergence rates in Theorem 2.6 converges faster than the previous algorithms [8], Propositions 10 and 11, [17], Theorem 5, with convergence rates (2.13) and (2.14), the next section numerically compares their abilities to solve concrete constrained smooth convex optimization problems.