Skip to main content

Convergence of proximal splitting algorithms in \(\operatorname{CAT}(\kappa)\) spaces and beyond

Abstract

In the setting of \(\operatorname{CAT}(\kappa)\) spaces, common fixed point iterations built from prox mappings (e.g. prox-prox, Krasnoselsky–Mann relaxations, nonlinear projected-gradients) converge locally linearly under the assumption of linear metric subregularity. Linear metric subregularity is in any case necessary for linearly convergent fixed point sequences, so the result is tight. To show this, we develop a theory of fixed point mappings that violate the usual assumptions of nonexpansiveness and firm nonexpansiveness in p-uniformly convex spaces.

1 Fundamentals of nonlinear spaces

Following [1] we focus on p-uniformly convex spaces with parameter c [12]: for \(p\in (1,\infty )\), a metric space \((G, d)\) is p-uniformly convex with constant \(c>0\) whenever it is a geodesic space, and

$$\begin{aligned} & \bigl(\forall t\in [0,1]\bigr) (\forall x,y,z\in G) \\ &\quad d\bigl(z, (1-t)x\oplus ty\bigr)^{p} \leq (1-t)d(z,x)^{p}+td(z,y)^{p} - \frac{c}{2}t(1-t)d(x,y)^{p}. \end{aligned}$$
(1)

Examples of p-uniformly convex spaces are \(L^{p}\) spaces, \(\operatorname{CAT}(0)\) spaces (\(p=c=2\)), Hadamard spaces (complete \(\operatorname{CAT}(0)\) spaces), Hilbert spaces (linear Hadamard spaces). Of particular interest are \(\operatorname{CAT}(\kappa)\) spaces since these serve as the model space for applications on manifolds with curvature bounded above.

Lemma 1

([13], Proposition 3.1)

A \(\operatorname{CAT}(\kappa )\) space is locally 2-uniformly convex with parameter \(c\nearrow 2\) as the diameter of the local neighborhood vanishes. In particular, for any \(\operatorname{CAT}(\kappa )\) space \((G, d)\) and any point \(\overline{x}\in G\), for all \(\delta \in (0, \pi /(4\sqrt{\kappa }))\), the subspace \((\mathbb{B}_{\delta }(\overline{x}), d \vert _{\mathbb{B}_{\delta }(\overline{x})})\) is a 2-uniformly convex space with constant \(c_{\delta }= 4\delta \sqrt{\kappa }\tan (\pi /2 - 2\delta \sqrt{ \kappa } )\).

Note the asymptotic behavior of the constants: as \(\delta \searrow 0\), the constant \(c\nearrow 2\).

Definition 2

Let \((G,d)\) be a geodesic space and γ and η be two geodesics through p. Then γ is said to be perpendicular to η at point p denoted by \(\gamma \perp _{p} \eta \) if

$$\begin{aligned} d(x,p)\leq d(x,y) \quad \forall x\in \gamma, y \in \eta. \end{aligned}$$

A space is said to be symmetric perpendicular if for all geodesics γ and η with common point p we have

$$\begin{aligned} \gamma \perp _{p} \eta\quad \Leftrightarrow\quad \eta \perp _{p} \gamma. \end{aligned}$$

Remark 3

Any \(\operatorname{CAT}(\kappa )\) space \((G,d)\) with \(\operatorname{diam}(G) < \frac{\pi }{2\sqrt{\kappa }} \) is symmetric perpendicular [9, Theorem 2.11].

In a complete p-uniformly convex space the p-proximal mapping of a proper and lower semicontinuous function f is defined by

$$\begin{aligned} \operatorname {prox}^{p}_{f, \lambda } (x):= \mathop {\operatorname {argmin}}_{y \in G} f(y)+ \frac{1}{p \lambda ^{p-1}} d(y,x)^{p}\quad (\lambda >0). \end{aligned}$$
(2)

The main dividend of this work is the following.

Theorem 4

(Convergence of proximal algorithms in \(\operatorname{CAT}(\kappa)\) spaces)

Let \((G, d)\) be a complete \(\operatorname{CAT}(\kappa )\) space with \(\kappa >0\), and for \(j=1,2,\dots,N\), let \(f_{j}\) and \(g\colon G \rightarrow \mathbb{R}\cap \{+\infty \}\) be proper, convex, and lower semicontinuous with \(\mathop {\operatorname {argmin}}g\cap (\bigcap_{j=1}^{N} \mathop {\operatorname {argmin}}f_{j} )\neq \emptyset \). Let \(T: D \to D \) where \(D\subset G\) denotes one of the following:

  1. (i)

    \(T:=\operatorname {prox}_{f_{N},\lambda _{N}}\circ \operatorname {prox}_{f_{N-1},\lambda _{N-1}} \circ \cdots \circ \operatorname {prox}_{f_{1},\lambda _{1}}\);

  2. (ii)

    \(T:=\beta \operatorname {prox}_{g,\lambda }\oplus (1-\beta )\operatorname {Id}\);

  3. (iii)

    \(T:=\operatorname {prox}_{f_{1},\lambda _{1}}\circ (\beta \operatorname {prox}_{g,\lambda _{2}} \oplus (1-\beta )\operatorname {Id})\);

  4. (iv)

    \(T:=P_{C}\circ (\beta \operatorname {prox}_{g,\lambda _{1}}\oplus (1-\beta ) \operatorname {Id})\),

where in the last case \(P_{C}\) is the metric projector onto the closed convex set \(C\subset G\) (in other words, this is the specialization of part (iii) to the case where \(f_{1}\) is the indicator function of a closed convex subset of G). If T satisfies \(\operatorname {Fix}T\neq \emptyset \) and

$$\begin{aligned} d(x,\operatorname {Fix}T\cap D)\leq \mu d(x,Tx), \quad\forall x\in D\subset G, \end{aligned}$$

with constant μ, then the fixed point sequence initialized from any starting point close enough to FixT is at least linearly convergent to a point in FixT.

A more precise statement of this theorem, with proof, is Theorem 25. The intervening sections prove the fundamental building blocks.

2 Almost α-firmly nonexpansive mappings

The regularity of a mapping \(T: G \to G \) is characterized by the behavior of the images of pairs of points under T. A key tool is what has been called the transport discrepancy in [4]:

$$\begin{aligned} \psi ^{(p,c)}_{T}(x,y):= \frac{c}{2} \bigl(d(Tx, x)^{p}+d(Ty, y)^{p} + d(Tx, Ty)^{p} + d(x, y)^{p} - d(Tx, y)^{p} - d(x,Ty)^{p} \bigr). \end{aligned}$$
(3)

Definition 5

Let \((G, d)\) be a p-uniformly convex metric space with constant c.

  1. (i)

    The mapping \(T:G\to G\) is pointwise almost nonexpansive at \(y\in D\subset G\) on D with violation \(\epsilon \geq 0\) whenever

    $$\begin{aligned} \exists \epsilon \geq 0:\quad d(Tx,Ty)^{p}\leq (1+\epsilon )d(x,y)^{p}\quad \forall x\in D. \end{aligned}$$
    (4)

    The smallest ϵ for which (4) holds is called the violation. If (4) holds with \(\epsilon =0\), then T is pointwise nonexpansive at \(y\in D\subset G\) on D. If (4) holds at all \(y\in D\), then T is said to be (almost) nonexpansive on D. If \(D=G\), then the mapping T is simply said to be (almost) nonexpansive. If \(D\supset \operatorname {Fix}T\neq \emptyset \) and (4) holds at all \(y\in \operatorname {Fix}T\) with the same violation, then T is said to be almost quasi nonexpansive.

  2. (ii)

    The mapping T is said to be pointwise asymptotically nonexpansive at y whenever

    $$\begin{aligned} \forall \epsilon >0, \exists D_{\epsilon }(y)\subset G:\quad d(Tx,Ty)^{p} \leq (1+\epsilon )d(x,y)^{p}\quad \forall x\in D_{\epsilon }(y), \end{aligned}$$
    (5)

    where \(D_{\epsilon }(y)\) is a neighborhood of y in D.

  3. (iii)

    \(T: G \to G \) is said to be quasi strictly nonexpansive whenever

    $$\begin{aligned} d(Tx,\overline{x})< d(x,\bar{x})\quad \forall x \in G\setminus \operatorname {Fix}T, \forall \overline{x}\in \operatorname {Fix}T. \end{aligned}$$
    (6)
  4. (iv)

    The operator \(T:G\to G\) is pointwise almost α-firmly nonexpansive at \(y\in D\subset G\) on D with violation at most \(\epsilon >0\) whenever

    $$\begin{aligned} \exists \alpha \in (0,1), \epsilon \geq 0:\quad d(Tx,Ty)^{p} \leq (1+ \epsilon )d(x,y)^{p}-\frac{1-\alpha }{\alpha }\psi ^{(p,c)}_{T}(x,y). \end{aligned}$$
    (7)

    If (7) holds with \(\epsilon =0\), then T is pointwise α-firmly nonexpansive at \(y\in D\subset G\) on D. If (7) holds at all \(y\in D\) with the same constant α, then T is said to be (almost) α-firmly nonexpansive on D. If \(D=G\), then the mapping T is simply said to be (almost) α-firmly nonexpansive. If \(D\supset \operatorname {Fix}T\neq \emptyset \) and (7) holds at all \(y\in \operatorname {Fix}T\) with the same constant α, then T is said to be almost quasi α-firmly nonexpansive.

  5. (v)

    The mapping T is said to be pointwise asymptotically α-firmly nonexpansive at y with constant \(\alpha <1\) whenever

    $$\begin{aligned} &\forall \epsilon >0, \exists D_{\epsilon }(y)\subset G: \\ &\quad d(Tx,Ty)^{p} \leq (1+\epsilon )d(x,y)^{p}-\frac{1-\alpha }{\alpha } \psi ^{(p,c)}_{T}(x,y)\quad \forall x\in D_{\epsilon }(y), \end{aligned}$$
    (8)

    where \(D_{\epsilon }(y)\) is a neighborhood of y in D.

Proposition 6

(Characterizations)

Let \((G, d)\) be a p-uniformly convex space with constant \(c>0\), and let \(T:D\to G\) for \(D\subset G\).

  1. (i)
    $$\begin{aligned} \psi _{T}^{(p,c)}(x, y)=\frac{c}{2}d(Tx, x)^{p} \quad\textit{whenever } y\in \operatorname {Fix}T. \end{aligned}$$
    (9)

    For fixed \(y\in \operatorname {Fix}T\), the function \(\psi _{T}^{(p,c)}(x,y)\geq 0\) for all \(x \in D\) and \(\psi _{T}^{(p,c)}(x,y)= 0\) only when \(x\in \operatorname {Fix}T\).

  2. (ii)

    Let \(y\in \operatorname {Fix}T\). T is pointwise almost α-firmly nonexpansive at y on D with violation at most \(\epsilon >0\) if and only if

    $$\begin{aligned} \exists \alpha \in [0,1): \quad d(Tx,y)^{p}\leq (1+\epsilon )d(x,y)^{p} - \frac{1-\alpha }{\alpha }\frac{c}{2}d(Tx,x)^{p}\quad \forall x\in D. \end{aligned}$$
    (10)

    In particular, T is almost quasi α-firmly nonexpansive on D whenever T possesses fixed points and (10) holds at all \(y\in \operatorname {Fix}T\) with the same constant \(\alpha \in [0,1)\) and violation at most ϵ.

  3. (iii)

    If T is pointwise almost α-firmly nonexpansive at \(y\in \operatorname {Fix}T\) on D with constant \(\underline{\alpha }\in [0,1)\) and violation at most ϵ, then it is pointwise almost α-firmly nonexpansive at y with the same upper bound on the violaton violation on D for all \(\alpha \in [\underline{\alpha },1]\). In particular, if T is pointwise almost α-firmly nonexpansive at \(y\in \operatorname {Fix}T\) on D, then it is pointwise almost nonexpansive at y on D.

Proof

This is a slight extension of [4, Proposition 4], which was for pointwise α-firmly nonexpansive mappings. The proof for pointwise almost α-firmly nonexpansive mappings is the same. □

2.1 Composition of operators

Before continuing with pointwise almost α-firmly nonexpansive mappings, we make a brief but important observation about fixed points of compositions of quasi strictly nonexpansive mappings (6).

Lemma 7

Let \(T_{1}\) and \(T_{2}\) be quasi strictly nonexpansive on \((G,d)\) with \(\operatorname {Fix}T_{1} \cap \operatorname {Fix}T_{2} \neq \emptyset \). Then \(\operatorname {Fix}T_{1} \circ T_{2} = \operatorname {Fix}T_{1} \cap \operatorname {Fix}T_{2}\).

Proof

The inclusion \(\operatorname {Fix}T_{1} \cap \operatorname {Fix}T_{2} \subset \operatorname {Fix}(T_{1} \circ T_{2} )\) is clear. Assume that there exists \(y \in \operatorname {Fix}( T_{1} \circ T_{2}) \setminus (\operatorname {Fix}T_{1} \cap \operatorname {Fix}T_{2})\) and choose \(x \in \operatorname {Fix}T_{1} \cap \operatorname {Fix}T_{2}\). Then

$$\begin{aligned} d(y,x)\leq d(T_{1} \circ T_{2} y,x) \leq d(T_{2}y,x) \leq d(y,x) \end{aligned}$$

with either \(d(T_{2}y,x) < d(y,x)\) or \(d(T_{1} \circ T_{2} y,x) < d(T_{2} y,x)\) as \(y \notin \operatorname {Fix}T_{1} \cap \operatorname {Fix}T_{2}\). This is a contradiction, so \(\operatorname {Fix}T_{1} \circ T_{2} = \operatorname {Fix}T_{1} \cap \operatorname {Fix}T_{2}\). □

Remark 8

The sufficiency of strict quasi nonexpansivity for the analogous identity for convex combinations of mappings in a Hadamard space was recognized in [3, Remark 7.11].

Lemma 9

Let \((G, d)\) be a p-uniformly convex space with constant \(c>0\), and let \(D\subset G\). Let \({T_{0}}:D\to G\) be pointwise almost α-firmly nonexpansive at y on D with constant \(\alpha _{0}\) and violation \(\epsilon _{0}\), and let \({T_{1}}:{T_{0}}(D)\to G\) be pointwise almost α-firmly nonexpansive at \({T_{0}} y\) on \({T_{0}}(D)\) with constant \(\alpha _{1}\) and violation \(\epsilon _{1}\). Then the composition \(\overline{T}:={T_{1}}\circ {T_{0}}\) is pointwise almost α-firmly nonexpansive at y with constant \(\overline{\alpha }\in (0,1)\) and violation at most \({\overline{\epsilon }}=\epsilon _{0}+\epsilon _{1} + \epsilon _{0} \epsilon _{1}\) on D whenever

$$\begin{aligned} \frac{1-\overline{\alpha }}{\overline{\alpha }}\psi ^{(p,c)}_{ \overline{T}}(x,y) \leq (1+ \epsilon _{1}) \frac{1-\alpha _{0}}{\alpha _{0}}\psi ^{(p,c)}_{{T_{0}}}(x,y) + \frac{1-\alpha _{1}}{\alpha _{1}}\psi ^{(p,c)}_{{T_{1}}}({T_{0}}x,{T_{0}}y)\quad \forall x\in D. \end{aligned}$$
(11)

Proof

The proof is a slight extension of the same result for compositions of α-firmly nonexpansive mappings in [4, Lemma 10]. Since \({T_{1}}\) is pointwise α-firmly nonexpansive at \({T_{0}} y\) with violation \(\epsilon _{1}\) and constant \(\alpha _{1}\) on \({T_{0}}(D)\), we have

$$\begin{aligned} d(\overline{T}x,\overline{T}y)^{p}\leq (1+\epsilon _{1}) d({T_{0}} x,{T_{0}} y)^{p}-\frac{1-\alpha _{1}}{\alpha _{1}}\psi ^{(p,c)}_{{T_{1}}}({T_{0}}x,{T_{0}}y) \quad\forall x \in D, \end{aligned}$$

where \(\psi ^{(p,c)}_{{T_{1}}}\) is defined by (3). Since \({T_{0}}\) is α-firmly nonexpansive at y with constant \(\alpha _{0}\) with violation \(\epsilon _{0}\) on D, we have

$$\begin{aligned} &d(\overline{T}x,\overline{T}y)^{p}\\ &\quad\leq (1+\epsilon _{0}) (1+ \epsilon _{1}) d(x,y)^{p} -(1+\epsilon _{1}) \frac{1-\alpha _{0}}{\alpha _{0}}\psi ^{(p,c)}_{{T_{0}}}(x,y) - \frac{1-\alpha _{1}}{\alpha _{1}}\psi ^{(p,c)}_{{T_{1}}}({T_{0}}x,{T_{0}}y) \end{aligned}$$

for all \(x \in D\). Whenever (11) holds, we conclude that

$$\begin{aligned} \exists \overline{\alpha }\in (0,1):\quad d(\overline{T}x, \overline{T}y)^{p} \leq (1+{\overline{\epsilon }})d(x,y)^{p}- \frac{1-\overline{\alpha }}{\overline{\alpha }}\psi ^{(p,c)}_{ \overline{T}}(x,y) \quad\forall x\in D, \end{aligned}$$

where \({\overline{\epsilon }}= \epsilon _{0}+\epsilon _{1}+\epsilon _{0} \epsilon _{1}\). □

Proposition 10

(Compositions of pointwise almost α-firmly nonexpansive mappings)

Let \((G, d)\) be a p-uniformly convex space with constant \(c>0\), and let \(D\subset G\). Let \(T_{0}:D\to G\) be pointwise almost α-firmly nonexpansive at y on D with constant \(\alpha _{0}\) and violation \(\epsilon _{0}\), and let \(T_{1}:T_{0}(D)\to G\) be pointwise almost α-firmly nonexpansive at y on \(T_{0}(D)\) with constant \(\alpha _{1}\) and violation \(\epsilon _{1}\). Let \(y \in \operatorname {Fix}T_{0} \cap \operatorname {Fix}{T_{1}}\). Then the composite operator \(\overline{T}={T_{1}} \circ T_{0}\) is pointwise almost α-firmly nonexpansive at y on D with violation at most \({\overline{\epsilon }}=\epsilon _{0}+\epsilon _{1} + \epsilon _{0} \epsilon _{1}\) and constant

$$\begin{aligned} \overline{\alpha }= \frac{\alpha _{0}+\alpha _{1}-2\alpha _{0}\alpha _{1}}{\frac{c}{2} (1-\alpha _{0}-\alpha _{1}+\alpha _{0}\alpha _{1} )+ \alpha _{0}+\alpha _{1}-2\alpha _{0}\alpha _{1}}. \end{aligned}$$
(12)

Proof

This is a minor extension of [4, Theorem 11]. By Lemma 9, it suffices to show (11) at all points \(y\in \operatorname {Fix}{T_{1}}\cap \operatorname {Fix}{T_{0}}\). First, note that \(\operatorname {Fix}\overline{T}\supset \operatorname {Fix}{T_{1}}\cap \operatorname {Fix}{T_{0}}\), so by (9) we have \(\psi ^{(p.c)}_{T_{0}}(x,y)=\frac{c}{2}d(x,{T_{0}}x)^{p}\), \(\psi ^{(p.c)}_{T_{1}}({T_{0}}x,{T_{0}}y)= \frac{c}{2}d({T_{0}}x, \overline{T}x)^{p}\), and \(\psi ^{(p.c)}_{\overline{T}}(x,y)=\frac{c}{2}d(x,\overline{T}x)^{p}\) whenever \(y\in \operatorname {Fix}{T_{1}}\cap \operatorname {Fix}{T_{0}}\). Inequality (11) in this case simplifies to

$$\begin{aligned} \exists \overline{\kappa }>0:\quad \kappa _{0} (1+\epsilon _{1})d(x, {T_{0}}x)^{p}+ \kappa _{1} d({T_{0}}x, \overline{T}x)^{p} \geq \overline{\kappa }d(x, \overline{T}x)^{p}\quad \forall x\in D, \end{aligned}$$
(13)

where \(\kappa _{0}:=\frac{1-\alpha _{0}}{\alpha _{0}}\), \(\kappa _{1}:=\frac{1-\alpha _{1}}{\alpha _{1}}\), and \(\overline{\kappa }:= \frac{1-\overline{\alpha }}{\overline{\alpha }}\) with \(\overline{\alpha }\in (0,1)\). By (1), we have

$$\begin{aligned} \frac{c}{2}t(1-t)d(x,\overline{T}x)^{p}&\leq \frac{c}{2}t(1-t)d(x, \overline{T}x)^{p}+ d\bigl({T_{0}}x, (1-t)x\oplus t\overline{T}x\bigr)^{p} \\ &\leq (1-t)d({T_{0}}x,x)^{p}+ td({T_{0}}x, \overline{T}x)^{p}\quad \forall x\in G, \forall t\in (0,1). \end{aligned}$$
(14)

Letting \(t=\frac{\kappa _{1}}{\kappa _{0}+\kappa _{1}}\) yields \((1-t)=\frac{\kappa _{0}}{\kappa _{0}+\kappa _{1}}\), so that (14) becomes

$$\begin{aligned} \frac{c}{2}\frac{\kappa _{0}\kappa _{1}}{\kappa _{0}+\kappa _{1}}d(x, \overline{T}x)^{p} \leq (1+ \epsilon _{1}) \kappa _{0} d({T_{0}}x,x)^{p}+ \kappa _{1} d({T_{0}}x,\overline{T}x)^{p}\quad \forall x\in G. \end{aligned}$$
(15)

It follows that (13) holds for any \(\overline{\kappa }\in (0, \frac{c\kappa _{0}\kappa _{1}}{2(\kappa _{0}+\kappa _{1})} ]\). We conclude that the composition is quasi α-firmly nonexpansive with constant

$$\begin{aligned} \overline{\alpha }= \frac{\kappa _{0}+\kappa _{1}}{\frac{c}{2}\kappa _{0}\kappa _{1}+\kappa _{0}+\kappa _{1}}. \end{aligned}$$

A short calculation shows that this is the same as (12), which completes the proof. □

2.2 Averages of operators

Let \(\mathcal{B}(G)\) be the Borel algebra on \((G,d)\), \(\mathcal{P}\) be the family of probability measures on \((G,\mathcal{B}(G))\), and \(\mathcal{P}^{p}(G)\) be the family of probability measures on G such that the pth moment exists i.e.

$$\begin{aligned} \mathcal{P}^{p}(G):= \biggl\{ \nu \in \mathcal{P}(G) \biggm| \int _{y}\, d(x,y)^{p} \nu (dx) < \infty\ \forall x \in G\biggr\} . \end{aligned}$$

For \(\mu \in \mathcal{P}^{p}(G)\), the minimizer of

$$\begin{aligned} x\mapsto \int _{G} \,d(x,y)^{p} \nu (dy) \end{aligned}$$

is called p-barycenter of ν and denoted by \(b_{p}(\nu )\) if it exists. The p-barycenter of ν always exists if \((G,d)\) is a proper geodesic space and \(\mu \in \mathcal{P}^{p}(G)\) [9, Proposition 3.3].

Let \(T_{i} \colon G \rightarrow G, i\in I\) be a collection of mappings where I is an index space. Assume that \((I,\mathcal{I})\) is a measurable space and \((x,i) \mapsto T_{i}x\) is measurable. Let η be a probability measure on I and define \(b_{p}(T_{i},\eta )\colon G \rightarrow G\) by

$$\begin{aligned} b_{p}(T_{i},\eta ) (x) = \mathop {\operatorname {argmin}}_{y \in G} y \mapsto \int _{I} \,d(T_{i} x,y)^{p} \eta (di). \end{aligned}$$

We use the notation \(T_{i}x_{*}\eta \) for the push forward of η with respect to the mapping \(i \mapsto T_{i} x\) for fixed x i.e.

$$\begin{aligned} T_{i}x_{*}\eta (A) = \eta \bigl(\{i \mid T_{i}x \in A\}\bigr) \end{aligned}$$
(16)

for \(A \in \mathcal{B}(G)\). Then, by definition, \(b_{p}(T_{i},\eta )(x) = b_{p}(T_{i} x _{*}\eta )\).

Theorem 11

Let G be a proper, symmetric perpendicular, p-uniformly convex space with constant \(c>0\), and let \(T_{i}\), \(i \in I\) be a family of almost quasi α-firmly nonexpansive operators with violation \(\epsilon _{i}\) and constant \(\alpha _{i}\) respectively on G. Let η be a probability measure on I such that \(T_{i}x_{*}\eta \in \mathcal{P}^{p}(G) \) for all \(x\in G\). Then \(\mathscr{T}=b_{p}(T_{i},\eta )\) is a pointwise almost α-firm operator at any \(y \in \bigcap_{i\in I} \operatorname {Fix}T_{i}\) on G with constant \(\overline{\alpha }=\sup_{i\in I} \alpha _{i}\) and violation at most \({\overline{\epsilon }}= \sup_{i\in I} \epsilon _{i}\).

Proof

Let \(x \in G\) be arbitrary, \(y \in \bigcap_{i\in I} \operatorname {Fix}T_{i} \subset \operatorname {Fix}\mathscr{T} \), \(\nu = T_{i} x _{*}\eta \) as defined in (16) and \((d(\cdot,y)^{p}) _{*} \nu \) the push forward of ν. Then

$$\begin{aligned} d(\mathscr{T} x,\mathscr{T} y)^{p} = d\bigl(b_{p}(T_{i}x_{*} \eta ),y\bigr)^{p} = d\bigl(b_{p}(\nu ),y\bigr)^{p} \leq b_{p}\bigl(\bigl(d(\cdot,y)^{p}\bigr) _{*} \nu \bigr) \end{aligned}$$

by Jensen’s inequality [9, Theorem 4.1] since \(d(\cdot,y)^{p}\) is a convex function and the fact that \(\mathbb{R}\) is a p-uniformly convex space with constant c. Now

$$\begin{aligned} b_{p}\bigl(\bigl(d(\cdot,y)^{p}\bigr) _{*} \nu \bigr) &= \mathop {\operatorname {argmin}}_{t \in \mathbb{R}} \int _{z} \bigl\vert t- d(z,y)^{p} \bigr\vert ^{p} \nu (dz) \\ &= \mathop {\operatorname {argmin}}_{t \in \mathbb{R}} \int _{i} \bigl\vert t- d(T_{i}x,y)^{p} \bigr\vert ^{p} \eta (di) \\ &\leq \mathop {\operatorname {argmin}}_{t \in \mathbb{R}} \int _{i} \biggl\vert t- \biggl[(1+\epsilon _{i}) d(x,y)^{p}- \frac{1-\alpha _{i}}{\alpha _{i}} \frac{c}{2} d(T_{i} x,x)^{p}\biggr] \biggr\vert ^{p} \eta (di) \\ &\leq (1+{\overline{\epsilon }}) d(x,y)^{p} - \mathop {\operatorname {argmin}}_{t \in \mathbb{R}} \int _{i} \biggl\vert t - \frac{1-\overline{\alpha }}{\overline{\alpha }} \frac{c}{2} d(T_{i}x,x)^{p} \biggr\vert ^{p} \eta (di) \\ &= (1+{\overline{\epsilon }}) d(x,y)^{p} - \frac{1-\overline{\alpha }}{\overline{\alpha }} \frac{c}{2} \mathop {\operatorname {argmin}}_{t \in \mathbb{R}} \int _{z} \bigl\vert t - d(z,x)^{p} \bigr\vert ^{p} \nu (dz) \\ &=(1+{\overline{\epsilon }}) d(x,y)^{p} - \frac{1-\overline{\alpha }}{\overline{\alpha }} \frac{c}{2} b_{p} \bigl(\bigl(d( \cdot,y)^{p}\bigr) _{*} \nu \bigr). \end{aligned}$$

And again Jensen’s inequality completes the proof

$$\begin{aligned} &(1+{\overline{\epsilon }}) d(x,y)^{p} - b_{p} \biggl( \frac{1-\overline{\alpha }}{\overline{\alpha }} \frac{c}{2} d(x,T_{i}x)_{*} \eta \biggr) \\ &\quad\leq (1+{\overline{\epsilon }}) d(x,y)^{p} - \frac{1-\overline{\alpha }}{\overline{\alpha }} \frac{c}{2} d\bigl(x, b_{p} (T_{i}x_{*} \eta )\bigr)^{p}. \end{aligned}$$

 □

For a finite index set I, without loss of generality \(I=\{1,\ldots,n\}\) and a probability measure η on I, we can define \(\omega _{i}:= \eta (i)\) for all i. Then

$$\begin{aligned} b_{p}(T_{i},\eta ) (x)=\mathop {\operatorname {argmin}}_{z\in G} \sum _{i=1}^{n} \omega _{i} d(z,T_{i}x)^{p}. \end{aligned}$$

In case of a Hilbert space G and \(p=c=2\) this reduces further to \(b_{p}(T_{i},\eta )(x) = \sum_{i=1}^{n} \omega _{i} T_{i} x\).

If the support of the measure ν consists of two discrete points \(x_{1}\) and \(x_{2}\) i.e. for \(\omega \in [0,1]\), \(\nu = \omega \delta _{x_{1}}+(1-\omega )\delta _{x_{2}}\), then \(b_{p}(\nu )\) can be calculated explicitly. It is obvious that \(b_{p}(\nu )\) has to lie on the geodesic connecting \(x_{1}\) and \(x_{2}\). Hence \(b_{p}(\nu ) = \overline{t}x_{1} \oplus (1-\overline{t}) x_{2}\) for some \(\overline{t}\in [0,1]\). Minimizing the function \(t\mapsto \omega d( t x_{1} \oplus (1-t) x_{2}, x_{1})^{p}+ (1- \omega ) d( t x_{1} \oplus (1-t) x_{2}, x_{1})^{p}\) leads to \(\overline{t}= \frac{1}{\sqrt[p-1]{\frac{1-\omega }{\omega }}+1}\). If \(I=\{1,2\}\), \(T_{1}=T\) and \(T_{2}=\operatorname{Id}\), \(\eta =\omega \delta _{1}(\cdot )+ (1-\omega ) \delta _{2}(\cdot )\) for \(\omega \in [0,1]\), then \(T_{\beta }=b_{p}(T_{i},\eta )\) is the Krasnoselsky–Mann relaxation of T

$$\begin{aligned} T_{\beta }=\beta T \oplus (1-\beta ) \operatorname{Id} \quad\text{with } \beta = \frac{1}{\sqrt[p-1]{\frac{1-\omega }{\omega }}+1}. \end{aligned}$$

The next result shows that the convex combination of an almost nonexpansive mapping with the identity mapping can be made arbitrarily close to α-firmly nonexpansive (no violation) by choosing the averaging constant small enough—this can be interpreted as choosing an appropriately small step size.

Proposition 12

(Krasnoselsky–Mann relaxations)

Let \((G,d)\) be a p-uniformly convex space and \(T\colon G \rightarrow G\) be pointwise almost nonexpansive at all \(y \in \operatorname {Fix}T\) with violation ϵ. Then \(T_{\beta }:= \beta T \oplus (1-\beta ) \operatorname{Id}\) is pointwise almost α-firmly nonexpansive at all \(y \in \operatorname {Fix}T\) with constant

$$\begin{aligned} \alpha _{\beta }= \frac{\alpha \beta ^{p-1}}{\alpha \beta ^{p-1} -\alpha \beta + 1} \end{aligned}$$

and violation at most \(\epsilon _{\beta }:=\epsilon \beta \).

Proof

Clearly \(\operatorname {Fix}T = \operatorname {Fix}T_{\beta }\) and \(d(x,T_{\beta }x)^{p}=\beta ^{p} d(x,Tx)^{p}\). Let \(y \in Fix T_{\beta }\), then

$$\begin{aligned} d(y,T_{\beta }x)^{p} &=d\bigl(y, \beta Tx \oplus (1-\beta )x \bigr)^{p} \\ &\leq \beta d(y,Tx)^{p}+(1-\beta ) d(y,x)^{p}- \frac{c}{2} \beta (1- \beta ) d(x,Tx)^{p} \\ & \leq (1+\epsilon \beta ) d(x,y)^{p} - \frac{c}{2}\beta ^{1-p} \biggl( \frac{1-\alpha }{\alpha }+1-\beta \biggr) d(x,T_{\beta }x)^{p}. \end{aligned}$$

Setting

$$\begin{aligned} \frac{1-\alpha _{\beta }}{\alpha _{\beta }} = \beta ^{1-p} \biggl( \frac{1-\alpha }{\alpha }+1-\beta \biggr) \end{aligned}$$

and solving for \(\alpha _{\beta }\) yield the result. □

2.3 Metric subregularity

Recall that \(\mu:[0,\infty ) \to [0,\infty )\) is a gauge function if μ is continuous, strictly increasing with \(\mu (0)=0\), and \(\lim_{t\to \infty }\mu (t)=\infty \).

Definition 13

(Metric regularity on a set)

Let \((G_{1}, d_{1})\) and \((G_{2}, d_{2})\) be metric spaces, and let \(\Phi: G_{1}\rightrightarrows G_{2} \), \(U\subset G_{1}\), \(V\subset G_{2}\). The mapping Φ is called metrically regular with gauge μ on \(U\times V\) relative to \(\Lambda \subset G_{1}\) if

$$\begin{aligned} \forall y\in V, \forall x\in U\cap \Lambda,\quad \operatorname {dist}\bigl(x, \Phi ^{-1}(y)\cap \Lambda \bigr)\leq \mu \bigl(\operatorname {dist}\bigl(y, \Phi (x) \bigr) \bigr). \end{aligned}$$
(17)

When the set V consists of a single point, \(V=\{{\overline{y}}\}\), then Φ is said to be metrically subregular for on U with gauge μ relative to \(\Lambda \subset G_{1}\).

When μ is a linear function (that is, \(\mu (t)=\kappa t, \forall t\in [0,\infty )\)), this special case is distinguished as linear metric (sub)regularity with constant κ. When \(\Lambda =G_{1}\), the quantifier “relative to” is dropped. When μ is linear, the infimum of all constants κ for which (17) holds is called the modulus of metric regularity.

The next statement is obvious from the definition.

Proposition 14

Let \((G_{1}, d_{1})\) and \((G_{2}, d_{2})\) be metric spaces, and let \(\Phi: G_{1}\rightrightarrows G_{2} \), \(U\subset G_{1}\), \(V\subset G_{2}\). If Φ is metrically subregular with gauge μ at y on U relative to \(\Lambda \subset G_{1}\), then Φ is metrically subregular with the same gauge μ at y on all subsets \(U'\subset U\) relative to \(\Lambda \subset G_{1}\).

3 Quantitative convergence

To obtain convergence of fixed point iterations under the assumption of metric subregularity, the gauge of metric subregularity μ is constructed implicitly from another nonnegative function \(\theta: [0,\infty ) \to [0,\infty ) \) satisfying

$$\begin{aligned} \mathrm{(i)}\quad \theta (0)=0;\qquad \mathrm{(ii)}\quad 0< \theta (t)< t \quad \forall t>0;\qquad \mathrm{(iii)}\quad \sum _{j=1}^{\infty }\theta ^{(j)}(t)< \infty\quad \forall t\geq 0. \end{aligned}$$
(18)

For a p-uniformly convex space the operative gauge satisfies

$$\begin{aligned} \mu \biggl( \biggl( \frac{(1+\epsilon )t^{p}- (\theta (t) )^{p}}{\tau } \biggr)^{1/p} \biggr)= t \quad\iff\quad \theta (t) = \bigl((1+ \epsilon )t^{p} - \tau \bigl(\mu ^{-1}(t) \bigr)^{p} \bigr)^{1/p} \end{aligned}$$
(19)

for \(\tau >0\) fixed and θ satisfying (18).

In the case of linear metric subregularity on a \(\operatorname{CAT}(\kappa)\) space this becomes

$$\begin{aligned} t\mapsto \mu t \quad\iff\quad \theta (t)= \biggl((1+\epsilon )- \frac{\tau }{\mu ^{2}} \biggr)^{1/2}t \biggl(\mu \geq \sqrt{ \frac{\tau }{(1+\epsilon )}}\biggr). \end{aligned}$$

If (17) is satisfied for some \(\mu '>0\), then the condition \(\mu \geq \sqrt{\frac{\tau }{(1+\epsilon )}}\) is satisfied for all \(\mu \geq \mu '\) large enough. The conditions in (18) in this case simplify to \(\theta (t)=\gamma t\), where

$$\begin{aligned} 0< \gamma:=\sqrt{1+\epsilon -\frac{\tau }{\mu ^{2}}}< 1 \quad\iff\quad \sqrt{ \frac{\tau }{(1+\epsilon )}} < \mu < \sqrt{ \frac{\tau }{\epsilon }}. \end{aligned}$$
(20)

The next definition characterizes the quantitative convergence of sequences in terms of gauge functions.

Definition 15

(Gauge monotonicity [10])

Let \((G,d)\) be a metric space, let \((x_{k})_{k\in \mathbb{N}}\) be a sequence on G, let \(D\subset G\) be nonempty, and let the continuous mapping \(\mu: \mathbb{R}_{+} \to \mathbb{R}_{+} \) satisfy \(\mu (0)=0\) and

$$\begin{aligned} &\mu (t_{1})< \mu (t_{2})\leq t_{2}\quad \text{whenever } 0\le t_{1}< t_{2}. \end{aligned}$$
  1. (i)

    \((x_{k})_{k\in \mathbb{N}}\) is said to be gauge monotone with respect to D with rate μ whenever

    $$\begin{aligned} d(x_{k+1}, D)\leq \mu \bigl(d(x_{k}, D) \bigr)\quad \forall k\in \mathbb{N}. \end{aligned}$$
    (21)
  2. (ii)

    \((x_{k})_{k\in \mathbb{N}}\) is said to be linearly monotone with respect to D with rate c if (21) is satisfied for \(\mu (t)=c\cdot t\) for all \(t\in \mathbb{R}_{+}\) and some constant \(c\in [0,1]\).

A sequence \((x_{k})_{k\in \mathbb{N}}\) is said to converge gauge monotonically to some element \(x^{*}\in G\) with rate \(s_{k}(t):=\sum_{j=k}^{\infty }\mu ^{(j)}(t)\) whenever it is gauge monotone with gauge μ satisfying \(\sum_{j=1}^{\infty }\mu ^{(j)}(t)<\infty \ \forall t\geq 0\), and there exists a constant \(a>0\) such that \(d(x_{k},x^{*})\leq a s_{k}(t)\) for all \(k\in \mathbb{N}\).

All Fejér monotone sequences are linearly monotone (with constant \(c=1\)) but the converse does not hold (see Proposition 1 and Example 1 of [10]). Gauge-monotonic convergence for a linear gauge in the definition above is just R-linear convergence.

Metric subregularity and pointwise (almost) nonexpansiveness are fundamentally connected through the surrogate mapping \(\mathcal{T}_{S}: G \to \mathbb{R}_{+} \cup \{+\infty \}\) defined by

$$\begin{aligned} \mathcal{T}_{S}(x):= \Bigl\vert \Bigl(\inf _{y\in S}\psi ^{(p,c)}_{T}(x,y) \Bigr)^{1/p} \Bigr\vert , \end{aligned}$$
(22)

where \(\psi ^{(p,c)}_{T}\) is defined by (3) and \(S\subset G\). If \(S=\emptyset \) then, by definition, \(\mathcal{T}_{S}(x):=+\infty \) for all x. Hence, \(\mathcal{T}_{S}\) is proper when S is nonempty. For our purposes, \(S\subseteq \operatorname {Fix}T\), in which case by Proposition 6(i) we have \(\psi ^{(p,c)}_{T}(x,y)\geq 0\) for all \(x\in D\) and all \(y\in S\) and \(\psi ^{(p,c)}_{T}(x,y)= 0\) only when both \(x,y\in \operatorname {Fix}T\). Hence \(\mathcal{T}_{S}\) is nonnegative, takes the value 0 only on FixT, and has the simple representation

$$\begin{aligned} \mathcal{T}_{S}(x)= \sqrt[p]{\frac{2}{c}}d(Tx,x)>0\quad (S\neq \emptyset ). \end{aligned}$$
(23)

Theorem 16

(Necessary and sufficient conditions for convergence rates)

Let \((G,d)\) be a complete p-uniformly convex space with constant c; let \(D\subset G\) with \((D, d)\), let \(T: D \to D \) with \((T(D), d)\) compact on bounded subsets, and let \(S:=\operatorname {Fix}T\cap D\) be nonempty. Assume further that T is pointwise almost α-firmly nonexpansive at all points \(y\in S\) with the same constant α̅ and violation at most ϵ on D.

  1. (a)

    (necessity) Suppose that all sequences \((x^{k})_{k\in \mathbb{N}}\) defined by \(x^{k+1}=Tx^{k}\) and initialized in D are gauge monotone relative to S with rate θ satisfying (18), and \((\operatorname {Id}- \theta )^{-1}(\cdot )\) is continuous on \(\mathbb{R}_{+}\), strictly increasing, and \((\operatorname {Id}- \theta )^{-1}(0)=0\). Then all sequences initialized on D converge gauge monotonically to some \(\overline{x}\in S\) with rate \(O(s_{k}(t_{0}))\) where \(s_{k}(t):=\sum_{j=k}^{\infty }\theta ^{(j)}(t)\) and \(t_{0}:=d(x^{0},\operatorname {Fix}T\cap D)\). Moreover, \(\mathcal{T}_{S}\) defined by (22) is metrically subregular for 0 relative to D on D with gauge \(\mu (\cdot )=(\operatorname {Id}-\theta )^{-1}(\cdot )\).

  2. (b)

    (sufficiency) Let T satisfy

    $$\begin{aligned} d(x,\operatorname {Fix}T\cap D)\leq \mu \bigl(d(x,Tx)\bigr), \quad\forall x\in D, \end{aligned}$$
    (24)

    with gauge μ given implicitly by (19) with θ satisfying (18) for \(\tau =(1-\overline{\alpha })/\overline{\alpha }\) and \(\epsilon \geq 0\) an upper bound on the violation of pointwise α firmness of T on D. Then, for any \(x^{0}\in D\), the sequence \((x^{k})_{k\in \mathbb{N}}\) defined by \(x^{k+1}= T x^{k}\) satisfies

    $$\begin{aligned} d \bigl(x^{k+1},\operatorname {Fix}T\cap D \bigr) \leq \theta \bigl(d \bigl(x^{k}, \operatorname {Fix}T\cap D \bigr) \bigr) \quad\forall k \in \mathbb{N}. \end{aligned}$$
    (25)

    Moreover, the sequence \((x^{k})_{k\in \mathbb{N}}\) converges gauge monotonically to some \(x^{*}\in \operatorname {Fix}T\cap D\) with rate \(O(s_{k}(t_{0}))\) where \(s_{k}(t):=\sum_{j=k}^{\infty }\theta ^{(j)}(t)\) and \(t_{0}:=d(x^{0},\operatorname {Fix}T\cap D)\).

Before proving this theorem, we collect some intermediate results.

Lemma 17

(Gauge monotonicity and almost quasi α-firmness implies convergence to fixed points)

Let \((G, d)\) be a complete p-uniformly convex space with constant c. Let \(T: G \to G \) with \(T(D)\subseteq D\subseteq G\) boundedly compact. Suppose that \(\operatorname {Fix}T\cap D\) is nonempty and that T is pointwise almost α-firmly nonexpansive at all \(y\in \operatorname {Fix}T\cap D\) with the same constant α̅ and violation at most ϵ on D. If the sequence \((x^{k})_{k\in \mathbb{N}}\) defined by \(x^{k+1}= Tx^{k}\) and initialized in D is gauge monotone relative to \(\operatorname {Fix}T\cap D\) with rate θ satisfying (18), then \((x^{k})_{k\in \mathbb{N}}\) converges gauge monotonically to some \(x^{*}\in \operatorname {Fix}T\cap D\) with rate \(O(s_{k}(t_{0}))\) where \(s_{k}(t):=\sum_{j=k}^{\infty }\theta ^{(j)}(t)\) and \(t_{0}:=d(x^{0},\operatorname {Fix}T\cap D)\).

Proof

The assumption that T is pointwise almost α-firmly nonexpansive at all \(y\in \operatorname {Fix}T\cap D\) with constant α̅ and violation at most ϵ on D yields

$$\begin{aligned} d(Tx, y)^{p}\leq (1+\epsilon )d(x, y)^{p}- \frac{c(1-\overline{\alpha })}{2\overline{\alpha }}d(Tx,x)^{p}, \quad\forall x\in D. \end{aligned}$$

Let \(x^{0}\in D\) and define the sequence \(x^{k+1}= Tx^{k}\) for all \(k\in \mathbb{N}\). Since T is pointwise almost α-firmly nonexpansive at all points in \(\operatorname {Fix}T\cap D\) on D, \(\operatorname {Fix}T\cap D\) is closed and \(P_{\operatorname {Fix}T\cap D}x^{k}\) is nonempty (though possibly set-valued) for all k. Denote any selection by \(\bar{x}^{k}\in P_{\operatorname {Fix}T\cap D}x^{k}\) for each \(k\in \mathbb{N}\). Then

$$\begin{aligned} d\bigl(x^{k+1}, \bar{x}^{k}\bigr)^{p}\leq (1+ \epsilon )d\bigl(x^{k}, \bar{x}^{k}\bigr)^{p}- \frac{c(1-\overline{\alpha })}{2\overline{\alpha }} d\bigl(x^{k}, x^{k+1}\bigr)^{p},\quad \forall k\in \mathbb{N}, \end{aligned}$$

which implies that

$$\begin{aligned} d\bigl(x^{k}, x^{k+1}\bigr)\leq \biggl( \frac{2\overline{\alpha }(1+\epsilon )}{c(1-\overline{\alpha })} \biggr)^{1/p}d\bigl(x^{k}, \bar{x}^{k}\bigr),\quad \forall k\in \mathbb{N}. \end{aligned}$$

On the other hand \(d(x^{k}, \bar{x}^{k})= d(x^{k}, \operatorname {Fix}T\cap D) \leq \theta (d(x^{k-1}, \operatorname {Fix}T\cap D) )\) since \((x^{k})_{k\in \mathbb{N}}\) is gauge monotone relative to \(\operatorname {Fix}T\cap D\) with rate θ. Therefore an iterative application of gauge monotonicity yields

$$\begin{aligned} d\bigl(x^{k}, x^{k+1}\bigr) \leq \biggl( \frac{2\overline{\alpha }(1+\epsilon )}{c(1-\overline{\alpha })} \biggr)^{1/p}\theta ^{(k)} \bigl(d\bigl(x^{0}, \operatorname {Fix}T \cap D\bigr) \bigr),\quad \forall k\in \mathbb{N}. \end{aligned}$$

Let \(t_{0}=d(x^{0}, \operatorname {Fix}T\cap D)\). For any given natural numbers \(k,l\) with \(k< l\), an iterative application of the triangle inequality yields the upper estimate

$$\begin{aligned} d\bigl(x^{k}, x^{l}\bigr)&\leq d\bigl(x^{k}, x^{k+1}\bigr)+d\bigl(x^{k+1}, x^{k+2}\bigr)+\cdots+d \bigl(x^{l-1}, x^{l}\bigr) \\ &\leq \biggl( \frac{2\overline{\alpha }(1+\epsilon )}{c(1-\overline{\alpha })} \biggr)^{1/p} \bigl( \theta ^{(k)}(t_{0})+\theta ^{(k+1)}(t_{0})+ \cdots +\theta ^{(l-1)}(t_{0}) \bigr) \\ &< \biggl( \frac{2\overline{\alpha }(1+\epsilon )}{c(1-\overline{\alpha })} \biggr)^{1/p} s_{k}(t_{0}), \end{aligned}$$

where \(s_{k}(t_{0}):=\sum_{j=k}^{\infty }\theta ^{(j)}(t_{0})<\infty \) for θ satisfying (18). Since \((\theta ^{(k)}(t_{0}))_{k\in \mathbb{N}}\) is a summable sequence of nonnegative numbers, the sequence of partial sums \(s_{k}(t_{0})\) converges to zero monotonically as \(k\to \infty \), and hence \((x^{k})_{k\in \mathbb{N}}\) is a Cauchy sequence and \(x^{k}\to x^{*}\) for some \(x^{*}\in G\). Letting \(l\to +\infty \) yields

$$\begin{aligned} \lim_{l\to +\infty }d\bigl(x^{k}, x^{l}\bigr)=d \bigl(x^{k}, x^{*}\bigr)\leq a s_{k}(t_{0}),\qquad a:= \biggl( \frac{2\overline{\alpha }(1+\epsilon )}{c(1-\overline{\alpha })} \biggr)^{1/p}. \end{aligned}$$

Therefore \((x^{k})_{k\in \mathbb{N}}\) converges gauge monotonically to \(x^{*}\) with rate \(O(s_{k}(t_{0}))\).

It remains to show that \(x^{*}\in \operatorname {Fix}T\cap D\). Note that for each \(k\in \mathbb{N}\)

$$\begin{aligned} d\bigl(x^{k}, \bar{x}^{k}\bigr)=d\bigl(x^{k}, \operatorname {Fix}T\cap D\bigr)\leq \theta ^{(k)}(t_{0}), \end{aligned}$$

which yields \(\lim_{k} d(x^{k}, \bar{x}^{k})=0\). But by the triangle inequality

$$\begin{aligned} d\bigl(\bar{x}^{k}, x^{*}\bigr)\leq d\bigl(x^{k}, \bar{x}^{k}\bigr)+d\bigl(x^{k}, x^{*}\bigr), \end{aligned}$$

so \(\lim_{k}d(\bar{x}^{k}, x^{*})=0\). By construction \((\bar{x}^{k})_{k\in \mathbb{N}}\subseteq \operatorname {Fix}T\cap D\) and \(\operatorname {Fix}T\cap D\) is closed, hence \(x^{*}\in \operatorname {Fix}T\cap D\). □

Proposition 18

([4], Theorem 32)

Let \((G, d)\) be a p-uniformly convex metric space with constant c. Let \(T:D\to D\) with \(D\subseteq G\). Suppose that \(S:=\operatorname {Fix}T\cap D\) is nonempty. Suppose that all sequences \((x^{k})_{k\in \mathbb{N}}\) defined by \(x^{k+1}=Tx^{k}\) and initialized in D are gauge monotone relative to S with rate θ satisfying (18). Suppose, in addition, that \((\operatorname {Id}- \theta )^{-1}(\cdot )\) is continuous on \(\mathbb{R}_{+}\), strictly increasing, and \((\operatorname {Id}- \theta )^{-1}(0)=0\). Then \(\mathcal{T}_{S}\) defined by (22) is metrically subregular for 0 relative to D on D with gauge \(\mu (\cdot )=(\operatorname {Id}-\theta )^{-1}(\cdot )\).

Proof of Theorem 16

Part (a). This follows immediately from Lemma 17 and Proposition 18.

Part (b). Our pattern of proof follows the same logic as the analogous result for set-valued mappings in a Euclidean space setting [11, Theorem 2.2]. Since \(S = \operatorname {Fix}T\cap D\), Proposition 6(i) establishes that \(\psi (x,y)=\frac{c}{2}d(x, Tx)^{p}\) for all \(y\in \operatorname {Fix}T\), so \(\mathcal{T}_{S}(x)=\frac{c}{2}d(x, Tx)\). Also by Proposition 6(i) \(\mathcal{T}_{S}\) takes the value 0 only on FixT, that is, \(\mathcal{T}_{S}^{-1}(0)=\operatorname {Fix}{T}\). So by assumption that \(\mathcal{T}_{S}\) satisfies (24) with gauge μ given by (19) for \(\tau =(1-\overline{\alpha })/\overline{\alpha }\), together with the definition of metric subregularity (Definition 13), this yields

$$\begin{aligned} d(x, \operatorname {Fix}T\cap D) &= d\bigl(x, \mathcal{T}_{S}^{-1}(0) \cap D\bigr) \\ &\le \mu \bigl(\mathcal{T}_{S}(x) \bigr) = \mu \bigl(d(x, Tx)\bigr)\quad \forall x\in D. \end{aligned}$$

In other words,

$$\begin{aligned} \frac{1-\overline{\alpha }}{\overline{\alpha }} \bigl(\mu ^{-1} \bigl(d(x,\operatorname {Fix}T\cap D) \bigr) \bigr)^{p} \leq \frac{1-\overline{\alpha }}{\overline{\alpha }}d(x, Tx)^{p}\quad \forall x\in D. \end{aligned}$$
(26)

On the other hand, by the assumption that T is pointwise almost α-firmly nonexpansive at all points \(y\in S\) with the same constant α̅ and violation at most ϵ on D we have

$$\begin{aligned} 0 &\leq \frac{1-\overline{\alpha }}{\overline{\alpha }}d(x, Tx)^{p} \\ &\leq (1+\epsilon ) d(x, y)^{p}-d(Tx, y)^{p}\quad \forall y\in \operatorname {Fix}T\cap D, \forall x\in D. \end{aligned}$$
(27)

Incorporating (26) into (27) and rearranging the inequality yields

$$\begin{aligned} &d(Tx, y)^{p} \leq (1+\epsilon )d(x, y)^{p} - \frac{1-\overline{\alpha }}{\overline{\alpha }} \bigl(\mu ^{-1} \bigl(d(x, \operatorname {Fix}T\cap D) \bigr) \bigr)^{p} \\ & \quad \forall y\in \operatorname {Fix}T\cap D , \forall x\in D. \end{aligned}$$

Since this holds at any \(x\in D\), it certainly holds at the iterates \(x^{k}\) with initial point \(x^{0}\in D\) since T is a self-mapping on D. Therefore for all \(k\in \mathbb{N}\)

$$\begin{aligned} & d\bigl(x^{k+1}, y\bigr) \leq \sqrt[p]{ (1+\epsilon )d \bigl(x^{k}, y\bigr)^{p} - \frac{1-\overline{\alpha }}{\overline{\alpha }} \bigl(\mu ^{-1} \bigl(d \bigl(x^{k}, \operatorname {Fix}T\cap D \bigr) \bigr) \bigr)^{p}} \\ & \quad\forall y\in \operatorname {Fix}T\cap D. \end{aligned}$$
(28)

Equation (28) simplifies as follows. Since the space \((T(D), d)\) is boundedly compact and FixT is closed by continuity, for every \(k\in \mathbb{N}\), the distance \(d(x^{k}, \operatorname {Fix}T\cap D)\) is attained at some \(y^{k}\in \operatorname {Fix}T\cap D\). This yields

$$\begin{aligned} &d\bigl(x^{k+1}, y^{k+1}\bigr)^{p} \leq d \bigl(x^{k+1}, y^{k}\bigr)^{p} \leq (1+\epsilon )d \bigl(x^{k}, y^{k}\bigr)^{p} - \frac{1-\overline{\alpha }}{\overline{\alpha }} \bigl( \mu ^{-1} \bigl(d\bigl(x^{k}, y^{k}\bigr) \bigr) \bigr)^{p} \\ &\quad\quad\forall k \in \mathbb{N}. \end{aligned}$$
(29)

Taking the pth root and recalling (19) yields (25).

This establishes also that the sequence \((x^{k})_{k\in \mathbb{N}}\) is gauge monotone relative to \(\operatorname {Fix}T\cap D\) with rate θ satisfying Eq. (18). By Lemma 17 it follows that the sequence \((x^{k})_{k\in \mathbb{N}}\) converges gauge monotonically to \(x^{*}\in \operatorname {Fix}T\cap D\) with the rate \(O(s_{k}(d(x^{0},\operatorname {Fix}T\cap D)))\) where \(s_{k}(t):=\sum_{j=k}^{\infty }\theta ^{(j)}(t)\). □

Corollary 19

(Linear convergence)

  1. (a)

    (necessity) In the setting of Theorem 16(a), if all sequences \((x^{k})_{k\in \mathbb{N}}\) defined by \(x^{k+1}=Tx^{k}\) and initialized in D are linearly monotone relative to S with rate \(\gamma <1\), then all sequences initialized on D converge R-linearly to some \(\overline{x}\in S\) with rate \(O(\gamma ^{k})\). Moreover, \(\mathcal{T}_{S}\) defined by (22) is linearly metrically subregular for 0 relative to D on D with gauge \(\mu (t)=(1-\gamma )^{-1}t\).

  2. (b)

    (sufficiency) In the setting of Theorem 16(b) suppose that T satisfies

    $$\begin{aligned} d(x,\operatorname {Fix}T\cap D)\leq \mu d(x,Tx),\quad \forall x\in D, \end{aligned}$$
    (30)

    with the scalar μ satisfying

    $$\begin{aligned} \sqrt[p]{\frac{1-\overline{\alpha }}{\overline{\alpha }(1+\epsilon )}}< \mu < \sqrt[p]{\frac{1-\overline{\alpha }}{\overline{\alpha }\epsilon }}. \end{aligned}$$

    Then, for any \(x^{0}\in D\), the sequence \((x^{k})_{k\in \mathbb{N}}\) defined by \(x^{k+1}= T x^{k}\) is R-linearly convergent to a point in \(\operatorname {Fix}T\cap D\) with rate \(\gamma = \sqrt[p]{1+\epsilon -\frac{1-\overline{\alpha }}{\overline{\alpha }\mu ^{p}}}\).

In the statements above, the upper bound on the violation of α-firm nonexpansiveness ϵ has to be compensated for by an equally strong gauge of metric subregularity with this value of ϵ explicitly accounted for in the gauge. The next result shows that these can be decoupled if T is pointwise asymptotically α-firmly nonexpansive at fixed points. In particular, if T is pointwise almost α-firmly nonexpansive at \(y\in \operatorname {Fix}T\) with arbitrarily small violation ϵ, then whenever T is (gauge) metrically subregular at y, there is a neighborhood of y on which convergence of the fixed point iteration can be quantified by the said gauge. In this situation it suffices to qualitatively determine metric subregularity—the exact value of the constants in relation to the violation of α-firmness is not needed in order to determine local convergence on the order of the gauge.

Proposition 20

Let \((G,d)\) be a complete p-uniformly convex space with constant c; let \(D\subset G\), let \(T: D \to D \) with \(T(D)\) boundedly compact, and let \(S:=\operatorname {Fix}T\cap D\) be nonempty. Assume that T is a self-mapping on sufficiently small balls around points in S restricted to D, and that T is pointwise asymptotically α-firmly nonexpansive at all points \(y\in S\) with constant \(\overline{\alpha }\in (0,1)\). Suppose further that T satisfies

$$\begin{aligned} d(x,\operatorname {Fix}T\cap D)\leq \mu \bigl(d(x,Tx)\bigr), \quad\forall x\in D, \end{aligned}$$
(31)

with gauge μ given by (19) and \(\tau =(1-\overline{\alpha })/\overline{\alpha }\). Then, for any \(x^{0}\) close enough to S, the sequence \((x^{k})_{k\in \mathbb{N}}\) defined by \(x^{k+1}= T x^{k}\) converges gauge monotonically to some \(x^{*}\in \operatorname {Fix}T\cap D\) with rate \(O(s_{k}(t_{0}))\) where \(s_{k}(t):=\sum_{j=k}^{\infty }\theta ^{(j)}(t)\) and \(t_{0}:=d(x^{0},\operatorname {Fix}T)\) for θ given implicitly by (19) satisfying (18).

Proof

Since T is a self-mapping on \(\mathbb{B}_{\delta }(S)\cap D\) for δ small enough, and T is pointwise asymptotically α-firmly nonexpansive with constant \(\overline{\alpha }\in (0,1)\), the result follows immediately from Proposition 14 and Theorem 16 when the domain D is restricted to \(\mathbb{B}_{\delta }(S)\) for δ sufficiently small. □

4 Proximal mappings

We return now to the prox mapping (2). It was shown in [8, Proposition 2.7] that the argmin in (2) exists and is unique if f is proper, lsc, and convex. In general the prox mapping of a convex function is not α-firmly nonexpansive. However, it was shown in [4, Corollary 23] that it is almost α-firmly nonexpansive. This and other properties of prox mappings are collected in the following result.

Theorem 21

Let \((G,d)\) be a p-uniformly convex metric space with constant \(c\in (0,2]\), and let \(f\colon G \rightarrow \mathbb{R}\) be proper, convex, and lower semicontinuous.

  1. (i)

    If \((G,d)\) is symmetric perpendicular, then \(\operatorname {prox}^{p}_{f,\lambda }\) is quasi strictly nonexpansive, that is,

    $$\begin{aligned} d\bigl(\operatorname {prox}^{p}_{f,\lambda }(x),\overline{x}\bigr)< d(x,\bar{x})\quad \forall x \in G, \forall \overline{x}\in \mathop {\operatorname {argmin}}f = \operatorname {Fix}\operatorname {prox}^{p}_{f, \lambda }. \end{aligned}$$
  2. (ii)

    The prox mapping \(\operatorname {prox}_{f,\lambda }^{p}\) is pointwise almost α-firmly nonexpansive at all \(y\in \mathop {\operatorname {argmin}}f\) on G with constant

    $$\begin{aligned} \alpha _{c} = \frac{c(c-1)}{c(c-1)+2} \quad\textit{and violation bounded above by}\quad \epsilon _{c}= \frac{2-c}{c-1}. \end{aligned}$$
    (32)
  3. (iii)

    If \((G,d)\) is a \(\operatorname{CAT}(\kappa )\) space, then \(\operatorname {prox}_{f,\lambda }^{p}\) is pointwise asymptotically α-firmly nonexpansive at all \(y\in \mathop {\operatorname {argmin}}f\) with constant \(\overline{\alpha }= 1/2\).

Proof

(i). Let \(x \in G\) be arbitrary and \(y:= \operatorname {prox}^{p}_{f,\lambda }(x)\). We prove by contradiction that the projection of x onto the geodesic \([\overline{x},y]\) connecting and y is y i.e. \(P_{[\overline{x},y]}(x)= y\). Therefore assume that \(P_{[\overline{x},y]}(x)\neq y\) i.e. \(P_{[\overline{x},y]}(x)= (1-t)y \oplus t \overline{x}\) for some \(t \in (0,1]\). Then \(f((1-t)y \oplus t \overline{x}) \leq (1-t) f(y) + t f(\overline{x}) \leq f(y)\) and \(d((1-t)y \oplus t \overline{x},x) < d(y,x)\). Now

$$\begin{aligned} f\bigl((1-t)y \oplus t \overline{x}\bigr)+\frac{1}{p \lambda ^{p-1}}d\bigl((1-t)y \oplus t \overline{x},x\bigr)^{p}< f(y) +\frac{1}{p \lambda ^{p-1}}d(y,x) \end{aligned}$$

contradicts \(y= \operatorname {prox}^{p}_{f,\lambda }(x)\). Hence our assumption must be discarded and \(P_{[\overline{x},y]}(x) = y\). In particular \([\overline{x},y] \perp _{y} [x,y]\), and hence by the symmetric perpendicular property \([x,y] \perp _{y} [\overline{x},y]\). Now \([x,y] \perp _{y} [\overline{x},y]\) in turn yields the claim \(d(y,\overline{x})\leq d(x,\overline{x})\).

If in addition \((G,d)\) is a p-uniformly convex space

$$\begin{aligned} d(\overline{x}, y)^{p}&=d\bigl(\overline{x},P_{[x,y]}( \overline{x})\bigr)^{d} \leq d\biggl(\overline{x}, \frac{1}{2} x \oplus \frac{1}{2} y\biggr)^{p} \\ &\leq \frac{1}{2} d(\overline{x},x)^{p} + \frac{1}{2} d( \overline{x}, y)^{p}- \frac{c}{2} \frac{1}{4} d(x,y)^{p} \\ &\leq d(\overline{x}, x)^{p} - \frac{c}{2} \frac{1}{4} d(x,y)^{p}. \end{aligned}$$

This is only possible if either \(x=y\) or \(d(\overline{x},y) < d(\overline{x},x)\). In both cases \(\operatorname {prox}^{p}_{f,\lambda }\) is quasi strictly nonexpansive.

(ii). This is [4, Corollary 23].

(iii). Let \(\epsilon >0\), \(y\in \mathop {\operatorname {argmin}}f\), and \(c=\frac{2+\epsilon }{1+\epsilon }\). Then \(c \in (1,2)\) and there is a unique solution \(t \in (0, \pi /2)\) to \(\frac{c}{2} = t \tan (\pi /2-t) \). Set \(\delta =t/(2 \sqrt{\kappa })\). Then \(\delta \in (0,\pi /(4 \sqrt{\kappa }))\) and \((\mathbb{B}_{\delta }(y), d \vert _{\mathbb{B}_{\delta }(y)})\) is a p-uniformly convex space with constant c by Lemma 1. Part (i) ensures that \(\operatorname {prox}^{p}_{f,\lambda }\) is a self-mapping on \(\mathbb{B}_{\delta }(y)\), and hence

$$\begin{aligned} \mathop {\operatorname {argmin}}_{z \in G} f(z)+ \frac{1}{p \lambda ^{p-1}} d(z,x)^{p} = \mathop {\operatorname {argmin}}_{z \in \mathbb{B}_{\delta }(y)} f(y)+ \frac{1}{p \lambda ^{p-1}} d(y,x)^{p}, \end{aligned}$$

and we will use the same notation \(\operatorname {prox}^{p}_{f,\lambda }\) for the operator restricted to the subspace \((\mathbb{B}_{\delta }(y), d \vert _{\mathbb{B}_{\delta }(y)})\). The operator \(\operatorname {prox}^{p}_{f,\lambda }\) is pointwise almost α-firmly nonexpansive with constant \(\alpha _{c}=\frac{c(c-1)}{c(c-1)+2}\) and violation bounded above by \(\epsilon _{c} = \frac{2-c}{c-1} =\epsilon \) on \((\mathbb{B}_{\delta }(y), d \vert _{\mathbb{B}_{\delta }(y)})\) by part (ii). Note that \(\alpha _{c}\nearrow 1/2\) as \(c\nearrow 2\) and by Propositon 6(iii) \(\operatorname {prox}^{p}_{f,\lambda }\) is pointwise almost α-firmly nonexpansive with constant \(\overline{\alpha }=1/2\) for all \(c\in (0,2)\). Since \(\epsilon _{c}\searrow 0\) as \(c\nearrow 2\), this implies that \(\operatorname {prox}^{p}_{f,\lambda }\) is pointwise asymptotically α-firmly nonexpansive at \(y\in \mathop {\operatorname {argmin}}f\) with constant \(\overline{\alpha }= 1/2\) and the neighborhood \(\mathbb{B}_{\delta }(y)\) of y. □

Remark 22

In part (i) of Theorem 21, quasi nonexpansiveness comes from the symmetric perpendicular property. Strict quasi nonexpansiveness comes from the property of p-uniformly convex spaces.

The next result gathers properties of mappings built from prox mappings.

Proposition 23

Let \((G,d)\) be a p-uniformly convex space with constant \(c\in (1,2]\), and let \(f,g\colon G \rightarrow \mathbb{R}\cup \{+\infty \}\) be convex, proper, and lower semicontinuous. Assume that \(\mathop {\operatorname {argmin}}f \cap \mathop {\operatorname {argmin}}g \neq \emptyset \).

  1. (i)

    The operator \(T:= \operatorname {prox}_{f,\lambda _{2}}^{p} \circ \operatorname {prox}_{g,\lambda _{1}}^{p}\) (\(\lambda _{1}, \lambda _{2}>0\)) is pointwise almost α-firmly nonexpansive at all points \(y\in \Omega:=\mathop {\operatorname {argmin}}f\cap \mathop {\operatorname {argmin}}g\) on G with constant \(\alpha _{\circ }= \frac{2 (c-1)}{2c-1}\) and violation bounded above by \(\epsilon _{\circ }=\frac{1}{(c-1)^{2}}-1\). If \((G, d)\) is a \(\operatorname{CAT}( \kappa)\) space, then T is pointwise asymptotically α-firmly nonexpansive at all \(y\in \Omega \) with constant \(\overline{\alpha }_{\circ }= 2/3\).

  2. (ii)

    Let \(T_{0}: G \to G \) be pointwise asymptotically α-firmly nonexpansive with constant \(\overline{\alpha }_{0}\) at all \(y\in \Omega:=\mathop {\operatorname {argmin}}f\cap \operatorname {Fix}T_{0}\). The operator \(T:= \operatorname {prox}_{f,\lambda }^{p} \circ T_{0}\) (\(\lambda >0\)) is pointwise almost α-firmly nonexpansive at all points \(y\in \Omega \) on G with constant and violation bounded above by

    $$\begin{aligned} \alpha = \frac{\overline{\alpha }_{0}+\alpha _{c}-2\overline{\alpha }_{0}\alpha _{c}}{\frac{c}{2} (1-\overline{\alpha }_{0}-\alpha _{c}+\overline{\alpha }_{0}\alpha _{c} )+ \overline{\alpha }_{0}+\alpha _{c}-2\overline{\alpha }_{0}\alpha _{c}}\quad \textit{and}\quad \epsilon = \epsilon _{0}+\epsilon _{c}+\epsilon _{0} \epsilon _{c}, \end{aligned}$$
    (33)

    where \(\epsilon _{0}\) is the violation of α-firm nonexpansiveness of \(T_{0}\) on some neighborhood small enough. If \((G, d)\) is a \(\operatorname{CAT}( \kappa)\) space, then T is pointwise asymptotically α-firmly nonexpansive at all \(y\in \Omega \) with constant

    $$\begin{aligned} \overline{\alpha }:=\frac{1}{2-\overline{\alpha }_{0}}. \end{aligned}$$
    (34)
  3. (iii)

    The Krasnoselsky–Mann relaxation \(T:=\beta \operatorname {prox}_{f,\lambda }^{p}\oplus (1-\beta )\operatorname {Id}\) is pointwise almost α-firmly nonexpansive at all points \(y\in \Omega:=\mathop {\operatorname {argmin}}f\) on G with constant

    $$\begin{aligned} \alpha _{\beta }= \frac{\alpha _{c}\beta ^{p-1}}{\alpha _{c}\beta ^{p-1} - \alpha _{c}\beta +1} \end{aligned}$$

    and violation bounded above by \(\epsilon _{\beta }=\beta \epsilon _{c}\), where \(\alpha _{c}\) and \(\epsilon _{c}\) are given by (32). If \((G, d)\) is a \(\operatorname{CAT}(\kappa)\) space, then T is pointwise asymptotically α-firmly nonexpansive at all \(y\in \Omega \) with constant

    $$\begin{aligned} \overline{\alpha }_{\beta }:=\frac{\beta ^{p-1}}{\beta ^{p-1}-\beta +2}. \end{aligned}$$
  4. (iv)

    The composition \(T:=\operatorname {prox}_{f,\lambda _{2}}^{p}\circ (\beta \operatorname {prox}_{g,\lambda _{1}}^{p} \oplus (1-\beta )\operatorname {Id})\) is pointwise almost α-firmly nonexpansive at all points \(y\in \Omega:=\mathop {\operatorname {argmin}}f\cap \mathop {\operatorname {argmin}}g\) on G with constant

    $$\begin{aligned} \begin{aligned}&\widehat{\alpha }= \frac{\alpha _{\beta }+\alpha _{c}-2\alpha _{\beta }\alpha _{c}}{\frac{c}{2} (1-\alpha _{\beta }-\alpha _{c}+\alpha _{\beta }\alpha _{c} )+ \alpha _{\beta }+\alpha _{c}-2\alpha _{\beta }\alpha _{c}}\\ &\quad \textit{and violation bounded above by}\\ & \widehat{\epsilon }= (1+\beta )\epsilon _{c} + \beta \epsilon _{c}^{2}, \end{aligned} \end{aligned}$$
    (35)

    where \(\alpha _{\beta }\), \(\alpha _{c}\), and \(\epsilon _{c}\) are the constants in part (iii) and Theorem 21(ii) respectively. If \((G, d)\) is a \(\operatorname{CAT}(\kappa)\) space, then T is pointwise asymptotically α-firmly nonexpansive at all \(y\in \Omega \) with constant

    $$\begin{aligned} \widehat{\alpha }= \frac{1}{2 - \overline{\alpha }_{\beta }}, \end{aligned}$$

    where \(\overline{\alpha }_{\beta }\) is the constant in part (iii),

  5. (v)

    If \((G, d)\) is symmetric perpendicular, the projected gradient operator

    $$\begin{aligned} T:=P_{C}\circ \bigl(\beta \operatorname {prox}_{g,\lambda }^{p}\oplus (1-\beta ) \operatorname {Id}\bigr) \end{aligned}$$

    is pointwise almost α-firmly nonexpansive at all points \(y\in \Omega:=C\cap \mathop {\operatorname {argmin}}g\) on G with

    $$\begin{aligned} \alpha _{PG}= \frac{1}{\frac{c}{2} (1-\alpha _{\beta } )+ 1}\quad \textit{and violation bounded above by}\quad \epsilon _{PG} = \epsilon _{c}\beta. \end{aligned}$$
    (36)

    If \((G, d)\) is a \(\operatorname{CAT}(\kappa)\) space, then T is pointwise asymptotically α-firmly nonexpansive at all \(y\in \Omega \) with constant

    $$\begin{aligned} \alpha _{PG}= \frac{1}{2 - \overline{\alpha }_{\beta }}, \end{aligned}$$

    where \(\overline{\alpha }_{\beta }\) is the constant in part (iii),

Proof

(i). By Theorem 21(ii), the operators \(\operatorname {prox}_{f,\lambda }^{p}\) and \(\operatorname {prox}_{g,\lambda }^{p}\) are almost quasi α-firmly nonexpansive with constants \(\alpha _{c}=\frac{c (c-1)}{c (c-1)+2}\) and violation bounded above by \(\epsilon _{c}=\frac{2-c}{c-1}\). The operator T is almost α-firmly nonexpansive at all points \(y\in \Omega \) on G with

$$\begin{aligned} \alpha _{\circ }= \frac{2 \alpha _{c}-2 \alpha _{c}^{2}}{\frac{c}{2} (1-2\alpha _{c}+\alpha _{c}^{2})+2\alpha _{c}-2\alpha _{c}^{2}} \end{aligned}$$

and violation satisfying \(1+\epsilon _{\circ }=(1+\epsilon _{c})^{2}\) by Proposition 10. A short calculation yields

$$\begin{aligned} \alpha _{\circ }=\frac{2(c-1)}{2c-1} \quad\text{and}\quad \epsilon _{\circ }= \frac{1}{(c-1)^{2}}-1. \end{aligned}$$

Taking the limit as \(c\to 2\) from below yields the constant \(\overline{\alpha }_{\circ }= 2/3\) with limiting violation \({\overline{\epsilon }}_{\circ }= 0\). The same argument as Proposition 21(iii) then shows that, when \((G, d)\) is a \(\operatorname{CAT}(\kappa )\) space, the composition of two prox mappings is pointwise asymptotically α-firmly nonexpansive at points in Ω with constant \(2/3\).

(ii). Theorem 21(ii) and Proposition 10 yield pointwise almost α-firm nonexpansiveness of T at \(y\in \Omega \) with constant and violation characterized by (33), where \(\alpha _{c}\) and \(\epsilon _{c}\) are given by (32), \(\overline{\alpha }_{0}\) is the asymptotic constant of α-firm nonexpansiveness of \(T_{0}\), and \(\epsilon _{0}\) is the upper bound of the violation on some neighborhood. (By Proposition 6(iii) if \(T_{0}\) is pointwise almost α-firmly nonexpansive with constant \(\alpha _{0}<\overline{\alpha }_{0}\), then \(T_{0}\) is also pointwise almost α-firmly nonexpansive with constant \(\overline{\alpha }_{0}\).) By Theorem 21(iii) and the assumption that \(T_{0}\) is pointwise asymptotically α-firmly nonexpansive at \(\operatorname {Fix}T_{0}\) with constant \(\overline{\alpha }_{0}\), the same argument as Proposition 21(iii) establishes that, when \((G, d)\) is a \(\operatorname{CAT}(\kappa )\) space, T is pointwise asymptotically α-firmly nonexpansive at points in Ω with constant α̅ given by (34).

(iii). The first statement is an immediate application of Proposition 12 and Theorem 21(ii). When \((G, d)\) is a \(\operatorname{CAT}(\kappa )\) space, the same argument as Proposition 21(iii) establishes that Krasnoselsky–Mann relaxations of prox mappings are pointwise asymptotically α-firmly nonexpansive with constant \(\overline{\alpha }_{\beta }\) as claimed.

(iv). This is an application of part (ii) to part (iii).

(v). This is a specialization of part (iv) when f is the indicator function of a convex set C and follows from the fact that, on symmetric perpendicular p-uniformly convex spaces, the projector is pointwise α-firmly nonexpansive at all points in C with constant \(\alpha =1/2\) (no violation) as shown in [4, Proposition 25]. □

Remark 24

Part (i) of Proposition 23 coincides with \(\alpha = \frac{2}{3}\) and \(\epsilon =0\) in the classic setting with \(c=2\). In particular, the composition \(P_{A} \circ P_{B}\) of two projections \(P_{A}\) and \(P_{B}\) onto convex sets A and B with \(A\cap B \neq \emptyset \) is α-firmly nonexpansive at all \(y\in A\cap B\) on G with \(\alpha =\frac{2}{3}\) and violation \(\epsilon =0\). However, this result does not apply if the problem is infeasible i.e. \(A \cap B= \emptyset \).

These properties allow us to prove the following fundamental result.

Theorem 25

(Convergence of proximal algorithms in \(\operatorname{CAT}(\kappa)\) spaces)

Let \((G, d)\) be a complete \(\operatorname{CAT}(\kappa )\) space with \(\kappa >0\) and \(f,g\colon G \rightarrow \mathbb{R}\cap \{+\infty \}\) be proper, convex, and lower semicontinuous with \(\mathop {\operatorname {argmin}}f\cap \mathop {\operatorname {argmin}}g \neq \emptyset \). Let T denote any of the mappings in Proposition 23. If T satisfies

$$\begin{aligned} d(x,\operatorname {Fix}T\cap D)\leq \mu d(x,Tx),\quad \forall x\in D, \end{aligned}$$
(37)

with constant μ, then the fixed point sequence initialized from any starting point close enough to \(\operatorname {Fix}T\cap D\) is at least linearly convergent to a point in \(\operatorname {Fix}T\cap D\) with rate \(\gamma = \sqrt{1+\epsilon - \frac{1-\alpha }{\alpha \mu ^{2}}}<1\), where α and ϵ are the respective constant and violation of pointwise α-firm nonexpansiveness of the fixed point mapping T as given in Proposition 23. The asymptotic rate of convergence is \(\overline{\gamma }= \sqrt{1- \frac{1-\overline{\alpha }}{\overline{\alpha }\mu ^{2}}}<1\), where α̅ is the respective constant of pointwise asymptotic α-firm nonexpansiveness of the fixed point mapping T.

Proof

As established in Theorem 21 and Proposition 23, all of the mappings covered in those results are pointwise asymptotically α-firmly nonexpansive at points in Ω with constants \(\overline{\alpha }<1\), where Ω is one of the following subsets corresponding to the respective mappings (i)–(v) in 23: (i) \(\Omega = \mathop {\operatorname {argmin}}f\cap \mathop {\operatorname {argmin}}g\); (ii) \(\Omega \subset \operatorname {Fix}T_{0}\cap \mathop {\operatorname {argmin}}f\); (iii) \(\Omega = \mathop {\operatorname {argmin}}f\); (iv) \(\Omega = \mathop {\operatorname {argmin}}f\cap \mathop {\operatorname {argmin}}g\); (v) \(\Omega = C\cap \mathop {\operatorname {argmin}}g\).

As noted in Remark 3, any \(\operatorname{CAT}(\kappa)\) space is symmetric perpendicular locally, so by Theorem 21(i) and Lemma 7, in every case \(\operatorname {Fix}T = \Omega \). Now, if T satisfies (37) at all points in FixT on G with constant μ, it follows immediately from Proposition 20 that the fixed point iteration converges linearly to a point in \(\operatorname {Fix}T\cap D\) with the given rate for all starting points close enough to \(\operatorname {Fix}T\cap D\), as claimed. □

5 Proximal splitting methods

The concrete examples provided here have been well studied for p-uniformly convex spaces with \(p=c=2\) i.e. \(\operatorname{CAT}(0)\) spaces. The tools established in the previous sections open the door to applying these methods in \(\operatorname{CAT}( \kappa )\) spaces, which is new. Since \(\operatorname{CAT}(\kappa )\) spaces are p-uniformly convex with \(p=2\), to avoid confusion, we revert to the usual notation for proximal operators in the setting, namely \(\operatorname {prox}_{f,\lambda }\), omitting the exponent.

Let \((G,d)\) be a \(\operatorname{CAT}(\kappa)\) space, \(f_{i}:G\to G\) be proper lsc convex functions for \(i=1,2,\dots N\). Consider the problem

$$\begin{aligned} \inf_{x\in G}\sum_{i=1}^{N} f_{i}(x). \end{aligned}$$
(38)

Applying backward-backward splitting to this problem yields Algorithm 1. Local linear convergence follows from Theorem 25 and the extension of Proposition 23(i) via part (ii) of the same proposition and induction, under the assumption that \(\mathcal{T}_{\Omega }\) defined by (22)—which simplifies to (23)—is linearly metrically subregular for 0 on G with constant μ and \(\Omega:=\bigcap_{j}\mathop {\operatorname {argmin}}f_{j}\neq \emptyset \). By Proposition 6(i), linear metric subregularity simplifies to

$$\begin{aligned} d(x, \Omega )\leq \mu d(Tx, x) \quad\forall x\in D. \end{aligned}$$
(39)
Algorithm 1
figure a

Proximal splitting

Recall, in a p-uniformly convex space with \(p=2\), the Moreau–Yosida envelope of f is defined by

$$\begin{aligned} e_{f, \lambda }(x):=\inf_{y\in G} \biggl(f(y)+ \frac{1}{2\lambda }d(x,y)^{2} \biggr). \end{aligned}$$

The analogue to the direction of steepest descent for the Moreau–Yosida envelope in p-uniformly convex settings is

$$\begin{aligned} (1-\beta )x\oplus \beta \operatorname {prox}_{f,\lambda }(x). \end{aligned}$$
(40)

Specializing problem (38) to the case \(N=2\) and \(f_{2}=\iota _{C}\), the indicator function of some closed convex set \(C\subset G\) yields Algorithm 2, the analog to projected gradients in \(\operatorname{CAT}(\kappa)\) space, which is the projected resolvent/projected prox iteration. Local linear convergence follows immediately from Theorem 25 under the assumption that T satisfies (30) and \(\Omega:=\mathop {\operatorname {argmin}}f\cap C\neq \emptyset \).

Algorithm 2
figure b

Metric projected gradients

Compositions of projectors in \(\operatorname{CAT}(\kappa )\) spaces has been studied in [4] and [1]. We consider Algorithm 1 when the functions \(f_{i}:=\iota _{C_{i}}\), the indicator functions of closed convex sets \(C_{i}\subset G\), where \((G, d)\) is a complete, symmetric perpendicular p-uniformly convex space with constant c. The p-proximal mapping of the indicator function is the metric projector, and so by [4, Proposition 25] these are pointwise α-firmly nonexpansive at all points in \(\bigcap_{i} C_{i}\) (assuming, of course, that this is nonempty) and by [4, Lemma 10] the cyclic projections mapping

$$\begin{aligned} T_{CP}:=P_{C_{N}}\cdot P_{C_{2}} P_{C_{1}} \end{aligned}$$
(41)

is pointwise α-firmly nonexpansive at all points in \(\bigcap_{i} C_{i}=\operatorname {Fix}T_{CP}\), when the intersection is nonempty, with constant \(\overline{\alpha }_{N} = \frac{N-1}{N}\) on G. Δ- or weak convergence (no rate) to a point in \(\bigcap_{i} C_{i}\) follows from [4, Theorem 27], with strong convergence whenever one of the sets is compact. If in addition \(d(x,\bigcap_{i} C_{i})\leq \mu d(T_{CP}x, x)\) for all \(x\in G\), where \(\mu >0\) is the rate of linear metric subregularity, then, by Theorem 25, the sequence \((x_{k})\) initialized anywhere in G converges linearly to some \(x^{*}\in \bigcap_{i} C_{i}\).

6 Open problems

There are two obvious next steps for this work. First and foremost is to determine the requirements for quantitative convergence of proximal splitting methods for the case when the individual prox mappings do not have common fixed points—the so-called inconsistent case—since it is too limiting to require that the fixed points of the constituent elements of splitting methods coincide. The second item to explore is qualitative settings in which metric subregularity comes “for free”. In linear settings, polyhedrality and isolated fixed points suffice to guarantee metric subregularity [7, Propositions 3I.1 and 3I.2], and this was successfully used to prove local linear convergence of the ADMM/Douglas–Rachford algorithms in a convex setting [2, Theorem 2.7]. In more general settings, the Kurdyka–Łojasiewicz (KL) property—which implies metric subregularity [6, Corollary 4 and Remark 5]—is satisfied by semi-algebraic functions [5]. Analogues to these properties for p-uniformly convex spaces would be very useful.

Availability of data and materials

Not applicable.

References

  1. Ariza-Ruiz, D., López-Acedo, G., Nicolae, A.: The asymptotic behavior of the composition of firmly nonexpansive mappings. J. Optim. Theory Appl. 167, 409–429 (2015)

    Article  MathSciNet  Google Scholar 

  2. Aspelmeier, T., Charitha, C., Luke, D.R.: Local linear convergence of the ADMM/Douglas–Rachford algorithms without strong convexity and application to statistical imaging. SIAM J. Imaging Sci. 9(2), 842–868 (2016)

    Article  MathSciNet  Google Scholar 

  3. Bërdëllima, A.: Investigations in Hadamard Spaces. PhD thesis, Georg-August Universität Göttingen, Göttingen (2020)

  4. Bërdëllima, A., Lauster, F., Luke, D.R.: α-firmly nonexpansive operators on metric spaces. arXiv:2104.11302 (2021)

  5. Bolte, J., Daniilidis, A., Lewis, A.: The Lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2006)

    Article  MathSciNet  Google Scholar 

  6. Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)

    Article  Google Scholar 

  7. Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings, 2nd edn. Springer, Dordrecht (2014)

    MATH  Google Scholar 

  8. Izuchukwu, C., Ugwunnadi, G.C., Mewomo, O.T., Khan, A.R., Abbas, M.: Proximal-type algorithms for split minimization problem in P-uniformly convex metric spaces. Numer. Algorithms 82(3), 909–935 (2019)

    Article  MathSciNet  Google Scholar 

  9. Kuwae, K.: Jensen’s inequality on convex spaces. Calc. Var. Partial Differ. Equ. 49(3–4), 1359–1378 (2014)

    Article  MathSciNet  Google Scholar 

  10. Luke, D.R., Teboulle, M., Thao, N.H.: Necessary conditions for linear convergence of iterated expansive, set-valued mappings. Math. Program. 180, 1–31 (2018)

    Article  MathSciNet  Google Scholar 

  11. Luke, D.R., Thao, N.H., Tam, M.K.: Quantitative convergence analysis of iterated expansive, set-valued mappings. Math. Oper. Res. 43(4), 1143–1176 (2018)

    Article  MathSciNet  Google Scholar 

  12. Naor, A., Silberman, L.: Poincaré inequalities, embeddings, and wild groups. Compos. Math. 147(5), 1546–1572 (2011)

    Article  MathSciNet  Google Scholar 

  13. Ohta, S.: Convexities of metric spaces. Geom. Dedic. 125, 225–250 (2007)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

FL was supported in part by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Project-ID LU 1702/1-1. DRL was supported in part by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Project-ID LU 1702/1-1 and in part by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Project-ID 432680300—SFB 1456. Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Authors

Contributions

The authors contributed equally to all results. All authors read and approved the final manuscript.

Corresponding author

Correspondence to D. Russell Luke.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lauster, F., Luke, D.R. Convergence of proximal splitting algorithms in \(\operatorname{CAT}(\kappa)\) spaces and beyond. Fixed Point Theory Algorithms Sci Eng 2021, 13 (2021). https://doi.org/10.1186/s13663-021-00698-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13663-021-00698-0

MSC

Keywords