- Research
- Open access
- Published:
Circumcentering approximate reflections for solving the convex feasibility problem
Fixed Point Theory and Algorithms for Sciences and Engineering volume 2022, Article number: 1 (2022)
Abstract
The circumcentered-reflection method (CRM) has been applied for solving convex feasibility problems. CRM iterates by computing a circumcenter upon a composition of reflections with respect to convex sets. Since reflections are based on exact projections, their computation might be costly. In this regard, we introduce the circumcentered approximate-reflection method (CARM), whose reflections rely on outer-approximate projections. The appeal of CARM is that, in rather general situations, the approximate projections we employ are available under low computational cost. We derive convergence of CARM and linear convergence under an error bound condition. We also present successful theoretical and numerical comparisons of CARM to the original CRM, to the classical method of alternating projections (MAP), and to a correspondent outer-approximate version of MAP, referred to as MAAP. Along with our results and numerical experiments, we present a couple of illustrative examples.
1 Introduction
We consider the convex feasibility problem (CFP) consisting of finding a point in the intersection of a finite number of closed convex sets. We are going to employ Pierra’s product space reformulation [1] in order to reduce CFP to seeking a point common to a closed convex set and an affine subspace.
Projection-based methods are usually utilized for solving CFP. Widely known are the method of alternating projections (MAP) [2, 3], the Douglas–Rachford method (DRM) [4, 5], and the Cimmino method [3, 6]. Recently, the circumcentered-reflection method (CRM) has been developed as a powerful new tool for solving CFP, outperforming MAP and DRM. It was introduced in [7, 8] and further enhanced in [9–18]. In particular, CRM was shown in [15] to converge to a solution of CFP, and it was proven in [9] that linear convergence is obtained in the presence of an error bound condition.
Computing exact projections onto general convex sets can be, context depending, too demanding in comparison to solving the given CFP itself. Bearing this in mind, we present in this paper a version of CRM employing outer-approximate projections. These approximate projections still enjoy some of the properties of the exact ones, having the advantage of being potentially more tractable. For instance, they cover the subgradient projections of Fukushima [19].
Consider closed convex sets \(K_{1}, \dots , K_{m}\subset \mathbb{R}^{n}\) with nonempty intersection and the CFP of finding a point \(x\in \bigcap_{i=1}^{m}K_{i}\). In the eighties, Pierra noted that this problem is directly related to the problem of finding a point \(\mathbf{x} \in \mathbf{K}\cap \mathbf{D}\), where \(\mathbf{K} := K_{1} \times \cdots \times K_{m}\subset \mathbb{R}^{nm}\) and the diagonal space \(\mathbf{D}=\{(x,\dots ,x): x\in \mathbb{R}^{n}\}\subset \mathbb{R}^{nm}\). In fact, \(x\in \bigcap_{i=1}^{m}K_{i}\) if and only if \(\mathbf{ x}=(x,\ldots , x) \in \mathbf{K} \cap \mathbf{D} \). Thus, if we solve any intersection problem featuring a closed convex set and an affine subspace, we cover the general CFP. Let us proceed in this direction by considering a closed convex set \(K\subset \mathbb{R}^{n}\) and an affine subspace \(U\subset \mathbb{R}^{n}\) with nonempty intersection. From now on, the CFP we are going to focus on is the one of tracking a point in \(K\cap U\).
We consider now two operators A and \(B:\mathbb{R}^{n}\to \mathbb{R}^{n}\) and define \(T=A\circ B\). Under adequate assumptions, the sequence \(\{x^{k}\}_{k\in \mathbb{N}}\subset \mathbb{R}^{n}\) defined by
is expected to converge to a common fixed point of A and B. If the operators A and B are the projectors onto the convex sets U and K, that is, \(A=P_{U}\), \(B=P_{K}\), then problem (1.1) provides the iteration of the famous method of alternating projections (MAP). Moreover, the set of common fixed points of A and B in this case is precisely \(K\cap U\) and MAP converges to a point in \(K\cap U\) for any starting point in \(\mathbb{R}^{n}\); see, for instance, [3].
The circumcentered-reflection method (CRM) introduced in [7, 8] can be seen as an acceleration technique for the sequence defined by (1.1). We showed in [9] that indeed CRM achieves a better linear rate than MAP in the presence of an error bound. Moreover, in general, there is abundant numerical evidence that CRM outperforms MAP (see [7, 15, 16]).
Define the reflection operators \(A^{R}, B^{R}:\mathbb{R}^{n}\to \mathbb{R}^{n}\) as \(A^{R}=2A-\operatorname{Id}\), \(B^{R}=2B-\operatorname{Id}\), where Id stands for the identity operator in \(\mathbb{R}^{n}\). The CRM operator \(C:\mathbb{R}^{n}\to \mathbb{R}^{n}\) is defined as
i.e., the circumcenter of the three points x, \(B^{R}(x)\), \(A^{R}(B^{R}(x))\). The CRM sequence \(\{x^{k}\}_{k\in \mathbb{N}}\subset \mathbb{R}^{n}\), starting at some \(x^{0}\in \mathbb{R}^{n}\), is then defined as \(x^{k+1}=C(x^{k})\). For three non-collinear points \(x,y,z\in \mathbb{R}^{n}\), the circumcenter \(\operatorname{circ}(x,y,z)\) is the center of the unique two-dimensional circle passing through x, y, z (or, equivalently, the point in the affine manifold \(\operatorname{aff}\{x,y,z\}\) generated by x, y, z and equidistant to these three points). In particular, if \(A=P_{U}\) and \(B=P_{K}\), that is, \(A^{R}=2P_{U}-\operatorname{Id}\), \(B^{R}=2P_{K}-\operatorname{Id}\), the CRM sequence, starting at some \(x^{0}\in U\),
converges to a point in \(K\cap U\) as long as the initial point lies in U. If in addition a certain error bound between K and U holds, then CRM converges linearly, and with a better rate than MAP.
In this paper, we introduce approximate versions of MAP and CRM for solving CFP, which we call MAAP and CARM. The MAAP and CARM iterations are computed by (1.1) and (1.2) with A being the exact projector onto U and B an approximate projector onto K. The approximation consists of replacing at each iteration the set K by a larger set separating the current iterate from K. This separating scheme is rather general and, for a large family of convex sets, includes particular instances where the separating set is a half-space, or a Cartesian product of half-spaces, in which cases all the involved projections have very low computational cost. One could fear that this significant reduction in the computational cost per iteration could be nullified by a substantial slowing down of the process as a whole, through a deterioration of the convergence speed. However, we show that this is not necessarily the case. Indeed, we prove that under error bound conditions separating schemes are available so that MAAP and CARM enjoy linear convergence rates, with the linear rate of CARM being strictly better than MAAP. Our numerical experiments confirm these statements, and more than that, they show CARM outperforming MAP, CRM, and MAAP in terms of computational time.
2 Preliminaries
We recall first the definition of Q-linear and R-linear convergence.
Definition 2.1
Consider a sequence \(\{z^{k}\}_{k\in \mathbb{N}}\subset \mathbb{R}^{n}\) converging to \(z^{*}\in \mathbb{R}^{n}\). Assume that \(z^{k}\ne z^{*}\) for all \(k\in \mathbb{N}\). Let \(q:= \limsup_{k\to \infty } \frac{ \Vert z^{k+1}-z^{*} \Vert }{ \Vert z^{k}-z^{*} \Vert }\). Then the sequence \(\{z^{k}\}_{k\in \mathbb{N}}\) converges
-
(i)
Q-superlinearly, if \(q=0\),
-
(ii)
Q-linearly, if \(q\in (0,1)\),
-
(iii)
Q-sublinearly, if \(q\ge 1\).
Let \(r:= \limsup_{k\to \infty } \Vert z^{k}-z^{*} \Vert ^{1/k}\). Then the sequence \(\{z^{k}\}_{k\in \mathbb{N}}\) converges
-
(iv)
R-superlinearly, if \(r=0\),
-
(v)
R-linearly, if \(r\in (0,1)\),
-
(vi)
R-sublinearly, if \(r\ge 1\).
The values q and r are called asymptotic constants of \(\{z^{k}\}_{k\in \mathbb{N}}\).
It is well known that Q-linear convergence implies R-linear convergence (with the same asymptotic constant), but the converse statement does not hold true [20].
We remind now the notion of Fejér monotonicity.
Definition 2.2
A sequence \(\{z^{k}\}_{k\in \mathbb{N}}\subset \mathbb{R}^{n}\) is Fejér monotone with respect to a nonempty closed convex set \(M\subset \mathbb{R}^{n}\) when \(\Vert z^{k+1}-y \Vert \le \Vert z^{k}-y \Vert \) for all \(k\in \mathbb{N}\) and \(y\in M\).
Proposition 2.3
Suppose that the sequence \(\{z^{k}\}_{k\in \mathbb{N}}\) is Fejér monotone with respect to the closed convex set \(M\subset \mathbb{R}^{n}\). Then
-
(i)
\(\{z^{k}\}_{k\in \mathbb{N}}\) is bounded.
-
(ii)
if a cluster point z̄ of \(\{z^{k}\}_{k\in \mathbb{N}}\) belongs to M, we have \(\lim_{k\to \infty }z^{k}=\bar{z}\).
-
(iii)
if \(\{z^{k}\}_{k\in \mathbb{N}}\) converges to z̄, we get \(\Vert z^{k}-\bar{z} \Vert \le 2\operatorname{dist}(z^{k},M)\).
Proof
See Theorem 2.16 in [3]. □
Next we introduce the separating operator needed for the approximate versions of MAP and CRM, namely MAAP and CARM.
Definition 2.4
Given a closed and convex set \(K\subset \mathbb{R}^{n}\), a separating operator for K is a point-to-set mapping \(S:\mathbb{R}^{n}\to {\mathcal{P}}(\mathbb{R}^{n})\) satisfying:
-
(A1)
\(S(x)\) is closed and convex for all \(x\in \mathbb{R}^{n}\).
-
(A2)
\(K\subset S(x)\) for all \(x\in \mathbb{R}^{n}\).
-
(A3)
If a sequence \(\{z^{k}\}_{k\in \mathbb{N}}\subset \mathbb{R}^{n}\) converges to \(z^{*}\in \mathbb{R}^{n}\) and \(\lim_{k\to \infty }\operatorname{dist}(z^{k},S(z^{k}))=0\), then \(z^{*}\in K\).
We have the following immediate result regarding Definition 2.4.
Proposition 2.5
If S is a separating operator for K, then \(x\in S(x)\) if and only if \(x\in K\).
Proof
The “if” statement follows from A2. For the “only if” statement, take \(x\in S(x)\), consider the constant sequence \(z^{k}=x\) for all \(k\in \mathbb{N}\), which converges to x, and apply A3. □
Proposition 2.5 implies that if \(x\notin K\) then \(x\notin S(x)\), which, in view of A2, indicates that the set \(S(x)\) separates indeed x from K. The separating sets \(S(x)\) will provide to the approximate projections that we are going to employ throughout the paper.
Several notions of separating operators have been introduced in the literature; see, e.g., [21, Sect. 2.1.13] and the references therein. Our definition is a point-to-set version of the separating operators in [22, Definition 2.1]. It encompasses not only hyperplane-based separators as the ones in the seminal work by Fukushima [19], considered next in Example 2.6, but also more general situations. Indeed, in Example 2.7, \(S(x)\) is the Cartesian product of half-spaces, which is not a half-space.
For the family of convex sets in Examples 2.6 and 2.7, we get both explicit separating operators complying with Definition 2.4 and closed formulas for projections onto them.
Example 2.6
Assume that \(K=\{x\in \mathbb{R}^{n}: g(x)\le 0\}\), where \(g:\mathbb{R}^{n}\to \mathbb{R}\) is convex. Define
where \(u\in \partial g(x)\) is an arbitrary subgradient of g at x.
We mention that any closed and convex set K can be written as the 0-sublevel set of a convex and even smooth function g, for instance, \(g(x)=\operatorname{dist}(x,K)^{2}\), but in general this is not advantageous, because for this g it holds that \(\nabla g(x)=2(x-P_{K}(x))\), so that \(P_{K}(x)\), the exact projection of x onto K, is needed for computing the separating half-space, and nothing has been won. The scheme is interesting when the function g has easily computable gradient or subgradients. For instance, in the quite frequent case in which \(K=\{x\in \mathbb{R}^{n}: g_{i}(x)\le 0\ (1\le i\le \ell )\}\), where the \(g_{i}\)s are convex and smooth, we can take \(g(x)=\max_{1\le i\le \ell }g_{i}(x)\), and the subgradients of g are easily obtained from the gradients of the \(g_{i}\)s.
Example 2.7
Assume that \(\mathbf{K}=K_{1}\times \cdots \times K_{m}\subset \mathbb{R}^{nm}\), where \(K_{i}\subset \mathbb{R}^{n}\) is of the form \(K_{i}= \{x\in \mathbb{R}^{n}: g_{i}(x)\le 0\}\) and \(g_{i}:\mathbb{R}^{n}\to \mathbb{R}\) is convex for \(1\le i\le m\). Write \(x\in \mathbb{R}^{nm}\) as \(x=(x^{1},\dots ,x^{m})\) with \(x^{i}\in \mathbb{R}^{n} (1\le i\le m)\). We define the separating operator \(\mathbf{S}:\mathbb{R}^{nm}\to {\mathcal{P}}(\mathbb{R}^{nm})\) as \(\mathbf{S}(x)=S_{1}(x^{1})\times \cdots \times S_{m}(x^{m})\), with
where \(u^{i}\in \partial g_{i}(x^{i})\) is an arbitrary subgradient of \(g_{i}\) at \(x^{i}\).
Example 2.7 is suited for the reduction of simultaneous projection method (SiPM) for m convex sets in \(\mathbb{R}^{n}\) to MAP regarding two convex sets in \(\mathbb{R}^{nm}\). Note that in Example 2.6, \(S(x)\) is either K or a half-space, and the same holds for the sets \(S_{i}(x^{i})\) in Example 2.7. We prove next that the separating operators S and S defined in Examples 2.6 and 2.7 satisfy assumptions A1–A3.
Proposition 2.8
The separating operators S and S defined in Examples 2.6and 2.7satisfy assumptions A1–A3.
Proof
We start with S as in Example 2.6. First we observe that if \(x\notin K\) then all subgradient of g at x are nonzero: since K is assumed nonempty, there exist points where g is nonpositive, so that x, which satisfies \(g(x) >0\), cannot be a minimizer of g, and hence \(0\notin \partial g(x)\), i.e., \(u\neq 0\) for all \(u\in \partial g(x)\). Regarding A1, \(S(x)\) is either equal to K or to a half-space, both of which are closed and convex.
For A2, obviously it holds for \(x\in K\). If \(x\notin K\), we take \(z\in K\), and conclude, taking into account the fact that \(z\in K\) and the subgradient inequality, that \(u^{\top }(x-z)+g(x)\le g(z)\le 0\), implying that \(z\in S(x)\) in view of (2.1).
We deal now with A3. Take a sequence \(\{z^{k}\}_{k\in \mathbb{N}}\) converging to some \(z^{*}\) such that \(\lim_{k\to \infty }\operatorname{dist}(z^{k},S(z^{k})) =0\). We must prove that \(z^{*}\in K\). If some subsequence of \(\{z^{k}\}_{k\in \mathbb{N}}\) is contained in K, then \(z^{*}\in K\), because K is closed. Otherwise, for large enough k, \(S(z^{k})\) is a half-space. It is well known, and easy to check, that the projection \(P_{H}\) onto a half-space \(H=\{y\in \mathbb{R}^{n}: a^{\top }y\le \alpha \}\subset \mathbb{R}^{n}\), with \(a\in \mathbb{R}^{n}\), \(\alpha \in \mathbb{R}\), is given by
Denote by \(P^{S_{k}}\) the projection onto \(S(z^{k})\). By (2.2), \(P^{S_{k}}(z)=z- \Vert u^{k} \Vert ^{-2}\max \{0, g(z)\}u^{k}\), so that
Note that \(\{z^{k}\}_{k\in \mathbb{N}}\) is bounded, because it is convergent. Since the subdifferential operator ∂g is locally bounded in the interior of the domain of g, which here we take as \(\mathbb{R}^{n}\), there exists \(\mu >0\) so that \(\Vert u^{k} \Vert \le \mu \) for all k and all \(u^{k}\in \partial g(z^{k})\). Hence, \(\operatorname{dist}(z^{k},S(z^{k}))\ge \mu ^{-1}\max \{0,g(z^{k})\}\ge 0\). Since by assumption \(\lim_{k\to \infty }\operatorname{dist}(z^{k},S(z^{k}))=0\), and g being convex, is continuous, we get that \(0=\lim_{k\to \infty }\mu ^{-1}\max \{0,g(z^{k})\}=\mu ^{-1}\max \{0,g(z^{*}) \}\), implying that \(0=\max \{0,g(z^{*})\}\),i.e., \(g(z^{*})\le 0\), so that \(z^{*}\in K\) and A3 holds.
Now we consider S as in Example 2.7. As before, if \(x^{i}\notin K_{i}\) then \(S_{i}(x^{i})\) is indeed a half-space in \(\mathbb{R}^{n}\). Concerning A1–A3, A1 holds because \(\mathbf{S}(x)\) is the Cartesian product of closed and convex sets (either \(K_{i}\) or a half-space in \(\mathbb{R}^{n}\)). For A2, take \((x^{1}, \dots , x^{m})\in \mathbf{K}\). If \(x^{i}\in K_{i}\), then \(x^{i}\in S_{i}(z^{i})=K_{i}\). Otherwise, we take \(z^{i}\in K_{i}\), and invoking again the subgradient inequality, we get \((u^{i})^{\top }(x^{i}-z^{i})+g(x^{i})\le g(z^{i})\le 0\) implying that \(z^{i}\in S_{i}(x^{i})\), i.e., \(K_{i}\subset S_{i}(X^{i})\) for all i, and the result follows taking into account the definitions of K and S. For A3, note that \(\lim_{k\to \infty }\operatorname{dist}(z^{k},\mathbf{S}(z^{k}))=0\) if and only if \(\lim_{k\to \infty }\operatorname{dist}(z^{k,i},S_{i}(z^{k,i}))=0\) for \(1\le i\le m\), where \(z^{k}=(z^{k,1},\dots ,z^{k,m})\) with \(z^{k,i}\in \mathbb{R}^{n}\). Then, the result follows with the same argument as in Example 2.6, with \(z^{k, i}\), \(S_{i}\), \(g_{i}\) substituting for \(z^{k}\), S, g. □
3 Convergence results for MAAP and CARM
We recall now the definitions of MAP and CRM and introduce the formal definitions of MAAP and CARM. Consider a closed convex set \(K\subset \mathbb{R}^{n}\) and an affine manifold \(U\subset \mathbb{R}^{n}\). We remind that an affine manifold is a set of the form \(\{x\in \mathbb{R}^{n}: Qx=b\}\) for some \(Q\in \mathbb{R}^{n\times n}\) and some \(b\in \mathbb{R}^{n}\).
Let \(P_{K}\), \(P_{U}\) be the projections onto K, U respectively and define \(R_{K},R_{U}, T, C:\mathbb{R}^{n}\to \mathbb{R}^{n}\) as
where Id is the identity operator in \(\mathbb{R}^{n}\) and \(\operatorname{circ}(x,y,z)\) is the circumcenter of x, y, z, i.e., the point in the affine hull of x, y, z equidistant to them. We remark that \(\operatorname{circ}(x,y,y)= (1/2)(x+y)\) (in this case the affine hull is the line through x, y) and \(\operatorname{circ}(x,x,x)=x\) (the affine hull being the singleton \(\{x\}\)).
Then, starting from any \(x^{0}\in \mathbb{R}^{n}\), MAP generates a sequence \(\{x^{k}\}_{k\in \mathbb{N}}\subset \mathbb{R}^{n}\) according to
and, starting with \(x^{0}\in U\), CRM generates a sequence \(\{x^{k}\}_{k\in \mathbb{N}}\subset \mathbb{R}^{n}\) given by
For MAAP and CARM, we assume that \(S:\mathbb{R}^{n}\to {\mathcal{P}}(\mathbb{R}^{n})\) is a separating operator for K satisfying A1–A3, we take \(P_{U}\) as before and define \(P^{S}\) as the operator given by \(P^{S}(x):=P_{S(x)}(x)\), where \(P_{S(x)}\) is the projection onto \(S(x)\).
Take \(R_{U}\) as in (3.1), and define \(R^{S}, T^{S}, C^{S}:\mathbb{R}^{n}\to \mathbb{R}^{n}\) as
Then, starting from any \(x^{0}\in \mathbb{R}^{n}\), MAAP generates a sequence \(\{x^{k}\}_{k\in \mathbb{N}}\subset \mathbb{R}^{n}\) according to
and, starting with \(x^{0}\in U\), CARM generates a sequence \(\{x^{k}\}_{k\in \mathbb{N}}\subset \mathbb{R}^{n}\) given by
We observe now that the “trivial” separating operator \(S(x)=K\) for all \(x\in \mathbb{R}^{n}\) satisfies A1–A3, and that in this case we have \(T^{S}=T\), \(C^{S}=C\), so that MAP, CRM are particular instances of MAAP, CARM respectively. Hence, the convergence analysis of the approximate algorithms encompasses the exact ones. Global convergence of MAP is well known (see, e.g., [23]) and the corresponding result for CRM has been established in [15]. The following propositions follow quite closely the corresponding results for the exact algorithms, the difference consisting in the replacement of the set K by the separating set \(S(x)\). However, some care is needed, because K is fixed, while \(S(x)\) changes along the algorithm, so that we present the complete analysis for the approximate algorithms MAAP and CARM.
Proposition 3.1
For all \(z\in K\cap U\) and all \(x\in \mathbb{R}^{n}\), it holds that
with \(T^{S}\) as in (3.2).
Proof
The projection operator \(P_{M}\) onto any closed and convex set M is known to be firmly nonexpansive [24, Proposition 4.16], that is,
for all \(x\in \mathbb{R}^{n}\) and all \(y\in M\).
Applying consecutively (3.4) with \(M=U\) and \(M=S(x)\) and noting that for \(z\in K\cap U\) we get \(z\in U\) and also \(z\in K\subset S(x)\) (due to Assumption A2), we obtain (3.3). □
A similar result for operator \(C^{S}\) is more delicate due to the presence of the reflections and the circumcenter and requires some intermediate results. We follow closely the analysis for operator C presented in [15].
The crux of the convergence analysis of CRM, performed in [15], is the remarkable observation that for \(x\in U\setminus K\), \(C(x)\) is indeed the projection of x onto a half-space \(H(x)\) separating x from \(K\cap U\). Next, we extend this result to \(C^{S}\).
Proposition 3.2
Let \(U,H\subset \mathbb{R}^{n}\) be an affine manifold and a subspace, respectively, such that \(H\cap U\ne \emptyset \). Denote as \(P_{H}, R_{H}:\mathbb{R}^{n}\to \mathbb{R}^{n}\) the projection and the reflection with respect to H, respectively. Then
-
(i)
\(P_{H\cap U}(x)=\operatorname{circ}(x,R_{H}(x), R_{U}(R_{H}(x))\) for all \(x\in U\),
-
(ii)
\(\operatorname{circ}(x,R_{H}(x), R_{U}(R_{H}(x))\in U\) for all \(x\in U\).
Proof
See Lemmas 2 and 3 in [15]. □
Proposition 3.2 means that when the sets in CFP are an affine manifold and a hyperplane, CRM indeed converges in one step, which is a first indication of its superiority over MAP, which certainly does not enjoy this one-step convergence property, but also points to the main weakness of CRM, namely that for its convergence we may replace H by a general closed and convex set, but the other set must be kept as an affine manifold.
Lemma 3.3
Define \(H(x)\subset \mathbb{R}^{n}\) as
Then, for all \(x\in U\), \(C^{S}(x)=P_{H(x)\cap U}(x)\).
Proof
Take \(x\in U\). If \(x\in K\), then \(x\in S(x)\) by A2, and it follows that \(R_{U}(x)=R^{S}(x)=x\), so that \(C^{S}(x)=\) \(\operatorname{circ}(x,x,x)=x\). Also, \(P_{H(x)}(x)=P_{K}(x)=x\) by (3.5), and the result holds. Assume that \(x\in U\setminus K\), so that \(H(x)\) is the half-space in (3.5).
In view of (3.5), we get, using (2.2) with \(a=x-P^{S}(x)\), \(\alpha =(x-P^{S}(x))^{\top }P^{S}(x)\), that \(P_{H(x)}(x)=P^{S}(x)\). It follows from the definition of the reflection operator \(R^{S}\) that
Since U is an affine manifold and \(H(x)\) is a half-space, we can apply Proposition 3.2 and conclude that \(C^{S}(x)=P_{H(x)\cap U}(x)\), proving the last statement of the lemma. By assumption, \(x\in U\), so that \(P_{H(x)\cap U}(x)=P_{H(x)}(x)\), establishing the result. □
This rewriting of the operator \(C^{S}\) as a projection onto a half-space (which varies with the argument of \(C^{S}\)) allows us to obtain the result for CARM analogous to Proposition 3.1.
Proposition 3.4
For all \(z\in K\cap U\) and all \(x\in U\), it holds that
-
(i)
\(\Vert C^{S}(x)-z \Vert ^{2}\le \Vert z-x \Vert ^{2}- \Vert C^{S}(x)-x \Vert ^{2} \) with \(C^{S}\) as in (3.2).
-
(ii)
\(C^{S}(x)\in U\) for all \(x\in U\).
Proof
For (i), take \(z\in K\cap U\) and \(x\in U\). By Lemma 3.3, \(C^{S}(x)=P_{H(x)}(x)\) for all \(x\in U\). Since \(z\in K\subset H(x)\), we can apply (3.4) with \(M=H(x)\), obtaining \(\Vert P_{H(x)}(x)-z \Vert ^{2}\le \Vert x-z \Vert ^{2}- \Vert P_{H(x)}(x)-x \Vert ^{2}\), which gives the result, invoking again Lemma 3.3. Item (ii) follows from Proposition 3.2 and Lemma 3.3. □
Propositions 3.1 and 3.4 allow us to prove convergence of the MAAP and CARM sequences, respectively, using the well-known Fejér monotonicity argument.
Theorem 3.5
Consider a closed and convex set \(K\subset \mathbb{R}^{n}\) and an affine manifold \(U\subset \mathbb{R}^{n}\) such that \(K\cap U\neq \emptyset \). Consider also a separating operator S for K satisfying Assumptions A1–A3. Then the sequences generated by either MAAP or CARM, starting from any initial point in the MAAP case and from a point in U in the CARM case, are well defined, contained in U, Fejér monotone with respect to \(K\cap U\), convergent, and their limits belong to \(K\cap U\), i.e., they solve CFP.
Proof
Let first \(\{x^{k}\}_{k\in \mathbb{N}}\) be the sequence generated by MAAP, i.e., \(x^{k+1}=T^{S}(x^{k})\). Take any \(z\in K\cap U\). Then, by Proposition 3.1,
and so \(\{x^{k}\}_{k\in \mathbb{N}}\) is Fejér monotone with respect to \(K \cap U\). By the definition of \(T^{S}\) in (3.2), \(\{x^{k}\}_{k\in \mathbb{N}}\subset U\). By Proposition 2.3(i), \(\{x^{k}\}_{k\in \mathbb{N}}\) is bounded. Also, \(\{ \Vert x^{k}-z \Vert \}_{k\in \mathbb{N}}\) is nonincreasing and nonnegative, therefore convergent, and thus the difference between consecutive iterates converges to 0. Hence, rewriting (3.7) as
we conclude that
and
Let x̄ be a cluster point of \(\{x^{k}\}_{k\in \mathbb{N}}\) and \(\{x^{j_{k}}\}_{j_{k}\in \mathbb{N}}\) be a subsequence of \(\{x^{k}\}_{k\in \mathbb{N}}\) convergent to x̄. By (3.9), \(\lim_{k\to \infty }\operatorname{dist}(x^{j_{k}},S(x^{j_{k}}))=0\). By Assumption A3 on the separating operator S, \(\bar{x}\in K\). It follows also from (3.9) that \(\lim_{k\to \infty }P^{S}(x^{j_{k}})=\bar{x}\). By (3.8) and the continuity of \(P_{U}\), \(P_{U}(\bar{x})=\bar{x}\), so that \(\bar{x}\in U\) and therefore \(\bar{x}\in K\cap U\). By Proposition 2.3(ii), \(\bar{x}=\lim_{k\to \infty }x^{k}\), completing the proof for the case of MAAP.
Let now \(\{ x^{k}\}_{k\in \mathbb{N}}\) be the sequence generated by CARM with \(x^{0}\in U\). By Lemma 3.3, whenever \(x^{k}\in U\), \(x^{k+1}\) is the projection onto a closed and convex set, namely \(H(x^{k})\), and hence it is well defined. Since \(x^{0}\in U\) by assumption, the whole sequence is well defined, and using recursively Proposition 3.4(ii), we have that \(\{x^{k}\}_{k\in \mathbb{N}}\subset U\). Now, we use Proposition 3.2, obtaining, for any \(z\in K\cap U\),
so that again \(\{ x^{k}\}_{k\in \mathbb{N}}\) is Fejér monotone with respect \(K\cap U\), and henceforth bounded. Also, with the same argument as before, we get
In view of (3.10) and the definition of circumcenter, \(\Vert x^{k+1}-x^{k} \Vert = \Vert x^{k+1}-R^{S}(x^{k}) \Vert \), so that \(\lim_{k\to \infty } \Vert x^{k+1} -R^{S}(x^{k}) \Vert =0\) implying that \(\lim_{k\to \infty } \Vert x^{k+1}-P^{S}(x^{k}) \Vert =0\). Thus, since \(\Vert x^{k}-P^{S}(x^{k}) \Vert \le \Vert x^{k}-x^{k+1} \Vert + \Vert x^{k+1}-P^{S}(x^{k}) \Vert \), we get that
Let x̄ be any cluster point of \(\{x^{k}\}_{k\in \mathbb{N}}\). Looking at (3.11) along a subsequence of \(\{x^{k}\}_{k\in \mathbb{N}}\) converging to x̄ and invoking Assumption A3 of the separating operator S, we conclude that \(\bar{x}\in K\). Since \(\{x^{k}\}_{k\in \mathbb{N}}\subset U\), we get that all cluster points of \(\{x^{k}\}_{k\in \mathbb{N}}\) belong to \(K\cap U\), and then, using Proposition 2.3(ii), we get that \(\lim_{k\to \infty }x^{k}=\bar{x}\in K\cap U\), establishing the convergence result for CARM. □
4 Linear convergence rate of MAAP and CARM under a local error bound assumption
In [9], the following global error bound assumption on the sets K, U, denoted as (EB), was considered:
-
(EB)
There exists \(\bar{\omega}>0\) such that \(\operatorname{dist}(x,K)\ge \bar{\omega} \operatorname{dist}(K\cap U)\) for all \(x\in U\).
Let us comment on the connection between (EB) and other notions of error bounds which have been introduced in the past, all of them related to regularity assumptions imposed on the solutions of certain problems. If the problem at hand consists of solving \(H(x)=0\) with a smooth \(H:\mathbb{R}^{n}\to \mathbb{R}^{m}\), a classical regularity condition demands that \(m=n\) and the Jacobian matrix of H be nonsingular at a solution \(x^{*}\), in which case Newton’s method, for instance, is known to enjoy superlinear or quadratic convergence. This condition implies local uniqueness of the solution \(x^{*}\). For problems with nonisolated solutions, a less demanding assumption is the notion of calmness (see [25], Chap. 8, Sect. F), which requires that
for all \(x\in \mathbb{R}^{n}\setminus S^{*}\) and some \(\omega >0\), where \(S^{*}\) is the solution set, i.e., the set of zeros of H. Calmness, also called upper-Lipschitz continuity (see [26]), is a classical example of error bound, and it holds in many situations (e.g., when H is affine by virtue of Hoffman’s lemma [27]). It implies that the solution set is locally a Riemannian manifold (see [28]), and it has been used for establishing superlinear convergence of Levenberg–Marquardt methods in [29].
When dealing with convex feasibility problems, as in this paper, it seems reasonable to replace the numerator of (4.1) by the distance from x to some of the convex sets, giving rise to (EB). Similar error bounds for feasibility problems can be found, for instance, in [30–33].
Under (EB), it was proved in [9] that MAP converges linearly with asymptotic constant bounded above by \(\sqrt{1-\bar{\omega}^{2}}\), and that CRM also converges linearly with a better upper bound for the asymptotic constant, namely \(\sqrt{(1-\bar{\omega}^{2})/(1+\bar{\omega}^{2})}\). In this section, we prove similar results for MAAP and CARM, assuming that a local error bound related not just to K, U, but also to the separating operator S. The local error bound, denoted as (LEB), is defined as follows:
-
(LEB)
There exist a set \(V\subset \mathbb{R}^{n}\) and a scalar \(\omega >0\) such that
$$ \operatorname{dist}\bigl(x,S(x)\bigr)\ge \omega \operatorname{dist}(x,K\cap U)\quad \text{for all } x \in U\cap V. $$
We reckon that (LEB) becomes meaningful, and relevant for establishing convergence rate results, only when the set V contains the tail of the sequence generated by the algorithm; otherwise it might be void (e.g., it holds trivially, with any ω, when \(U\cap V=\emptyset \)). In order to facilitate the presentation, we opted for introducing additional conditions on V in our convergence results rather than in the definition of (LEB).
The use of a local error bound instead of a global one makes sense, because the definition of linear convergence rate deals only with iterates \(x^{k}\) of the generated sequence with large enough k, and the convergence of the sequences of interest has already been established in Theorem 3.5, so that only points close enough to the limit \(x^{*}\) of the sequence matter. In fact, the convergence rate analysis for MAP and CRM in [9] holds, without any substantial change, under a local rather than global error bound.
The set V could be expected to be a neighborhood of the limit \(x^{*}\) of the sequence, but we do not specify it for the time being, because for the prototypical example of separating operator, i.e., the one in Example 2.6 of Sect. 3, it will have, as we will show later, a slightly more complicated structure: a ball centered at \(x^{*}\) minus a certain “slice”.
We start with the convergence rate analysis for MAAP.
Proposition 4.1
Assume that K, U and the separating operator S satisfy (LEB). Consider \(T^{S}:\mathbb{R}^{n}\to \mathbb{R}^{n}\) as in (3.1). Then, for all \(x\in U\cap V\),
with ω as in Assumption (LEB).
Proof
By Proposition 3.1, for all \(z\in K\cap U\) and all \(x\in \mathbb{R}^{n}\),
Note that \(\Vert P^{S}(x)-x \Vert =\operatorname{dist}(x,S(x))\) and that \(\Vert T^{S}(x)-P_{K\cap U}(T^{S}(x)) \Vert \le \Vert T^{S}(x)-z \Vert \) by the definition of \(P_{K\cap U}\). Take \(z= P_{K\cap U}(x)\), and get from (4.3)
Take now \(x\in U\cap V\) and invoke (LEB) to get from (4.4)
which immediately implies the result. □
Proposition 4.1 implies that if \(\{x^{k}\}_{k\in \mathbb{N}}\) is the sequence generated by MAAP and \(x^{k}\in V\) for large enough k, then the sequence \(\{\operatorname{dist}(x^{k},K\cap U)\}_{k\in \mathbb{N}}\) converges Q-linearly, with asymptotic constant bounded above by \(\sqrt{1-\omega ^{2}}\). In order to move from the distance sequence to the sequence \(\{x^{k}\}_{k\in \mathbb{N}}\) itself, we will invoke the following lemma from [9].
Lemma 4.2
Consider a nonempty closed convex set \(M\subset \mathbb{R}^{n}\) and a sequence \(\{y^{k}\}_{k\in \mathbb{N}}\subset \mathbb{R}^{n}\). Assume that \(\{y^{k}\}_{k\in \mathbb{N}}\) is Fejér monotone with respect to M, and that \(\{\operatorname{dist}(y^{k},M)\}_{k\in \mathbb{N}}\) converges R-linearly to 0. Then \(\{y^{k}\}_{k\in \mathbb{N}}\) converges R-linearly to some point \(y^{*}\in M\), with asymptotic constant bounded above by the asymptotic constant of \(\{\operatorname{dist}(y^{k},M)\}_{k\in \mathbb{N}}\).
Proof
See Lemma 3.4 in [9]. □
Next we establish the linear convergence of MAAP under (LEB).
Theorem 4.3
Consider a closed and convex set \(K\subset \mathbb{R}^{n}\) and an affine manifold \(U\subset \mathbb{R}^{n}\) such that \(K\cap U\neq \emptyset \). Moreover, assume that S is a separating operator for K satisfying Assumptions A1–A3. Suppose that K, U and the separating operator S satisfy (LEB). Let \(\{x^{k}\}_{k\in \mathbb{N}}\) be the sequence generated by MAAP from any starting point \(x^{0}\in \mathbb{R}^{n}\). If there exists \(k_{0}\) such that \(x^{k}\in V\) for all \(k\ge k_{0}\), then \(\{x^{k}\}_{k\in \mathbb{N}}\) converges R-linearly to some point \(x^{*}\in K\cap U\), and the asymptotic constant is bounded above by \(\sqrt{1-\omega ^{2}}\), with ω and V as in (LEB).
Proof
The fact that \(\{x^{k}\}_{k\in \mathbb{N}}\) converges to some \(x^{*}\in K\cap U\) has been established in Theorem 3.5. Take any \(k\ge k_{0}\). By assumption, \(x^{k}\in V\), and by definition of \(T^{S}\), \(x^{k}\in U\). So, we can take \(x=x^{k}\) in Proposition 4.1, in which case \(T^{S}(x)=x^{k+1}\), and rewrite (4.2) as \((1-\omega ^{2})\operatorname{dist}(x^{k},K\cap U)^{2}\ge \operatorname{dist}(x^{k+1},K\cap U)^{2}\) for \(k\ge k_{0}\), which implies that \(\{\operatorname{dist}(x^{k},K\cap U)\}_{k\in \mathbb{N}}\) converges Q-linearly, and hence R-linearly, with asymptotic constant bounded by \(\sqrt{1-\omega ^{2}}\). The corresponding result for the R-linear convergence of \(\{x^{k}\}_{k\in \mathbb{N}}\) with the same bound for the asymptotic constant follows then from Lemma 4.2, since \(\{x^{k}\}_{k\in \mathbb{N}}\) is Fejér monotone with respect to \(K\cap U\) by Theorem 3.5. □
Now we analyze the convergence rate of CARM under (LEB), for which a preliminary result, relating x, \(C^{S}(x)\) and \(T^{S}(x)\), is needed. The corresponding result for x, \(C(x)\), \(T(x)\) can be found in [15], where it is proved with a geometrical argument. Here we present an analytical one.
Proposition 4.4
Consider the operators \(C^{S},T^{S}:\mathbb{R}^{n}\to \mathbb{R}^{n}\) defined in (3.2). Then \(T^{S}(x)\) belongs to the segment between x and \(C^{S}(x)\) for all \(x\in U\).
Proof
Let E denote the affine manifold spanned by x, \(R^{S}(x)\) and \(R_{U}(R^{S}(x))\). By definition, the circumcenter of these three points, namely \(C^{S}(x)\), belongs to E. We claim that \(T^{S}(x)\) also belongs to E, and next we proceed to proving the claim. Since U is an affine manifold, \(P_{U}\) is an affine operator, so that \(P_{U}(\alpha x+(1-\alpha )x')=\alpha P_{U}(x)+(1-\alpha )P_{U}(x')\) for all \(\alpha \in \mathbb{R}\) and all \(x,x'\in \mathbb{R}^{n}\). By (3.1), \(R_{U}(R^{S}(x))=2P_{U}(R^{S}(x))-R^{S}(x)\), so that
On the other hand, using the affinity of \(P_{U}\), the definition of \(T^{S}\), and the assumption that \(x\in U\), we have
so that
i.e., \(T^{S}(x)\) is a convex combination of x, \(R_{U}(R^{S}(x))\) and \(R^{S}(x)\). Since these three points belong to E, the same holds for \(T^{S}(x)\), and the claim holds. We observe now that \(x\in U\) by assumption, \(T^{S}(x)\in U\) by definition, and \(C^{S}(x)\in U\) by Proposition 3.4(ii). Now we consider three cases: if \(\operatorname{dim} (E \cap U)=0\) then x, \(T^{S}(x)\) and \(C^{S}(x)\) coincide, and the result holds trivially. If \(\operatorname{dim} (E\cap U)=2\) then \(E\subset U\), so that \(R^{S}(x)\in U\) so that \(R_{U}(R^{S}(x))=R^{S}(x)\), in which case \(C^{S}(x)\) is the midpoint between x and \(R^{S}(x)\), which is precisely \(P^{S}(x)\). Hence, \(P^{S}(x)\in U\), so that \(T^{S}(x)=P_{U}(P^{S}(x))=P^{S}(x)=C^{S}(x)\), implying that \(T^{S}(x)\) and \(C^{S}(x)\) coincide, and the result holds trivially. The interesting case is the remaining one, i.e., \(\operatorname{dim} (E\cap U)=1\). In this case x, \(T^{S}(x)\) and \(C^{S}(x)\) lie in a line, so that we can write \(C^{S}(x)=x+\eta (T^{S}(x)-x)\) with \(\eta \in \mathbb{R}\), and it suffices to prove that \(\eta \ge 1\).
By the definition of η,
In view of (3.4) with \(M=U\), \(y=C^{S}(x)\), and \(x=R^{S}(x)\),
Then
using the definition of the circumcenter in the first equality, (4.9) in the inequality, and (4.6), as well as the definition of η, in the third equality. Combining (4.8) and (4.10), we get
implying that \(\vert \eta \vert \ge \vert 2-\eta \vert \), which holds only when \(\eta \ge 1\), completing the proof. □
We continue with another intermediate result.
Proposition 4.5
Assume that (LEB) holds for K, U, and S, and take \(x\in U\). If \(x,C^{S}(x)\in V\), then
with V, ω as in (LEB).
Proof
Take \(z\in K\cap U\), \(x\in V\cap U\). We use Proposition 3.1, rewriting (3.3) as
for all \(x\in \mathbb{R}^{n}\) and all \(z\in K\cap U\). Since \(x\in U\), we get from Lemma 3.3 that \(C^{S}(x)=P_{H(x)}(x)\). We apply next the well-known characterization of projections [24, Theorem 3.16] to get
In view of Proposition 4.4, \(P_{U}(P^{S}(x))\) is a convex combination of x and \(C^{S}(x)\), meaning that \(P_{U}(P^{S}(x))-C^{S}(x)\) is a nonnegative multiple of \(x-C^{S}(x)\), so that (4.13) implies
Expanding the inner product in (4.14), we obtain
Combining (4.12) and (4.15), we have
Now, since U is an affine manifold, \((y-P_{U}(y))^{\top }(w-P_{U}(y))=0\) for all \(y\in \mathbb{R}^{n}\) and all \(w\in U\), so that
Since \(C^{S}(x)\in U\) by Lemma 3.3, we use (4.17) with \(y=P^{S}(x)\), \(w=C^{S}(x)\), getting
Replacing (4.18) in (4.16), we obtain
using the facts that \(P^{S}(x)\in S(x)\) and \(z\in K\cap U\) in the last inequality. Now, we take \(z=P_{K\cap U}(x)\), recall that \(x,C^{S}(x)\in V\) by hypothesis, and invoke the (LEB) assumption, together with (4.19), in order to get
The result follows rearranging (4.20). □
Next we present our convergence rate result for CARM.
Theorem 4.6
Consider a closed and convex set \(K\subset \mathbb{R}^{n}\), an affine manifold \(U\subset \mathbb{R}^{n}\) such that \(K\cap U\neq \emptyset \), and a separating operator S for K satisfying Assumptions A1–A3. Suppose that K, U and the separating operator S satisfy (LEB). Let \(\{x^{k}\}_{k\in \mathbb{N}}\) be the sequence generated by CARM from any starting point \(x^{0}\in U\). If there exists \(k_{0}\) such that \(x^{k}\in V\) for all \(k\ge k_{0}\), then \(\{x^{k}\}_{k\in \mathbb{N}}\) converges R-linearly to some point \(x^{*}\in K\cap U\), and the asymptotic constant is bounded above by \(\sqrt{{(1-\omega ^{2})}/{(1+\omega ^{2})}}\), with ω and V as in (LEB).
Proof
The fact that \(\{x^{k}\}_{k\in \mathbb{N}}\) converges to some \(x^{*}\in K\cap U\) has been established in Theorem 3.5. Take any \(k\ge k_{0}\). By assumption, \(x^{k}\in V\) and by the definition of \(T^{S}\), \(x^{k}\in U\). Also, \(k+1\ge k_{0}\), so that \(C^{S}(x^{k})=x^{k+1}\in V\) So, we can take \(x=x^{k}\) in Proposition 4.5 and rewrite (4.11) as \((1+\omega ^{2})\operatorname{dist}(x^{k+1},K\cap U)^{2}\le (1-\omega ^{2})\operatorname{dist}(x^{k},K \cap U)^{2}\) or equivalently as
for all \(k\ge 0\), which immediately implies that \(\{\operatorname{dist}(x^{k},K\cap U)\}_{k\in \mathbb{N}}\) converges Q-linearly, and hence R-linearly, with asymptotic constant bounded by \(\sqrt{(1-\omega ^{2})/(1+\omega ^{2})}\). The corresponding result for the R-linear convergence of \(\{x^{k}\}_{k\in \mathbb{N}}\) with the same bound for the asymptotic constant follows then from Lemma 4.2, since \(\{x^{k}\}_{k\in \mathbb{N}}\) is Fejér monotone with respect to \(K\cap U\) by Theorem 3.5. □
From now on, given \(z\in \mathbb{R}^{n}\), \(\alpha >0, B(z,\alpha )\) will denote the closed ball centered at z with radius α.
The results of Theorems 4.3 and 4.6 become relevant only if we are able to find a separating operator S for K such that (LEB) holds. This is only possible if the “trivial” separating operator satisfies an error bound, i.e., if an error bound holds for the sets K, U themselves. Let \(\{ x^{k}\}_{k\in \mathbb{N}}\) be a sequence generated by CARM starting at some \(x^{0}\in U\). By Theorem 3.5, \(\{x^{k}\}_{k\in \mathbb{N}}\) converges to some \(x^{*}\in K\cap U\). Without loss of generality, we assume that \(x^{k}\notin K\) for all k, because otherwise the sequence is finite and it makes no sense to deal with convergence rates. In such a case, \(x^{*}\in \partial K\), the boundary of K. We also assume from now on that a local error bound for K, U, say (LEB1), holds at some neighborhood of \(x^{*}\), i.e.,
-
(LEB1)
There exist \(\rho ,\bar{\omega}>0\) such that \(\operatorname{dist}(x,K)\ge \bar{\omega} \operatorname{dist}(x,K\cap U)\) for all \(x\in U\cap B(x^{*},\rho )\).
Note that (LEB1) is simply a local version of (EB). Observe further that (LEB1) does not involve the separating operator S, and that it gives a specific form to the set V in (LEB), namely a ball around \(x^{*}\).
We will analyze the situation for what we call the “prototypical” separating operator, namely the operator S presented in Example 2.6, for the case in which K is the 0-sublevel set of a convex function \(g:\mathbb{R}^{n}\to \mathbb{R}\).
We will prove that under some additional mild assumptions on g, for any \(\omega <\bar{\omega}\), there exists a set V such that U, K, S satisfy a local error bound assumption, say (LEB), with the pair ω, V.
Indeed, it will not be necessary to assume (LEB) in the convergence rate result; we will prove that under (LEB1), (LEB) will be satisfied for any \(\omega <\bar{\omega}\) with an appropriate set V which does contain the tail of the sequence.
Our proof strategy will be as follows: in order to check that the error bounds for K, U and \(S(x)\), U are virtually the same for x close to the limit \(x^{*}\) of the CARM sequence, we will prove that the quotient between \(\operatorname{dist}(x,S(x))\) and \(\operatorname{dist}(x,K)\) approaches 1 when x approaches \(x^{*}\). Since both distances vanish at \(x=x^{*}\), we will take the quotient of their first order approximations, in a L’Hôspital’s rule fashion, and the result will be established as long as the numerator and denominator of the new quotient are bounded away from 0, because otherwise this quotient remains indeterminate. This “bad” situation occurs when x approaches \(x^{*}\) along a direction almost tangent to \(K\cap U\), or equivalently almost normal to \(\nabla g(x^{*})\). Fortunately, the CARM sequence, being Fejér monotone with respect to \(K\cap U\), does not approach \(x^{*}\) in such a tangential way. We will take an adequate value smaller than the angle between \(\nabla g(x^{*})\) and \(x^{k}-x^{*}\). Then, we will exclude directions whose angle with \(\nabla g(x^{*})\) is smaller than such a value and find a ball around \(x^{*}\) such that, given any \(\omega <\bar{\omega}\), (LEB) holds with parameter ω in the set V defined as the ball minus the “slice” containing the “bad” directions. Because of the Fejér monotonicity of the CARM sequence, its iterates will remain in V for large enough k, and the results of Theorem 4.6 will hold with such ω. We proceed to follow this strategy in detail.
The additional assumptions on g are continuous differentiability and a Slater condition, meaning that there exists \(\hat{x}\in \mathbb{R}^{n}\) such that \(g(\hat{x})<0\). When g is of class \(\mathcal{C}^{1}\), the separating operator of Example 2.6 becomes
Proposition 4.7
Let \(g:\mathbb{R}^{n}\to \mathbb{R}\) be convex, of class \(\mathcal{C}^{1}\), and such that there exists \(\hat{x}\in \mathbb{R}^{n}\) satisfying \(g(\hat{x})< 0\). Take \(K=\{x\in \mathbb{R}^{n}: g(x)\le 0\}\). Assume that K, U satisfy (LEB1). Take \(x^{*}\) as in (LEB1), fix \(0<\nu < \Vert \nabla g(x^{*}) \Vert \) (we will establish that \(0\ne \nabla g(x^{*})\) in the proof of this proposition), and define \(L_{\nu }:=\{z\in \mathbb{R}^{n}: \vert \nabla g(x^{*})^{\top }(z-x^{*}) \vert \le \nu \Vert z-x^{*} \Vert \}\). Consider the separating operator S defined in (4.21). Then, for any \(\omega <\bar{\omega}\), with ω̄ as in (LEB1), there exists \(\beta >0\) such that K, U, S satisfy (LEB) with ω and \(V:=B(x^{*},\beta )\setminus L_{\nu }\).
Proof
The fact that \(0 < \nu < \Vert \nabla g(x^{*}) \Vert \) ensures that \(V\ne \emptyset \). We will prove that, for x close to \(x^{*}\), the quotient \(\operatorname{dist}(x,S(x))/\operatorname{dist}(x,K)\) approaches 1, and first we proceed to evaluate \(\operatorname{dist}(x,S(x))\). Note that when \(x\in K\subset S(x)\), the inequality in (LEB1) holds trivially because of A1. Thus, we assume that \(x\notin K\), so that \(x\notin S(x)\) by Proposition 2.5, and hence \(g(x)> 0\) and \(S(x)=\{z\in \mathbb{R}^{n}:\nabla g(x)^{\top }(z-x)+g(x)\leq 0\}\), implying, in view of (2.2), that
Now we look for a more manageable expression for \(\operatorname{dist}(x,K)= \Vert x-P_{K}(x) \Vert \). Let \(y=P_{K}(x)\). So, y is the unique solution of the problem \(\min \Vert z-x \Vert ^{2}\) s.t. \(g(z)\le 0\), whose first order optimality conditions, sufficient by convexity of g, are
with \(\lambda \ge 0\), so that
Now we observe that the Slater condition implies that the right-hand sides of both (4.22) and (4.24) are well defined: since \(x\notin K\), \(g(x)> 0\); since \(y=P_{K}(x)\in \partial K\), \(g(y)=0\). By the Slater condition, \(g(x)>g(\hat{x})\) and \(g(y)>g(\hat{x})\), so that neither x nor y are minimizers of g, and hence both \(\nabla g(y)\) and \(\nabla g(x)\) are nonzero. By the same token, \(\nabla g(x^{*})\ne 0\), because \(x^{*}\) is not a minimizer of g: being the limit of a sequence lying outside K, \(x^{*}\) belongs to the boundary of K, so that \(g(x^{*})=0>g(\hat{x})\).
where the notation \(y(x)\), \(\lambda (x)\) emphasizes that both \(y=P_{K}(x)\) and the multiplier λ depend on x.
We look at the right-hand side of (4.25) for x close to \(x^{*}\in K\), in which case y, by the continuity of \(P_{K}\), approaches \(P(x^{*})=x^{*}\), so that \(\nabla g(y(x))\) approaches \(\nabla g(x^{*})\ne 0\), and hence, in view of (4.22), \(\lambda (x)\) approaches 0. So, the product of the first two factors in the right-hand side of (4.25) approaches \(\Vert \nabla g(x^{*}) \Vert ^{2}\), but the quotient is indeterminate, because both the numerator and the denominator approach 0, requiring a more precise first order analysis.
Expanding \(g(x)\) around \(x^{*}\) and taking into account that \(g(x^{*})=0\), we get
Define \(t= \Vert x-x^{*} \Vert \), \(d=t^{-1}(x-x^{*})\) so that \(\Vert d \Vert =1\), and (4.26) becomes
Now we look at \(\lambda (x)\). Let \(\phi (t)=\lambda (x^{*}+td)\). Note that, for \(x\in \partial K\), we get \(y(x)=x\), so that \(0=\lambda (x)\nabla g(x)\) and hence \(\lambda (x)=0\). Thus, \(\phi (0)=0\) and
where \(\phi '_{+}(0)\) denotes the right derivative of \(\phi (t)\) at 0. Since we assume that \(x\notin K\), we have \(y(x)\in \partial K\) and hence, using (4.23),
for all \(t>0\). Let \(\sigma (t)=\phi (t)\nabla g(y(x^{*}+td))\), \(\psi (t)=g(x^{*}+td-\sigma (t))\), so that (4.28) becomes \(0=\psi (t)=g(x^{*}+td-\sigma (t))\) for all \(t>0\) and hence
Taking limits in (4.29) with \(t\to 0^{+}\) and noting that \(y(x^{*})=x^{*}\) because \(x^{*}\in K\), we get
where \(\sigma '_{+}(0)\) denotes the right derivative of \(\sigma (t)\) at 0. We compute \(\sigma '_{+}(0)\) directly from the definition, because we assume that g is of class \(\mathcal{C}^{1}\) but perhaps not of class \(\mathcal{C}^{2}\). Recalling that \(\phi (0)=0\), we have that
using the facts that g is class \(\mathcal{C} ^{1}\) and that \(y(x^{*})=x^{*}\). Replacing (4.31) in (4.30), we get that \(0=\nabla g(x^{*})^{\top }(d-\phi '_{+}(0)\nabla g(x^{*}))\), and therefore
Using (4.27) and (4.32), we obtain
Replacing (4.33) and (4.26) in (4.25), we obtain
Now we recall that we must check the inequality of (LEB) only for points in V, and that \(V\cap L_{\nu }=\emptyset \) with \(L_{\nu }=\{z\in \mathbb{R}^{n}:\nabla g(x^{*})(z-x^{*})\le \nu \Vert z-x^{*} \Vert \}\). So, for \(x\in V\), we have \(\vert \nabla g(x^{*})^{\top }(x-x^{*}) \vert \ge \nu \Vert x-x^{*} \Vert \), which implies \(\vert \nabla g(x^{*})^{\top }d \vert \ge \nu \), i.e., \(\nabla g(x^{*})^{\top }d\) is bounded away from 0, independently of the direction d. In this situation, it is clear that the rightmost expression of (4.34) tends to 1 when \(t\to 0^{+}\), and so there exists some \(\beta >0\) such that, for \(t\in (0,\beta )\), such an expression is not smaller than \(\omega /\bar{\omega}\), with ω as in (LEB) and ω̄ as in (LEB1). Without loss of generality, we assume that \(\beta \le \rho \), with ρ as in Assumption (LEB1). Since \(t= \Vert x-x^{*} \Vert \), we have proved that, for \(x\in U\cap B(x^{*},\beta )\setminus L_{\nu }=U\cap V\), it holds that
It follows from (4.35) that
for all \(x\in V\cap U\). Dividing both sides of (4.36) by \(\operatorname{dist}(x,K\cap U)\), recalling that \(\beta \le \rho \), and invoking Assumption (LEB1), we obtain
for all \(x\in U\cap V\), thus proving that (LEB) holds for any \(\omega <\bar{\omega}\), with \(V=B(x^{*},\beta )\setminus L_{\nu }\) and with ω̄ as in (LEB1). □
We have proved that for the prototypical separating operator given by (4.21), the result of Proposition 4.5 holds. In order to obtain the convergence rate result of Theorem 4.6 for this operator, we must prove that in this case the tail of the sequence \(\{x^{k}\}_{k\in \mathbb{N}}\) generated by CARM is contained in \(V=B(x^{*},\beta )\setminus L_{\nu }\). Note that β depends on ν. Next we will show that if we take ν smaller than a certain constant which depends on \(x^{*}\), the initial iterate \(x^{0}\), the Slater point x̂, and the parameter ω̄ of (LEB1), then the tail of the sequence \(\{x^{k}\}_{k\in \mathbb{N}}\) will remain outside \(L_{\nu }\). Clearly, this will suffice, because the sequence eventually remains in any ball around its limit, which is \(x^{*}\), so that its tail will surely be contained in \(B(x^{*},\beta )\). The fact that \(x^{k}\notin L_{\nu }\) for large enough k is a consequence of the Fejér monotonicity of the sequence with respect to \(K\cap U\), proved in Theorem 3.5. In the next proposition we will prove that indeed \(x^{k}\notin L_{\nu }\) for large enough k, and so the result of Theorem 4.6 holds for this separating operator.
Proposition 4.8
Let \(g:\mathbb{R}^{n}\to \mathbb{R}\) be convex, of class \(\mathcal{C}^{1}\), and such that there exists \(\hat{x}\in \mathbb{R}^{n}\) satisfying \(g(\hat{x})< 0\). Take \(K=\{x\in \mathbb{R}^{n}: g(x)\le 0\}\). Assume that K, U satisfy (LEB1). Consider the separating operator S defined in (4.21). Let \(\{x^{k}\}_{k\in \mathbb{N}}\) be a sequence generated by (CARM) with starting point \(x^{0}\in U\) and limit point \(x^{*}\in K\cap U\). Take \(\nu >0\) satisfying
with ω̄ as in (LEB1), and define
Then there exists \(k_{0}\) such that, for all \(k\ge k_{0}\), \(x_{k}\in B(x^{*},\beta )\setminus L_{\nu }\), with β as in Proposition 4.7.
Proof
Assume that \(x^{k}\in L_{\nu }\), i.e.,
Using the gradient inequality, the fact that \(g(x^{*})=0\), and (4.38), we obtain
By Theorem 3.5, \(\{x^{k}\}_{k\in \mathbb{N}}\) is Fejér monotone with respect to \(K\cap U\). Thus, we use Proposition 2.3(iii) and (LEB1) in (4.39), obtaining
Denote \(y^{k}=P_{K}(x^{k})\). Using again the gradient inequality, together with the facts that \(g(y^{k})=0\) and that \(x^{k}-y^{k}\) and \(\nabla g(y^{k})\) are collinear, which is a consequence of (4.23), and the nonnegativity of λ, we get from (4.40)
Now we use the Slater assumption on g for finding a lower bound for \(\Vert \nabla g(y^{k}) \Vert \). Take x̂ such that \(g(\hat{x}) <0\), and apply once again the gradient inequality.
Multiplying (4.42) by −1, we get
using the facts that \(y^{k}=P_{K}(x^{k})\) and that \(x^{*}\in K\) in the third inequality and the Féjer monotonicity of \(\{ x^{k}\}_{k\in \mathbb{N}}\) with respect to \(K\cap U\) in the fourth one. Now, since \(\lim_{k\to \infty } x^{k}=x^{*}\), there exists \(k_{1}\) such that \(\Vert x^{k}-x^{*} \Vert \le \rho \) for \(k\ge k_{1}\), with ρ as in (LEB1). So, in view of (4.43), with \(k\ge k_{1}\), \(\vert g(\hat{x}) \vert \le \Vert \nabla g(y^{k}) \Vert ( \Vert \hat{x}-x^{*} \Vert + \Vert x^{*}-x^{0} \Vert )\), implying that
Combining (4.40), (4.41), (4.44), and (4.37), we obtain
implying
The inequality in (4.45) has been obtained by assuming that \(x^{k}\in L_{\nu }\). Now, since \(\lim_{k\to \infty }x^{k}=x^{*}\) and g is of class \(\mathcal{C}^{1}\), there exists \(k_{0}\ge k_{1}\) such that \(\Vert \nabla g(x^{*})-\nabla g(x^{k}) \Vert \le \nu \) for \(k\ge k_{0}\), and hence (4.45) implies that, for \(k\ge k_{0}\), \(x^{k}\notin L_{\nu }\). Since \(k_{0}\ge k_{1}\), \(x^{k}\in B(x^{*},\beta )\) for \(k\ge k_{0}\), meaning that when \(k\ge k_{0}\), \(x^{k}\in B(x^{*},\beta )\setminus L_{\nu }\), establishing the result. □
Now we conclude the analysis of CARM with the prototypical separating operator, proving that under smoothness of g and a Slater condition, the CARM method achieves linear convergence with precisely the same bound for the asymptotic constant as CRM, thus showing that the approximation of \(P_{K}\) by \(P^{S}\) produces no deterioration in the convergence rate. We emphasize again that, for this operator S, \(P_{S}\) has an elementary closed formula, namely the one given by
Theorem 4.9
Let \(g:\mathbb{R}^{n}\to \mathbb{R}\) be convex, of class \(\mathcal{C}^{1}\), and such that there exists \(\hat{x}\in \mathbb{R}^{n}\) satisfying \(g(\hat{x})< 0\). Take \(K=\{x\in \mathbb{R}^{n}: g(x)\le 0\}\). Assume that K, U satisfy (LEB1). Consider the separating operator S defined in (4.21). Let \(\{x^{k}\}_{k\in \mathbb{N}}\) be a sequence generated by CARM with the starting point \(x^{0}\in U\). Then \(\{x^{k}\}_{k\in \mathbb{N}}\) converges to some \(x^{*}\in K\cap U\) with linear convergence rate and asymptotic constant bounded above by \(\sqrt{{(1-\bar{\omega}^{2})}/{(1+\bar{\omega}^{2})}}\), with ω̄ as in (LEB1).
Proof
The fact that \(\{x^{k}\}_{k\in \mathbb{N}}\) converges to some \(x^{*}\in K\cap 1\) follows from Theorem 3.5. Let ω̄ be the parameter in (LEB1). By Proposition 4.7, P, K, and S satisfy (LEB) with any parameter \(\omega \le \bar{\omega}\) and a suitable V. By Proposition 4.8, \(x^{k}\in V\) for large enough k, so that the assumptions of Theorem 4.6 hold, and hence
for any \(\omega \le \bar{\omega}\). Taking infimum in the right-hand side of (4.46) with \(\omega <\bar{\omega}\), we conclude that the inequality holds also for ω̄, i.e.,
completing the proof. □
We mention that the results of Propositions 4.7 and 4.8 and Theorem 4.9 can be extended without any complications to the separating operator S in Example 2.7, so that they can be applied for accelerating SiPM for CFP with m convex sets, presented as 0-sublevel sets of smooth convex functions. We omit the details.
Let us continue with a comment on the additional assumptions on g used for proving Theorem 4.9, namely continuous differentiability and the Slater condition. We guess that the second one is indeed needed for the validity of the result; regarding smoothness of g, we conjecture that the CARM sequence still converges linearly under (LEB) when g is not smooth, but with an asymptotic constant possibly larger than the one for CRM. It seems clear that the proof of such a result requires techniques quite different from those used here.
Finally, we address the issue of the extension of the results in this paper to the framework of infinite dimensional Hilbert spaces. We have refrained from developing our analysis in such a framework because our main focus lies in the extension of the convergence rate results for the exact algorithms presented in [9] to the approximate methods introduced in this paper, so that in order to establish the appropriate comparisons between the exact and approximate methods one should start by rewriting the results of [9] in the context of Hilbert spaces, which would unduly extend the length of this paper. We just comment that it is possible to attain such an aim following the approach presented in [11, 12].
5 Convergence rate results for CARM and MAAP applied to specific instances of CFP
The results of Sect. 4 indicate that when K, U satisfy an error bound assumption, both CARM and MAAP enjoy linear convergence rates (with a better asymptotic constant for the former). In this section we present two families of CFP instances for which the difference between CARM and MAAP is more dramatic: using the prototypical separating operator, in the first one (for which (LEB) does not hold), MAAP converges sublinearly and CARM converges linearly; in the second one, MAAP converges linearly, as in Sect. 4, but CARM converges superlinearly. Similar results on the behavior of MAP and CRM for these two families can be found in [9].
Throughout this section, \(K\subset \mathbb{R}^{n+1}\) will be the epigraph of a convex function \(f:\mathbb{R}^{n}\to \mathbb{R}\) of class \(\mathcal{C}^{1}\) and U will be the hyperplane \(U:=\{x\in \mathbb{R}^{n+1}:x_{n+1}=0\}\). We mention that the specific form of U and the fact that K is an epigraph entail little loss of generality; but the smoothness assumption on f and the fact that U is a hyperplane (i.e. an affine manifold of codimension 1) are indeed more restrictive.
First we look at the case when the following assumptions hold:
-
B1.
\(f(0)=0\).
-
B2.
\(\nabla f(x)=0\) if and only if \(x=0\).
Note that under B1–B2, 0 is the unique minimizer of f and that \(K\cap U=\{0\}\). It follows from Theorem 3.5 that the sequences generated by MAAP and CARM, from any initial iterate in \(\mathbb{R}^{n}\) and U respectively, converge to \(x^{*}=0\). We prove next that under these assumptions MAAP converges sublinearly.
Proposition 5.1
Assume that \(K\subset \mathbb{R}^{n+1}\) is the epigraph of a convex function \(f:\mathbb{R}^{n}\to \mathbb{R}\) of class \(\mathcal{C}^{1}\) satisfying B1–B2, and \(U:=\{x\in \mathbb{R}^{n+1}:x_{n+1}=0\}\). Consider the separating operator given by (4.21) for the function \(g:\mathbb{R}^{n+1}\to \mathbb{R}\) defined as \(g(x_{1}, \dots ,x_{n+1})=f(x_{1}, \dots ,x_{n})-x_{n+1}\). Then the sequence \(\{x^{k}\}_{k\in \mathbb{N}}\) generated by MAAP starting at any \(x^{0}\in \mathbb{R}^{n+1}\) converges sublinearly to \(x^{*}=0\).
Proof
Convergence of \(\{x^{k}\}_{k\in \mathbb{N}}\) to \(x^{*}=0\) results from Theorem 3.5. We write vectors in \(\mathbb{R}^{n+1}\) as \((x,s)\) with \(x\in \mathbb{R}^{n}\), \(s\in \mathbb{R}\). We start by computing the formula for \(T^{S}(x,0)\). By definition of g, \(\nabla g(x,s)=(\nabla f(x),-1)^{\top }\). Let
By (2.2),
which implies, since \(P_{U}(x,s)=(x,0)\),
Let \(\bar{x}= \Vert x \Vert ^{-1}x\). From (5.2),
Note that \(\lim_{x\to 0}\alpha (x)=\alpha (0)=1\) and that, by B1–B2, \(\lim_{x\to 0}\nabla f(x)=\nabla f(0)=0\), \(f(x)=o( \Vert x \Vert )\), implying that \(\lim_{x\to 0}f(x)/ \Vert x \Vert =0\), and conclude from (5.3) that
Now, since \(x^{k+1}=T^{S}(x^{k})\), \(x^{k}\in U\) for all \(k\ge 0\), and \(x^{*}=0\), we get from (5.4)
and hence \(\{x^{k}\}_{k\in \mathbb{N}}\) converges sublinearly. □
Next we study the CARM sequence in the same setting.
Proposition 5.2
Assume that \(K\subset \mathbb{R}^{n+1}\) is the epigraph of a convex function \(f:\mathbb{R}^{n}\to \mathbb{R}\) of class \(\mathcal{C}^{1}\) satisfying B1–B2, and \(U:=\{x\in \mathbb{R}^{n+1}:x_{n+1}=0\}\). Consider the separating operator given by (4.21) for the function \(g:\mathbb{R}^{n+1}\to \mathbb{R}\) defined as \(g(x_{1}, \dots ,x_{n+1})=f(x_{1}, \dots ,x_{n})-x_{n+1}\). For \(0\ne x\in \mathbb{R}^{n}\), define
Then
with \(C^{S}\) as in (3.2).
Proof
Define
By (3.2),
From Proposition 4.4,
for some \(\eta \ge 1\). By the definition of circumcenter, \(\Vert C^{S}(x)-x \Vert = \Vert C^{S}(x)-R^{S}(x) \Vert \). Combining this equation with (5.8) and (5.9), one obtains \(\eta =1+ \Vert \nabla f(x) \Vert ^{-1}\), which implies, in view of (5.7), that
so that
using the fact that \(f(x)\le \nabla f(x)^{\top }x\), which follows from that gradient inequality with the points x and 0. It follows from (5.12) and the definitions of \(\alpha (x)\), \(\beta (x)\), that
using (5.5) in the last equality. □
We prove next the linear convergence of the CARM sequence in this setting under the following additional assumption on f:
-
(B3)
\(\liminf_{x\to 0} \frac{f(x)}{ \Vert x \Vert \Vert \nabla f(x) \Vert }>0 \).
Corollary 5.3
Under the assumptions of Proposition 5.2, if f satisfies B3 and \(\{x^{k}\}_{k\in \mathbb{N}}\) is the sequence generated by CARM starting at any \(x^{0}\in U\), then \(\lim_{k\to \infty }x^{k}=x^{*}=0\), and
with
so that \(\{x^{k}\}_{k\in \mathbb{N}}\) converges linearly, with asymptotic constant bounded by \(\sqrt{1-\delta ^{2}}\).
Proof
Convergence of \(\{x^{k}\}_{k\in \mathbb{N}}\) follows from Theorem 3.5. Since \(x^{k+1}=C^{S}(x^{k})\), we invoke Proposition 5.2, observing that \(\liminf_{x\to 0}\theta (x)=\delta \) and taking square root and lim sup in (5.6):
using (5.5) and Assumption B3. □
In [9] it was shown that Assumption B3 holds in several cases, e.g., when f is of class \(\mathcal{C}^{2}\) and the Hessian \(\nabla ^{2} f(0)\) is positive definite, in which case
where \(\lambda _{\max }\), \(\lambda _{\min }\) are the largest and smallest eigenvalues of \(\nabla ^{2} f(0)\), or when \(f(x)=\varphi ( \Vert x \Vert )\), where \(\varphi :\mathbb{R}\to \mathbb{R}\) is a convex function of class \(\mathcal{C}^{r}\), satisfying \(\varphi (0)=\varphi '(0)=0\), in which case \(\delta \ge 1/p\), where \(p\le r\) is defined as \(p=\min \{j:\varphi ^{(j)}\ne 0\}\).
In all these instances, in view of Proposition 5.1 and Corollary 5.3, the CARM sequence converges linearly, while the MAAP one converges sublinearly. If we look at the formulae for \(T^{S}\) and \(C^{S}\), in (5.2) and (5.11), we note that both operators move from \((x,0)\) in the direction \((\nabla f(x),0)\) but with different step-sizes. Looking now at (5.3) and (5.5), we see that the relevant factors of these step-sizes, for x near 0, are \(f(x)/ \Vert x \Vert \) and \(f(x)/( \Vert x \Vert \Vert \nabla f(x) \Vert )\). Since we assume that \(\nabla f(0)=0\), the first one vanishes near 0, inducing the sublinear behavior of MAAP, while the second one, in rather generic situations, will stay away from 0. It is the additional presence of \(\Vert \nabla f(x) \Vert \) in the denominator of \(\theta (x)\) which makes all the difference.
Now we analyze the second family, which is similar to the first one, excepting that condition B1 is replaced by the following one:
-
(B1’)
\(f(0)<0\).
We also make a further simplifying assumption, which is not essential for the result, but keeps the calculations simpler. We take f of the form \(f(x)=\varphi ( \Vert x \Vert )\) with \(\varphi :\mathbb{R}\to \mathbb{R}\). Rewriting B1’, B2 in terms of φ, we assume that
-
(i)
\(\varphi :\mathbb{R}\to \mathbb{R}\) is strictly convex and of class \(\mathcal{C} ^{1}\),
-
(ii)
\(\varphi (0)<0\),
-
(iii)
\(\varphi '(0)=0\).
This form of f gives a one-dimensional flavor to this family. Now, \(0\in \mathbb{R}^{n+1}\) cannot be the limit point of the MAAP or the CARM sequences: 0 is still the unique minimizer of f, but since \(f(0)<0\), \(0\notin \partial K\), while the limit points of the sequences, unless they are finite (in which case convergence rates make no sense), do belong to the boundary of K. Hence, both f and ∇f do not vanish at such limit points, implying that both φ and \(\varphi '\) are nonzero at the norms of the limit points. We have the following result for this family.
Proposition 5.4
Assume that \(U,K\subset \mathbb{R}^{n+1}\) are defined as \(U=\{(x,0): x\in \mathbb{R}^{n}\}\) and \(K= \mathrm{epi}(f)\), where \(f(x)=\phi ( \Vert x \Vert )\) and ϕ satisfies (i)–(iii). Let \(C^{S}\), \(T^{S}\) be as defined in (3.2), and \((x^{*},0)\), \((z^{*},0)\) be the limits of the sequences \(\{x^{k}\}_{k\in \mathbb{N}}\), \(\{z^{k}\}_{k\in \mathbb{N}}\) generated by CARM and MAAP, starting from some \((x^{0},0)\in U\) and some \((z^{0},w)\in \mathbb{R}^{n+1}\), respectively. Then
and
Proof
We start by rewriting the formulae for \(C^{S}(x)\), \(T^{S}(x)\) in terms of φ. We also define \(t:= \Vert x \Vert \). Using (5.1), (5.2), (5.5), and (5.6), we obtain
and
Note that x, \(T^{S}(x)\), \(C^{S}(x)\) are collinear (the one-dimensional flavor!), so that the same happens with \(x^{*}\), \(z^{*}\). Let \(r:= \Vert x^{*} \Vert \), \(s:= \Vert z^{*} \Vert \), so that \(x^{*}=(r/t)x\), \(z^{*}=(s/t)x\). Then, using (5.15), (5.16), we get
and
using in the last equalities of (5.17) and (5.18) the fact that \(\varphi (r)=\varphi (s)=0\), which results from \(f(x^{*})=f(z^{*})=0\). Now we take limits with \(x\to z^{*}\), \(x\to x^{*}\) in the leftmost expressions of (5.17), (5.18), which demands limits with \(t\to s\), \(t\to r\) in the rightmost expressions of them.
and
completing the proof. □
Corollary 5.5
Under the assumptions of Proposition 5.4, the sequence generated by MAAP converges Q-linearly to a point \((x^{*},0)\in K\cap U\), with asymptotic constant equal to \(1/(1+\varphi '( \Vert x^{*} \Vert )^{2})\), and the sequence generated by CARM converges superlinearly.
Proof
Recall that if \(\{(x^{k},0)\}_{k\in \mathbb{N}}\) is the MAAP sequence, then \((x^{k+1},0)=T^{S}(x^{k},0)\), and if \(\{(z^{k},0)\}_{k\in \mathbb{N}}\) is the CARM sequence, then \((z^{k+1},0)=C^{S}(z^{k},0)\). Recall also that for both sequences the last components of the iterates vanish because \(\{x^{k}\}_{k\in \mathbb{N}}, \{z^{k}\}_{k\in \mathbb{N}}\subset U\). Then the result follows immediately from (5.13) and (5.14) in Proposition 5.4. □
We mention that the results of Corollary 5.5 coincide with those obtained in Corollary 4.11 of [9] for the sequences generated by MAP and CRM applied to the same families of instances of CFP, showing that the convergence rate results of the exact methods are preserved without any deterioration also in these cases.
6 Numerical comparisons
In this section, we perform numerical comparisons between CARM, MAAP, CRM, and MAP. These methods are employed for solving the particular CFP of finding a common point in the intersection of finitely many ellipsoids, that is, finding
with each ellipsoid \(\mathcal{E}_{i}\) being given by
where \(g_{i}:\mathbb{R}^{n}\to \mathbb{R}\) is given by \(g_{i}(x) = x^{\top } A_{i} x +2 x^{\top } b^{i} - \alpha _{i}\), each \(A_{i}\) is a symmetric positive definite matrix, \(b^{i}\) is an n-vector, \(\alpha _{i}\) is a positive scalar.
Problem (6.1) has importance on its own (see [34, 35]), and both CRM and MAP are suitable for solving it. Nevertheless, the main motivation for tackling it with approximate projection methods is that the computation of exact projections onto ellipsoids is a formidable burden for any algorithm to bear. Since the gradient of each \(g_{i}\) is easily available, we can consider the separable operators given in Examples 2.6 and 2.7 and use CARM and MAAP to solve problem (6.1) as well. What is more, the experiments illustrate that, in this case, CARM handily outperforms CRM in terms of CPU time, while still being competitive in terms of iteration count. The exact projection onto each ellipsoid is so demanding that even MAAP has a better CPU time result than CRM.
The four methods are employed upon Pierra’s product space reformulation, that is, we seek a point \(\mathbf{x}^{*} \in \mathbf{K}\cap \mathbf{D}\), where \(\mathbf{K} := \mathcal{E}_{1}\times \mathcal{E}_{2}\times \cdots \times \mathcal{E}_{m}\) and D is the diagonal space. For each sequence \(\{\mathbf{x}^{k} \}_{k\in \mathbb{N}}\) that we generate, we consider the tolerance \(\varepsilon := 10^{-6}\) and use as stopping criteria the gap distances
where \(P_{\mathbf{K}} (\mathbf{x}^{k})\) is utilized for CRM and MAP, and \(P^{S}_{\mathbf{K}} (\mathbf{x}^{k})\) is used for CARM and MAAP. Note that if the correspondent criterion is not met in a given iteration, the projection computed is employed to yield the next step. We also set the maximum number of iterations as 50,000.
To execute our tests, we randomly generate 160 instances of (6.1) in the following manner. We range the dimension size n in \(\{10, 50, 100, 200\}\), and for each n, we took the number m of underlying sets varying in \(\{5, 10, 20 ,50\}\). For each of these 16 pairs \((m,n)\), we build 10 randomly generated instances of (6.1). Each matrix \(A_{i}\) is of the form \(A_{i} = \gamma \operatorname{Id}+B_{i}^{\top }B_{i}\), with \(B_{i} \in \mathbb{R}^{n\times n}\), \(\gamma \in \mathbb{R}_{++}\). Matrix \(B_{i}\) is a sparse matrix sampled from the standard normal distribution with sparsity density \(p=2 n^{-1}\), and each vector \(b^{i}\) is sampled from the uniform distribution between \([0,1]\). We then choose each \(\alpha _{i}\) so that \(\alpha _{i} > (b^{i})^{\top }Ab^{i}\), which ensures that 0 belongs to every \(\mathcal{E}_{i}\), and thus (6.1) is feasible. The initial point \(x^{0}\) is of the form \((\eta ,\eta ,\ldots , \eta )\in \mathbb{R}^{n}\), with η being negative and \(\vert \eta \vert \) sufficient large, guaranteeing that \(x^{0}\) is far from all \(\mathcal{E}_{i}\)s.
The computational experiments were performed on an Intel Xeon W-2133 3.60 GHz with 32 GB of RAM running Ubuntu 20.04 and using Julia v1.5 programming language [36]. The codes for our experiments are fully available in https://github.com/lrsantos11/CRM-CFP.
We remark that, as CRM and MAP rely on the computation of exact projections, ALGENCAN [37], an augmented Lagrangian algorithm implemented in Fortran (wrapped in Julia using NLPModels.jl [38]) was used in our code to compute projections onto the ellipsoids. Each projection was found by solving the correspondent quadratic minimization problem with quadratic constraints (the \(g_{i}\)s).
The results are summarized in Fig. 1 using a performance profile [39]. Performance profiles allow one to compare different methods on problems set with respect to a performance measure. The vertical axis indicates the percentage of problems solved, while the horizontal axis indicates, in log-scale, the corresponding factor of the performance index used by the best solver. In this case, when looking at CPU time (in seconds), the performance profile shows that CARM always did better than the other three methods. The picture also shows that MAAP took less time than CRM and MAP. We conclude this examination by presenting, in Table 1, the following descriptive statistics of the benchmark of CARM, MAAP, CRM, and MAP: mean, maximum (max), minimum (min), and standard deviation (std) for iteration count (it) and CPU time in seconds (CPU (s)). In particular, CARM was, in average, almost 3000 times faster than CRM.
7 Concluding remarks
In this paper, we have introduced a new circumcenter iteration for solving convex feasibility problems. The new method is called CARM, and it utilizes outer-approximate projections instead of the exact ones taken in the original CRM.
We have drawn our attention to questions on whether similar convergence results known for CRM could be generalized for CARM. We have derived many positive theoretical statements in this regard. For instance, the convergence of CARM was proven, and linear rates were achieved under error bound conditions. In addition to that, we presented numerical experiments in which subgradient approximate projections were employed. This choice of approximate projections is a particular case of the ones covered by CARM. The numerical results show CARM to be much faster in CPU time than the other methods we compared it with, namely, the pure CRM, the classical MAP, and an approximate version of MAP called MAAP.
Availability of data and materials
Not applicable.
References
Pierra, G.: Decomposition through formalization in a product space. Math. Program. 28(1), 96–115 (1984). https://doi.org/10.1007/BF02612715
Kaczmarz, S.: Angenäherte Auflösung von Systemen linearer Gleichungen. Bull. Int. Acad. Pol. Sci. Lett. Class. Sci. Math. Nat. A 35, 355–357 (1937)
Bauschke, H.H., Borwein, J.M.: On projection algorithms for solving convex feasibility problems. SIAM Rev. 38(3), 367–426 (1996). https://doi.org/10.1137/S0036144593251710
Douglas, J., Rachford, H.H. Jr.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–421 (1956). https://doi.org/10.1090/S0002-9947-1956-0084194-4
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979). https://doi.org/10.1137/0716071
Cimmino, G.: Calcolo approssimato per le soluzioni dei sistemi di equazioni lineari. Ric. Sci. 9(II), 326–333 (1938)
Behling, R., Bello-Cruz, J.-Y., Santos, L.-R.: Circumcentering the Douglas–Rachford method. Numer. Algorithms 78(3), 759–776 (2018). https://doi.org/10.1007/s11075-017-0399-5
Behling, R., Bello-Cruz, J.-Y., Santos, L.-R.: On the linear convergence of the circumcentered-reflection method. Oper. Res. Lett. 46(2), 159–162 (2018). https://doi.org/10.1016/j.orl.2017.11.018
Arefidamghani, R., Behling, R., Bello-Cruz, J.-Y., Iusem, A.N., Santos, L.-R.: The circumcentered-reflection method achieves better rates than alternating projections. Comput. Optim. Appl. 79(2), 507–530 (2021). https://doi.org/10.1007/s10589-021-00275-6
Bauschke, H.H., Ouyang, H., Wang, X.: On circumcenters of finite sets in Hilbert spaces. Linear Nonlinear Anal. 4(2), 271–295 (2018)
Bauschke, H.H., Ouyang, H., Wang, X.: Best approximation mappings in Hilbert spaces. Math. Program. (2021). https://doi.org/10.1007/s10107-021-01718-y
Bauschke, H.H., Ouyang, H., Wang, X.: Circumcentered methods induced by isometries. Vietnam J. Math. 48, 471–508 (2020). https://doi.org/10.1007/s10013-020-00417-z
Bauschke, H.H., Ouyang, H., Wang, X.: On circumcenter mappings induced by nonexpansive operators. Pure Appl. Funct. Anal. 6(2), 257–288 (2021)
Bauschke, H.H., Ouyang, H., Wang, X.: On the linear convergence of circumcentered isometry methods. Numer. Algorithms 87, 263–297 (2021). https://doi.org/10.1007/s11075-020-00966-x
Behling, R., Bello-Cruz, Y., Santos, L.-R.: On the circumcentered-reflection method for the convex feasibility problem. Numer. Algorithms 86, 1475–1494 (2021). https://doi.org/10.1007/s11075-020-00941-6
Dizon, N., Hogan, J., Lindstrom, S.B.: Circumcentering Reflection Methods for Nonconvex Feasibility Problems (2019). 1910.04384
Dizon, N., Hogan, J., Lindstrom, S.B.: Centering Projection Methods for Wavelet Feasibility Problems (2020). 2005.05687
Ouyang, H.: Finite convergence of locally proper circumcentered methods (2020). 2011.13512 [math]
Fukushima, M.: An outer approximation algorithm for solving general convex programs. Oper. Res. 31(1), 101–113 (1983). https://doi.org/10.1287/opre.31.1.101
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables, 1st edn. Classics in Applied Mathematics. SIAM, Philadelphia (2000)
Cegielski, A.: Iterative Methods for Fixed Point Problems in Hilbert Spaces. Lecture Notes in Mathematics, vol. 2057. Springer, New York (2012)
Cegielski, A.: Generalized relaxations of nonexpansive operators and convex feasibility problems. In: Leizarowitz, A., Mordukhovich, B.S., Shafrir, I., Zaslavski, A.J. (eds.) Nonlinear Analysis and Optimization: A Conference in Celebration of Alex Ioffe’s 70th and Simeon Reich’s 60th Birthdays, Haïfa, Israël, June 18–24, 2008. Contemporary Mathematics, pp. 111–123. Am. Math. Soc., Providence (2010)
Cheney, W., Goldstein, A.A.: Proximity maps for convex sets. Proc. Am. Math. Soc. 10(3), 448–450 (1959). https://doi.org/10.2307/2032864
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. CMS Books in Mathematics. Springer, Berlin (2017). https://doi.org/10.1007/978-3-319-48311-5
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, 2nd edn. Grundlehren Der Mathematischen Wissenschaften, vol. 317. Springer, Berlin (2004)
Robinson, S.M.: Generalized equations and their solutions, part II: applications to nonlinear programming. In: Guignard, M. (ed.) Optimality and Stability in Mathematical Programming. Mathematical Programming Studies, pp. 200–221. Springer, Berlin (1982). https://doi.org/10.1007/BFb0120989
Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bur. Stand. 49(4), 263–265 (1952)
Behling, R., Iusem, A.: The effect of calmness on the solution set of systems of nonlinear equations. Math. Program. 137(1), 155–165 (2013). https://doi.org/10.1007/s10107-011-0486-7
Kanzow, C., Yamashita, N., Fukushima, M.: Levenberg–Marquardt methods with strong local convergence properties for solving nonlinear equations with convex constraints. J. Comput. Appl. Math. 172(2), 375–397 (2004). https://doi.org/10.1016/j.cam.2004.02.013
Bauschke, H.H., Borwein, J.M.: On the convergence of von Neumann’s alternating projection algorithm for two sets. Set-Valued Anal. 1(2), 185–212 (1993). https://doi.org/10.1007/BF01027691
Bauschke, H.H.: Projection Algorithms and Monotone Operators. PhD thesis, Simon Fraser University, Burnaby (August 1996)
Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Transversality and alternating projections for nonconvex sets. Found. Comput. Math. 15(6), 1637–1651 (2015). https://doi.org/10.1007/s10208-015-9279-3
Kruger, A.Y.: About intrinsic transversality of pairs of sets. Set-Valued Var. Anal. 26(1), 111–142 (2018). https://doi.org/10.1007/s11228-017-0446-3
Lin, A., Han, S.-P.: A class of methods for projection on the intersection of several ellipsoids. SIAM J. Optim. 15(1), 129–138 (2004). https://doi.org/10.1137/S1052623403422297
Jia, Z., Cai, X., Han, D.: Comparison of several fast algorithms for projection onto an ellipsoid. J. Comput. Appl. Math. 319, 320–337 (2017). https://doi.org/10.1016/j.cam.2017.01.008
Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: a fresh approach to numerical computing. SIAM Rev. 59(1), 65–98 (2017). https://doi.org/10.1137/141000671
Birgin, E.G., Martínez, J.M.: Practical Augmented Lagrangian Methods for Constrained Optimization, 1st edn. SIAM, Philadelphia (2014). https://doi.org/10.1137/1.9781611973365
Siqueira, A.S., Orban, D.: NLPModels.Jl. Zenodo (2019). https://doi.org/10.5281/ZENODO.2558627
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002). https://doi.org/10.1007/s101070100263
Acknowledgements
The authors would like to thank the two anonymous referees for their valuable suggestions which improved this manuscript.
Funding
RB was partially supported by the Brazilian Agency Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Grants 304392/2018-9 and 429915/2018-7; YBC was partially supported by the National Science Foundation (NSF), Grant DMS – 1816449.
Author information
Authors and Affiliations
Contributions
All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Araújo, G.H.M., Arefidamghani, R., Behling, R. et al. Circumcentering approximate reflections for solving the convex feasibility problem. Fixed Point Theory Algorithms Sci Eng 2022, 1 (2022). https://doi.org/10.1186/s13663-021-00711-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13663-021-00711-6