A primal-dual fixed point algorithm for minimization of the sum of three convex separable functions

Chen, Peijun; Huang, Jianguo; Zhang, Xiaoqun

doi:10.1186/s13663-016-0543-2

Research
Open access
Published: 26 April 2016

A primal-dual fixed point algorithm for minimization of the sum of three convex separable functions

Peijun Chen^1,2,3,
Jianguo Huang¹ &
Xiaoqun Zhang^1,4

Fixed Point Theory and Applications volume 2016, Article number: 54 (2016) Cite this article

5044 Accesses
19 Citations
Metrics details

Abstract

Many problems arising in image processing and signal recovery with multi-regularization and constraints can be formulated as minimization of a sum of three convex separable functions. Typically, the objective function involves a smooth function with Lipschitz continuous gradient, a linear composite nonsmooth function, and a nonsmooth function. In this paper, we propose a primal-dual fixed point (PDFP) scheme to solve the above class of problems. The proposed algorithm for three-block problems is a symmetric and fully splitting scheme, only involving an explicit gradient, a linear transform, and the proximity operators which may have a closed-form solution. We study the convergence of the proposed algorithm and illustrate its efficiency through examples on fused LASSO and image restoration with non-negative constraint and sparse regularization.

1 Introduction

In this paper, we aim to design a primal-dual fixed point algorithmic framework for solving the following minimization problem:

$$ \min_{x\in\mathbb{R}^{n}} {f_{1}}(x)+({f_{2}}\circ B) (x)+{f_{3}}(x), $$

(1.1)

where ${f_{1}}$, ${f_{2}}$, and ${f_{3}}$ are three proper lower semi-continuous convex functions, and ${f_{1}}$ is differentiable on ${\mathbb{R}^{n}}$ with a $1/\beta$-Lipschitz continuous gradient for some $\beta\in (0,+\infty]$, while $B:{\mathbb{R}^{n}}\rightarrow\mathbb{R}^{m}$ is a linear transformation. This formulation covers a wide application in image processing and signal recovery with multi-regularization terms and constraints. For instance, in many imaging and data processing applications, the functional $f_{1}$ corresponds to a data-fidelity term, and the last two terms are used for regularization. As a direct example of (1.1), we can consider the fused LASSO penalized problem [1, 2] defined by

$$ \min_{x\in{\mathbb{R}^{n}}} \frac{1}{2}\|Ax-a\|^{2}+ \mu_{1} \|Bx\|_{1}+\mu_{2} \|x\|_{1}. $$

On the other hand, in the imaging science, total variation regularization with B being the discrete gradient operator together with $\ell_{1}$ regularization has been adopted in some image restoration applications, for example in [3]. Another useful application corresponds to $f_{3}=\chi_{C}$, where $\chi _{C}$ is the indicator function of a nonempty closed convex set C. In this case, the problem reduces to

$$ \min_{x\in C} {f_{1}}(x)+({f_{2}}\circ B) (x). $$

(1.2)

As far as we know, Combettes and Pesquet first proposed a fully splitting algorithm in [4] to solve monotone operator inclusions problems, which include (1.1) as a special case. Condat [5] tackled the same problem and proposed a primal-dual splitting scheme. Extensions to multi-block composite functions are also discussed in detail. For the special case $B=I$ (I denotes the usual identity operator), Davis and Yin [6] proposed a three-operator splitting scheme based on monotone operators. For the case that the problem (1.1) reduces to two-block separable functions, many splitting and proximal algorithms have been proposed and studied in the literature. Among them, extensive research have been conducted on the alternating direction of multiplier method (ADMM) [7] (also known as split Bregman [3]; see for example [8] and the references therein). The primal-dual hybrid gradient method (PDHG) [9–12], also known as the Chambolle-Pock algorithm [11], is another class of popular algorithm, largely adopted in imaging applications. In [13–16], several completely decoupled schemes, such as the inexact Uzawa solver and primal-dual fixed point algorithm, are proposed to avoid subproblem solving for some typical $\ell_{1}$ minimization problems. Komodakis and Pesquet [17] recently gave a nice overview of recent primal-dual approaches for solving large-scale optimization problems (1.1). A general class of multi-step fixed point proximity algorithms is proposed in [18], which covers several existing algorithms [11, 12] as special cases. In the preparation of this paper, we notice that Li and Zhang [19] also studied the problem (1.1) and introduced a quasi-Newton and the overrelaxation strategies for accelerating the algorithms. Both algorithms can be viewed as a generalization of Condat’s algorithm [5]. The theoretical analysis is established based on the multi-step techniques present in [18].

In the following, we mainly review some most relevant work for a concise presentation. Problem (1.2) has been studied in [20] in the context of maximum a posterior ECT reconstruction, and a preconditioned alternating projection algorithm (PAPA) is proposed for solving the resulting regularization problem. For $f_{3}=0$ in (1.1), we proposed the primal-dual fixed point algorithm $\mathrm{PDFP}^{2}\mathrm{O}$ (primal-dual fixed point algorithm based on proximity operator) in [15]. Based on the fixed point theory, we have shown the convergence of the scheme $\mathrm{PDFP}^{2}\mathrm{O}$ and the convergence rate of the iteration sequence under suitable conditions.

In this work, we aim to extend the ideas of the $\mathrm{PDFP}^{2}\mathrm{O}$ in [15] and the PAPA in [20] for solving (1.1) without subproblem solving and provide a convergence analysis on the primal-dual sequences. The specific algorithm, namely the primal-dual fixed point (PDFP) algorithm, is formulated as follows:

$$ (\mathrm{PDFP})\quad \textstyle\begin{cases} y^{k+1}=\operatorname{prox}_{{\gamma}{{f_{3}}}}(x^{k}-\gamma\nabla {{f_{1}}}(x^{k})-{\lambda} B^{T} v^{k}),\\ v^{k+1}=(I-\operatorname{prox}_{\frac{\gamma}{\lambda }{{f_{2}}}})(By^{k+1}+v^{k}),\\ x^{k+1}=\operatorname{prox}_{{\gamma}{{f_{3}}}}(x^{k}-\gamma\nabla {{f_{1}}}(x^{k})-{\lambda} B^{T} v^{k+1}),\end{cases} $$

(1.3)

where $0<\lambda< 1/\lambda_{\mathrm{max}}(BB^{T})$, $0<\gamma<2\beta$. Here $\operatorname{prox}_{f}$ is the proximity operator [21] of a function f; see (2.2). When $f_{3}=\chi_{C}$, the proposed algorithm (1.3) is reduced to the PAPA proposed in [20]; see (4.1). For another special case, $f_{3}=0$ in (1.3), we obtain the $\mathrm{PDFP}^{2}\mathrm{O}$ proposed in [15]; see (4.2). The convergence analysis of this PDFP algorithm is built upon fixed point theory on the primal and dual pairs. The overall scheme is completely explicit, which allows for an easy implementation and parallel computing for many large-scale applications. This will be further illustrated through application to the problems arising in statistics learning and image restoration. The PDFP has a symmetric form and it is different from Condat’s algorithm [5]. In addition, we point out that the ranges of the parameters in PDFP are larger than those of [5, 19] and the rules for the parameters in PDFP are well separated, which could be advantageous in practice compared to [5, 19].

The rest of the paper is organized as follows. In Section 2, we will present some preliminaries and notations, and deduce PDFP from the first order optimality condition. In Section 3, we will provide the convergence results and the linear convergence rate results for some special cases. In Section 4, we will make a comparison on the form of the PDFP algorithm (1.3) with some existing algorithms. In Section 5, we will show the numerical performance and the efficiency of PDFP through some examples on fused LASSO and pMRI (parallel magnetic resonance image) reconstruction.

2 Primal-dual fixed point algorithm

2.1 Preliminaries and notations

For the self completeness of this work, we list some relevant notations, definitions, assumption and lemmas in convex analysis. We refer the reader to [15, 22] and the references therein for more details.

For the ease of presentation, we restrict our discussion to Euclidean space $\mathbb{R}^{n}$, equipped with the usual inner product $\langle \cdot,\cdot\rangle$ and norm $\|\cdot\|=\langle\cdot,\cdot\rangle^{1/2}$. We first assume that the problem (1.1) has at least one solution and ${f_{2}}$, ${f_{3}}$, B satisfy

$$ 0\in\operatorname{ri}\bigl(\operatorname{dom}_{{f_{2}}}- B( \operatorname{dom}_{{f_{3}}})\bigr), $$

(2.1)

where the symbol $\operatorname{ri}(\cdot)$ denotes the interior of a convex subset, and the effective domain of f is defined as $\operatorname{dom}_{f} = \{x\in\mathbb{R}^{n}\mid f(x) < +\infty\}$.

The $\ell_{1}$ norm of a vector $x\in\mathbb{R}^{n}$ is denoted by $\| \cdot\|_{1}$ and the spectral norm of a matrix is denoted by $\|\cdot\| _{2}$. Let $\Gamma_{0}(\mathbb{R}^{n})$ be the collection of all proper lower semi-continuous convex functions from $\mathbb{R}^{n}$ to $(-\infty,+\infty]$. For a function $f\in\Gamma_{0}(\mathbb{R}^{n})$, the proximity operator of f: $\operatorname{prox}_{f}$ [21] is defined by

$$ \operatorname{prox}_{f}(x)= \mathop{\arg \min}_{y\in\mathbb{R}^{n}} f (y)+ \frac{1}{2}\|x-y\|^{2}. $$

(2.2)

For a nonempty closed convex set $C\subset\mathbb{R}^{n}$, let $\chi _{C}$ be the indicator function of C, defined by

$$ \chi_{C}(x)= \textstyle\begin{cases} 0, &x\in C, \\ +\infty,&x\notin C. \end{cases} $$

Let $\operatorname{proj}_{C}$ be the projection operator onto C, i.e.

$$ \operatorname{proj}_{C}(x)= \mathop{\arg \min}_{y\in C} \|x-y \|^{2}. $$

It is easy to see that $\operatorname{prox}_{\gamma\chi_{C}}= \operatorname{proj}_{C}$ for all $\gamma>0$, and the proximity operator is a generalization of projection operator. Note that many efficient splitting algorithms rely on the fact that $\operatorname{prox}_{f}$ has a closed-form solution. For example, when $f=\gamma\|\cdot\|_{1}$, the proximity solution is given by element-wise soft-shrinking. We refer the reader to [22] for more details as regards proximity operators. Let ∂f be the subdifferential of f, i.e.

$$ \partial f(x)=\bigl\{ v\in\mathbb{R}^{n} \mid \langle y-x,v\rangle\leq f(y)-f(x) \mbox{ for all } y\in\mathbb{R}^{n}\bigr\} , $$

(2.3)

and ${f^{*}}$ be the convex conjugate function of f, defined by

$$ f^{*}(x)=\sup_{y \in\mathbb{R}^{n}}\langle x,y\rangle-f(y). $$

An operator $T:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$ is nonexpansive if

$$ \|Tx-Ty\|\leq\|x-y\| \quad \mbox{for all } x,y \in\mathbb{R}^{n}, $$

and T is firmly nonexpansive if

$$ \|Tx-Ty\|^{2}\leq\langle Tx-Ty,x- {y}\rangle\quad \mbox{for all } x,y \in\mathbb{R}^{n}. $$

It is obvious that a firmly nonexpansive operator is nonexpansive. An operator T is δ-strongly monotone if there exists a positive real number δ such that

$$ \langle Tx-Ty,x- {y}\rangle\geq\delta\|x-y\|^{2} \quad \mbox{for all } x,y \in\mathbb{R}^{n}. $$

(2.4)

Lemma 2.1

For any two functions ${{f_{2}}} \in\Gamma_{0} (\mathbb{R}^{m})$ and ${{f_{3}}}\in\Gamma_{0} (\mathbb{R}^{n})$, and a linear transformation $B:{\mathbb{R}^{n}}\rightarrow\mathbb{R}^{m}$, satisfying that $0\in\operatorname{ri}(\operatorname{dom}_{{f_{2}}}- B(\operatorname{dom}_{{f_{3}}}))$, we have

$$\partial({f_{2}}\circ B+{f_{3}})=B^{T}\circ \partial{{f_{2}}}\circ B+\partial{f_{3}}. $$

Lemma 2.2

Let ${f}\in\Gamma_{0}(\mathbb{R}^{n})$. Then $\operatorname{prox}_{f}$ and $I-\operatorname{prox}_{f}$ are firmly nonexpansive. In addition,

$$\begin{aligned}& x=\operatorname{prox}_{{f}}(y) \quad \Leftrightarrow \quad y-x\in \partial {{f}}(x) \quad \textit{for a given } y\in\mathbb{R}^{n}, \end{aligned}$$

(2.5)

$$\begin{aligned}& y\in\partial{{f}}(x) \quad \Leftrightarrow \quad x=\operatorname {prox}_{{f}}(x+y) \\& \hphantom{y\in\partial{{f}}(x)}\quad \Leftrightarrow\quad y=(I-\operatorname{prox} _{{f}}) (x+y)\quad \textit{for } x,y\in\mathbb{R}^{n}, \end{aligned}$$

(2.6)

$$\begin{aligned}& x=\operatorname{prox}_{\gamma{f}}(x)+ \gamma\operatorname {prox}_{\frac{1}{\gamma }{f^{*}}}\biggl({\frac{1}{\gamma} x}\biggr) \quad \textit{for all } x\in\mathbb{R}^{n} \textit{ and }\gamma>0. \end{aligned}$$

(2.7)

If f has $1/\beta$-Lipschitz continuous gradient further, we have

$$ \beta\bigl\Vert \nabla f(x)-\nabla f(y)\bigr\Vert ^{2}\leq\bigl\langle \nabla f(x)-\nabla f(y), x-y\bigr\rangle \quad \textit{for all } x, y \in\mathbb{R}^{n}. $$

(2.8)

Lemma 2.3

Let T be an operator and $u^{*}$ be a fixed point of T. Let $\{ u^{k+1}\}$ be the sequence generated by the fixed point iteration $u^{k+1}=T(u^{k})$. Suppose (i) T is continuous, (ii) $\{\|u^{k}-u^{*}\|\} $ is non-increasing, (iii) $\lim_{k\to+\infty} \|u^{k+1}-u^{k}\|=0$. Then the sequence $\{u^{k}\}$ is bounded and converges to a fixed point of T.

The proof of Lemma 2.3 is standard, and we refer the reader to the proof of Theorem 3.5 in [15] for more details.

Let γ and λ be two positive numbers. To simplify the presentation, we use the following notations:

$$\begin{aligned}& T_{0}(v,x)=\operatorname{prox}_{{\gamma}{{f_{3}}}}\bigl(x-\gamma\nabla {{f_{1}}}(x)-{\lambda } B^{T} v\bigr), \end{aligned}$$

(2.9)

$$\begin{aligned}& T_{1}(v,x)=(I-\operatorname{prox}_{\frac{\gamma}{\lambda }{{f_{2}}}}) \bigl(B\circ T_{0}(v,x)+v\bigr), \end{aligned}$$

(2.10)

$$\begin{aligned}& T_{2}(v,x)=\operatorname{prox}_{{\gamma}{{f_{3}}}}\bigl(x-\gamma\nabla {{f_{1}}}(x)-{\lambda } B^{T} \circ T_{1}(v,x) \bigr), \end{aligned}$$

(2.11)

$$\begin{aligned}& T(v,x)= \bigl( {T_{1}}(v,x), {T_{2}(v,x)} \bigr). \end{aligned}$$

(2.12)

Denote

$$\begin{aligned}& g(x)=x-\gamma\nabla f_{1}(x) \quad \mbox{ for all } x\in \mathbb{R}^{n}, \end{aligned}$$

(2.13)

$$\begin{aligned}& M=I-\lambda BB^{T}. \end{aligned}$$

(2.14)

Let $\lambda_{\mathrm{max}}(A)$ denote the largest eigenvalue of a square matrix A. When $0<\lambda< 1/\lambda_{\mathrm{max}}(BB^{T})$, M is a symmetric and positive definite matrix, so we can define a norm

$$ \|v\|_{M}=\sqrt{\langle{v},Mv\rangle} \quad \mbox{for all } v\in \mathbb{R}^{m}. $$

(2.15)

For a pair ${u}= (v,x )\in\mathbb{R}^{m}\times\mathbb {R}^{n}$, we also define a norm on the product space $\mathbb{R}^{m}\times\mathbb{R}^{n}$ as

$$ \|u\|_{\lambda}=\sqrt{\lambda\|v\|^{2}+\|x\|^{2}}. $$

(2.16)

2.2 Derivation of PDFP

On extending the ideas of the PAPA proposed in [20] and the $\mathrm{PDFP}^{2}\mathrm{O}$ proposed in [15], we derive the primal-dual fixed point algorithm (1.3) for solving the minimization problem (1.1).

Under the assumption (2.1), by using the first order optimality condition of (1.1) and Lemma 2.1, we have

$$ 0\in\gamma\nabla{{f_{1}}}\bigl(x^{*}\bigr)+{\lambda}B^{T} \partial\biggl(\frac{\gamma }{\lambda} {{f_{2}}}\biggr) \bigl(Bx^{*}\bigr)+ \gamma\partial{f_{3}}\bigl(x^{*}\bigr),$$

where $x^{*}$ is an optimal solution. Let

$$ v^{*}\in\partial\biggl(\frac{\gamma}{\lambda} {{f_{2}}}\biggr) \bigl(Bx^{*} \bigr). $$

(2.17)

By applying (2.6), we have

$$\begin{aligned}& v^{*}= (I-\operatorname{prox}_{\frac{\gamma}{\lambda} {{f_{2}}}}) \bigl(Bx^{*}+v^{*}\bigr), \end{aligned}$$

(2.18)

$$\begin{aligned}& x^{*}=\operatorname{prox}_{ {\gamma} {{f_{3}}}}\bigl(x^{*}-\gamma\nabla {{f_{1}}}\bigl(x^{*}\bigr)-\lambda B^{T}v^{*}\bigr). \end{aligned}$$

(2.19)

By inserting (2.19) into (2.18), we get

$$v^{*}=(I- \operatorname{prox}_{\frac{\gamma}{\lambda}{{f_{2}}}}) \bigl(B\circ \operatorname{prox}_{{\gamma} {{f_{3}}}} \bigl(x^{*}-\gamma\nabla{{f_{1}}}\bigl(x^{*}\bigr)-\lambda B^{T}v^{*}\bigr)+v^{*}\bigr), $$

or equivalently, $v^{*}=T_{1}(v^{*},x^{*})$. Next, replacing $v^{*}$ in (2.19) by $T_{1}(v^{*},x^{*})$, we can get $x^{*}=T_{2}(v^{*},x^{*})$. In other words $u^{*}=T(u^{*})$ for ${u^{*}}= (v^{*},x^{*} )$. Meanwhile, if $u^{*}=T(u^{*})$, we can see that $x^{*}$ meets the first order optimality condition of (1.1) and thus $x^{*}$ is a minimizer of (1.1).

To sum up, we have the following theorem.

Theorem 2.1

Suppose that $x^{*}$ is a solution of (1.1) and $v^{*}\in\mathbb{R}^{m}$ is defined as (2.17). Then we have

$$ \textstyle\begin{cases} v^{*}=T_{1}(v^{*},x^{*}), \\ x^{*}=T_{2}(v^{*},x^{*}), \end{cases} $$

i.e. ${u^{*}}= ({v^{*}},{x^{*}} )$ is a fixed point of T. Conversely, if ${u^{*}}= ({v^{*}},{x^{*}} )\in\mathbb {R}^{m}\times\mathbb{R}^{n}$ is a fixed point of T, then $x^{*}$ is a solution of (1.1).

It is easy to confirm that the sequence $\{(v^{k+1}, x^{k+1})\}$ generated by the PDFP algorithm (1.3) is the Picard iteration $(v^{k+1},x^{k+1})={T}(v^{k},x^{k})$. So we will use the operator T to analyze the convergence of the PDFP in Section 3.

3 Convergence analysis

In the following, let $\{y^{k+1}\}$ and $\{u^{k+1}=(v^{k+1},x^{k+1})\}$ be the sequences generated by the PDFP algorithm (1.3), i.e. $y^{k+1}={T}_{0}(v^{k},x^{k})$ and $(v^{k+1},x^{k+1})={T}(v^{k},x^{k})$. Let ${u^{*}}= ({v^{*}},{x^{*}} )$ be a fixed point of the operator T.

3.1 Convergence

Lemma 3.1

We have the following estimates:

$$\begin{aligned}& \bigl\Vert v^{k+1}-v^{*}\bigr\Vert ^{2} \leq\bigl\Vert v^{k}-v^{*}\bigr\Vert ^{2}-\bigl\Vert v^{k+1}-v^{k} \bigr\Vert ^{2}+2 \bigl\langle B^{T}\bigl(v^{k+1}-v^{*} \bigr),y^{k+1}-x^{*}\bigr\rangle , \end{aligned}$$

(3.1)

$$\begin{aligned}& \bigl\Vert x^{k+1}-x^{*}\bigr\Vert ^{2} \leq\bigl\Vert x^{k}-x^{*}\bigr\Vert ^{2}-\bigl\Vert x^{k+1}-y^{k+1} \bigr\Vert ^{2}-\bigl\Vert x^{k}-y^{k+1}\bigr\Vert ^{2} \\& \hphantom{\bigl\Vert x^{k+1}-x^{*}\bigr\Vert ^{2} \leq{}}{}+2\bigl\langle x^{k+1}-y^{k+1}, \gamma \nabla{f_{1}}\bigl(x^{k}\bigr)+\lambda B^{T}v^{k} \bigr\rangle \\& \hphantom{\bigl\Vert x^{k+1}-x^{*}\bigr\Vert ^{2} \leq{}}{}-2\bigl\langle x^{k+1}-x^{*}, \gamma\nabla{f_{1}} \bigl(x^{k}\bigr)+\lambda B^{T}v^{k+1} \bigr\rangle +2 \gamma\bigl({f_{3}}\bigl(x^{*}\bigr)-{f_{3}} \bigl(y^{k+1}\bigr)\bigr). \end{aligned}$$

(3.2)

Proof

We first prove (3.1). By Lemma 2.2, we know $I-\operatorname{prox}_{\frac {\gamma }{\lambda}{{f_{2}}}}$ is firmly nonexpansive, and using (1.3)₂ and (2.18) we further have

$$ \bigl\Vert v^{k+1}-v^{*}\bigr\Vert ^{2}\leq\bigl\langle v^{k+1}-v^{*},\bigl(B y^{k+1}+v^{k}\bigr)-\bigl(Bx^{*}+v^{*} \bigr)\bigr\rangle , $$

which implies

$$ \bigl\langle v^{k+1}-v^{*},v^{k+1}-v^{k}\bigr\rangle \leq\bigl\langle v^{k+1}-v^{*},B \bigl(y^{k+1}-x^{*}\bigr)\bigr\rangle =\bigl\langle B^{T} \bigl(v^{k+1}-v^{*} \bigr),y^{k+1}-x^{*}\bigr\rangle .$$

Thus

$$\begin{aligned} \bigl\Vert v^{k+1}-v^{*}\bigr\Vert ^{2} =&\bigl\Vert v^{k}-v^{*}\bigr\Vert ^{2}-\bigl\Vert v^{k+1}-v^{k} \bigr\Vert ^{2}+2\bigl\langle v^{k+1}-v^{*}, v^{k+1}-v^{k}\bigr\rangle \\ \leq& \bigl\Vert v^{k}-v^{*}\bigr\Vert ^{2}-\bigl\Vert v^{k+1}-v^{k}\bigr\Vert ^{2}+2 \bigl\langle B^{T}\bigl(v^{k+1}-v^{*}\bigr),y^{k+1}-x^{*}\bigr\rangle . \end{aligned}$$

Next we prove (3.2). By the optimality condition of (1.3)₃ (cf. (2.5)), we have

$$ \bigl(x^{k}-\gamma\nabla{f_{1}}\bigl(x^{k}\bigr)- \lambda B^{T}v^{k+1}\bigr)-x^{k+1} \in\gamma \partial{f_{3}}\bigl(x^{k+1}\bigr). $$

By the property of subdifferentials (cf. (2.3)),

$$ \bigl\langle x^{*}-x^{k+1},\bigl(x^{k}-\gamma \nabla{f_{1}}\bigl(x^{k}\bigr)-\lambda B^{T}v^{k+1} \bigr)-x^{k+1} \bigr\rangle \leq\gamma\bigl({f_{3}} \bigl(x^{*}\bigr)- {f_{3}}\bigl(x^{k+1}\bigr)\bigr), $$

i.e.

$$ \bigl\langle x^{k+1}-x^{*}, x^{k+1}-x^{k}\bigr\rangle \leq-\bigl\langle x^{k+1}-x^{*}, \gamma\nabla{f_{1}} \bigl(x^{k}\bigr)+\lambda B^{T}v^{k+1} \bigr\rangle + \gamma \bigl({f_{3}}\bigl(x^{*}\bigr)- {f_{3}} \bigl(x^{k+1}\bigr)\bigr). $$

Therefore,

$$\begin{aligned} \bigl\Vert x^{k+1}-x^{*}\bigr\Vert ^{2} =&\bigl\Vert x^{k}-x^{*}\bigr\Vert ^{2}-\bigl\Vert x^{k+1}-x^{k} \bigr\Vert ^{2}+2\bigl\langle x^{k+1}-x^{*}, x^{k+1}-x^{k}\bigr\rangle \\ \leq& \bigl\Vert x^{k}-x^{*}\bigr\Vert ^{2}-\bigl\Vert x^{k+1}-x^{k}\bigr\Vert ^{2}-2\bigl\langle x^{k+1}-x^{*}, \gamma \nabla{f_{1}}\bigl(x^{k}\bigr)+ \lambda B^{T}v^{k+1} \bigr\rangle \\ &{}+2\gamma\bigl({f_{3}}\bigl(x^{*}\bigr)- {f_{3}}\bigl(x^{k+1}\bigr)\bigr). \end{aligned}$$

(3.3)

On the other hand, by the optimality condition of (1.3)₁, it follows that

$$ \bigl(x^{k}-\gamma\nabla{f_{1}}\bigl(x^{k}\bigr)- \lambda B^{T}v^{k}\bigr)-y^{k+1} \in\gamma \partial{f_{3}}\bigl(y^{k+1}\bigr). $$

Thanks to the property of subdifferentials, we have

$$ \bigl\langle x^{k+1}-y^{k+1},\bigl(x^{k}-\gamma \nabla{f_{1}}\bigl(x^{k}\bigr)-\lambda B^{T}v^{k} \bigr)-y^{k+1} \bigr\rangle \leq\gamma\bigl({f_{3}} \bigl(x^{k+1}\bigr)-{f_{3}}\bigl(y^{k+1}\bigr)\bigr). $$

So

$$ \bigl\langle x^{k+1}-y^{k+1}, x^{k}-y^{k+1} \bigr\rangle \leq\bigl\langle x^{k+1}-y^{k+1}, \gamma \nabla{f_{1}}\bigl(x^{k}\bigr)+\lambda B^{T}v^{k} \bigr\rangle +\gamma\bigl({f_{3}}\bigl(x^{k+1} \bigr)-{f_{3}}\bigl(y^{k+1}\bigr)\bigr). $$

Thus

$$\begin{aligned} -\bigl\Vert x^{k+1}-x^{k}\bigr\Vert ^{2} =&- \bigl\Vert x^{k+1}-y^{k+1}\bigr\Vert ^{2}-\bigl\Vert x^{k}-y^{k+1}\bigr\Vert ^{2}+2\bigl\langle x^{k+1}-y^{k+1}, x^{k}-y^{k+1}\bigr\rangle \\ \leq& -\bigl\Vert x^{k+1}-y^{k+1}\bigr\Vert ^{2}- \bigl\Vert x^{k}-y^{k+1}\bigr\Vert ^{2}+2\bigl\langle x^{k+1}-y^{k+1}, \gamma\nabla{f_{1}} \bigl(x^{k}\bigr)+\lambda B^{T}v^{k} \bigr\rangle \\ &{}+2\gamma\bigl({f_{3}}\bigl(x^{k+1}\bigr)-{f_{3}} \bigl(y^{k+1}\bigr)\bigr). \end{aligned}$$

Replacing the term $-\|x^{k+1}-x^{k}\|^{2}$ in (3.3) with the right side term of the above inequality, we immediately obtain (3.2). □

Lemma 3.2

We have

$$\begin{aligned} \bigl\Vert u^{k+1} -u^{*}\bigr\Vert _{\lambda}^{2} \leq& \bigl\Vert u^{k}-u^{*}\bigr\Vert _{\lambda}^{2}- \lambda\bigl\Vert v^{k+1}-v^{k}\bigr\Vert ^{2}_{M}-\bigl\Vert x^{k+1}-y^{k+1}+ \lambda B^{T}\bigl(v^{k+1}-v^{k}\bigr)\bigr\Vert ^{2} \\ &{} -\bigl\Vert \bigl(x^{k}-y^{k+1}\bigr)-\bigl(\gamma \nabla{f_{1}}\bigl(x^{k}\bigr)-\gamma\nabla{f_{1}} \bigl(x^{*}\bigr)\bigr)\bigr\Vert ^{2} \\ &{}-\gamma(2\beta- {\gamma} )\bigl\Vert \nabla{f_{1}}\bigl(x^{k}\bigr)-\nabla{f_{1}} \bigl(x^{*}\bigr)\bigr\Vert ^{2}. \end{aligned}$$

(3.4)

Proof

Summing the two inequalities (3.1) and (3.2) and re-arranging the terms, we have

$$\begin{aligned}& \lambda\bigl\Vert v^{k+1}-v^{*}\bigr\Vert ^{2}+\bigl\Vert x^{k+1}-x^{*}\bigr\Vert ^{2} \\& \quad \leq \lambda\bigl\Vert v^{k}-v^{*}\bigr\Vert ^{2}+ \bigl\Vert x^{k}-x^{*}\bigr\Vert ^{2}-\lambda\bigl\Vert v^{k+1}-v^{k}\bigr\Vert ^{2}-\bigl\Vert x^{k+1}-y^{k+1}\bigr\Vert ^{2}-\bigl\Vert x^{k}-y^{k+1}\bigr\Vert ^{2} \\& \qquad {} +2 \bigl\langle \lambda B^{T} \bigl(v^{k+1}-v^{*} \bigr),y^{k+1}-x^{*}\bigr\rangle +2\bigl\langle x^{k+1}-y^{k+1}, \gamma\nabla{f_{1}}\bigl(x^{k}\bigr)+\lambda B^{T}v^{k} \bigr\rangle \\& \qquad {} -2\bigl\langle x^{k+1}-x^{*}, \gamma\nabla{f_{1}} \bigl(x^{k}\bigr)+\lambda B^{T}v^{k+1}\bigr\rangle +2 \gamma\bigl({f_{3}}\bigl(x^{*}\bigr)-{f_{3}} \bigl(y^{k+1}\bigr)\bigr) \\& \quad = \lambda\bigl\Vert v^{k}-v^{*}\bigr\Vert ^{2}+\bigl\Vert x^{k}-x^{*}\bigr\Vert ^{2}-\lambda\bigl\Vert v^{k+1}-v^{k}\bigr\Vert ^{2}-\bigl\Vert x^{k+1}-y^{k+1}\bigr\Vert ^{2}-\bigl\Vert x^{k}-y^{k+1}\bigr\Vert ^{2} \\& \qquad {} +2 \bigl\langle \lambda B^{T} \bigl(v^{k+1}-v^{k} \bigr),y^{k+1}-x^{k+1}\bigr\rangle +2\bigl\langle x^{k} -y^{k+1},\gamma\nabla{f_{1}}\bigl(x^{k}\bigr)-\gamma \nabla{f_{1}}\bigl(x^{*}\bigr)\bigr\rangle \\& \qquad {} -2\bigl\langle x^{k}-x^{*},\gamma\nabla{f_{1}} \bigl(x^{k}\bigr)-\gamma\nabla {f_{1}}\bigl(x^{*}\bigr)\bigr\rangle \\& \qquad {} +2\bigl(\bigl\langle y^{k+1}-x^{*},-\gamma\nabla{f_{1}} \bigl(x^{*}\bigr)-\lambda B^{T} v^{*} \bigr\rangle +\gamma \bigl({f_{3}}\bigl(x^{*}\bigr)-{f_{3}}\bigl(y^{k+1} \bigr)\bigr)\bigr) \\& \quad = \lambda\bigl\Vert v^{k}-v^{*}\bigr\Vert ^{2}+\bigl\Vert x^{k}-x^{*}\bigr\Vert ^{2}-\lambda\bigl\Vert v^{k+1}-v^{k}\bigr\Vert _{M}^{2}-\bigl\Vert x^{k+1}-y^{k+1}+\lambda B^{T} \bigl(v^{k+1}-v^{k}\bigr)\bigr\Vert ^{2} \\& \qquad {} -\bigl\Vert \bigl(x^{k}-y^{k+1}\bigr)-\bigl(\gamma \nabla{f_{1}}\bigl(x^{k}\bigr)-\gamma\nabla{f_{1}} \bigl(x^{*}\bigr)\bigr)\bigr\Vert ^{2}+\bigl\Vert \gamma \nabla{f_{1}}\bigl(x^{k}\bigr)-\gamma\nabla{f_{1}} \bigl(x^{*}\bigr)\bigr\Vert ^{2} \\& \qquad {} -2\bigl\langle x^{k}-x^{*},\gamma\nabla{f_{1}} \bigl(x^{k}\bigr)-\gamma\nabla {f_{1}}\bigl(x^{*}\bigr)\bigr\rangle \\& \qquad {} +2\bigl(\bigl\langle y^{k+1}-x^{*},-\gamma\nabla{f_{1}} \bigl(x^{*}\bigr)-\lambda B^{T} v^{*} \bigr\rangle +\gamma \bigl({f_{3}}\bigl(x^{*}\bigr)-{f_{3}}\bigl(y^{k+1} \bigr)\bigr)\bigr), \end{aligned}$$

(3.5)

where $\|\cdot\|_{M}$ is given in (2.14) and (2.15). Meanwhile, by the optimality condition of (2.19), we have

$$ -\gamma\nabla{f_{1}}\bigl(x^{*}\bigr)-\lambda B^{T}v^{*}\in \gamma\partial{f_{3}}\bigl(x^{*}\bigr), $$

which implies

$$ \bigl\langle y^{k+1}-x^{*},-\gamma\nabla{f_{1}}\bigl(x^{*}\bigr)- \lambda B^{T}v^{*}\bigr\rangle +\gamma\bigl({f_{3}} \bigl(x^{*}\bigr)- {f_{3}}\bigl(y^{k+1}\bigr)\bigr) \leq0. $$

(3.6)

On the other hand, it follows from (2.8) that

$$ -\bigl\langle x^{k}- x^{*},\nabla{f_{1}}\bigl(x^{k} \bigr)-\nabla{f_{1}}\bigl(x^{*}\bigr)\bigr\rangle \leq -{\beta}\bigl\Vert \nabla{f_{1}}\bigl(x^{k}\bigr)- \nabla{f_{1}} \bigl(x^{*}\bigr)\bigr\Vert ^{2}. $$

(3.7)

Recalling (2.16), we immediately obtain (3.4) in terms of (3.5)-(3.7). □

Lemma 3.3

Let $0<\lambda<1/{\lambda_{\mathrm{max}}(BB^{T})}$ and $0<\gamma<2\beta$. Then the sequence $\{\|u^{k}-u^{*}\|_{\lambda}\}$ is non-increasing and $\lim_{k\to+\infty} \|u^{k+1}-u^{k}\|_{\lambda}=0$.

Proof

If $0<\lambda< 1/{\lambda_{\mathrm{max}}(BB^{T})}$ and $0<\gamma<2\beta$, it follows from (3.4) that $\|u^{k+1}-u^{*}\|_{\lambda}\leq\| u^{k}-u^{*}\|_{\lambda}$, i.e. the sequence $\{\|u^{k}-u^{*}\| _{\lambda}\}$ is non-increasing. Moreover, summing the inequalities (3.4) from $k=0$ to $k=+\infty$, we get

$$\begin{aligned}& \lim_{k\to+\infty} \bigl\| v^{k+1}-v^{k} \bigr\| _{M}=0, \end{aligned}$$

(3.8)

$$\begin{aligned}& \lim_{k\to+\infty} \bigl\| x^{k+1}-y^{k+1}+\lambda B^{T}\bigl(v^{k+1}-v^{k}\bigr)\bigr\| =0, \end{aligned}$$

(3.9)

$$\begin{aligned}& \lim_{k\to+\infty} \bigl\| \bigl(x^{k}-y^{k+1}\bigr)- \bigl(\gamma\nabla{f_{1}}\bigl(x^{k}\bigr)-\gamma \nabla{f_{1}}\bigl(x^{*}\bigr)\bigr)\bigr\| =0, \end{aligned}$$

(3.10)

$$\begin{aligned}& \lim_{k\to+\infty} \bigl\| \nabla{f_{1}}\bigl(x^{k} \bigr)-\nabla{f_{1}}\bigl(x^{*}\bigr)\bigr\| =0. \end{aligned}$$

(3.11)

The combination of (3.10) and (3.11) gives

$$ \lim_{k\to+\infty} \bigl\Vert x^{k}-y^{k+1}\bigr\Vert =0. $$

(3.12)

Noting that $0<\lambda<1/{\lambda_{\mathrm{max}}(BB^{T})}$, we know M is symmetric and positive definite, so (3.8) is equivalent to

$$ \lim_{k\to+\infty} \bigl\Vert v^{k+1}-v^{k}\bigr\Vert =0. $$

(3.13)

Hence, we have from the above inequality and (3.9) that

$$ \lim_{k\to+\infty} \bigl\Vert x^{k+1}-y^{k+1}\bigr\Vert =0. $$

(3.14)

The combination of (3.12) and (3.14) then gives rise to

$$ \lim_{k\to+\infty} \bigl\Vert x^{k+1}-x^{k}\bigr\Vert =0. $$

(3.15)

According to (3.13), (3.15), and (2.16), we have $\lim_{k\to+\infty} \|u^{k+1}-u^{k}\|_{\lambda}=0$. □

As a direct consequence of Lemma 3.3 and Lemma 2.3, we obtain the convergence of the PDFP as follows.

Theorem 3.1

Let $0<\lambda<1/{\lambda_{\mathrm{max}}(BB^{T})}$ and $0<\gamma<2\beta$. Then the sequence $\{u^{k}\}$ is bounded and converges to a fixed point of T, and both $\{x^{k}\}$ and $\{y^{k}\}$ converge to a solution of (1.1).

Proof

By Lemma 2.2, both $\operatorname{prox}_{{\gamma }{f_{3}}}$ and $I-\operatorname{prox}_{\frac{\gamma}{\lambda}{f_{2}}}$ are firmly nonexpansive, thus the operator T defined by (2.9)-(2.12) is continuous. From Lemma 3.3, we know that the sequence $\{\|u^{k}-u^{*}\| _{\lambda}\}$ is non-increasing and $\lim_{k\to+\infty} \| u^{k+1}-u^{k}\|_{\lambda}=0$. By using Lemma 2.3, we know that the sequence $\{ u^{k}\}$ is bounded and converges to a fixed point of T. By using Theorem 2.1 and (3.14), we can conclude that both $\{x^{k}\}$ and $\{y^{k}\} $ converge to a solution of (1.1). □

Remark 3.1

For the special case $f_{3}=0$, the PDFP reduces naturally to the $\mathrm{PDFP}^{2}\mathrm{O}$ (4.2) proposed in [15], where the conditions for the parameters are $0<\lambda\leq1/\lambda _{\mathrm{max}}(BB^{T})$, $0<\gamma<2\beta$. In the proof of Lemma 3.3, we utilize the positive definitiveness of M to obtain (3.13) from (3.8). So the condition for the parameter λ is slightly more restricted as $0<\lambda< 1/\lambda_{\mathrm{max}}(BB^{T})$ in Lemma 3.3 and Theorem 3.1. When $f_{3}=0$, the conditions in the proof of Lemma 3.3 can also be relaxed to $0<\lambda\leq1/\lambda_{\mathrm{max}}(BB^{T})$. As a matter of fact, it is easy to check by the definition of $y^{k+1}$ (see (1.3)₁) and the optimality condition of (2.19) that

$$ \bigl\Vert \bigl(x^{k}-y^{k+1}\bigr)-\bigl( \gamma\nabla{f_{1}}\bigl(x^{k}\bigr)-\gamma \nabla{f_{1}}\bigl(x^{*}\bigr)\bigr)\bigr\Vert ^{2}=\bigl\Vert \lambda B^{T}\bigl(v^{k}-v^{*}\bigr)\bigr\Vert ^{2}. $$

(3.16)

Observing that $\|v^{k+1}-v^{k}\|^{2}=\|v^{k+1}-v^{k}\|_{M}^{2}+\lambda\| B^{T}(v^{k+1}-v^{k})\|^{2}$, we have by (3.16), (3.10) and (3.8) that $\lim_{k\to+\infty} \|v^{k+1}-v^{k}\|=0$. Therefore we can derive the convergence whenever M is semi-positive definite for $f_{3}=0$.

Remark 3.2

For the special case $f_{1}=0$, the problem (1.1) only corresponds to two proper lower semi-continuous convex functions. The convergence condition $0<\gamma<2\beta$ in the PDFP becomes $0<\gamma<+\infty$. Although γ is an arbitrary positive number in theory, the range of γ will affect the convergence speed and it is also a difficult problem to choose a best value in practice.

3.2 Linear convergence rate for special cases

In the following, we will show the convergence rate results with some additional assumptions on the basic problem (1.1). In particular, for $f_{3}=0$, the algorithm reduces to the $\mathrm{PDFP}^{2}\mathrm{O}$ proposed in [15]. The conditions for a linear convergence given there as Condition 3.1 in [15] is as follows: for $0<\lambda\leq1/\lambda_{\mathrm{max}}(BB^{T})$ and $0<\gamma<{2}\beta$, there exist $\eta_{1}, \eta_{2}\in[0,1)$ such that

$$\begin{aligned} &\bigl\Vert I-\lambda BB^{T}\bigr\Vert _{2}\leq\eta_{1}^{2}, \\ &\bigl\Vert g(x)-g(y)\bigr\Vert \le\eta_{2} \Vert x-y\Vert \quad \mbox{for all } x, y\in\mathbb {R}^{n}, \end{aligned}$$

(3.17)

where $g(x)$ is given in (2.13). It is easy to see that a strongly convex function $f_{1}$ satisfies the condition (3.17). For a general $f_{3}$, we need stronger conditions on the functions.

Theorem 3.2

Suppose that (3.17) holds and $f_{2}^{*}$ is strongly convex. Then we have

$$ \bigl\Vert u^{k+1}-u^{*}\bigr\Vert _{(1+\lambda\delta/\gamma)\lambda} \leq\eta\bigl\Vert u^{k}-u^{*}\bigr\Vert _{(1+\lambda\delta/\gamma)\lambda}, $$

where $0<\eta<1$ is the convergence rate (indicated in the proof) and $\delta>0$ is a parameter describing the strongly monotone property of $\partial f_{2}^{*}$ (cf. (2.4)).

Proof

Use Moreau’s identity (cf. (2.7)) to get

$$(I-\operatorname{prox}_{\frac{\gamma}{\lambda}{f_{2}}}) \bigl(By^{k+1}+v^{k} \bigr) =\frac{\gamma}{\lambda}\operatorname{prox}_{\frac{\lambda}{\gamma }{f_{2}^{*}}} \biggl( \frac{\lambda}{\gamma}By^{k+1}+\frac{\lambda}{\gamma}v^{k}\biggr). $$

So (1.3)₂ is equivalent to

$$ \frac{\lambda}{\gamma}v^{k+1} =\operatorname{prox}_{\frac{\lambda}{\gamma}{f_{2}^{*}}} \biggl(\frac{\lambda}{\gamma}By^{k+1}+\frac{\lambda}{\gamma}v^{k} \biggr). $$

(3.18)

According to the optimality condition of (3.18),

$$ \frac{\lambda}{\gamma}By^{k+1}+\frac{\lambda}{\gamma}v^{k}- \frac {\lambda}{\gamma}v^{k+1} \in\frac{\lambda}{\gamma}\partial {f_{2}^{*}}\biggl(\frac{\lambda}{\gamma}v^{k+1}\biggr). $$

(3.19)

Similarly, according to the optimality condition of (2.18),

$$ \frac{\lambda}{\gamma}Bx^{*} \in\frac{\lambda}{\gamma}\partial {f_{2}^{*}}\biggl(\frac{\lambda}{\gamma}v^{*}\biggr). $$

(3.20)

Observing that $\partial f_{2}^{*}$ is δ-strongly monotone, we have by (3.19) and (3.20)

$$ \bigl\langle v^{k+1}-v^{*},\bigl(B y^{k+1}+v^{k}-v^{k+1} \bigr)-Bx^{*}\bigr\rangle \geq\frac {\lambda}{\gamma} \delta\bigl\Vert v^{k+1}-v^{*}\bigr\Vert ^{2}, $$

i.e.

$$ \bigl\langle v^{k+1}-v^{*},v^{k+1}-v^{k}\bigr\rangle \leq\bigl\langle B^{T} \bigl(v^{k+1}-v^{*}\bigr),y^{k+1}-x^{*} \bigr\rangle -\frac{\lambda}{\gamma}\delta\bigl\Vert v^{k+1}-v^{*}\bigr\Vert ^{2}. $$

Thus

$$\begin{aligned} \bigl\Vert v^{k+1}-v^{*}\bigr\Vert ^{2} =& \bigl\Vert v^{k}-v^{*}\bigr\Vert ^{2}-\bigl\Vert v^{k+1}-v^{k}\bigr\Vert ^{2}+2\bigl\langle v^{k+1}-v^{*}, v^{k+1}-v^{k}\bigr\rangle \\ \leq&\bigl\Vert v^{k}-v^{*}\bigr\Vert ^{2}-\bigl\Vert v^{k+1}-v^{k}\bigr\Vert ^{2}+2 \bigl\langle B^{T}\bigl(v^{k+1}-v^{*}\bigr),y^{k+1}-x^{*}\bigr\rangle \\ &{}- \frac{\lambda}{\gamma}\delta\bigl\Vert v^{k+1}-v^{*}\bigr\Vert ^{2}. \end{aligned}$$

(3.21)

Summing the two inequalities (3.21) and (3.2), and then using the same argument for driving (3.5), we arrive at

$$\begin{aligned} \biggl(1+\frac{\lambda}{\gamma} \delta\biggr)\lambda\bigl\Vert v^{k+1}-v^{*}\bigr\Vert ^{2}+ \bigl\Vert x^{k+1}-x^{*}\bigr\Vert ^{2} \leq& \lambda\bigl\Vert v^{k}-v^{*}\bigr\Vert ^{2}+ \bigl\Vert g \bigl(x^{k}\bigr)-g\bigl(x^{*}\bigr)\bigr\Vert ^{2} \\ \leq& \lambda\bigl\Vert v^{k}-v^{*}\bigr\Vert ^{2}+ \eta_{2}^{2}\bigl\Vert x^{k}-x^{*}\bigr\Vert ^{2}, \end{aligned}$$

(3.22)

where we have also used the condition (3.17) and the inequality (3.6).

Let $\eta_{3}=1/\sqrt{1+\lambda\delta/\gamma}$ and $\eta=\max\{ \eta_{2},\eta_{3}\}$. It is clear that $0<\eta<1$. Hence, according to the notation (2.16), the estimate (3.22) can be rewritten as required. □

We note that a linear convergence rate for strongly convex $f_{2}^{*}$ and $f_{3}$ are obtained in [19]. They introduced two preconditioned operators for accelerating the algorithm, while a clear relation between the convergence rate and the preconditioned operators is still missing. Meanwhile, introducing preconditioned operators could be beneficial in practice, and we can also introduce a preconditioned operator to deal with $\nabla f_{1}$ in our scheme. Since the analysis is rather similar to the current one, we will omit it in this paper.

4 Connections to other algorithms

In this section, we present the connections of the PDFP algorithm to some algorithms proposed previously in the literature.

In particular, when $f_{3}=\chi_{C}$, due to $\operatorname{prox}_{\gamma f_{3}}= \operatorname{proj}_{C}$, the proposed algorithm (1.3) is reduced to the PAPA proposed in [20]

$$ (\mathrm{PDFP})\quad \textstyle\begin{cases} y^{k+1}=\operatorname{proj}_{C}(x^{k}-\gamma\nabla{{f_{1}}}(x^{k})-{\lambda} B^{T} v^{k}), \\ v^{k+1}=(I-\operatorname{prox}_{\frac{\gamma}{\lambda }{{f_{2}}}})(By^{k+1}+v^{k}), \\ x^{k+1}=\operatorname{proj}_{C}(x^{k}-\gamma\nabla{{f_{1}}}(x^{k})-{\lambda} B^{T} v^{k+1}), \end{cases} $$

(4.1)

where $0<\lambda< 1/\lambda_{\mathrm{max}}(BB^{T})$, $0<\gamma<2\beta$. We note that the conditions of the parameters for the convergence of the PDFP are larger than those in [20]. Here we still refer to (4.1) as the PDFP, since the PAPA originally proposed in [20] incorporates other techniques such as diagonal preconditioning.

For the special case $f_{3}=0$, due to $\operatorname{prox}_{\gamma f_{3}}=I$, we obtain the $\mathrm{PDFP}^{2}\mathrm{O}$ scheme proposed in [15]

$$ \bigl(\mathrm{PDFP}^{2}\mathrm{O}\bigr)\quad \textstyle\begin{cases} y^{k+1}=x^{k}-\gamma\nabla{{f_{1}}}(x^{k})-{\lambda} B^{T} v^{k}, \\ v^{k+1}=(I-\operatorname{prox}_{\frac{\gamma}{\lambda }{{f_{2}}}})(By^{k+1}+v^{k}), \\ x^{k+1}=x^{k}-\gamma\nabla{{f_{1}}}(x^{k})-{\lambda} B^{T} v^{k+1}, \end{cases} $$

(4.2)

where $0<\lambda\leq1/\lambda_{\mathrm{max}}(BB^{T})$, $0<\gamma<2\beta$. Recently, we notice that the $\mathrm{PDFP}^{2}\mathrm{O}$ reduces to the algorithm previously proposed for $f_{1}(x)=\frac{1}{2}\|Ax-a\|^{2}$ by Loris and Verhoeven in [14]. The convergence and the convergence rate of the objective function were established in [14], but the convergence conditions are slightly more restrictive than the ones given in [15]. On the other hand, we emphasize that the $\mathrm{PDFP}^{2}\mathrm{O}$ algorithm can also be interpreted from the point of view of forward-backward operator splitting, as shown in [15, 16]. Moreover, the multi-block formulation was devised and analyzed in [16].

Based on the $\mathrm{PDFP}^{2}\mathrm{O}$, we also proposed the $\mathrm{PDFP}^{2}\mathrm{O}_{C}$ in [23] for $f_{3}=\chi_{C}$ as

$$ \bigl(\mathrm{PDFP}^{2}\mathrm{O}_{C}\bigr) \quad \textstyle\begin{cases} y^{k+1}=x^{k}-\gamma\nabla{{f_{1}}}(x^{k})-{\lambda} B^{T} v_{1}^{k}-{\lambda }v_{2}^{k}, \\ v_{1}^{k+1}=(I-\operatorname{prox}_{\frac{\gamma}{\lambda }{{f_{2}}}})(By^{k+1}+v_{1}^{k}), \\ v_{2}^{k+1}=(I-\operatorname{proj}_{C})(y^{k+1}+v_{2}^{k}), \\ x^{k+1}=x^{k}-\gamma\nabla{{f_{1}}}(x^{k})-{\lambda} B^{T} v_{1}^{k+1}-{\lambda}v_{2}^{k+1}, \end{cases} $$

(4.3)

where $0<\lambda\leq1/(\lambda_{\mathrm{max}}(BB^{T})+1)$, $0<\gamma<2\beta$. A similar technique of extension to multi-composite functions has also been used in [5, 18, 24]. Compared to the PDFP (4.1), the algorithm $\mathrm{PDFP}^{2}\mathrm{O}_{C}$ introduces an extra variable, while the PDFP requires two times projections. Most importantly, the primal variable at each iterate of the PDFP is feasible, but maybe not for that of the $\mathrm{PDFP}^{2}\mathrm{O}_{C}$. In addition, the permitted ranges of the parameters are also tighter in the $\mathrm{PDFP}^{2}\mathrm{O}_{C}$.

Another interesting special case is $f_{1}=0$. The scheme (1.3) reduces to

$$ (\mathrm{PDFP})\quad \textstyle\begin{cases} y^{k+1}=\operatorname{prox}_{{\gamma}{{f_{3}}}}(x^{k}-{\lambda} B^{T} v^{k}),\\ v^{k+1}=(I-\operatorname{prox}_{\frac{\gamma}{\lambda }{{f_{2}}}})(By^{k+1}+v^{k}),\\ x^{k+1}=\operatorname{prox}_{{\gamma}{{f_{3}}}}(x^{k}-{\lambda} B^{T} v^{k+1}),\end{cases} $$

(4.4)

where $0<\lambda< 1/\lambda_{\mathrm{max}}(BB^{T})$, $0<\gamma<+\infty$. It is easy to see that (4.4) is different from the PDHG method in [10, 11] and (4.4) has a symmetric step (4.4)₁ compared to the extrapolation step in the PDHG method.

Combettes and Pesquet first proposed a fully split algorithm in [4] to solve monotone inclusions with mixtures of composite, Lipschitzian, and parallel-sum type monotone operators, which include (1.1) as a special case. The problem is recast as two-block inclusions and then solved with an error-tolerant primal-dual forward-backward-forward algorithm as studied in [25]. Condat [5] tackled the same problem as given in (1.1) and proposed a primal-dual splitting scheme. For the special case with $f_{1}=0$, Condat’s algorithm reduces to the PDHG method in [11]. By grouping the multi-block as two blocks, the authors in [24] extended the PDHG algorithm [12] to the minimization of sum of multi-composite functions. The authors in [18] proposed a class of multi-step fixed point proximity algorithms, including several existing algorithms as special examples, for example the algorithms in [11, 12]. In [6], Davis and Yin proposed a three-operator splitting method for solving three-block monotone inclusions in a very tricky way. When solving the problem (1.1) with $B=I$, the scheme is different from Condat’s algorithm and PDFP algorithm. But it requires subproblem solving if $B\neq I$. Li and Zhang [19] studied (1.1) based on the techniques present in [18] and including Condat’s algorithm in [5] as a special case, and further introduced quasi-Newton and the overrelaxation strategies to accelerate the algorithms.

In the following, we mainly compare PDFP to the basic Algorithm 3.2 proposed by Condat in [5] to simplify the presentation. We first change the form of the PDFP algorithm (1.3) by using Moreau’s identity, see (2.7), i.e.

$$(I-\operatorname{prox}_{\frac{\gamma}{\lambda}{f_{2}}}) \bigl(By^{k+1}+ {v}^{k} \bigr) =\frac{\gamma}{\lambda}\operatorname{prox}_{\frac{\lambda}{\gamma }{f_{2}^{*}}} \biggl( \frac{\lambda}{\gamma}By^{k+1}+\overline{v}^{k}\biggr), $$

where $\overline{v}^{k}= \frac{\lambda}{\gamma}v^{k}$. A direct comparison is presented in Table 1. From Table 1, we can see that the ranges of the parameters in Condat’s algorithm are relatively smaller than PDFP. Also since the condition for Condat’s algorithm is mixed with all the parameters, it is not always easy to choose them in practice. This is also pointed out in [5]. While the rules for the parameters in PDFP are separate, and they can be chosen independently according to the Lipschitz constant and the operator norm of $BB^{T}$. In this sense, our parameter rules are relatively more practical. In the numerical experiments, we can set λ to be close to $1/\lambda_{\mathrm{max} }(BB^{T})$ and γ to be close to 2β for most of tests. Moreover, the results of $x^{k}-\gamma\nabla{f_{1}}(x^{k})$ and ${\lambda} B^{T} v^{k+1}$ can be stored as two intermediate variables that can be reused in (1.3)₁ and (1.3)₃ during the iterations. Nevertheless, PDFP has an extra step (1.3)₁ compared to Condat’s algorithm and the computation cost may increase due to the computation of $\operatorname{prox} _{{\gamma}{f_{3}}}$. In practice, this step is often related to the $\ell_{1}$ shrinkage or projection operation being easy to implement, so the cost could be still ignorable in practice.

Table 1 The comparison between Condat ( $\pmb{\rho_{k}=1}$ ) and PDFP

Full size table

5 Numerical experiments

In this section, we will apply the PDFP algorithm to solve two problems: the fused LASSO penalized problem and parallel magnetic resonance imaging (pMRI) reconstruction. All the experiments are implemented under MATLAB 7.00 (R14) and conducted on a computer with Intel (R) core (TM) i5-4300U CPU@1.90G.

5.1 The fused LASSO penalized problem

The fused LASSO (least absolute shrinkage and selection operator) penalized problem is proposed for group variable selection, and we refer the reader to [1, 2, 26] for more details for the applications of this model. It can be described as

$$ \min_{x\in{\mathbb{R}^{n}}} \frac{1}{2}\|Ax-a\| ^{2}+ \mu_{1} \sum_{i=1}^{n-1}|x_{i+1}-x_{i}|+ \mu_{2} \|x\|_{1}. $$

Here $A\in\mathbb{R}^{r\times n}$, $a\in\mathbb{R}^{r}$. The row of A: $A_{i}$ for $i=1,2,\ldots, r$ represent the ith observation of the independent variables and $a_{i}$ denotes the response variable, and the vector $x\in\mathbb {R}^{n}$ is the regression coefficient to recover. The first term is corresponding to the data-fidelity term, and the last two terms aim to ensure the sparsity in both x and their successive differences in x. Let

$$B= \begin{pmatrix} -1&1\\ &-1&1\\ &&\ddots&\ddots\\ &&&-1&1 \end{pmatrix}. $$

Then the forgoing problem can be reformulated as

$$ \min_{x\in{\mathbb{R}^{n}}} \frac{1}{2}\|Ax-a\| ^{2}+ \mu_{1} \|Bx\|_{1}+ \mu_{2} \|x \|_{1}. $$

(5.1)

For this example, we can set ${f_{1}}(x)=\frac{1}{2}\|Ax-a\|^{2}$, ${f_{2}} =\mu_{1}\|\cdot\|_{1}$, ${f_{3}} =\mu_{2}\|\cdot\|_{1}$. We want to show that the PDFP algorithm (1.3) can be applied to solve this generic class of problems (5.1) directly and easily.

The following tests are designed for the simulation. We set $r=500$, $n=10\text{,}000$, and the data a is generated as $Ax+\alpha e$, where A and e are random matrices whose elements are normally distributed with zero mean and variance 1, and $\alpha=0.01$, and x is a generated sparse vector, whose nonzero elements are showed in Figure 1 by green ‘+’. We set $\mu_{1}=200$, $\mu _{2}=20$, and the maximum iteration number as $\mathit{Itn}=1\text{,}500$.

We compare the PDFP algorithm with Condat’s algorithm [5]. For the PDFP algorithm, the parameters λ and γ are chosen according to Theorem 3.1. In practice, we set λ to be close to $1/\lambda_{\mathrm{max}}(BB^{T})$ and γ to be close to 2β. Here we set $\lambda=1/4$ as the $n-1$ eigenvalues of $BB^{T}$ can be analytically computed as $2-2 \cos(i\pi /n)$, $i = 1, 2,\ldots, n-1$ and $\gamma=1.99/\lambda_{\mathrm{max}}(A^{T}A)$. For Condat’s algorithm, we set $\lambda= 0.19/4$, $\gamma=1.9/\lambda _{\mathrm{max}}(A^{T}A)$, which is chosen for a relative better numerical performance. The computation time, the attained objective function values, and the relative errors to the true solution are close for Condat’s algorithm and PDFP. From Figure 1, we see that both Condat’s algorithm and PDFP can quite correctly recover the positions of the non-zeros and the values.

5.2 Image restoration with non-negative constraint and sparse regularization

A general image restoration problem with non-negative constraint and sparse regularization can be written as

$$ \min_{x\in{C}} \frac{1}{2}\|Ax-a\|^{2} +\mu\|Bx\| _{1}, $$

(5.2)

where A is some linear operator describing the image formation process, $\|Bx\|_{1}$ is the usual $\ell_{1}$ based regularization in order to promote sparsity under the transform B, $\mu>0$ is the regularization parameter. Here we use the isotropic total variation as the regularization functional, thus the matrix B represents the discrete gradient operator. For this example, we can set $f_{1}(x)=\frac{1}{2}\|Ax-a\|^{2}$, $f_{2}=\mu\|\cdot\|_{1}$, and ${f_{3}}=\chi_{C}$.

We consider pMRI reconstruction, where $A=(A_{1}^{T},A_{2}^{T},\ldots, A_{N}^{T})^{T}$ for each $A_{j}$ is composed of a diagonal downsampling operator D, the Fourier transform F, and a diagonal coil sensitivity mapping $S_{j}$ for receiver j, i.e. $A_{j}=DFS_{j}$ and $S_{j}$ are often estimated in advance. It is well known in the total variation application that $\lambda_{\mathrm{max}}(BB^{T})=8$. The related Lipschitz constant of $\nabla f_{1}$ can be estimated as $\beta =1$. Therefore the two parameters in PDFP are set as $\lambda=1/8$ and $\gamma=2$. The same simulation setting as in [15] is used in this experiment and we still use the artifact power (AP) and the two-region signal to noise ratio (SNR) to measure the image quality. We refer the reader to [15, 27] for more details.

In the following, we compare PDFP algorithm (4.1) with the previously proposed algorithms $\mathrm{PDFP}^{2}\mathrm{O}$ (4.2) and $\mathrm{PDFP}^{2}\mathrm{O}_{C}$ (4.3). From Figures 2 and 3, we can first see that the introduction of non-negative constraint in the model (5.2) is beneficial and we can recover a better solution with higher two-region SNR and lower AP value. The non-negative constraint leads to a faster convergence for a stable recovery. Second, $\mathrm{PDFP}^{2}\mathrm{O}_{C}$ and PDFP are both efficient. For a subsampling rate $R=2$, $\mathrm{PDFP}^{2}\mathrm{O}_{C}$ and PDFP can both recover better solutions in terms of AP values compared to $\mathrm{PDFP}^{2}\mathrm{O}$ under the same iterative numbers. For $R=4$, the solutions of $\mathrm{PDFP}^{2}\mathrm{O}_{C}$ and PDFP have better AP values than those of $\mathrm{PDFP}^{2}\mathrm{O}$, but only use half iteration numbers of $\mathrm{PDFP}^{2}\mathrm{O}$. The computation time for PDFP is slightly less than $\mathrm{PDFP}^{2}\mathrm{O}_{C}$. Finally, the iterative solutions of PDFP are always feasible, which could be useful in practice.

6 Conclusion

We have extended the algorithm PAPA [20] and $\mathrm{PDFP}^{2}\mathrm{O}$ [15] to derive a primal-dual fixed point algorithm PDFP (see (1.3)) for solving the minimization problem of three-block convex separable functions (1.1). The proposed PDFP algorithm is a symmetric and fully splitting scheme, only involving explicit gradient and linear operators without any inversion and subproblem solving, when the proximity operator of nonsmooth functions can easily be handled. The scheme can easily be adapted to a variety of inverse problems involving many terms minimization and it is suitable for large-scale parallel implementation. In addition, the parameter range determined by the convergence analysis is rather simple and clear, and it could be useful for practical applications. Finally, as discussed in Section 5 in [5], we can also extend the current PDFP algorithm to solve multi-block composite (more than three) minimization problems. Preconditioning operators, as proposed in [16, 19, 24, 28], can also be introduced to accelerate the PDFP, which could be a future work for some specific applications.

References

Tibshirani, R, Saunders, M, Rosset, S, Zhu, J, Knight, K: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc., Ser. B, Stat. Methodol. 67(1), 91-108 (2005)
Article MathSciNet MATH Google Scholar
Yuan, M, Lin, Y: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc., Ser. B, Stat. Methodol. 68(1), 49-67 (2006)
Article MathSciNet MATH Google Scholar
Goldstein, T, Osher, S: The split Bregman method for L1-regularized problems. SIAM J. Imaging Sci. 2(2), 323-343 (2009)
Article MathSciNet MATH Google Scholar
Combettes, PL, Pesquet, J-C: Primal-dual splitting algorithm for solving inclusions with mixtures of composite, Lipschitzian, and parallel-sum type monotone operators. Set-Valued Var. Anal. 20(2), 307-330 (2012)
Article MathSciNet MATH Google Scholar
Condat, L: A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 158(2), 460-479 (2013)
Article MathSciNet MATH Google Scholar
Davis, D, Yin, W: A three-operator splitting scheme and its optimization applications (2015). arXiv:1504.01032
Fortin, M, Glowinski, R: Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems. North-Holland, Amsterdam (1983)
MATH Google Scholar
Boyd, S, Parikh, N, Chu, E, Peleato, B, Eckstein, J: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1-122 (2011)
Article MATH Google Scholar
Zhu, M, Chan, T: An efficient primal-dual hybrid gradient algorithm for total variation image restoration. CAM report 08-34, UCLA (2008)
Esser, E, Zhang, X, Chan, TF: A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM J. Imaging Sci. 3(4), 1015-1046 (2010)
Article MathSciNet MATH Google Scholar
Chambolle, A, Pock, T: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120-145 (2011)
Article MathSciNet MATH Google Scholar
Pock, T, Chambolle, A: Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In: 2011 International Conference on Computer Vision (ICCV), pp. 1762-1769. IEEE Press, New York (2011)
Chapter Google Scholar
Zhang, X, Burger, M, Bresson, X, Osher, S: Bregmanized nonlocal regularization for deconvolution and sparse reconstruction. SIAM J. Imaging Sci. 3(3), 253-276 (2010)
Article MathSciNet MATH Google Scholar
Loris, I, Verhoeven, C: On a generalization of the iterative soft-thresholding algorithm for the case of non-separable penalty. Inverse Probl. 27(12), 125007 (2011)
Article MathSciNet MATH Google Scholar
Chen, P, Huang, J, Zhang, X: A primal-dual fixed point algorithm for convex separable minimization with applications to image restoration. Inverse Probl. 29(2), 025011 (2013)
Article MathSciNet MATH Google Scholar
Combettes, PL, Condat, L, Pesquet, J-C, Vu, BC: A forward-backward view of some primal-dual optimization methods in image recovery. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 4141-4145. IEEE Press, New York (2014)
Chapter Google Scholar
Komodakis, N, Pesquet, J-C: Playing with duality: an overview of recent primal-dual approaches for solving large-scale optimization problems (2014). arXiv:1406.5429
Li, Q, Shen, L, Xu, Y, Zhang, N: Multi-step fixed-point proximity algorithms for solving a class of optimization problems arising from image processing. Adv. Comput. Math. 41(2), 387-422 (2015)
Article MathSciNet MATH Google Scholar
Li, Q, Zhang, N: Fast proximity-gradient algorithms for structured convex optimization problems. Preprint (2015)
Krol, A, Li, S, Shen, L, Xu, Y: Preconditioned alternating projection algorithms for maximum a posteriori ECT reconstruction. Inverse Probl. 28(11), 115005 (2012)
Article MathSciNet MATH Google Scholar
Moreau, J-J: Fonctions convexes duales et points proximaux dans un espace Hilbertien. C. R. Acad. Sci. Paris, Sér. A Math. 255, 2897-2899 (1962)
MathSciNet MATH Google Scholar
Combettes, PL, Wajs, VR: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168-1200 (2005)
Article MathSciNet MATH Google Scholar
Chen, P, Huang, J, Zhang, X: A primal-dual fixed point algorithm based on proximity operator for convex set constrained separable problem. J. Nanjing Norm. Univ. Nat. Sci. Ed. 36(3), 1-5 (2013) (in Chinese)
MathSciNet MATH Google Scholar
Tang, Y-C, Zhu, C-X, Wen, M, Peng, J-G: A splitting primal-dual proximity algorithm for solving composite optimization problems (2015). arXiv:1507.08413
Briceno-Arias, LM, Combettes, PL: A monotone+skew splitting model for composite monotone inclusions in duality. SIAM J. Control Optim. 21(4), 1230-1250 (2011)
Article MathSciNet MATH Google Scholar
Liu, J, Yuan, L, Ye, J: An efficient algorithm for a class of fused lasso problems. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 323-332. ACM, New York (2010)
Chapter Google Scholar
Ji, JX, Son, JB, Rane, SD: PULSAR: a Matlab toolbox for parallel magnetic resonance imaging using array coils and multiple channel receivers. Concepts Magn. Reson., Part B Magn. Reson. Eng. 31(1), 24-36 (2007)
Article Google Scholar
Chen, P: Primal-dual fixed point algorithms for convex separable minimization and their applications. PhD thesis, Shanghai Jiao Tong University (2013) (in Chinese)

Download references

Acknowledgements

P Chen was partially supported by the PhD research startup foundation of Taiyuan University of Science and Technology (No. 20132024). J Huang was partially supported by NSFC (No. 11571237). X Zhang was partially supported by NSFC (Nos. 91330102 and GZ1025) and 973 program (No. 2015CB856004). We thank the reviewer for pointing out the references [4, 14, 16] and for the pertinent comments and suggestions, which greatly improved the early version of this paper.

Author information

Authors and Affiliations

Schools of Mathematical Sciences, and MOE-LSC, Shanghai Jiao Tong University, 800, Dongchuan Road, Shanghai, China
Peijun Chen, Jianguo Huang & Xiaoqun Zhang
School of Biomedical Engineering, Shanghai Jiao Tong University, 800, Dongchuan Road, Shanghai, China
Peijun Chen
Department of Mathematics, Taiyuan University of Science and Technology, Taiyuan, China
Peijun Chen
Institute of Natural Sciences, Shanghai Jiao Tong University, 800, Dongchuan Road, Shanghai, China
Xiaoqun Zhang

Authors

Peijun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jianguo Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoqun Zhang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All authors contributed equally to the writing of this paper. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Chen, P., Huang, J. & Zhang, X. A primal-dual fixed point algorithm for minimization of the sum of three convex separable functions. Fixed Point Theory Appl 2016, 54 (2016). https://doi.org/10.1186/s13663-016-0543-2

Download citation

Received: 29 November 2015
Accepted: 12 April 2016
Published: 26 April 2016
DOI: https://doi.org/10.1186/s13663-016-0543-2

A primal-dual fixed point algorithm for minimization of the sum of three convex separable functions

Abstract

1 Introduction

2 Primal-dual fixed point algorithm

2.1 Preliminaries and notations

Lemma 2.1

Lemma 2.2

Lemma 2.3

2.2 Derivation of PDFP

Theorem 2.1

3 Convergence analysis

3.1 Convergence

Lemma 3.1

Proof

Lemma 3.2

Proof

Lemma 3.3

Proof

Theorem 3.1

Proof

Remark 3.1

Remark 3.2

3.2 Linear convergence rate for special cases

Theorem 3.2

Proof

4 Connections to other algorithms

5 Numerical experiments

5.1 The fused LASSO penalized problem

5.2 Image restoration with non-negative constraint and sparse regularization

6 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords