Skip to main content

Projecting onto rectangular matrices with prescribed row and column sums


In 1990, Romero presented a beautiful formula for the projection onto the set of rectangular matrices with prescribed row and column sums. Variants of Romero’s formula were rediscovered by Khoury and by Glunt, Hayden, and Reams for bistochastic (square) matrices in 1998. These results have found various generalizations and applications.

In this paper, we provide a formula for the more general problem of finding the projection onto the set of rectangular matrices with prescribed scaled row and column sums. Our approach is based on computing the Moore–Penrose inverse of a certain linear operator associated with the problem. In fact, our analysis holds even for Hilbert–Schmidt operators, and we do not have to assume consistency. We also perform numerical experiments featuring the new projection operator.

1 Motivation

A matrix in \(\mathbb{R}^{n\times n}\) is called bistochastic if all entries of it are nonnegative and all its row and column sums equal 1. More generally, a matrix is generalized bistochastic if the requirement on nonnegativity is dropped. The bistochastic matrices form a convex polytope B, commonly called the Birkhoff polytope, in \(\mathbb{R}^{n\times n}\), with its extreme points being the permutation matrices (a seminal result due to Birkhoff and von Neumann). A lovely formula provided in 1998 by Khoury [8]—and also by Glunt et al. [5]—gives the projection of any matrix onto G, the affine subspace of generalized bistochastic matrices (see Example 3.8). More generally, nonnegative rectangular matrices with prescribed row and column sums are called transportation polytopes. If the nonnegativity assumption is dropped, then Romero provided already in 1990 an explicit formula (see Remark 3.5) which even predates the square case! On the other hand, the projection onto the set of nonnegative matrices N is simple—just replace every negative entry with 0. No explicit formula is known to project a matrix onto the set of bistochastic matrices; however, because \(B=G\cap N\), one may apply algorithms such as Dykstra’s algorithm to iteratively approximate the projection onto B by using the projection operators \(P_{G}\) and \(P_{N}\) (see, e.g., Takouda’s [12] for details). In the case of transportation polytopes, algorithms which even converge in finitely many steps were provided by Calvillo and Romero [4].

The goal of this paper is to provide explicit projection operators in more general settings. Specifically, we present a projection formula for finding the projection onto the set of rectangular matrices with prescribed scaled row and column sums. Such problems arise, e.g., in discrete tomography [13] and the study of transportation polytopes [4]. Our approach uses the Moore–Penrose inverse of a certain linear operator \(\mathcal{A}\). It turns out that our analysis works even for Hilbert–Schmidt operators because the range of \(\mathcal{A}\) can be determined and seen to be closed. Our main references are [3, 7] (for Hilbert–Schmidt operators) and [6] (for the Moore–Penrose inverse). We also note that consistency is not required.

The paper is organized as follows. After recording a useful result involving the Moore–Penrose inverse at the end of this section, we prove our main results in Sect. 2. These results are then specialized to rectangular matrices in Sect. 3. We then turn to numerical experiments in Sect. 4, where we compare the performance of three popular algorithms: Douglas–Rachford, the method of alternating projections, and Dykstra.

We conclude this introductory section with a result which we believe to be part of the folklore (although we were not able to pinpoint a crisp reference). It is formulated using the Moore–Penrose inverse of an operator—for the definition of the Moore–Penrose inverse and its basic properties, see [6] (and also [3, pages 57–59] for a crash course). The formula presented works even in the case when the problem is inconsistent and automatically provides a least squares solution.

Proposition 1.1

Let \(A\colon X\to Y\) be a continuous linear operator with closed range between two real Hilbert spaces. Let \(b\in Y\), set \(\bar{b}:= P_{\operatorname{ran}A}b\), and set \(C:= A^{-1}\bar{b}\). Then

$$\begin{aligned} (\forall x\in X)\quad P_{C}x = x - A^{\dagger }(Ax-b), \end{aligned}$$

where \(A^{\dagger }\) denotes the Moore–Penrose inverse of A.


Clearly, \(\bar{b}\in \operatorname{ran}A\); hence, \(C\neq \varnothing \). Let \(x\in X\). It is well known (see, e.g., [3, Example 29.17(ii)]) that

$$\begin{aligned} P_{C}x = x - A^{\dagger }(Ax-\bar{b}). \end{aligned}$$

On the other hand,

$$\begin{aligned} A^{\dagger }\bar{b} =A^{\dagger }P_{\operatorname{ran}A}b =A^{\dagger }AA^{\dagger }b= A^{\dagger }b \end{aligned}$$

using the fact that \(AA^{\dagger }= P_{\operatorname{ran}A}\) (see, e.g., [3, Proposition 3.30(ii)]) and \(A^{\dagger }A A^{\dagger }= A^{\dagger }\) (see, e.g., [6, Section II.2]). Altogether, \(P_{C}x=x-A^{\dagger }(Ax-b)\) as claimed. □

2 Hilbert–Schmidt operators

From now on, we assume that

$$\begin{aligned} \text{$X$ and $Y$ are two real Hilbert spaces}, \end{aligned}$$

which in turn give rise to the real Hilbert space

$$\begin{aligned} \mathcal{H}:= \{{T\colon X\to Y} | { \text{$T$ is Hilbert--Schmidt}} \}. \end{aligned}$$

Hilbert–Schmidt operators encompass rectangular matrices—even with infinitely many entries as long as these are square summable—as well as certain integral operators. (We refer the reader to [7, Sect. 2.6] for basic results on Hilbert–Schmidt operators and also recommend [10, Section VI.6].) Moreover, \(\mathcal{H}\) (is generated by and) contains rank-one operators of the form

$$\begin{aligned} (v\otimes u)\colon X \to Y \colon x\mapsto \langle {u},{x} \rangle v, \end{aligned}$$

where \((v,u)\in Y\times X\), and with adjoint

$$\begin{aligned} (v\otimes u)^{*}\colon Y \to X \colon y\mapsto \langle {v},{y} \rangle u \end{aligned}$$


$$\begin{aligned} (v\otimes u)^{*}=u\otimes v. \end{aligned}$$


$$\begin{aligned} u\otimes u = \Vert u \Vert ^{2}P_{ \mathbb{R}u}\quad \text{and}\quad v \otimes v = \Vert v \Vert ^{2}P_{ \mathbb{R}v}. \end{aligned}$$

For the rest of the paper, we fix

$$\begin{aligned} e\in X\quad \text{and}\quad f \in Y, \end{aligned}$$

and set

$$\begin{aligned} \mathcal{A} \colon \mathcal{H}\to Y\times X \colon T \mapsto \bigl(Te,T^{*}f\bigr). \end{aligned}$$

Proposition 2.1

\(\mathcal{A}\) is a continuous linear operator and \(\|\mathcal{A}\| = \sqrt{\|e\|^{2}+\|f\|^{2}}\).


Clearly, \(\mathcal{A}\) is a linear operator. Moreover, \((\forall T\in \mathcal{H})\) \(\|\mathcal{A}(T)\|^{2} = \|Te\|^{2} + \|T^{*}f\|^{2} \leq \|T\|_{ \mathsf{op}}^{2}\|e\|^{2}+ \|T^{*}\|_{\mathsf{op}}^{2}\|f\|^{2} \leq \|T\|^{2}(\|e\|^{2}+\|f\|^{2})\) because the Hilbert–Schmidt norm dominates the operator norm. It follows that \(\mathcal{A}\) is continuous and \(\|\mathcal{A}\|\leq \sqrt{\|e\|^{2}+\|f\|^{2}}\). On the other hand, if \(T = f\otimes e\), then \(\|T\| = \|e\|\|f\|\), \(\mathcal{A}(T) = (\|e\|^{2}f,\|f\|^{2}e)\) and hence \(\|\mathcal{A}(T)\|=\|T\|\sqrt{\|e\|^{2}+\|f\|^{2}}\). Thus \(\|\mathcal{A}\| \geq \sqrt{\|e\|^{2}+\|f\|^{2}}\). Combining these observations, we obtain altogether that \(\|\mathcal{A}\| = \sqrt{\|e\|^{2}+\|f\|^{2}}\). □

We now prove that \(\operatorname{ran}\mathcal{A}\) is always closed.

Proposition 2.2

(Range of \(\mathcal{A}\) is closed)

The following hold:

  1. (i)

    If \(e=0\) and \(f=0\), then \(\operatorname{ran}\mathcal{A}= \{0\}\times \{0\}\).

  2. (ii)

    If \(e=0\) and \(f\neq 0\), then \(\operatorname{ran}\mathcal{A}= \{0\}\times X\).

  3. (iii)

    If \(e\neq 0\) and \(f=0\), then \(\operatorname{ran}\mathcal{A}= Y\times \{0\}\).

  4. (iv)

    If \(e\neq 0\) and \(f\neq 0\), then \(\operatorname{ran}\mathcal{A}= \{(f,-e)\}^{\perp }\).

Consequently, \(\operatorname{ran}\mathcal{A}\) is always a closed linear subspace of \(Y\times X\).


(i): Clear.

(ii): Obviously, \(\operatorname{ran}\mathcal{A}\subseteq \{0\}\times X\). Conversely, let \(x\in X\) and set

$$\begin{aligned} T:= \frac{1}{ \Vert f \Vert ^{2}}f\otimes x. \end{aligned}$$

Then \(Te=T0 = 0\) and

$$\begin{aligned} T^{*}f = \frac{1}{ \Vert f \Vert ^{2}}(f\otimes x)^{*}f = \frac{1}{ \Vert f \Vert ^{2}} \langle {f},{f} \rangle x = x, \end{aligned}$$

and thus \((0,x)=(Te,T^{*}f)=\mathcal{A}(T)\in \operatorname{ran}\mathcal{A}\).

(iii): Obviously, \(\operatorname{ran}\mathcal{A}\subseteq Y\times \{0\}\). Conversely, let \(y\in Y\) and set

$$\begin{aligned} T:= \frac{1}{ \Vert e \Vert ^{2}}y\otimes e. \end{aligned}$$

Then \(T^{*}f=T^{*}0 = 0\) and

$$\begin{aligned} Te = \frac{1}{ \Vert e \Vert ^{2}}(y\otimes e)e = \frac{1}{ \Vert e \Vert ^{2}} \langle {e},{e} \rangle y = y, \end{aligned}$$

and thus \((y,0)=(Te,T^{*}f)=\mathcal{A}(T)\in \operatorname{ran}\mathcal{A}\).

(iv): If \((y,x)\in \operatorname{ran}\mathcal{A}\), say \((y,x)=\mathcal{A}(T)=(Te,T^{*}f)\) for some \(T\in \mathcal{H}\), then

$$\begin{aligned} \langle {f},{y} \rangle &= \langle {f},{Te} \rangle = \bigl\langle {T^{*}f},{e} \bigr\rangle = \langle {x},{e} \rangle, \end{aligned}$$

i.e., \((y,x)\perp (f,-e)\). It follows that \(\operatorname{ran}\mathcal{A}\subseteq \{(f,-e)\}^{\perp }\).

Conversely, let \((y,x)\in \{(f,-e)\}^{\perp }\), i.e., \(\langle {e},{x} \rangle = \langle {f},{y} \rangle \).

Case 1: \(\langle {e},{x} \rangle = \langle {f},{y} \rangle \neq 0\).


$$\begin{aligned} \zeta:= \frac{1}{ \langle {x},{e} \rangle }= \frac{1}{ \langle {y},{f} \rangle }\quad \text{and}\quad T := \zeta (y\otimes x)\in \mathcal{H}. \end{aligned}$$

Note that

$$\begin{aligned} Te = \zeta (y\otimes x)e = \zeta \langle {x},{e} \rangle y = y \end{aligned}$$


$$\begin{aligned} T^{*}f = \zeta (y\otimes x)^{*}f = \zeta \langle {y},{f} \rangle x = x; \end{aligned}$$

therefore, \((y,x)=(Te,T^{*}f)=\mathcal{A}(T)\in \operatorname{ran}\mathcal{A}\).

Case 2: \(\langle {e},{x} \rangle = \langle {f},{y} \rangle =0\).

Pick ξ and η in \(\mathbb{R}\) such that

$$\begin{aligned} \xi \Vert f \Vert ^{2}=1\quad \text{and}\quad \eta \Vert e \Vert ^{2}=1, \end{aligned}$$

and set

$$\begin{aligned} T:= \xi (f\otimes x) + \eta (y\otimes e)\in \mathcal{H}. \end{aligned}$$


$$\begin{aligned} Te = \xi (f\otimes x)e + \eta (y\otimes e)e = \xi \langle {x},{e} \rangle f + \eta \langle {e},{e} \rangle y = 0f + \eta \Vert e \Vert ^{2}y = y \end{aligned}$$


$$\begin{aligned} T^{*}f = \xi (f\otimes x)^{*}f + \eta (y\otimes e)^{*}f = \xi \langle {f},{f} \rangle x + \eta \langle {y},{f} \rangle e = \xi \Vert f \Vert ^{2}x + 0e = x. \end{aligned}$$

Thus \((y,x)=(Te,T^{*}f)=\mathcal{A}(T)\in \operatorname{ran}\mathcal{A}\). □

We now turn to the adjoint of \(\mathcal{A}\).

Proposition 2.3

(Adjoint of \(\mathcal{A}\))

We have

$$\begin{aligned} \mathcal{A}^{*}\colon {Y}\times {X} \to \mathcal{H} \colon (y,x) \mapsto y\otimes e + f \otimes x. \end{aligned}$$


Let \(T\in \mathcal{H}\) and \((y,x)\in Y\times X\). Let B be any orthonormal basis of X. Then

$$\begin{aligned} \bigl\langle {\mathcal{A}(T)},{(y,x)} \bigr\rangle &= \bigl\langle { \bigl(Te,T^{*}f\bigr)},{(y,x)} \bigr\rangle \end{aligned}$$
$$\begin{aligned} &= \langle {Te},{y} \rangle + \bigl\langle {T^{*}f},{x} \bigr\rangle \end{aligned}$$
$$\begin{aligned} &= \bigl\langle {e},{T^{*}y} \bigr\rangle + \bigl\langle {T^{*}f},{x} \bigr\rangle \end{aligned}$$
$$\begin{aligned} &= \sum_{b\in B} \bigl( \langle {e},{b} \rangle \bigl\langle {b},{T^{*}y} \bigr\rangle + \bigl\langle {T^{*}f},{b} \bigr\rangle \langle {b},{x} \rangle \bigr) \end{aligned}$$
$$\begin{aligned} &= \sum_{b\in B} \bigl\langle {Tb},{ \langle {e},{b} \rangle y} \bigr\rangle +\sum_{b\in B} \bigl\langle {Tb},{ \langle {x},{b} \rangle f} \bigr\rangle \end{aligned}$$
$$\begin{aligned} &= \sum_{b\in B} \bigl\langle {Tb},{(y\otimes e)b} \bigr\rangle + \sum_{b\in B} \bigl\langle {Tb},{(f\otimes x)b} \bigr\rangle \end{aligned}$$
$$\begin{aligned} &= \langle {T},{y\otimes e} \rangle + \langle {T},{f \otimes x} \rangle \end{aligned}$$
$$\begin{aligned} &= \langle {T},{y\otimes e + f \otimes x} \rangle, \end{aligned}$$

which proves the result. □

We have all the results together to start tackling the Moore–Penrose inverse of \(\mathcal{A}\).

Theorem 2.4

(Moore–Penrose inverse of \(\mathcal{A}\) part 1)

Suppose that \(e\neq 0\) and \(f\neq 0\). Let \((y,x)\in Y\times X\). Then

$$\begin{aligned} \mathcal{A}^{\dagger }(y,x) = \frac{1}{ \Vert e \Vert ^{2}} \biggl( y \otimes e - \frac{ \langle {f},{{y}} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} f \otimes e \biggr) + \frac{1}{ \Vert f \Vert ^{2}} \biggl( f \otimes x - \frac{ \langle {e},{x} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} f \otimes e \biggr). \end{aligned}$$



$$\begin{aligned} (v,u):= \biggl(\frac{1}{ \Vert e \Vert ^{2}} \biggl( y - \frac{ \langle {y},{f} \rangle }{ \Vert e \Vert ^{2} + \Vert f \Vert ^{2}} f \biggr), \frac{1}{ \Vert f \Vert ^{2}} \biggl( x - \frac{ \langle {x},{e} \rangle }{ \Vert e \Vert ^{2} + \Vert f \Vert ^{2}} e \biggr) \biggr). \end{aligned}$$


$$\begin{aligned} \langle {f},{v} \rangle &= \frac{1}{ \Vert e \Vert ^{2}} \biggl( \langle {f},{y} \rangle - \frac{ \langle {y},{f} \rangle }{ \Vert e \Vert ^{2} + \Vert f \Vert ^{2}} \langle {f},{f} \rangle \biggr) \end{aligned}$$
$$\begin{aligned} &=\frac{ \langle {f},{y} \rangle }{ \Vert e \Vert ^{2}} \biggl( 1- \frac{ \Vert f \Vert ^{2}}{ \Vert e \Vert ^{2} + \Vert f \Vert ^{2}} \biggr) \end{aligned}$$
$$\begin{aligned} &=\frac{ \langle {f},{y} \rangle }{ \Vert e \Vert ^{2}} \cdot \frac{ \Vert e \Vert ^{2} + \Vert f \Vert ^{2} - \Vert f \Vert ^{2}}{ \Vert e \Vert ^{2} + \Vert f \Vert ^{2}} \end{aligned}$$
$$\begin{aligned} &= \frac{ \langle {f},{y} \rangle }{ \Vert e \Vert ^{2} + \Vert f \Vert ^{2}}, \end{aligned}$$

and similarly

$$\begin{aligned} \langle {e},{u} \rangle = \frac{ \langle {e},{x} \rangle }{ \Vert e \Vert ^{2} + \Vert f \Vert ^{2}}. \end{aligned}$$

Substituting (28a)–(28d) and (29) in (27) yields

$$\begin{aligned} (v,u) = \biggl(\frac{1}{ \Vert e \Vert ^{2}} \bigl( y - \langle {f},{{v}} \rangle f \bigr), \frac{1}{ \Vert f \Vert ^{2}} \bigl( x - \langle {e},{u} \rangle e \bigr) \biggr). \end{aligned}$$


$$\begin{aligned} y = \Vert e \Vert ^{2}v + \langle {f},{v} \rangle f\quad \text{and}\quad x = \Vert f \Vert ^{2}u + \langle {e},{u} \rangle e. \end{aligned}$$

Therefore, using (24), (30), (7), and (24) again, we obtain

$$\begin{aligned} \mathcal{A}^{*}\mathcal{A}\mathcal{A}^{*}(v,u) &= \mathcal{A}^{*} \mathcal{A}(v\otimes e+f\otimes u) \end{aligned}$$
$$\begin{aligned} &= \mathcal{A}^{*} \bigl((v\otimes e)e + (f\otimes u) e, (v\otimes e)^{*} f+(f\otimes u)^{*} f \bigr) \end{aligned}$$
$$\begin{aligned} &= \mathcal{A}^{*} \bigl( \Vert e \Vert ^{2}v + \langle {e},{u} \rangle f, \langle {f},{v} \rangle e+ \Vert f \Vert ^{2} u \bigr) \end{aligned}$$
$$\begin{aligned} &= \bigl( \Vert e \Vert ^{2}v + \langle {e},{u} \rangle f \bigr) \otimes e + f\otimes \bigl( \langle {f},{v} \rangle e+ \Vert f \Vert ^{2} u \bigr) \end{aligned}$$
$$\begin{aligned} &= \Vert e \Vert ^{2}v\otimes e + \langle {e},{u} \rangle f \otimes e + \langle {f},{v} \rangle f\otimes e+ \Vert f \Vert ^{2}f \otimes u \end{aligned}$$
$$\begin{aligned} &= \bigl( \Vert e \Vert ^{2}v+ \langle {f},{v} \rangle f \bigr) \otimes e + f\otimes \bigl( \Vert f \Vert ^{2}u + \langle {e},{u} \rangle e \bigr) \end{aligned}$$
$$\begin{aligned} &=y \otimes e + f \otimes x \end{aligned}$$
$$\begin{aligned} &=\mathcal{A}^{*}(y,x). \end{aligned}$$

To sum up, we found \(\mathcal{A}^{*}(v,u)\in \operatorname{ran}\mathcal{A}^{*} = (\ker \mathcal{A})^{\perp }\) such that \(\mathcal{A}^{*}\mathcal{A}\mathcal{A}^{*}(v,u) = \mathcal{A}^{*}(y,x)\). By [3, Proposition 3.30(i)], (30), and (24), we deduce that

$$\begin{aligned} \mathcal{A}^{\dagger }(y,x) &=\mathcal{A}^{*}(v,u) \end{aligned}$$
$$\begin{aligned} &=\mathcal{A}^{*} \biggl(\frac{1}{ \Vert e \Vert ^{2}} \bigl( y - \langle {f},{{v}} \rangle f \bigr), \frac{1}{ \Vert f \Vert ^{2}} \bigl( x - \langle {e},{u} \rangle e \bigr) \biggr) \end{aligned}$$
$$\begin{aligned} &=\frac{1}{ \Vert e \Vert ^{2}} \bigl( y\otimes e - \langle {f},{{v}} \rangle f\otimes e \bigr) + \frac{1}{ \Vert f \Vert ^{2}} \bigl( f\otimes x - \langle {e},{u} \rangle f\otimes e \bigr), \end{aligned}$$

which now results in (26) by using (28a)–(28d) and (29). □

Theorem 2.5

(Moore–Penrose inverse of \(\mathcal{A}\) part 2)

Let \((y,x)\in Y\times X\). Then the following hold:

  1. (i)

    If \(e=0\) and \(f\neq 0\), then \(\mathcal{A}^{\dagger }(y,x) = \frac{1}{\|f\|^{2}} f \otimes x\).

  2. (ii)

    If \(e\neq 0\) and \(f= 0\), then \(\mathcal{A}^{\dagger }(y,x) = \frac{1}{\|e\|^{2}} y \otimes e\).

  3. (iii)

    If \(e=0\) and \(f=0\), then \(\mathcal{A}^{\dagger }(y,x) = 0\in \mathcal{H}\).


Let \(T\in \mathcal{H}\).

(i): In this case, \(\mathcal{A}(T) = (0,T^{*}f)\) and \(\mathcal{A}^{*}(y,x) = f\otimes x\). Let us verify the Penrose conditions [6, p.48]. First, using (7),

$$\begin{aligned} \mathcal{A}\mathcal{A}^{\dagger }(y,x) &= \mathcal{A} \bigl( \Vert f \Vert ^{-2}f \otimes x \bigr) = \Vert f \Vert ^{-2} \bigl((f\otimes x)e,(f\otimes x)^{*}f \bigr) \end{aligned}$$
$$\begin{aligned} &= \Vert f \Vert ^{-2} \bigl(0, \langle {f},{f} \rangle x \bigr) = (0,x) \end{aligned}$$


$$\begin{aligned} \bigl\langle {\mathcal{A}\mathcal{A}^{\dagger }(y,x)},{(v,u)} \bigr\rangle = \bigl\langle {(0,x)},{(v,u)} \bigr\rangle = \langle {x},{u} \rangle = \bigl\langle {\mathcal{A} \mathcal{A}^{\dagger }(v,u)},{(y,x)} \bigr\rangle , \end{aligned}$$

which shows that \(\mathcal{A}\mathcal{A}^{\dagger }\) is indeed self-adjoint.


$$\begin{aligned} \mathcal{A}^{\dagger }\mathcal{A}(T) = \mathcal{A}^{\dagger } \bigl(Te,T^{*}f\bigr) = \mathcal{A}^{\dagger }\bigl(0,T^{*}f \bigr) = \Vert f \Vert ^{-2}f\otimes \bigl(T^{*}f\bigr), \end{aligned}$$

and if \(S\in \mathcal{H}\) and B is any orthonormal basis of X, then

$$\begin{aligned} \bigl\langle {\mathcal{A}^{\dagger }\mathcal{A}(T)},{S} \bigr\rangle &= \Vert f \Vert ^{-2} \bigl\langle {f\otimes \bigl(T^{*}f \bigr)},{S} \bigr\rangle \end{aligned}$$
$$\begin{aligned} &= \Vert f \Vert ^{-2}\sum_{b\in B} \bigl\langle {\bigl(f\otimes \bigl(T^{*}f\bigr)\bigr)b},{Sb} \bigr\rangle \end{aligned}$$
$$\begin{aligned} &= \Vert f \Vert ^{-2}\sum_{b\in B} \bigl\langle { \bigl\langle {T^{*}f},{b} \bigr\rangle f},{Sb} \bigr\rangle \end{aligned}$$
$$\begin{aligned} &= \Vert f \Vert ^{-2}\sum_{b\in B} \bigl\langle { \langle {f},{Tb} \rangle f},{Sb} \bigr\rangle \end{aligned}$$
$$\begin{aligned} &= \Vert f \Vert ^{-2}\sum_{b\in B} \langle {f},{Tb} \rangle \langle {f},{Sb} \rangle \end{aligned}$$
$$\begin{aligned} &= \bigl\langle {\mathcal{A}^{\dagger }\mathcal{A}(S)},{T} \bigr\rangle , \end{aligned}$$

which yields the symmetry of \(\mathcal{A}^{\dagger }\mathcal{A}\).

Thirdly, using (36) and the assumption that \(e=0\), we have

$$\begin{aligned} \mathcal{A}\mathcal{A}^{\dagger }\mathcal{A}(T) &= \mathcal{A} \bigl( \Vert f \Vert ^{-2}f\otimes \bigl(T^{*}f\bigr) \bigr) = \Vert f \Vert ^{-2} \bigl(0,\bigl(f\otimes \bigl(T^{*}f\bigr) \bigr)^{*}f \bigr) \end{aligned}$$
$$\begin{aligned} &= \Vert f \Vert ^{-2} \bigl(0, \langle {f},{f} \rangle T^{*}f \bigr) = \bigl(0,T^{*}f\bigr) \end{aligned}$$
$$\begin{aligned} &=\mathcal{A}(T). \end{aligned}$$

And finally, using (34a)–(34b), we have

$$\begin{aligned} \mathcal{A}^{\dagger }\mathcal{A}\mathcal{A}^{\dagger }(y,x) = \mathcal{A}^{\dagger }(0,x) = \Vert f \Vert ^{-2}f\otimes x = \mathcal{A}^{\dagger }(y,x). \end{aligned}$$

(ii): This can be proved similar to (i).

(iii): In this case, \(\mathcal{A}\) is the zero operator and hence the Desoer–Whalen conditions (see [6, page 51]) make it obvious that \(\mathcal{A}^{\dagger }\) is the zero operator as well. □

Let us define the auxiliary function

$$\begin{aligned} \delta (\xi ):= \textstyle\begin{cases} \xi &\text{if $\xi \neq 0$;} \\ 1 &\text{if $\xi =0$,} \end{cases}\displaystyle \end{aligned}$$

which allows us to combine the previous two results into one.

Corollary 2.6

Let \((y,x)\in Y\times X\). Then

$$\begin{aligned} \mathcal{A}^{\dagger }(y,x) ={}& \frac{1}{\delta ( \Vert e \Vert ^{2})} \biggl( y \otimes e - \frac{ \langle {f},{{y}} \rangle }{\delta ( \Vert e \Vert ^{2}+ \Vert f \Vert ^{2})} f\otimes e \biggr) \end{aligned}$$
$$\begin{aligned} &{} + \frac{1}{\delta ( \Vert f \Vert ^{2})} \biggl( f \otimes x - \frac{ \langle {e},{x} \rangle }{\delta ( \Vert e \Vert ^{2}+ \Vert f \Vert ^{2})} f\otimes e \biggr). \end{aligned}$$

We now turn to formulas for \(P_{\operatorname{ran}\mathcal{A}}\) and \(P_{\operatorname{ran}\mathcal{A^{*}}}\).

Corollary 2.7

(Projections onto \(\operatorname{ran}\mathcal{A}\) and \(\operatorname{ran}\mathcal{A}^{*}\))

Let \((y,x)\in {Y}\times {X}\) and let \(T\in \mathcal{H}\). If \(e\neq 0\) and \(f\neq 0\), then

$$\begin{aligned} P_{\operatorname{ran}\mathcal{A}}(y,x) = \mathcal{A}\mathcal{A}^{\dagger }(y,x) = \biggl( y- \frac{ \langle {f},{y} \rangle - \langle {e},{x} \rangle }{ \Vert e \Vert ^{2} + \Vert f \Vert ^{2}}f, x- \frac{ \langle {e},{x} \rangle - \langle {f},{y} \rangle }{ \Vert e \Vert ^{2} + \Vert f \Vert ^{2}}e \biggr) \end{aligned}$$


$$\begin{aligned} P_{\operatorname{ran}\mathcal{A}^{*}}(T) = \mathcal{A}^{\dagger }\mathcal{A}(T) = \frac{1}{ \Vert e \Vert ^{2}} (Te) \otimes e + \frac{1}{ \Vert f \Vert ^{2}}f\otimes \bigl(T^{*}f\bigr) - \frac{ \langle {f},{{Te}} \rangle }{ \Vert e \Vert ^{2} \Vert f \Vert ^{2}} f \otimes e. \end{aligned}$$


$$\begin{aligned} P_{\operatorname{ran}\mathcal{A}}(y,x) = \mathcal{A}\mathcal{A}^{\dagger }(y,x) = \textstyle\begin{cases} (0,x) &\textit{if $e=0$ and $f\neq 0$;} \\ (y,0) &\textit{if $e\neq 0$ and $f= 0$;} \\ (0,0) &\textit{if $e=0$ and $f= 0$;} \end{cases}\displaystyle \end{aligned}$$


$$\begin{aligned} P_{\operatorname{ran}\mathcal{A}^{*}}(T) = \mathcal{A}^{\dagger }\mathcal{A}(T) = \textstyle\begin{cases} \frac{1}{ \Vert f \Vert ^{2}}f\otimes (T^{*}f) &\textit{if $e=0$ and $f\neq 0$;} \\ \frac{1}{ \Vert e \Vert ^{2}}(Te)\otimes e &\textit{if $e\neq 0$ and $f= 0$;} \\ 0&\textit{if $e=0$ and $f= 0$.} \end{cases}\displaystyle \end{aligned}$$


Using [3, Proposition 3.30(ii)] and (26), we obtain for \(e\neq 0\) and \(f\neq 0\)

$$\begin{aligned} &P_{\operatorname{ran}\mathcal{A}}(y,x) \end{aligned}$$
$$\begin{aligned} &\quad= \mathcal{A}\mathcal{A}^{\dagger }(y,x) \end{aligned}$$
$$\begin{aligned} &\quad=\mathcal{A} \biggl(\frac{1}{ \Vert e \Vert ^{2}} \biggl( y\otimes e - \frac{ \langle {f},{{y}} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} f \otimes e \biggr) + \frac{1}{ \Vert f \Vert ^{2}} \biggl( f \otimes x - \frac{ \langle {e},{x} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} f \otimes e \biggr) \biggr) \end{aligned}$$
$$\begin{aligned} &\quad= \biggl(\frac{1}{ \Vert e \Vert ^{2}} \biggl( y\otimes e - \frac{ \langle {f},{{y}} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} f \otimes e \biggr)e + \frac{1}{ \Vert f \Vert ^{2}} \biggl( f \otimes x - \frac{ \langle {e},{x} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} f \otimes e \biggr)e, \end{aligned}$$
$$\begin{aligned} &\qquad \frac{1}{ \Vert e \Vert ^{2}} \biggl( y\otimes e - \frac{ \langle {f},{{y}} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} f \otimes e \biggr)^{*}f \\ &\qquad{}+ \frac{1}{ \Vert f \Vert ^{2}} \biggl( f \otimes x - \frac{ \langle {e},{x} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} f \otimes e \biggr)^{*}f \biggr) \end{aligned}$$
$$\begin{aligned} &\quad= \biggl(\frac{1}{ \Vert e \Vert ^{2}} \biggl( \langle {e},{e} \rangle y - \frac{ \langle {f},{{y}} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} \langle {e},{e} \rangle f \biggr) + \frac{1}{ \Vert f \Vert ^{2}} \biggl( \langle {x},{e} \rangle f - \frac{ \langle {e},{x} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} \langle {e},{e} \rangle f \biggr), \end{aligned}$$
$$\begin{aligned} &\qquad \frac{1}{ \Vert e \Vert ^{2}} \biggl( \langle {y},{f} \rangle e - \frac{ \langle {f},{{y}} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} \langle {f},{f} \rangle e \biggr) + \frac{1}{ \Vert f \Vert ^{2}} \biggl( \langle {f},{f} \rangle x - \frac{ \langle {e},{x} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} \langle {f},{f} \rangle e \biggr) \biggr) \end{aligned}$$
$$\begin{aligned} &\quad= \biggl(y - \frac{ \langle {f},{{y}} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}}f + \frac{ \langle {e},{x} \rangle }{ \Vert f \Vert ^{2}}f - \frac{ \langle {e},{x} \rangle \Vert e \Vert ^{2}}{ \Vert f \Vert ^{2} ( \Vert e \Vert ^{2}+ \Vert f \Vert ^{2} )}f, \end{aligned}$$
$$\begin{aligned} &\qquad x - \frac{ \langle {e},{{x}} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}}e + \frac{ \langle {f},{y} \rangle }{ \Vert e \Vert ^{2}}e - \frac{ \langle {f},{y} \rangle \Vert f \Vert ^{2}}{ \Vert e \Vert ^{2} ( \Vert e \Vert ^{2}+ \Vert f \Vert ^{2} )}e \biggr) \end{aligned}$$
$$\begin{aligned} &\quad= \biggl( y + \frac{- \langle {f},{y} \rangle \Vert f \Vert ^{2}+ \langle {e},{x} \rangle ( \Vert e \Vert ^{2}+ \Vert f \Vert ^{2} ) - \langle {e},{x} \rangle \Vert e \Vert ^{2}}{ \Vert f \Vert ^{2} ( \Vert e \Vert ^{2}+ \Vert f \Vert ^{2} )} f, \end{aligned}$$
$$\begin{aligned} &\qquad x + \frac{- \langle {e},{x} \rangle \Vert e \Vert ^{2}+ \langle {f},{y} \rangle ( \Vert e \Vert ^{2}+ \Vert f \Vert ^{2} ) - \langle {f},{y} \rangle \Vert f \Vert ^{2}}{ \Vert e \Vert ^{2} ( \Vert e \Vert ^{2}+ \Vert f \Vert ^{2} )} e \biggr) \end{aligned}$$
$$\begin{aligned} &\quad = \biggl( y - \frac{ \langle {f},{y} \rangle - \langle {e},{x} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} f, x - \frac{ \langle {e},{x} \rangle - \langle {f},{y} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} e \biggr), \end{aligned}$$

which verifies (43).

Next, using [3, Proposition 3.30(v) and (vi)] and (26), we have

$$\begin{aligned} P_{\operatorname{ran}\mathcal{A}^{*}}(T) ={}& P_{\operatorname{ran} \mathcal{A}^{\dagger }}(T) = \mathcal{A}^{\dagger } \mathcal{A}(T) = A^{\dagger }\bigl(Te,T^{*}f\bigr) \end{aligned}$$
$$\begin{aligned} ={}& \frac{1}{ \Vert e \Vert ^{2}} \biggl( (Te) \otimes e - \frac{ \langle {f},{{Te}} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} f \otimes e \biggr) \\ &{} + \frac{1}{ \Vert f \Vert ^{2}} \biggl( f \otimes \bigl(T^{*}f\bigr) - \frac{ \langle {e},{T^{*}f} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} f\otimes e \biggr) \end{aligned}$$
$$\begin{aligned} ={}& \frac{1}{ \Vert e \Vert ^{2}} (Te) \otimes e + \frac{1}{ \Vert f \Vert ^{2}}f\otimes \bigl(T^{*}f\bigr) - \frac{ \langle {f},{{Te}} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} \biggl(\frac{1}{ \Vert e \Vert ^{2}} + \frac{1}{ \Vert f \Vert ^{2}} \biggr) f\otimes e \end{aligned}$$
$$\begin{aligned} ={}& \frac{1}{ \Vert e \Vert ^{2}} (Te) \otimes e + \frac{1}{ \Vert f \Vert ^{2}}f\otimes \bigl(T^{*}f\bigr) - \frac{ \langle {f},{{Te}} \rangle }{ \Vert e \Vert ^{2} \Vert f \Vert ^{2}} f \otimes e, \end{aligned}$$

which establishes (44).

If \(e=0\) and \(f\neq 0\), then

$$\begin{aligned} \mathcal{A}\mathcal{A}^{\dagger }(y,x) = \mathcal{A}\bigl( \Vert f \Vert ^{-2}f \otimes x\bigr) = \Vert f \Vert ^{-2}\bigl(0,(f \otimes x)^{*}f\bigr) = \Vert f \Vert ^{-2}\bigl(0, \langle {f},{f} \rangle x\bigr)=(0,x) \end{aligned}$$


$$\begin{aligned} \mathcal{A}^{\dagger }\mathcal{A}(T) = \mathcal{A}^{\dagger } \bigl(0,T^{*}f\bigr) = \frac{1}{ \Vert f \Vert ^{2}}f\otimes \bigl(T^{*}f \bigr). \end{aligned}$$

The case when \(e\neq 0\) and \(f=0\) is treated similarly.

Finally, if \(e=0\) and \(f=0\), then \(\mathcal{A}^{\dagger }=0\) and the result follows. □

Theorem 2.8

(Main projection theorem)

Let \((s,r)\in Y\times X\) and set \((\bar{s},\bar{r})=P_{\operatorname{ran}\mathcal{A}}(s,r)\). Then

$$\begin{aligned} C:= \mathcal{A}^{-1}(\bar{s},\bar{r}) = \bigl\{ {T\in \mathcal{H}} | {Te = \bar{s} \textit{ and } T^{*} f = \bar{r}} \bigr\} \neq \varnothing. \end{aligned}$$

Let \(T\in \mathcal{H}\). If \(e\neq 0\) and \(f\neq 0\), then

$$\begin{aligned} P_{C}(T) ={}& T - \frac{1}{ \Vert e \Vert ^{2}} \biggl( (Te-s)\otimes e - \frac{ \langle {f},{Te-s} \rangle }{ \Vert e \Vert ^{2} + \Vert f \Vert ^{2}}f \otimes e \biggr) \end{aligned}$$
$$\begin{aligned} &{} - \frac{1}{ \Vert f \Vert ^{2}} \biggl( f\otimes \bigl(T^{*}f-r\bigr) - \frac{ \langle {e},{T^{*}f-r} \rangle }{ \Vert e \Vert ^{2} + \Vert f \Vert ^{2}} f\otimes e \biggr). \end{aligned}$$


$$\begin{aligned} P_{C}(T) = \textstyle\begin{cases} T- \frac{1}{ \Vert f \Vert ^{2}}f\otimes (T^{*}f-r) &\textit{if $e=0$ and $f\neq 0$;} \\ T- \frac{1}{ \Vert e \Vert ^{2}}(Te-s)\otimes e &\textit{if $e\neq 0$ and $f=0$;} \\ T &\textit{if $e=0$ and $f=0$.} \end{cases}\displaystyle \end{aligned}$$


Clearly, \(C\neq \varnothing \). Now Proposition 1.1 and (11) yield

$$\begin{aligned} P_{C}(T) &= T-\mathcal{A}^{\dagger }\bigl(\mathcal{A}T-(s,r)\bigr) = T - \mathcal{A}^{\dagger }\bigl(Te-s,T^{*}f -r\bigr). \end{aligned}$$

Now we consider all possible cases. If \(e\neq 0\) and \(f\neq 0\), then, using (26),

$$\begin{aligned} P_{C}(T) ={}& T- \frac{1}{ \Vert e \Vert ^{2}} \biggl( (Te-s) \otimes e - \frac{ \langle {f},{{Te-s}} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} f\otimes e \biggr) \\ &{} - \frac{1}{ \Vert f \Vert ^{2}} \biggl( f \otimes \bigl(T^{*}f-r\bigr) - \frac{ \langle {e},{T^{*}f-r} \rangle }{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} f\otimes e \biggr) \end{aligned}$$

as claimed.

Next, if \(e=0\) and \(f\neq 0\), then using Theorem 2.5(i) yields

$$\begin{aligned} P_{C}(T) = T- \frac{1}{ \Vert f \Vert ^{2}} f \otimes \bigl(T^{*}f-r \bigr). \end{aligned}$$

Similarly, if \(e\neq 0\) and \(f= 0\), then using Theorem 2.5(ii) yields

$$\begin{aligned} P_{C}(T) = T- \frac{1}{ \Vert e \Vert ^{2}} (Te-s) \otimes e. \end{aligned}$$

And finally, if \(e=0\) and \(f=0\), then \(\mathcal{A}^{\dagger }=0\) and hence \(P_{C}(T) = T\). □

Remark 2.9

Consider Theorem 2.8 and its notation. If \((s,r)\in \operatorname{ran}\mathcal{A}\), then \((\bar{s},\bar{r})=(s,r)\) and hence \(C = \mathcal{A}^{-1}(s,r)\) which covers also the consistent case. Note that the auxiliary function defined in (40) allows us to combine all four cases into

$$\begin{aligned} P_{C}(T) ={}& T - \frac{1}{\delta ( \Vert e \Vert ^{2})} \biggl( (Te-s)\otimes e - \frac{ \langle {f},{Te-s} \rangle }{\delta ( \Vert e \Vert ^{2} + \Vert f \Vert ^{2})}f \otimes e \biggr) \end{aligned}$$
$$\begin{aligned} &{} - \frac{1}{\delta ( \Vert f \Vert ^{2})} \biggl( f\otimes \bigl(T^{*}f-r\bigr) - \frac{ \langle {e},{T^{*}f-r} \rangle }{\delta ( \Vert e \Vert ^{2} + \Vert f \Vert ^{2})} f\otimes e \biggr). \end{aligned}$$

The last two results in this section are inspired by [5, Theorem 2.1] and [8, Theorem on page 566], respectively. See also Corollary 3.6 and Example 3.8.

Corollary 2.10

Suppose that \(Y=X\), let \(e\in X\smallsetminus \{0\}\), let \(f\in X\smallsetminus \{0\}\), set

$$\begin{aligned} E:= \frac{1}{ \Vert e \Vert ^{2}}e\otimes e = P_{ \mathbb{R}e} \quad\textit{and}\quad F:= \frac{1}{ \Vert f \Vert ^{2}}f\otimes f = P_{ \mathbb{R}f}, \end{aligned}$$

and let \(\gamma \in \mathbb{R}\). Then

$$\begin{aligned} C:= \bigl\{ {T\in \mathcal{H}} | {Te=\gamma e \textit{ and } T^{*} f=\gamma f} \bigr\} \neq \varnothing \end{aligned}$$


$$\begin{aligned} (\forall T\in \mathcal{H})\quad P_{C}(T) = \gamma \operatorname{Id}+( \operatorname{Id}-F) (T-\gamma \operatorname{Id}) (\operatorname{Id}-E). \end{aligned}$$


The projection identities in (56) follow from (9). Note that \(\gamma \operatorname{Id}\in C\), and hence \(C\neq \varnothing \). We may and do assume without loss of generality that \(\|e\|=1=\|f\|\).

Now let \(T\in \mathcal{H}\). Applying (52a)–(52b) with \(r:=\gamma f\) and \(s:=\gamma e\), we deduce that

$$\begin{aligned} P_{C}(T) ={}& T - \biggl( (Te-\gamma e)\otimes e - \frac{ \langle {f},{Te-\gamma e} \rangle }{2}f \otimes e \biggr) \end{aligned}$$
$$\begin{aligned} &{} - \biggl( f\otimes \bigl(T^{*}f-\gamma f\bigr) - \frac{ \langle {e},{T^{*}f-\gamma f} \rangle }{2} f \otimes e \biggr) \end{aligned}$$
$$\begin{aligned} ={}& T-(Te)\otimes e + \gamma e\otimes e + \frac{ \langle {f},{Te} \rangle -\gamma \langle {f},{e} \rangle }{2}f \otimes e \end{aligned}$$
$$\begin{aligned} &{} - f\otimes \bigl(T^{*}f\bigr) + \gamma f\otimes f + \frac{ \langle {Te},{f} \rangle -\gamma \langle {e},{f} \rangle }{2}f \otimes e \end{aligned}$$
$$\begin{aligned} ={}& T - (Te)\otimes e - f\otimes \bigl(T^{*}f\bigr) +\gamma (E+F) + \bigl( \langle {f},{Te} \rangle -\gamma \langle {e},{f} \rangle \bigr)f\otimes e \end{aligned}$$
$$\begin{aligned} ={}& T - TE - FT + \gamma (E+F) + \bigl( \langle {f},{Te} \rangle -\gamma \langle {e},{f} \rangle \bigr)f\otimes e \end{aligned}$$
$$\begin{aligned} ={}&T-TE-FT+\gamma (E+F) +FTE -\gamma FE \end{aligned}$$
$$\begin{aligned} ={}&\gamma \operatorname{Id}+T-TE-FT+FTE-\gamma \operatorname{Id}+ \gamma E+\gamma F-\gamma FE \end{aligned}$$
$$\begin{aligned} ={}&\gamma \operatorname{Id}+ (\operatorname{Id}-F)T( \operatorname{Id}-E)-\gamma (\operatorname{Id}-F) ( \operatorname{Id}-E) \end{aligned}$$
$$\begin{aligned} ={}& \gamma \operatorname{Id}+ (\operatorname{Id}-F) (T-\gamma \operatorname{Id}) (\operatorname{Id}-E) \end{aligned}$$

as claimed. □

We conclude this section with a beautiful projection formula that arises when the last result is specialized even further.

Corollary 2.11

Suppose that \(Y=X\), let \(e\in X\smallsetminus \{0\}\), and set

$$\begin{aligned} E:= \frac{1}{ \Vert e \Vert ^{2}}e\otimes e = P_{ \mathbb{R}e}. \end{aligned}$$


$$\begin{aligned} C:= \bigl\{ {T\in \mathcal{H}} | {Te=e=T^{*} e} \bigr\} \neq \varnothing \end{aligned}$$


$$\begin{aligned} (\forall T\in \mathcal{H})\quad P_{C}(T) = E+(\operatorname{Id}-E)T( \operatorname{Id}-E). \end{aligned}$$


Let \(T\in \mathcal{H}\). Applying Corollary 2.10 with \(f=e\) and \(\gamma =1\), we obtain

$$\begin{aligned} P_{C}(T) &= \operatorname{Id}+ (\operatorname{Id}-E) (T- \operatorname{Id}) (\operatorname{Id}-E) \end{aligned}$$
$$\begin{aligned} &= \operatorname{Id}+(\operatorname{Id}-E)T(\operatorname{Id}-E)-( \operatorname{Id}-E)^{2} \end{aligned}$$
$$\begin{aligned} &= \operatorname{Id}+(\operatorname{Id}-E)T(\operatorname{Id}-E)-( \operatorname{Id}-E) \end{aligned}$$
$$\begin{aligned} &= (\operatorname{Id}-E)T(\operatorname{Id}-E)+E \end{aligned}$$

because \(\operatorname{Id}-E=P_{\{e\}^{\perp }}\) is idempotent. □

3 Rectangular matrices

In this section, we specialize the results of Sect. 2 to

$$\begin{aligned} X=\mathbb{R}^{n} \quad\text{and}\quad Y=\mathbb{R}^{m}, \end{aligned}$$

which gives rise to

$$\begin{aligned} \mathcal{H}= \mathbb{R}^{m\times n}, \end{aligned}$$

the space of real \(m\times n\) matrices. Given u and x in \(\mathbb{R}^{n}\), and v and y in \(\mathbb{R}^{m}\), we have \(v\otimes u = vu^{\intercal }\), \((v\otimes u)x=vu^{\intercal }x = (u^{\intercal }x)v\), and \((v\otimes u)^{*}y = (v^{\intercal }y) u\). Corresponding to (11), we have

$$\begin{aligned} \mathcal{A}\colon \mathbb{R}^{m\times n}\to \mathbb{R}^{m+n} \colon T\mapsto \begin{bmatrix} Te \\ T^{\intercal }f \end{bmatrix} . \end{aligned}$$

The counterpart of (24) reads

$$\begin{aligned} \mathcal{A}^{*}\colon \mathbb{R}^{m+n}\to \mathbb{R}^{m\times n} \colon \begin{bmatrix} y \\ x \end{bmatrix} \mapsto ye^{\intercal }+ fx^{\intercal }. \end{aligned}$$

Translated to the matrix setting, Theorem 2.4 and Theorem 2.5 turn into the following.

Theorem 3.1

Let \(x\in \mathbb{R}^{n}\) and \(y\in \mathbb{R}^{m}\). If \(e\neq 0\) and \(f\neq 0\), then

$$\begin{aligned} \mathcal{A}^{\dagger } \begin{bmatrix} y \\ x \end{bmatrix} = \frac{1}{ \Vert e \Vert ^{2}} \biggl( ye^{\intercal }- \frac{{f}^{\intercal }{{y}}}{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} f e^{\intercal } \biggr) + \frac{1}{ \Vert f \Vert ^{2}} \biggl( f x^{\intercal }- \frac{{e}^{\intercal }{x}}{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} f e^{\intercal } \biggr). \end{aligned}$$


$$\begin{aligned} \mathcal{A}^{\dagger } \begin{bmatrix} y \\ x \end{bmatrix} = \textstyle\begin{cases} \frac{1}{ \Vert f \Vert ^{2}}fx^{\intercal }&\textit{if $e=0$ and $f\neq 0$;} \\ \frac{1}{ \Vert e \Vert ^{2}}ye^{\intercal } &\textit{if $e\neq 0$ and $f=0$;} \\ 0 &\textit{if $e=0$ and $f=0$.} \end{cases}\displaystyle \end{aligned}$$

In turn, Corollary 2.7 now states the following.

Corollary 3.2

Let \(x\in \mathbb{R}^{n}\), let \(y\in \mathbb{R}^{m}\), and let \(T\in \mathbb{R}^{m\times n}\). If \(e\neq 0\) and \(f\neq 0\), then

$$\begin{aligned} P_{\operatorname{ran}\mathcal{A}} \begin{bmatrix} y \\ x\end{bmatrix} = \begin{bmatrix} y \\ x \end{bmatrix} -\frac{f^{\intercal }y - e^{\intercal }x}{ \Vert e \Vert ^{2}+ \Vert f \Vert ^{2}} \begin{bmatrix} f \\ -e \end{bmatrix} \end{aligned}$$


$$\begin{aligned} P_{\operatorname{ran}\mathcal{A}^{*}}(T) = \frac{1}{ \Vert e \Vert ^{2}}Tee^{\intercal }+ \frac{1}{ \Vert f \Vert ^{2}}ff^{\intercal }T - \frac{f^{\intercal }Te}{ \Vert e \Vert ^{2} \Vert f \Vert ^{2}}fe^{\intercal }. \end{aligned}$$


P ran A [ y x ] = { [ 0 x ] if  e = 0  and  f 0 ; [ y 0 ] if  e 0  and  f = 0 ; [ 0 0 ] if  e = 0  and  f = 0 ;


$$\begin{aligned} P_{\operatorname{ran}\mathcal{A}^{*}}(T) = \textstyle\begin{cases} \frac{1}{ \Vert f \Vert ^{2}}ff^{\intercal }T &\textit{if $e=0$ and $f\neq 0$;} \\ \frac{1}{ \Vert e \Vert ^{2}}Tee^{\intercal } &\textit{if $e\neq 0$ and $f= 0$;} \\ 0 &\textit{if $e=0$ and $f= 0$.} \end{cases}\displaystyle \end{aligned}$$

Next, Theorem 2.8 turns into the following result.

Theorem 3.3

Let \(r\in \mathbb{R}^{n}\), let \(s\in \mathbb{R}^{m}\), and set \([\bar{s},\bar{r}]^{\intercal }= P_{\operatorname{ran}\mathcal{A}}[s,r]^{\intercal }\). Then

$$\begin{aligned} C:= \bigl\{ {T\in \mathbb{R}^{m\times n}} | {Te=\bar{s} \textit{ and } T^{\intercal }f = \bar{r}} \bigr\} \neq \varnothing. \end{aligned}$$

Now let \(T\in \mathbb{R}^{m\times n}\). If \(e\neq 0\) and \(f\neq 0\), then

$$\begin{aligned} P_{C}(T) ={}& T - \frac{1}{ \Vert e \Vert ^{2}} \biggl( (Te-s)e^{\intercal }- \frac{{f^{\intercal }}(Te-s)}{ \Vert e \Vert ^{2} + \Vert f \Vert ^{2}}fe^{\intercal } \biggr) \end{aligned}$$
$$\begin{aligned} & {}- \frac{1}{ \Vert f \Vert ^{2}} \biggl( f\bigl(f^{\intercal }T-r^{\intercal } \bigr) - \frac{{e^{\intercal }}{(T^{\intercal }f-r)}}{ \Vert e \Vert ^{2} + \Vert f \Vert ^{2}} fe^{\intercal } \biggr). \end{aligned}$$


$$\begin{aligned} P_{C}(T) = \textstyle\begin{cases} T- \frac{1}{ \Vert f \Vert ^{2}}f(f^{\intercal }T-r^{\intercal }) &\textit{if $e=0$ and $f\neq 0$;} \\ T- \frac{1}{ \Vert e \Vert ^{2}}(Te-s) e^{\intercal } &\textit{if $e\neq 0$ and $f=0$;} \\ T&\textit{if $e=0$ and $f=0$.} \end{cases}\displaystyle \end{aligned}$$

Let us specialize Theorem 3.3 further to the following interesting case.

Corollary 3.4

(Projection onto matrices with prescribed row/column sums)

Suppose that \(e=[1,1,\ldots,1]^{\intercal }\in \mathbb{R}^{n}\) and that \(f=[1,1,\ldots,1]^{\intercal }\in \mathbb{R}^{m}\). Let \(r\in \mathbb{R}^{n}\), let \(s\in \mathbb{R}^{m}\), and set \([\bar{s},\bar{r}]^{\intercal }= P_{\operatorname{ran}\mathcal{A}}[s,r]^{\intercal }\). Then

$$\begin{aligned} C:= \bigl\{ {T\in \mathbb{R}^{m\times n}} | {Te=\bar{s} \textit{ and } T^{\intercal }f = \bar{r}} \bigr\} \neq \varnothing, \end{aligned}$$

and for every \(T\in \mathbb{R}^{m\times n}\),

$$\begin{aligned} P_{C}(T) ={}& T - \frac{1}{n} \biggl( (Te-s)e^{\intercal }- \frac{{f^{\intercal }}(Te-s)}{n+m}fe^{\intercal } \biggr) \end{aligned}$$
$$\begin{aligned} &{} - \frac{1}{m} \biggl( f\bigl(f^{\intercal }T-r^{\intercal } \bigr) - \frac{{e^{\intercal }}{(T^{\intercal }f-r)}}{n + m} fe^{\intercal } \biggr). \end{aligned}$$

Remark 3.5

(Romero; 1990)

Consider Corollary 3.4 and its notation. Assume that \([s,r]^{\intercal }\in \operatorname{ran}\mathcal{A}\), which is equivalent to requiring that \(\langle {e},{r} \rangle = \langle {f},{s} \rangle \) (which is sometimes jokingly called the “Fundamental Theorem of Accounting”). Then one verifies that the entries of the matrix in (78a)–(78b) are given also expressed by

$$\begin{aligned} \bigl(P_{C}(T) \bigr)_{i,j} = T_{i,j} + \frac{s_{i}-(Te)_{i}}{n} + \frac{r_{j}-(T^{\intercal }f)_{j}}{m} + \frac{{f^{\intercal }}{Te}-{e^{\intercal }}{r}}{mn} \end{aligned}$$

for every \(i\in \{1,\ldots,m\}\) and \(j\in \{1,\ldots,n\}\). Formula (79) was proved by Romero (see [11, Corollary 2.1]) who proved this result using Lagrange multipliers and who has even a K-dimensional extension (where (79) corresponds to \(K=2\)). We also refer the reader to [4] for using (79) to compute the projection onto the transportation polytope.

Next, Corollary 2.10 turns into the following result.

Corollary 3.6

(Glunt–Hayden–Reams; 1998 [5, Theorem 2.1])

Suppose that e and f lie in \(\mathbb{R}^{n}\smallsetminus \{0\}\), set

$$\begin{aligned} E:= \frac{1}{ \Vert e \Vert ^{2}}ee^{\intercal } \quad\textit{and}\quad F:= \frac{1}{ \Vert f \Vert ^{2}}ff^{\intercal }, \end{aligned}$$

and let \(\gamma \in \mathbb{R}\). Then

$$\begin{aligned} C:= \bigl\{ {T\in \mathbb{R}^{n\times n}} | {Te=\gamma e \textit{ and } T^{\intercal }f=\gamma f} \bigr\} \neq \varnothing \end{aligned}$$


$$\begin{aligned} (\forall T\in \mathcal{H})\quad P_{C}(T) = \gamma \operatorname{Id}+( \operatorname{Id}-F) (T-\gamma \operatorname{Id}) (\operatorname{Id}-E). \end{aligned}$$

We conclude this section with a particularization of Corollary 2.11 which immediately follows when \(X=Y=\mathbb{R}^{n}\) and thus \(\mathcal{H}= \mathbb{R}^{n\times n}\):

Corollary 3.7

Suppose that \(e\in \mathbb{R}^{n}\smallsetminus \{0\}\), and set

$$\begin{aligned} E:= \frac{1}{ \Vert e \Vert ^{2}}ee^{\intercal }. \end{aligned}$$


$$\begin{aligned} C:= \bigl\{ {T\in \mathbb{R}^{n\times n}} | {Te=e=T^{\intercal }e} \bigr\} \neq \varnothing \end{aligned}$$


$$\begin{aligned} \bigl(\forall T\in \mathbb{R}^{n\times n}\bigr)\quad P_{C}(T) = E+( \operatorname{Id}-E)T(\operatorname{Id}-E). \end{aligned}$$

Example 3.8

(Projection formula for generalized bistochastic matrices; 1998 (See [8, Theorem on page 566] and [5, Corollary 2.1].))


$$\begin{aligned} u:=[1,1,\ldots,1]^{\intercal }\in \mathbb{R}^{n},\qquad C:= \bigl\{ {T \in \mathbb{R}^{n\times n}} | {Tu=u=T^{\intercal }u} \bigr\} \quad \text{and}\quad J:=(1/n)uu^{\intercal }. \end{aligned}$$


$$\begin{aligned} \bigl(\forall T\in \mathbb{R}^{n\times n}\bigr)\quad P_{C}(T)=J+( \operatorname{Id}-J)T(\operatorname{Id}-J). \end{aligned}$$


Apply Corollary 3.7 with \(e=u\) for which \(\|e\|^{2}=n\). □

Remark 3.9

For some applications of Example 3.8, we refer the reader to [12] and also to the recent preprint [2].

Remark 3.10

A reviewer pointed out that projection algorithms can also be employed to solve linear programming problems provided a strict complementary condition holds (see Nurminski’s work [9]). This does suggest a possibly interesting future project: explore whether the projections in this paper are useful in solving some linear programming problems on rectangular matrices with prescribed row and column sums.

4 Numerical experiments

We consider the problem of finding a rectangular matrix with prescribed row and column sums as well as some additional constraints on the entries of the matrix. To be specific and inspired by [1], we seek a real matrix of size \(m\times n = 4\times 5\) such that its row and column sums are equal to \(\bar{s}:= \begin{bmatrix} 32,43,33,23\end{bmatrix} ^{\intercal }\) and \(\bar{r}:= \begin{bmatrix} 24,18,37,27,25\end{bmatrix} ^{\intercal }\), respectively. One solution featuring actually nonnegative integers to this problem is given by

$$\begin{aligned} \textstyle\begin{array}{rrrrr|r} 9&4&8&4&7&32 \\ 7&9&15&7&5&43 \\ 3&2&9&10&9&33 \\ 5&3&5&6&4&23 \\ \hline 24&18&37&27&25&131 \end{array}\displaystyle \end{aligned}$$

Adopting the notation of Corollary 3.4, we see that the set

$$\begin{aligned} B:= \bigl\{ {T\in \mathbb{R}^{4\times 5}} | {Te =\bar{s} \text{ and } T^{\intercal }f = \bar{r}} \bigr\} \neq \varnothing \end{aligned}$$

is an affine subspace of \(\mathbb{R}^{4\times 5}\) and that an explicit formula for \(P_{B}\) is available through Corollary 3.4. Next, we define the closed convex “hyper box”

A:= i { 1 , 2 , 3 , 4 } j { 1 , 2 , 3 , 4 , 5 } [ 0 , min { s ¯ i , r ¯ j } ] .

For instance, the \((1,3)\)-entry of any nonnegative integer solution must lie between 0 and \(32=\min \{32,37\}\); thus \(A_{1,3} = [0,32]\). The projection of a real number ξ onto the interval \([0,\min ({\bar{s}}_{i},{\bar{r}}_{j})]\) is given by \(\max \{0,\min \{{\bar{s}}_{i},{\bar{r}}_{j},\xi \}\}\). Because A is the Cartesian product of such intervals, the projection operator \(P_{A}\) is nothing but the corresponding product of interval projection operators.

Our problem is thus to

$$\begin{aligned} \text{find a matrix $T$ in $A\cap B$.} \end{aligned}$$

We shall tackle (90) with three well-known algorithms: Douglas–Rachford (DR), method of alternating projections (MAP), and Dykstra (Dyk). Here is a quick review of how these methods operate for a given starting matrix \(T_{0}\in \mathbb{R}^{4\times 5}\) and a current matrix \(T_{k}\in \mathbb{R}^{4\times 5}\).

DR updates via

$$\begin{aligned} T_{k+1}:= T_{k} - P_{A}(T_{k}) + P_{B}\bigl(2P_{A}(T_{k})-T_{k} \bigr), \end{aligned}$$

MAP updates via

$$\begin{aligned} T_{k+1}:= P_{B}\bigl(P_{A}(T_{k}) \bigr), \end{aligned}$$

and finally Dyk initializes also \(R_{0}=0\in \mathbb{R}^{4\times 5}\) and updates via

$$\begin{aligned} A_{k+1}:= P_{A}(T_{k}+R_{k}),\qquad R_{k+1}:= T_{k}+R_{k}-A_{k+1},\qquad T_{k+1}:= P_{B}(A_{k+1}). \end{aligned}$$

For all three algorithms, it is known that

$$\begin{aligned} P_{A}(T_{k}) \to \text{some matrix in $A\cap B$;} \end{aligned}$$

in fact, Dyk satisfies even \(P_{A}(T_{k})\to P_{A\cap B}(T_{0})\) (see, e.g., [3, Corollary 28.3, Corollary 5.26, and Theorem 30.7]). Consequently, for each of the three algorithms, we will focus on the sequence

$$\begin{aligned} \bigl(P_{A}(T_{k})\bigr)_{k\in \mathbb{N}}, \end{aligned}$$

which obviously lies in A and which thus prompts the simple feasibility criterion given by

$$\begin{aligned} \delta _{k}:= \bigl\Vert P_{A}(T_{k})-P_{B} \bigl(P_{A}(T_{k})\bigr) \bigr\Vert . \end{aligned}$$

4.1 The convex case

Each algorithm is run for 250 iterations and for \(100,000\) instances of \(T_{0}\) that are produced with entries generated uniformly in \([-100,100]\). The plot of the median value for \(\delta _{k}\) of the iterates is shown in Fig. 1. The shaded region for each line represents the range of values attained at that iteration. We assume an algorithm to have achieved feasibility when \(\delta _{k}=0\). While MAP and DR always achieve feasibility, as can be seen from the range of their values in Fig. 1, DR achieves it the fastest in most cases. To support this, we order these algorithms in Table 1 according to their performance. The first column reports what percent of the instances achieved feasibility in the given order and if any of the algorithms did not converge. So the row labeled “DR<MAP” represents cases where DR achieved feasibility the fastest, MAP was second, and Dyk did not converge. The second column reports what percent of the first feasible matrices obtained were closest to the starting point \(T_{0}\) in the given order. This is done by measuring \(\lVert {T_{0}-T}\rVert\), where \(\lVert {\cdot}\rVert\) is the operator norm, and \(T_{k}\) is the first feasible matrices obtained using a given algorithm (Dyk, DR, or MAP). We consider the algorithms tied, if the distance between the starting point and the estimate for both differs by a value less than or equal to 10−15. As is evident, a majority of the cases have DR in the lead for feasibility. However, the distance of these matrices is not as close as the ones given by MAP and Dyk when feasible. This is consistent with the fact that DR explores regions further away from the starting point to look for matrices, and Dyk is built to achieve the least distance. It is also worth noting that at least one of these algorithms converges in every instance. (Convergence for all three algorithms is guaranteed in theory.)

Figure 1
figure 1

Convergence of iterates with the nonnegative matrix constraint

Table 1 Results for nonnegative matrices

Last but not least, because our problem deals with unscaled row and column sums, we point out that the sought-after projection may also be computed by using the algorithm proposed by Calvillo and Romero [4] which even converges in finitely many steps!

4.2 The nonconvex case

We exactly repeat the experiment of Sect. 4.1 with the only difference being that the (new) set A in this section is the intersection of the (old) set A from the previous section (see (89)) and \(\mathbb{Z}^{4\times 5}\). This enforces nonnegative integer solutions. The projection operator \(P_{A}\) is obtained by simply rounding after application of \(P_{A}\) from Sect. 4.1.

In this nonconvex case, MAP fails to converge in most cases, whereas DR and Dyk converge to solutions as shown in Fig. 2. This is corroborated by Table 2 where the rows where MAP converges correspond to only a quarter of the total cases. Again, DR achieves feasibility the fastest in more than half the cases, but Dykstra’s algorithm gives the solution closest to \(T_{0}\) among these, as shown in the second column of Table 2. In this nonconvex case convergence of any of the algorithms is not guaranteed; in fact, there are several instances when no solution is found. However, in the 105 runs considered, we did end up discovering several distinct solutions (see Table 3). It turned out that all solutions found were distinct even across all three algorithms resulting in 113,622 different nonnegative integer solutions in total.

Figure 2
figure 2

Convergence of iterates with the integer matrix constraint

Table 2 Results for nonnegative integer matrices
Table 3 Integer matrix solutions found by the three algorithms

Availability of data and materials

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.


  1. Aneikei (, Find a matrix with given row and column sums, 2016,

  2. Aragón Artacho, F.J., Campoy, R., Tam, M.K.: Strengthened splitting methods for computing resolvents (2020)

  3. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, Berlin (2017)

    Book  Google Scholar 

  4. Calvillo, G., Romero, D.: On the closest point to the origin in transportation polytopes. Discrete Appl. Math. 210, 88–102 (2016).

    Article  MathSciNet  MATH  Google Scholar 

  5. Glunt, W., Hayden, T.L., Reams, R.: The nearest “doubly stochastic” matrix to a real matrix with the same first moment. Numer. Linear Algebra Appl. 5), 475–482 (1998

    Article  MathSciNet  Google Scholar 

  6. Groetsch, C.W.: Generalized Inverses of Linear Operators. Dekker, New York (1977)

    MATH  Google Scholar 

  7. Kadison, R.V., Ringrose, J.R.: Fundamentals of the Theory of Operator Algebras I: Elementary Theory. Am. Math. Soc., Providence (1997)

    Book  Google Scholar 

  8. Khoury, R.N.: Closest matrices in the space of generalized doubly stochastic matrices. J. Math. Anal. Appl. 222, 561–568 (1998).

    Article  MathSciNet  MATH  Google Scholar 

  9. Nurminski, E.A.: Single-projection procedure for linear optimization. J. Glob. Optim. 66, 95–110 (2016).

    Article  MathSciNet  MATH  Google Scholar 

  10. Reed, M., Simon, B.: Methods of Modern Mathematical Physics I: Functional Analysis, revised and enlarged edn. Academic Press, San Diego (1980)

    MATH  Google Scholar 

  11. Romero, D.: Easy transportation-like problems on K-dimensional arrays. J. Optim. Theory Appl. 66, 137–147 (1990).

    Article  MathSciNet  MATH  Google Scholar 

  12. Takouda, P.L.: Un problème d’approximation matricielle: quelle est la matrice bistochastiqu la plus proche d’une matrice donnée? RAIRO Oper. Res. 39, 35–54 (2005).

    Article  MathSciNet  MATH  Google Scholar 

  13. Wikipedia, Discrete tomography,, retrieved September 13, 2021

Download references


We thank the editor Aviv Gibali, three anonymous reviewers, and Matt Tam for constructive comments and several pointers to literature we were previously unaware of.


The research of HHB and XW was partially supported by Discovery Grants from the Natural Sciences and Engineering Research Council of Canada.

Author information

Authors and Affiliations



All authors contributed equally in writing this article. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Heinz H. Bauschke.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bauschke, H.H., Singh, S. & Wang, X. Projecting onto rectangular matrices with prescribed row and column sums. Fixed Point Theory Algorithms Sci Eng 2021, 23 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: