Personal tools
You are here: Home Documentation User's Guide 8.7 Exploiting Structure
Document Actions

8.7 Exploiting Structure

8.7 Exploiting Structure By default, the functions conelp() and coneqp() exploit no problem structure except (to some limited extent) sparsity. Two mechanisms are provided for implementing customized solvers that take advantage of problem structure.

Providing a function for solving KKT equations.
The most expensive step of each iteration of conelp() or coneqp() is the solution of a set of linear equations (`KKT equations') of the form
\begin{displaymath}
\left[\begin{array}{ccc}
P & A^T & G^T \\
A & 0 & 0 \\
...
... = \left[\begin{array}{c} b_x \\ b_y \\ b_z \end{array}\right]
\end{displaymath} (8.3)

(with P=0 in conelp()). The matrix W depends on the current iterates and is defined as follows. We use the notation of sections 8.1 and 8.2. Suppose

\begin{displaymath}
u = \left(u_\mathrm{l}, \; u_{\mathrm{q},0}, \; \ldots, \; ...
...\mathrm{s},k} \in{\mbox{\bf S}}^{t_k}, \quad k = 0,\ldots,N-1.
\end{displaymath}

Then W is a block-diagonal matrix,

\begin{displaymath}
Wu = \left( W_\mathrm{l} u_\mathrm{l}, \;
W_{\mathrm{q},0}...
...rm{s},N-1} \mathop{\mathbf{vec}}{(u_{\mathrm{s},N-1})} \right)
\end{displaymath}

with the following diagonal blocks.
  • The first block is a positive diagonal scaling with a vector d:

    \begin{displaymath}
W_\mathrm{l} = \mbox{\bf diag}\,(d), \qquad W_\mathrm{l}^{-1} = \mbox{\bf diag}\,(d)^{-1}.
\end{displaymath}

    This transformation is symmetric:

    \begin{displaymath}
W_\mathrm{l}^T = W_\mathrm{l}.
\end{displaymath}

  • The next M blocks are positive multiples of hyperbolic Householder transformations:

    \begin{displaymath}
W_{\mathrm{q},k} = \beta_k ( 2 v_k v_k^T - J),
\qquad
W_{...
...c{1}{\beta_k} ( 2 Jv_k v_k^T J - J),
\qquad k = 0,\ldots,M-1,
\end{displaymath}

    where

    \begin{displaymath}
\beta_k > 0, \qquad v_{k0} > 0, \qquad
v_k^T Jv_k = 1, \q...
...= \left[\begin{array}{cc}
1 & 0 \\ 0 & -I \end{array}\right].
\end{displaymath}

    These transformations are also symmetric:

    \begin{displaymath}
W_{\mathrm{q},k}^T = W_{\mathrm{q},k}.
\end{displaymath}

  • The last N blocks are congruence transformations with nonsingular matrices:

    \begin{displaymath}
W_{\mathrm{s},k} \mathop{\mathbf{vec}}{(u_{\mathrm{s},k})} ...
..._k^{-T} u_{\mathrm{s},k} r_k^{-1})}, \qquad
k = 0,\ldots,N-1.
\end{displaymath}

    In general, this operation is not symmetric:

    \begin{displaymath}
W_{\mathrm{s},k}^T \mathop{\mathbf{vec}}{(u_{\mathrm{s},k})...
...u_{\mathrm{s},k} r_k^{-T})}, \qquad
\qquad
k = 0,\ldots,N-1.
\end{displaymath}

It is often possible to exploit problem structure to solve (8.3) faster than by standard methods. The last argument kktsolver of conelp() and coneqp() allows the user to supply a Python function for solving the KKT equations. This function will be called as "f = kktsolver(W)", where W is a dictionary that contains the parameters of the scaling:

  • W['d'] is the positive vector that defines the diagonal scaling. W['di'] is its componentwise inverse.
  • W['beta'] and W['v'] are lists of length M with the coefficients and vectors that define the hyperbolic Householder transformations.
  • W['r'] is a list of length N with the matrices that define the the congruence transformations. W['rti'] is a list of length N with the transposes of the inverses of the matrices in W['r'].

The function call "f = kktsolver(W)" should return a routine for solving the KKT system (8.3) defined by W. It will be called as "f(bx, by, bz)". On entry, bx, by, bz contain the righthand side. On exit, they should contain the solution of the KKT system, with the last component scaled, i.e., on exit,

\begin{displaymath}
b_x := u_x, \qquad b_y := u_y, \qquad b_z := W u_z.
\end{displaymath}

In other words, the function returns the solution of

\begin{displaymath}
\left[\begin{array}{ccc}
P & A^T & G^TW^{-1} \\
A & 0 & ...
...= \left[\begin{array}{c} b_x \\ b_y \\ b_z \end{array}\right].
\end{displaymath}

Specifying constraints via Python functions.
In the default use of conelp() and coneqp(), the linear constraints and the quadratic term in the objective are parameterized by CVXOPT matrices G, A, P. It is possible to specify these parameters via Python functions that evaluate the corresponding matrix-vector products and their adjoints.

If the argument G of conelp() or coneqp() is a Python function, it should be defined as follows:

G(x, y [, alpha[, beta[, trans]]])


This evaluates the matrix-vector products

\begin{displaymath}
y := \alpha Gx + \beta y \quad
(\mathrm{trans} = \mathrm{'...
...alpha G^T x + \beta y \quad
(\mathrm{trans} = \mathrm{'T'}).
\end{displaymath}

The default values of the optional arguments must be alpha = 1.0, beta = 0.0, trans = 'N'.

Similarly, if the argument A is a Python function, then it must be defined as follows.

A(x, y [, alpha[, beta[, trans]]])


This evaluates the matrix-vector products

\begin{displaymath}
y := \alpha Ax + \beta y \quad (\mathrm{trans} = \mathrm{'N'...
... \alpha A^T x + \beta y \quad (\mathrm{trans} = \mathrm{'T'}).
\end{displaymath}

The default values of the optional arguments must be alpha = 1.0, beta = 0.0, trans = 'N'.

If the argument P of coneqp() is a Python function, then it must be defined as follows:

P(x, y [, alpha[, beta]])


This evaluates the matrix-vector products

\begin{displaymath}
y := \alpha Px + \beta y.
\end{displaymath}

The default values of the optional arguments must be alpha = 1.0, beta = 0.0.

If G, A or P are Python functions, then the argument kktsolver must also be provided.

We illustrate these features with three applications.

Example: 1-norm approximation

The optimization problem

\begin{displaymath}
\begin{array}{ll}
\mbox{minimize} & \Vert Pu-q\Vert _1
\end{array}\end{displaymath}

can be formulated as a linear program

\begin{displaymath}
\begin{array}{ll}
\mbox{minimize} & {\bf 1}^T v \\
\mbox{subject to} & -v \preceq Pu - q \preceq v.
\end{array}\end{displaymath}

By exploiting the structure in the inequalities, the cost of an iteration of an interior-point method can be reduced to the cost of least-squares problem of the same dimensions. (See section 11.8.2 in the book Convex Optimization.) The code belows takes advantage of this fact.

from cvxopt import base, blas, lapack, solvers
from cvxopt.base import matrix, spmatrix, mul, div

def l1(P, q):
"""

Returns the solution u, w of the l1 approximation problem

(primal) minimize ||P*u - q||_1

(dual) maximize q'*w
subject to P'*w = 0
||w||_infty <= 1.
"""

m, n = P.size

# Solve the equivalent LP
#
# minimize [0; 1]' * [u; v]
# subject to [P, -I; -P, -I] * [u; v] <= [q; -q]
#
# maximize -[q; -q]' * z
# subject to [P', -P']*z = 0
# [-I, -I]*z + 1 = 0
# z >= 0.

c = matrix(n*[0.0] + m*[1.0])

def G(x, y, alpha = 1.0, beta = 0.0, trans = 'N'):

if trans=='N':
# y := alpha * [P, -I; -P, -I] * x + beta*y
u = P*x[:n]
y[:m] = alpha * ( u - x[n:]) + beta * y[:m]
y[m:] = alpha * (-u - x[n:]) + beta * y[m:]

else:
# y := alpha * [P', -P'; -I, -I] * x + beta*y
y[:n] = alpha * P.T * (x[:m] - x[m:]) + beta * y[:n]
y[n:] = -alpha * (x[:m] + x[m:]) + beta * y[n:]

h = matrix([q, -q])
dims = {'l': 2*m, 'q': [], 's': []}

def F(W):

"""
Returns a function f(x, y, z) that solves

[ 0 0 P' -P' ] [ x[:n] ] [ bx[:n] ]
[ 0 0 -I -I ] [ x[n:] ] [ bx[n:] ]
[ P -I -D1^{-1} 0 ] [ z[:m] ] = [ bz[:m] ]
[-P -I 0 -D2^{-1} ] [ z[m:] ] [ bz[m:] ]

where D1 = diag(di[:m])^2, D2 = diag(di[m:])^2 and di = W['di'].
"""

# Factor A = 4*P'*D*P where D = d1.*d2 ./(d1+d2) and d1 = di[:m].^2, d2 = di[m:].^2.

di = W['di']
d1, d2 = di[:m]**2, di[m:]**2
D = div( mul(d1,d2), d1+d2 )
A = P.T * spmatrix(4*D, range(m), range(m)) * P
lapack.potrf(A)

def f(x, y, z):

"""
On entry bx, bz are stored in x, z. On exit x, z contain the solution,
with z scaled: z./di is returned instead of z.
""""

# Solve for x[:n]:
#
# A*x[:n] = bx[:n] + P' * ( ((D1-D2)*(D1+D2)^{-1})*bx[n:]
# + (2*D1*D2*(D1+D2)^{-1}) * (bz[:m] - bz[m:]) ).

x[:n] += P.T * ( mul(div(d1-d2, d1+d2), x[n:]) + mul(2*D, z[:m]-z[m:]) )
lapack.potrs(A, x)

# x[n:] := (D1+D2)^{-1} * (bx[n:] - D1*bz[:m] - D2*bz[m:] + (D1-D2)*P*x[:n])

u = P*x[:n]
x[n:] = div(x[n:] - mul(d1, z[:m]) - mul(d2, z[m:]) + mul(d1-d2, u), d1+d2)

# z[:m] := d1[:m] .* ( P*x[:n] - x[n:] - bz[:m])
# z[m:] := d2[m:] .* (-P*x[:n] - x[n:] - bz[m:])

z[:m] = mul(d[:m], u - x[n:] - z[:m])
z[m:] = mul(d[m:], -u - x[n:] - z[m:])

return f

sol = solvers.conelp(c, G, h, dims, kktsolver = F)
return sol['x'][:n], sol['z'][m:] - sol['z'][:m]

Example: SDP with diagonal linear term

The SDP

\begin{displaymath}
\begin{array}{ll}
\mbox{minimize} & {\bf 1}^T x \\
\mbox{subject to} & W + \mbox{\bf diag}\,(x) \succeq 0
\end{array}
\end{displaymath}

can be solved efficiently by exploiting properties of the diag operator.

from cvxopt import base, blas, lapack, solvers
from cvxopt.base import matrix

def mcsdp(w):
"""
Returns solution x, z to

(primal) minimize sum(x)
subject to w + diag(x) >= 0

(dual) maximize -tr(w*z)
subject to diag(z) = 1
z >= 0.
"""

n = w.size[0]
c = matrix(1.0, (n,1))

def G(x, y, alpha = 1.0, beta = 0.0, trans = 'N'):
"""
y := alpha*(-diag(x)) + beta*y.
"""

if trans=='N':
# x is a vector; y is a symmetric matrix in column major order.
y *= beta
y[::n+1] -= alpha * x

else:
# x is a symmetric matrix in column major order; y is a vector.
y *= beta
y -= alpha * x[::n+1]


def cngrnc(r, x, alpha = 1.0):
"""
Congruence transformation

x := alpha * r'*x*r.

r and x are square matrices.
"""

# Scale diagonal of x by 1/2.
x[::n+1] *= 0.5

# a := tril(x)*r
a = +r
tx = matrix(x, (n,n))
blas.trmm(tx, a, side = 'L')

# x := alpha*(a*r' + r*a')
blas.syr2k(r, a, tx, trans = 'T', alpha = alpha)
x[:] = tx[:]

dims = {'l': 0, 'q': [], 's': [n]}

def F(W):
"""
Returns a function f(x, y, z) that solves

-diag(z) = bx
-diag(x) - r*r'*z*r*r' = bz

where r = W['r'][0] = W['rti'][0]^{-T}.
"""

rti = W['rti'][0]

# t = rti*rti' as a nonsymmetric matrix.
t = matrix(0.0, (n,n))
blas.gemm(rti, rti, t, transB = 'T')

# Cholesky factorization of tsq = t.*t.
tsq = t**2
lapack.potrf(tsq)

def f(x, y, z):
"""
On entry, x contains bx, y is empty, and z contains bz stored
in column major order.
On exit, they contain the solution, with z scaled
(vec(r'*z*r) is returned instead of z).

We first solve

((rti*rti') .* (rti*rti')) * x = bx - diag(t*bz*t)

and take z = - rti' * (diag(x) + bz) * rti.
"""

# tbst := t * bz * t
tbst = +z
cngrnc(t, tbst)

# x := x - diag(tbst) = bx - diag(rti*rti' * bz * rti*rti')
x -= tbst[::n+1]

# x := (t.*t)^{-1} * x = (t.*t)^{-1} * (bx - diag(t*bz*t))
lapack.potrs(tsq, x)

# z := z + diag(x) = bz + diag(x)
z[::n+1] += x

# z := -vec(rti' * z * rti)
# = -vec(rti' * (diag(x) + bz) * rti
cngrnc(rti, z, alpha = -1.0)

return f

sol = solvers.conelp(c, G, w[:], dims, kktsolver = F)
return sol['x'], sol['z']

Example: Minimizing 1-norm subject to a 2-norm constraint
In the second example, we use a similar trick to solve the problem

\begin{displaymath}
\begin{array}{ll}
\mbox{minimize} & \Vert u\Vert _1 \\
\mbox{subject to} & \Vert Au - b\Vert _2 \leq 1.
\end{array}\end{displaymath}

The code below is efficient, if we assume that the number of rows in A is greater than or equal to the number of columns.

def qcl1(A, b):
"""
Returns the solution u, z of

(primal) minimize || u ||_1
subject to || A * u - b ||_2 <= 1

(dual) maximize b^T z - ||z||_2
subject to || A'*z ||_inf <= 1.

Exploits structure, assuming A is m by n with m >= n.
"""

m, n = A.size

# Solve equivalent cone LP with variables x = [u; v].
#
# minimize [0; 1]' * x
# subject to [ I -I ] * x <= [ 0 ] (componentwise)
# [-I -I ] * x <= [ 0 ] (componentwise)
# [ 0 0 ] * x <= [ 1 ] (SOC)
# [-A 0 ] [ -b ]
#
# maximize -t + b' * w
# subject to z1 - z2 = A'*w
# z1 + z2 = 1
# z1 >= 0, z2 >=0, ||w||_2 <= t.

c = matrix(n*[0.0] + n*[1.0])
h = matrix( 0.0, (2*n + m + 1, 1))
h[2*n] = 1.0
h[2*n+1:] = -b

def G(x, y, alpha = 1.0, beta = 0.0, trans = 'N'):
y *= beta
if trans=='N':
# y += alpha * G * x
y[:n] += alpha * (x[:n] - x[n:2*n])
y[n:2*n] += alpha * (-x[:n] - x[n:2*n])
y[2*n+1:] -= alpha * A*x[:n]

else:
# y += alpha * G'*x
y[:n] += alpha * (x[:n] - x[n:2*n] - A.T * x[-m:])
y[n:] -= alpha * (x[:n] + x[n:2*n])


def Fkkt(W):
"""
Returns a function f(x, y, z) that solves

[ 0 G' ] [ x ] = [ bx ]
[ G -W'*W ] [ z ] [ bz ].
"""

# First factor
#
# S = G' * W**-1 * W**-T * G
# = [0; -A]' * W3^-2 * [0; -A] + 4 * (W1**2 + W2**2)**-1
#
# where
#
# W1 = diag(d1) with d1 = W['d'][:n] = 1 ./ W['di'][:n]
# W2 = diag(d2) with d2 = W['d'][n:] = 1 ./ W['di'][n:]
# W3 = beta * (2*v*v' - J), W3^-1 = 1/beta * (2*J*v*v'*J - J)
# with beta = W['beta'][0], v = W['v'][0], J = [1, 0; 0, -I].

# As = W3^-1 * [ 0 ; -A ] = 1/beta * ( 2*J*v * v' - I ) * [0; A]
beta, v = W['beta'][0], W['v'][0]
As = 2 * v * (v[1:].T * A)
As[1:,:] *= -1.0
As[1:,:] -= A
As /= beta

# S = As'*As + 4 * (W1**2 + W2**2)**-1
S = As.T * As
d1, d2 = W['d'][:n], W['d'][n:]
d = 4.0 * (d1**2 + d2**2)**-1
S[::n+1] += d
lapack.potrf(S)

def f(x, y, z):

# z := - W**-T * z
z[:n] = -div( z[:n], d1 )
z[n:2*n] = -div( z[n:2*n], d2 )
z[2*n:] -= 2.0*v*( v[0]*z[2*n] - blas.dot(v[1:], z[2*n+1:]) )
z[2*n+1:] *= -1.0
z[2*n:] /= beta

# x := x - G' * W**-1 * z
x[:n] -= div(z[:n], d1) - div(z[n:2*n], d2) + As.T * z[-(m+1):]
x[n:] += div(z[:n], d1) + div(z[n:2*n], d2)

# Solve for x[:n]:
#
# S*x[:n] = x[:n] - (W1**2 - W2**2)(W1**2 + W2**2)^-1 * x[n:]

x[:n] -= mul( div(d1**2 - d2**2, d1**2 + d2**2), x[n:])
lapack.potrs(S, x)

# Solve for x[n:]:
#
# (d1**-2 + d2**-2) * x[n:] = x[n:] + (d1**-2 - d2**-2)*x[:n]

x[n:] += mul( d1**-2 - d2**-2, x[:n])
x[n:] = div( x[n:], d1**-2 + d2**-2)

# z := z + W^-T * G*x
z[:n] += div( x[:n] - x[n:2*n], d1)
z[n:2*n] += div( -x[:n] - x[n:2*n], d2)
z[2*n:] += As*x[:n]

return f

dims = {'l': 2*n, 'q': [m+1], 's': []}
sol = solvers.conelp(c, G, h, dims, kktsolver = Fkkt)
if sol['status'] == 'optimal':
return sol['x'][:n], sol['z'][-m:]
else:
return None, None

Example: 1-norm regularized least-squares
As an example that illustrates how structure can be exploited in coneqp(), we consider the 1-norm regularized least-squares problem

\begin{displaymath}
\begin{array}{ll}
\mbox{minimize} & \Vert Ax - y\Vert _2^2 + \Vert x\Vert _1
\end{array}\end{displaymath}

with variable x. The problem is equivalent to the quadratic program

\begin{displaymath}
\begin{array}{ll}
\mbox{minimize} & \Vert Ax - y\Vert _2^2...
... u \\
\mbox{subject to} & -u \preceq x \preceq u
\end{array}\end{displaymath}

with variables x and u. The implementation below is efficient when A has many more columns than rows.

from cvxopt.base import matrix, spdiag, mul, div
from cvxopt import base, blas, lapack, solvers
import math

def l1regls(A, y):
"""

Returns the solution of l1-norm regularized least-squares problem

minimize || A*x - y ||_2^2 + || x ||_1.

"""

m, n = A.size
q = matrix(1.0, (2*n,1))
q[:n] = -2.0 * A.T * y

def P(u, v, alpha = 1.0, beta = 0.0 ):
"""
v := alpha * 2.0 * [ A'*A, 0; 0, 0 ] * u + beta * v
"""
v *= beta
v[:n] += alpha * 2.0 * A.T * (A * u[:n])


def G(u, v, alpha=1.0, beta=0.0, trans='N'):
"""
v := alpha*[I, -I; -I, -I] * u + beta * v (trans = 'N' or 'T')
"""

v *= beta
v[:n] += alpha*(u[:n] - u[n:])
v[n:] += alpha*(-u[:n] - u[n:])

h = matrix(0.0, (2*n,1))


# Customized solver for the KKT system
#
# [ 2.0*A'*A 0 I -I ] [x[:n] ] [bx[:n] ]
# [ 0 0 -I -I ] [x[n:] ] = [bx[n:] ].
# [ I -I -D1^-1 0 ] [zl[:n]] [bzl[:n]]
# [ -I -I 0 -D2^-1 ] [zl[n:]] [bzl[n:]]
#
# where D1 = W['di'][:n]**2, D2 = W['di'][:n]**2.
#
# We first eliminate zl and x[n:]:
#
# ( 2*A'*A + 4*D1*D2*(D1+D2)^-1 ) * x[:n] =
# bx[:n] - (D2-D1)*(D1+D2)^-1 * bx[n:] +
# D1 * ( I + (D2-D1)*(D1+D2)^-1 ) * bzl[:n] -
# D2 * ( I - (D2-D1)*(D1+D2)^-1 ) * bzl[n:]
#
# x[n:] = (D1+D2)^-1 * ( bx[n:] - D1*bzl[:n] - D2*bzl[n:] )
# - (D2-D1)*(D1+D2)^-1 * x[:n]
#
# zl[:n] = D1 * ( x[:n] - x[n:] - bzl[:n] )
# zl[n:] = D2 * (-x[:n] - x[n:] - bzl[n:] ).
#
# The first equation has the form
#
# (A'*A + D)*x[:n] = rhs
#
# and is equivalent to
#
# [ D A' ] [ x:n] ] = [ rhs ]
# [ A -I ] [ v ] [ 0 ].
#
# It can be solved as
#
# ( A*D^-1*A' + I ) * v = A * D^-1 * rhs
# x[:n] = D^-1 * ( rhs - A'*v ).

S = matrix(0.0, (m,m))
Asc = matrix(0.0, (m,n))
v = matrix(0.0, (m,1))

def Fkkt(W):

# Factor
#
# S = A*D^-1*A' + I
#
# where D = 2*D1*D2*(D1+D2)^-1, D1 = d[:n]**-2, D2 = d[n:]**-2.

d1, d2 = W['di'][:n]**2, W['di'][n:]**2

# ds is square root of diagonal of D
ds = math.sqrt(2.0) * div( mul( W['di'][:n], W['di'][n:]),
base.sqrt(d1+d2) )
d3 = div(d2 - d1, d1 + d2)

# Asc = A*diag(d)^-1/2
Asc = A * spdiag(ds**-1)

# S = I + A * D^-1 * A'
blas.syrk(Asc, S)
S[::m+1] += 1.0
lapack.potrf(S)

def g(x, y, z):

x[:n] = 0.5 * ( x[:n] - mul(d3, x[n:]) +
mul(d1, z[:n] + mul(d3, z[:n])) - mul(d2, z[n:] -
mul(d3, z[n:])) )
x[:n] = div( x[:n], ds)

# Solve
#
# S * v = 0.5 * A * D^-1 * ( bx[:n] -
# (D2-D1)*(D1+D2)^-1 * bx[n:] +
# D1 * ( I + (D2-D1)*(D1+D2)^-1 ) * bzl[:n] -
# D2 * ( I - (D2-D1)*(D1+D2)^-1 ) * bzl[n:] )

blas.gemv(Asc, x, v)
lapack.potrs(S, v)

# x[:n] = D^-1 * ( rhs - A'*v ).
blas.gemv(Asc, v, x, alpha=-1.0, beta=1.0, trans='T')
x[:n] = div(x[:n], ds)

# x[n:] = (D1+D2)^-1 * ( bx[n:] - D1*bzl[:n] - D2*bzl[n:] )
# - (D2-D1)*(D1+D2)^-1 * x[:n]
x[n:] = div( x[n:] - mul(d1, z[:n]) - mul(d2, z[n:]), d1+d2 )\
- mul( d3, x[:n] )

# zl[:n] = D1^1/2 * ( x[:n] - x[n:] - bzl[:n] )
# zl[n:] = D2^1/2 * ( -x[:n] - x[n:] - bzl[n:] ).
z[:n] = mul( W['di'][:n], x[:n] - x[n:] - z[:n] )
z[n:] = mul( W['di'][n:], -x[:n] - x[n:] - z[n:] )

return g

return solvers.coneqp(P, q, G, h, kktsolver = Fkkt)['x'][:n]

 

Powered by Plone CMS, the Open Source Content Management System