8.7 Exploiting Structure
- Providing a function for solving KKT equations.
- The most expensive step of each iteration of
conelp() or coneqp() is the
solution of a set of linear equations (`KKT equations') of the form
(with P=0 in conelp()). The matrix W depends on the current iterates and is defined as follows. We use the notation of sections 8.1 and 8.2. Suppose
Then W is a block-diagonal matrix,
with the following diagonal blocks.- The first block is a positive diagonal scaling with a
vector d:
This transformation is symmetric:
- The next M blocks are positive multiples of hyperbolic
Householder transformations:
where
These transformations are also symmetric:
- The last N blocks are congruence transformations with
nonsingular matrices:
In general, this operation is not symmetric:
- W['d'] is the positive vector that defines the diagonal scaling. W['di'] is its componentwise inverse.
- W['beta'] and W['v'] are lists of length M with the coefficients and vectors that define the hyperbolic Householder transformations.
- W['r'] is a list of length N with the matrices that define the the congruence transformations. W['rti'] is a list of length N with the transposes of the inverses of the matrices in W['r'].
The function call "f = kktsolver(W)" should return a routine for solving the KKT system (8.3) defined by W. It will be called as "f(bx, by, bz)". On entry, bx, by, bz contain the righthand side. On exit, they should contain the solution of the KKT system, with the last component scaled, i.e., on exit,
In other words, the function returns the solution of
- The first block is a positive diagonal scaling with a
vector d:
- Specifying constraints via Python functions.
- In the default use of conelp() and coneqp(), the
linear constraints and the quadratic term in the objective are
parameterized by CVXOPT matrices G, A, P. It is
possible to specify these parameters via Python functions that evaluate
the corresponding matrix-vector products and their adjoints.
If the argument G of conelp() or coneqp() is a Python function, it should be defined as follows:
G(x, y [, alpha[, beta[, trans]]])
-
- This evaluates the matrix-vector products
The default values of the optional arguments must be alpha = 1.0, beta = 0.0, trans = 'N'.
Similarly, if the argument A is a Python function, then it must be defined as follows.
A(x, y [, alpha[, beta[, trans]]])
-
- This evaluates the matrix-vector products
The default values of the optional arguments must be alpha = 1.0, beta = 0.0, trans = 'N'.
If the argument P of coneqp() is a Python function, then it must be defined as follows:
P(x, y [, alpha[, beta]])
-
- This evaluates the matrix-vector products
The default values of the optional arguments must be alpha = 1.0, beta = 0.0.
If G, A or P are Python functions, then the argument kktsolver must also be provided.
-
We illustrate these features with three applications.
- Example: 1-norm approximation
The optimization problem
can be formulated as a linear program
By exploiting the structure in the inequalities, the cost of an iteration of an interior-point method can be reduced to the cost of least-squares problem of the same dimensions. (See section 11.8.2 in the book Convex Optimization.) The code belows takes advantage of this fact.from cvxopt import base, blas, lapack, solvers
from cvxopt.base import matrix, spmatrix, mul, div
def l1(P, q):
"""
Returns the solution u, w of the l1 approximation problem
(primal) minimize ||P*u - q||_1
(dual) maximize q'*w
subject to P'*w = 0
||w||_infty <= 1.
"""
m, n = P.size
# Solve the equivalent LP
#
# minimize [0; 1]' * [u; v]
# subject to [P, -I; -P, -I] * [u; v] <= [q; -q]
#
# maximize -[q; -q]' * z
# subject to [P', -P']*z = 0
# [-I, -I]*z + 1 = 0
# z >= 0.
c = matrix(n*[0.0] + m*[1.0])
def G(x, y, alpha = 1.0, beta = 0.0, trans = 'N'):
if trans=='N':
# y := alpha * [P, -I; -P, -I] * x + beta*y
u = P*x[:n]
y[:m] = alpha * ( u - x[n:]) + beta * y[:m]
y[m:] = alpha * (-u - x[n:]) + beta * y[m:]
else:
# y := alpha * [P', -P'; -I, -I] * x + beta*y
y[:n] = alpha * P.T * (x[:m] - x[m:]) + beta * y[:n]
y[n:] = -alpha * (x[:m] + x[m:]) + beta * y[n:]
h = matrix([q, -q])
dims = {'l': 2*m, 'q': [], 's': []}
def F(W):
"""
Returns a function f(x, y, z) that solves
[ 0 0 P' -P' ] [ x[:n] ] [ bx[:n] ]
[ 0 0 -I -I ] [ x[n:] ] [ bx[n:] ]
[ P -I -D1^{-1} 0 ] [ z[:m] ] = [ bz[:m] ]
[-P -I 0 -D2^{-1} ] [ z[m:] ] [ bz[m:] ]
where D1 = diag(di[:m])^2, D2 = diag(di[m:])^2 and di = W['di'].
"""
# Factor A = 4*P'*D*P where D = d1.*d2 ./(d1+d2) and d1 = di[:m].^2, d2 = di[m:].^2.
di = W['di']
d1, d2 = di[:m]**2, di[m:]**2
D = div( mul(d1,d2), d1+d2 )
A = P.T * spmatrix(4*D, range(m), range(m)) * P
lapack.potrf(A)
def f(x, y, z):
"""
On entry bx, bz are stored in x, z. On exit x, z contain the solution,
with z scaled: z./di is returned instead of z.
""""
# Solve for x[:n]:
#
# A*x[:n] = bx[:n] + P' * ( ((D1-D2)*(D1+D2)^{-1})*bx[n:]
# + (2*D1*D2*(D1+D2)^{-1}) * (bz[:m] - bz[m:]) ).
x[:n] += P.T * ( mul(div(d1-d2, d1+d2), x[n:]) + mul(2*D, z[:m]-z[m:]) )
lapack.potrs(A, x)
# x[n:] := (D1+D2)^{-1} * (bx[n:] - D1*bz[:m] - D2*bz[m:] + (D1-D2)*P*x[:n])
u = P*x[:n]
x[n:] = div(x[n:] - mul(d1, z[:m]) - mul(d2, z[m:]) + mul(d1-d2, u), d1+d2)
# z[:m] := d1[:m] .* ( P*x[:n] - x[n:] - bz[:m])
# z[m:] := d2[m:] .* (-P*x[:n] - x[n:] - bz[m:])
z[:m] = mul(d[:m], u - x[n:] - z[:m])
z[m:] = mul(d[m:], -u - x[n:] - z[m:])
return f
sol = solvers.conelp(c, G, h, dims, kktsolver = F)
return sol['x'][:n], sol['z'][m:] - sol['z'][:m]- Example: SDP with diagonal linear term
The SDP
can be solved efficiently by exploiting properties of the diag operator.from cvxopt import base, blas, lapack, solvers
from cvxopt.base import matrix
def mcsdp(w):
"""
Returns solution x, z to
(primal) minimize sum(x)
subject to w + diag(x) >= 0
(dual) maximize -tr(w*z)
subject to diag(z) = 1
z >= 0.
"""
n = w.size[0]
c = matrix(1.0, (n,1))
def G(x, y, alpha = 1.0, beta = 0.0, trans = 'N'):
"""
y := alpha*(-diag(x)) + beta*y.
"""
if trans=='N':
# x is a vector; y is a symmetric matrix in column major order.
y *= beta
y[::n+1] -= alpha * x
else:
# x is a symmetric matrix in column major order; y is a vector.
y *= beta
y -= alpha * x[::n+1]
def cngrnc(r, x, alpha = 1.0):
"""
Congruence transformation
x := alpha * r'*x*r.
r and x are square matrices.
"""
# Scale diagonal of x by 1/2.
x[::n+1] *= 0.5
# a := tril(x)*r
a = +r
tx = matrix(x, (n,n))
blas.trmm(tx, a, side = 'L')
# x := alpha*(a*r' + r*a')
blas.syr2k(r, a, tx, trans = 'T', alpha = alpha)
x[:] = tx[:]
dims = {'l': 0, 'q': [], 's': [n]}
def F(W):
"""
Returns a function f(x, y, z) that solves
-diag(z) = bx
-diag(x) - r*r'*z*r*r' = bz
where r = W['r'][0] = W['rti'][0]^{-T}.
"""
rti = W['rti'][0]
# t = rti*rti' as a nonsymmetric matrix.
t = matrix(0.0, (n,n))
blas.gemm(rti, rti, t, transB = 'T')
# Cholesky factorization of tsq = t.*t.
tsq = t**2
lapack.potrf(tsq)
def f(x, y, z):
"""
On entry, x contains bx, y is empty, and z contains bz stored
in column major order.
On exit, they contain the solution, with z scaled
(vec(r'*z*r) is returned instead of z).
We first solve
((rti*rti') .* (rti*rti')) * x = bx - diag(t*bz*t)
and take z = - rti' * (diag(x) + bz) * rti.
"""
# tbst := t * bz * t
tbst = +z
cngrnc(t, tbst)
# x := x - diag(tbst) = bx - diag(rti*rti' * bz * rti*rti')
x -= tbst[::n+1]
# x := (t.*t)^{-1} * x = (t.*t)^{-1} * (bx - diag(t*bz*t))
lapack.potrs(tsq, x)
# z := z + diag(x) = bz + diag(x)
z[::n+1] += x
# z := -vec(rti' * z * rti)
# = -vec(rti' * (diag(x) + bz) * rti
cngrnc(rti, z, alpha = -1.0)
return f
sol = solvers.conelp(c, G, w[:], dims, kktsolver = F)
return sol['x'], sol['z']- Example: Minimizing 1-norm subject to a 2-norm constraint
- In the second example, we use a similar trick to solve the problem
The code below is efficient, if we assume that the number of rows in A is greater than or equal to the number of columns.def qcl1(A, b):
"""
Returns the solution u, z of
(primal) minimize || u ||_1
subject to || A * u - b ||_2 <= 1
(dual) maximize b^T z - ||z||_2
subject to || A'*z ||_inf <= 1.
Exploits structure, assuming A is m by n with m >= n.
"""
m, n = A.size
# Solve equivalent cone LP with variables x = [u; v].
#
# minimize [0; 1]' * x
# subject to [ I -I ] * x <= [ 0 ] (componentwise)
# [-I -I ] * x <= [ 0 ] (componentwise)
# [ 0 0 ] * x <= [ 1 ] (SOC)
# [-A 0 ] [ -b ]
#
# maximize -t + b' * w
# subject to z1 - z2 = A'*w
# z1 + z2 = 1
# z1 >= 0, z2 >=0, ||w||_2 <= t.
c = matrix(n*[0.0] + n*[1.0])
h = matrix( 0.0, (2*n + m + 1, 1))
h[2*n] = 1.0
h[2*n+1:] = -b
def G(x, y, alpha = 1.0, beta = 0.0, trans = 'N'):
y *= beta
if trans=='N':
# y += alpha * G * x
y[:n] += alpha * (x[:n] - x[n:2*n])
y[n:2*n] += alpha * (-x[:n] - x[n:2*n])
y[2*n+1:] -= alpha * A*x[:n]
else:
# y += alpha * G'*x
y[:n] += alpha * (x[:n] - x[n:2*n] - A.T * x[-m:])
y[n:] -= alpha * (x[:n] + x[n:2*n])
def Fkkt(W):
"""
Returns a function f(x, y, z) that solves
[ 0 G' ] [ x ] = [ bx ]
[ G -W'*W ] [ z ] [ bz ].
"""
# First factor
#
# S = G' * W**-1 * W**-T * G
# = [0; -A]' * W3^-2 * [0; -A] + 4 * (W1**2 + W2**2)**-1
#
# where
#
# W1 = diag(d1) with d1 = W['d'][:n] = 1 ./ W['di'][:n]
# W2 = diag(d2) with d2 = W['d'][n:] = 1 ./ W['di'][n:]
# W3 = beta * (2*v*v' - J), W3^-1 = 1/beta * (2*J*v*v'*J - J)
# with beta = W['beta'][0], v = W['v'][0], J = [1, 0; 0, -I].
# As = W3^-1 * [ 0 ; -A ] = 1/beta * ( 2*J*v * v' - I ) * [0; A]
beta, v = W['beta'][0], W['v'][0]
As = 2 * v * (v[1:].T * A)
As[1:,:] *= -1.0
As[1:,:] -= A
As /= beta
# S = As'*As + 4 * (W1**2 + W2**2)**-1
S = As.T * As
d1, d2 = W['d'][:n], W['d'][n:]
d = 4.0 * (d1**2 + d2**2)**-1
S[::n+1] += d
lapack.potrf(S)
def f(x, y, z):
# z := - W**-T * z
z[:n] = -div( z[:n], d1 )
z[n:2*n] = -div( z[n:2*n], d2 )
z[2*n:] -= 2.0*v*( v[0]*z[2*n] - blas.dot(v[1:], z[2*n+1:]) )
z[2*n+1:] *= -1.0
z[2*n:] /= beta
# x := x - G' * W**-1 * z
x[:n] -= div(z[:n], d1) - div(z[n:2*n], d2) + As.T * z[-(m+1):]
x[n:] += div(z[:n], d1) + div(z[n:2*n], d2)
# Solve for x[:n]:
#
# S*x[:n] = x[:n] - (W1**2 - W2**2)(W1**2 + W2**2)^-1 * x[n:]
x[:n] -= mul( div(d1**2 - d2**2, d1**2 + d2**2), x[n:])
lapack.potrs(S, x)
# Solve for x[n:]:
#
# (d1**-2 + d2**-2) * x[n:] = x[n:] + (d1**-2 - d2**-2)*x[:n]
x[n:] += mul( d1**-2 - d2**-2, x[:n])
x[n:] = div( x[n:], d1**-2 + d2**-2)
# z := z + W^-T * G*x
z[:n] += div( x[:n] - x[n:2*n], d1)
z[n:2*n] += div( -x[:n] - x[n:2*n], d2)
z[2*n:] += As*x[:n]
return f
dims = {'l': 2*n, 'q': [m+1], 's': []}
sol = solvers.conelp(c, G, h, dims, kktsolver = Fkkt)
if sol['status'] == 'optimal':
return sol['x'][:n], sol['z'][-m:]
else:
return None, None - Example: 1-norm regularized least-squares
- As an example that illustrates how structure can be exploited
in coneqp(), we consider the 1-norm regularized
least-squares problem
with variable x. The problem is equivalent to the quadratic program
with variables x and u. The implementation below is efficient when A has many more columns than rows.from cvxopt.base import matrix, spdiag, mul, div
from cvxopt import base, blas, lapack, solvers
import math
def l1regls(A, y):
"""
Returns the solution of l1-norm regularized least-squares problem
minimize || A*x - y ||_2^2 + || x ||_1.
"""
m, n = A.size
q = matrix(1.0, (2*n,1))
q[:n] = -2.0 * A.T * y
def P(u, v, alpha = 1.0, beta = 0.0 ):
"""
v := alpha * 2.0 * [ A'*A, 0; 0, 0 ] * u + beta * v
"""
v *= beta
v[:n] += alpha * 2.0 * A.T * (A * u[:n])
def G(u, v, alpha=1.0, beta=0.0, trans='N'):
"""
v := alpha*[I, -I; -I, -I] * u + beta * v (trans = 'N' or 'T')
"""
v *= beta
v[:n] += alpha*(u[:n] - u[n:])
v[n:] += alpha*(-u[:n] - u[n:])
h = matrix(0.0, (2*n,1))
# Customized solver for the KKT system
#
# [ 2.0*A'*A 0 I -I ] [x[:n] ] [bx[:n] ]
# [ 0 0 -I -I ] [x[n:] ] = [bx[n:] ].
# [ I -I -D1^-1 0 ] [zl[:n]] [bzl[:n]]
# [ -I -I 0 -D2^-1 ] [zl[n:]] [bzl[n:]]
#
# where D1 = W['di'][:n]**2, D2 = W['di'][:n]**2.
#
# We first eliminate zl and x[n:]:
#
# ( 2*A'*A + 4*D1*D2*(D1+D2)^-1 ) * x[:n] =
# bx[:n] - (D2-D1)*(D1+D2)^-1 * bx[n:] +
# D1 * ( I + (D2-D1)*(D1+D2)^-1 ) * bzl[:n] -
# D2 * ( I - (D2-D1)*(D1+D2)^-1 ) * bzl[n:]
#
# x[n:] = (D1+D2)^-1 * ( bx[n:] - D1*bzl[:n] - D2*bzl[n:] )
# - (D2-D1)*(D1+D2)^-1 * x[:n]
#
# zl[:n] = D1 * ( x[:n] - x[n:] - bzl[:n] )
# zl[n:] = D2 * (-x[:n] - x[n:] - bzl[n:] ).
#
# The first equation has the form
#
# (A'*A + D)*x[:n] = rhs
#
# and is equivalent to
#
# [ D A' ] [ x:n] ] = [ rhs ]
# [ A -I ] [ v ] [ 0 ].
#
# It can be solved as
#
# ( A*D^-1*A' + I ) * v = A * D^-1 * rhs
# x[:n] = D^-1 * ( rhs - A'*v ).
S = matrix(0.0, (m,m))
Asc = matrix(0.0, (m,n))
v = matrix(0.0, (m,1))
def Fkkt(W):
# Factor
#
# S = A*D^-1*A' + I
#
# where D = 2*D1*D2*(D1+D2)^-1, D1 = d[:n]**-2, D2 = d[n:]**-2.
d1, d2 = W['di'][:n]**2, W['di'][n:]**2
# ds is square root of diagonal of D
ds = math.sqrt(2.0) * div( mul( W['di'][:n], W['di'][n:]),
base.sqrt(d1+d2) )
d3 = div(d2 - d1, d1 + d2)
# Asc = A*diag(d)^-1/2
Asc = A * spdiag(ds**-1)
# S = I + A * D^-1 * A'
blas.syrk(Asc, S)
S[::m+1] += 1.0
lapack.potrf(S)
def g(x, y, z):
x[:n] = 0.5 * ( x[:n] - mul(d3, x[n:]) +
mul(d1, z[:n] + mul(d3, z[:n])) - mul(d2, z[n:] -
mul(d3, z[n:])) )
x[:n] = div( x[:n], ds)
# Solve
#
# S * v = 0.5 * A * D^-1 * ( bx[:n] -
# (D2-D1)*(D1+D2)^-1 * bx[n:] +
# D1 * ( I + (D2-D1)*(D1+D2)^-1 ) * bzl[:n] -
# D2 * ( I - (D2-D1)*(D1+D2)^-1 ) * bzl[n:] )
blas.gemv(Asc, x, v)
lapack.potrs(S, v)
# x[:n] = D^-1 * ( rhs - A'*v ).
blas.gemv(Asc, v, x, alpha=-1.0, beta=1.0, trans='T')
x[:n] = div(x[:n], ds)
# x[n:] = (D1+D2)^-1 * ( bx[n:] - D1*bzl[:n] - D2*bzl[n:] )
# - (D2-D1)*(D1+D2)^-1 * x[:n]
x[n:] = div( x[n:] - mul(d1, z[:n]) - mul(d2, z[n:]), d1+d2 )\
- mul( d3, x[:n] )
# zl[:n] = D1^1/2 * ( x[:n] - x[n:] - bzl[:n] )
# zl[n:] = D2^1/2 * ( -x[:n] - x[n:] - bzl[n:] ).
z[:n] = mul( W['di'][:n], x[:n] - x[n:] - z[:n] )
z[n:] = mul( W['di'][n:], -x[:n] - x[n:] - z[n:] )
return g
return solvers.coneqp(P, q, G, h, kktsolver = Fkkt)['x'][:n]
![\begin{displaymath}
\left[\begin{array}{ccc}
P & A^T & G^T \\
A & 0 & 0 \\
...
... = \left[\begin{array}{c} b_x \\ b_y \\ b_z \end{array}\right]
\end{displaymath}](img140.gif)