Personal tools
You are here: Home Documentation User's Guide 9.4 Exploiting Structure
Document Actions

9.4 Exploiting Structure

9.4 Exploiting Structure By default, the functions cp() and cpl() do not exploit problem structure. Two mechanisms are provided for implementing customized solvers that take advantage of problem structure.

Providing a function for solving KKT equations.
The most expensive step of each iteration of cp() is the solution of a set of linear equations (`KKT equations') of the form
\begin{displaymath}
\left[\begin{array}{ccc}
H & A^T & \tilde G^T \\
A & 0 &...
...= \left[\begin{array}{c} b_x \\ b_y \\ b_z \end{array}\right],
\end{displaymath} (9.2)

where

\begin{displaymath}
H = \sum_{k=0}^m z_k \nabla^2f_k(x), \qquad
\tilde G = \le...
...la f_1(x) & \cdots & \nabla f_m(x) & G^T \end{array}\right]^T.
\end{displaymath}

The matrix W depends on the current iterates and is defined as follows. Suppose

\begin{displaymath}
u = \left(u_\mathrm{nl}, \; u_\mathrm{l}, \; u_{\mathrm{q},...
...\mathrm{s},k} \in{\mbox{\bf S}}^{t_k}, \quad k = 0,\ldots,N-1.
\end{displaymath}

Then W is a block-diagonal matrix,

\begin{displaymath}
Wu = \left( W_\mathrm{nl} u_\mathrm{nl}, \;
W_\mathrm{l} ...
...rm{s},N-1} \mathop{\mathbf{vec}}{(u_{\mathrm{s},N-1})} \right)
\end{displaymath}

with the following diagonal blocks.
  • The first block is a positive diagonal scaling with a vector $d_{\mathrm{nl}}$:

    \begin{displaymath}
W_\mathrm{nl} = \mbox{\bf diag}\,(d_\mathrm{nl}),
\qquad W_\mathrm{nl}^{-1} = \mbox{\bf diag}\,(d_\mathrm{nl})^{-1}.
\end{displaymath}

    This transformation is symmetric:

    \begin{displaymath}
W_\mathrm{nl}^T = W_\mathrm{nl}.
\end{displaymath}

  • The second block is a positive diagonal scaling with a vector $d_{\mathrm{l}}$:

    \begin{displaymath}
W_\mathrm{l} = \mbox{\bf diag}\,(d_\mathrm{l}),
\qquad W_\mathrm{l}^{-1} = \mbox{\bf diag}\,(d_\mathrm{l})^{-1}.
\end{displaymath}

    This transformation is symmetric:

    \begin{displaymath}
W_\mathrm{l}^T = W_\mathrm{l}.
\end{displaymath}

  • The next M blocks are positive multiples of hyperbolic Householder transformations:

    \begin{displaymath}
W_{\mathrm{q},k} = \beta_k ( 2 v_k v_k^T - J),
\qquad
W_{...
...c{1}{\beta_k} ( 2 Jv_k v_k^T J - J),
\qquad k = 0,\ldots,M-1,
\end{displaymath}

    where

    \begin{displaymath}
\beta_k > 0, \qquad v_{k0} > 0, \qquad
v_k^T Jv_k = 1, \q...
...= \left[\begin{array}{cc}
1 & 0 \\ 0 & -I \end{array}\right].
\end{displaymath}

    These transformations are also symmetric:

    \begin{displaymath}
W_{\mathrm{q},k}^T = W_{\mathrm{q},k}.
\end{displaymath}

  • The last N blocks are congruence transformations with nonsingular matrices:

    \begin{displaymath}
W_{\mathrm{s},k} \mathop{\mathbf{vec}}{(u_{\mathrm{s},k})} ...
..._k^{-T} u_{\mathrm{s},k} r_k^{-1})}, \qquad
k = 0,\ldots,N-1.
\end{displaymath}

    In general, this operation is not symmetric, and

    \begin{displaymath}
W_{\mathrm{s},k}^T \mathop{\mathbf{vec}}{(u_{\mathrm{s},k})...
...u_{\mathrm{s},k} r_k^{-T})}, \qquad
\qquad
k = 0,\ldots,N-1.
\end{displaymath}

It is often possible to exploit problem structure to solve (9.2) faster than by standard methods. The last argument kktsolver of cp() allows the user to supply a Python function for solving the KKT equations. This function will be called as "f = kktsolver(x, z, W)". The argument x is the point at which the derivatives in the KKT matrix are evaluated. z is a positive vector of length it m + 1, containing the coefficients in the 1,1 block H. W is a dictionary that contains the parameters of the scaling:

  • W['dnl'] is the positive vector that defines the diagonal scaling for the nonlinear inequalities. W['dnli'] is its componentwise inverse.
  • W['d'] is the positive vector that defines the diagonal scaling for the componentwise linear inequalities. W['di'] is its componentwise inverse.
  • W['beta'] and W['v'] are lists of length M with the coefficients and vectors that define the hyperbolic Householder transformations.
  • W['r'] is a list of length N with the matrices that define the the congruence transformations. W['rti'] is a list of length N with the transposes of the inverses of the matrices in W['r'].

The function call "f = kktsolver(x, z, W)" should return a routine for solving the KKT system (9.2) defined by x, z, W. It will be called as "f(bx, by, bz)". On entry, bx, by, bz contain the righthand side. On exit, they should contain the solution of the KKT system, with the last component scaled, i.e., on exit,

\begin{displaymath}
b_x := u_x, \qquad b_y := u_y, \qquad b_z := W u_z.
\end{displaymath}

The role of the argument kktsolver in the function cpl() is similar, except that in (9.2),

\begin{displaymath}
H = \sum_{k=0}^{m-1} z_k \nabla^2f_k(x), \qquad
\tilde G =...
..._0(x) & \cdots & \nabla f_{m-1}(x) & G^T \end{array}\right]^T.
\end{displaymath}

Specifying constraints via Python functions.
In the default use of cp(), the arguments G and A are the coefficient matrices in the constraints of (9.2). It is also possible to specify these matrices by providing Python functions that evaluate the corresponding matrix-vector products and their adjoints.

If the argument G of conelp() is a Python function, it should be defined as follows:

G(u, v [, alpha[, beta[, trans]]])


This evaluates the matrix-vector products

\begin{displaymath}
v := \alpha Gu + \beta v \quad (\mathrm{trans} = \mathrm{'N'...
... \alpha G^T u + \beta v \quad (\mathrm{trans} = \mathrm{'T'}).
\end{displaymath}

The default values of the optional arguments must be alpha = 1.0, beta = 0.0, trans = 'N'.

Similarly, if the argument A is a Python function, then it must be defined as follows.

A(u, v [, alpha[, beta[, trans]]])


This evaluates the matrix-vector products

\begin{displaymath}
v := \alpha Au + \beta v \quad (\mathrm{trans} = \mathrm{'N'...
... \alpha A^T u + \beta v \quad (\mathrm{trans} = \mathrm{'T'}).
\end{displaymath}

The default values of the optional arguments must be alpha = 1.0, beta = 0.0, trans = 'N'.

In a similar way, when the first argument F() of cp() returns matrices of first derivatives or second derivatives Df, H, these matrices can be specified as Python functions. If Df is a Python function, it should be defined as follows:

Df(u, v [, alpha[, beta[, trans]]])


This evaluates the matrix-vector products

\begin{displaymath}
v := \alpha Df(x) u + \beta v \quad (\mathrm{trans} = \mathr...
...pha Df(x)^T u + \beta v \quad (\mathrm{trans} = \mathrm{'T'}).
\end{displaymath}

The default values of the optional arguments must be alpha = 1.0, beta = 0.0, trans = 'N'.

If H is a Python function, it should be defined as follows:

H(u, v [, alpha[, beta]])


This evaluates the matrix-vector product

\begin{displaymath}
v := \alpha H u + \beta v.
\end{displaymath}

The default values of the optional arguments must be alpha = 1.0, beta = 0.0, trans = 'N'.

If G, A, Df, or H are Python functions, then the argument kktsolver must also be provided.

As an example, we consider the unconstrained problem

\begin{displaymath}
\begin{array}{ll}
\mbox{minimize} & (1/2)\Vert Ax-b\Vert _2^2 - \sum_{i=1}^n \log(1-x_i^2)
\end{array}\end{displaymath}

where A is an m by n matrix with m less than n. The Hessian of the objective is diagonal plus a low-rank term:

\begin{displaymath}
H = A^TA + \mbox{\bf diag}\,(d), \qquad d_i = \frac{2(1+x_i^2)}{(1-x_i^2)^2}.
\end{displaymath}

We can exploit this property when solving (9.2) by applying the matrix inversion lemma. We first solve

\begin{displaymath}
(A \mbox{\bf diag}\,(d)^{-1}A^T + I) v = (1/z_0) A \mbox{\bf diag}\,(d)^{-1}b_x, \qquad
\end{displaymath}

and then obtain

\begin{displaymath}
u_x = \mbox{\bf diag}\,(d)^{-1}(b_x/z_0 - A^T v).
\end{displaymath}

The following code follows this method. It also uses BLAS functions for matrix-matrix and matrix-vector products.

from cvxopt.base import matrix, spdiag, mul, div, log 
from cvxopt import base, blas, lapack, solvers, base

def l2ac(A, b):
"""
Solves

minimize (1/2) * ||A*x-b||_2^2 - sum log (1-xi^2)

assuming A is m x n with m << n.
"""

m, n = A.size
def F(x = None, z = None):
if x is None:
return 0, matrix(0.0, (n,1))
if max(abs(x)) >= 1.0:
return None
# r = A*x - b
r = -b
blas.gemv(A, x, r, beta = -1.0)
w = x**2
f = 0.5 * blas.nrm2(r)**2 - sum(log(1-w))
# gradf = A'*r + 2.0 * x ./ (1-w)
gradf = div(x, 1.0 - w)
blas.gemv(A, r, gradf, trans = 'T', beta = 2.0)
if z is None:
return f, gradf.T
else:
def Hf(u, v, alpha = 1.0, beta = 0.0):
# v := alpha * (A'*A*u + 2*((1+w)./(1-w)).*u + beta *v
v *= beta
v += 2.0 * alpha * mul(div(1.0+w, (1.0-w)**2), u)
blas.gemv(A, u, r)
blas.gemv(A, r, v, alpha = alpha, beta = 1.0, trans = 'T')
return f, gradf.T, Hf


# Custom solver for the Newton system
#
# z[0]*(A'*A + D)*x = bx
#
# where D = 2 * (1+x.^2) ./ (1-x.^2).^2. We apply the matrix inversion
# lemma and solve this as
#
# (A * D^-1 *A' + I) * v = A * D^-1 * bx / z[0]
# D * x = bx / z[0] - A'*v.

S = matrix(0.0, (m,m))
Asc = matrix(0.0, (m,n))
v = matrix(0.0, (m,1))
def Fkkt(x, z, W):
ds = (2.0 * div(1 + x**2, (1 - x**2)**2))**-0.5
base.gemm(A, spdiag(ds), Asc)
blas.syrk(Asc, S)
S[::m+1] += 1.0
lapack.potrf(S)
a = z[0]
def g(x, y, z):
x[:] = mul(x, ds) / a
blas.gemv(Asc, x, v)
lapack.potrs(S, v)
blas.gemv(Asc, v, x, alpha = -1.0, beta = 1.0, trans = 'T')
x[:] = mul(x, ds)
return g

return solvers.cp(F, kktsolver = Fkkt)['x']

 

Powered by Plone CMS, the Open Source Content Management System