2  A Gentle Introduction

NoteWhat you’ll be able to do

By the end of this chapter you can state a convex problem in CVXR, solve it, read its status and solution, modify and re-solve it, and recognize when a problem is not something CVXR will accept.

2.1 Convex optimization in one picture

A convex optimization problem has the form

\[ \begin{array}{ll} \mbox{minimize} & f_0(x)\\ \mbox{subject to} & f_i(x) \leq 0, \quad i = 1,\ldots,m\\ & g_i(x) = 0, \quad i = 1,\ldots,p \end{array} \]

where \(x\) is the variable, the objective \(f_0\) and the inequality functions \(f_1,\ldots,f_m\) are convex, and the equality functions \(g_1,\ldots,g_p\) are affine.

A handy mental picture:

  • a convex function is upward-curving (a chord lies above the graph),
  • a concave function is downward-curving,
  • an affine function is flat — both convex and concave.

We minimize a convex function (or maximize a concave one) over a convex region. The payoff for staying convex: any local solution is automatically global. For much more, see Boyd and Vandenberghe (2004).

2.2 ‘Hello World’

Here is about the simplest problem that shows all three ingredients — an objective, an inequality constraint, and an equality constraint:

\[ \begin{array}{ll} \mbox{minimize} & x^2 + y^2 \\ \mbox{subject to} & x \geq 0, \quad 2x + y = 1 \end{array} \]

with scalar variables \(x\) and \(y\). It is simple enough to solve by hand, so we can check CVXR’s answer. Here it is in CVXR:

# Variables minimized over
x <- Variable(1)
y <- Variable(1)

# Problem definition
objective   <- Minimize(x^2 + y^2)
constraints <- list(x >= 0, 2 * x + y == 1)
prob1 <- Problem(objective, constraints)

# Solve it
sol1 <- psolve(prob1)

status(prob1)   # did it solve?
[1] "optimal"
sol1            # optimal objective value
[1] 0.2
value(x)
     [,1]
[1,]  0.4
value(y)
     [,1]
[1,]  0.2

Let’s walk through that carefully, because every CVXR problem is built the same way.

Variables are placeholders. The first two lines create scalar Variable objects. They have no value yet — not until we solve.

x <- Variable(1)
y <- Variable(1)

Minimize() builds an objective, not a number. Unlike base R’s min(), this does not compute anything (there is nothing to compute — x and y have no values). It records the goal.

objective <- Minimize(x^2 + y^2)

>= and == build constraints, not TRUE/FALSE. The comparison operators are overloaded: x >= 0 returns a Constraint object the solver will enforce — not a logical. (Without the constraints, the solution would trivially be \(x = y = 0\).)

constraints <- list(x >= 0, 2 * x + y == 1)

Problem() assembles, psolve() solves. A problem pairs an objective with zero or more constraints. psolve() translates it into a form a convex solver understands, calls the solver, and returns the optimal objective value. Afterwards value() retrieves each variable and status() reports what happened.

Solving by hand: the equality gives \(y = 1 - 2x\), so we minimize \(x^2 + (1-2x)^2 = 5x^2 - 4x + 1\), which is smallest at \(x = 2/5\). Then \(y = 1/5\) and the objective is \(1/5\) — exactly what CVXR found. The world says ‘hi’ back.

2.2.1 Three things that can happen

When you psolve() a problem, status() is one of:

  1. "optimal" — solved; value() gives the optimal variables.
  2. "infeasible" — no point satisfies all constraints (e.g. x >= 1 and x <= 0). psolve() returns +Inf (minimization) and variable values are NA.
  3. "unbounded" — the objective can be driven to \(-\infty\). psolve() returns -Inf.

Always glance at status() before trusting a solution.

2.3 Watching the solver work

By default psolve() is quiet. But you can ask it to show its work by passing verbose = TRUE — invaluable when you want to understand what CVXR is doing, or diagnose a slow or stalling solve:

psolve(prob1, verbose = TRUE)
────────────────────────────────── CVXR v1.9.1 ─────────────────────────────────
ℹ Problem: 2 variables, 2 constraints (QP)
ℹ Compilation: "OSQP" via CVXR::Dcp2Cone -> CVXR::CvxAttr2Constr -> CVXR::ConeMatrixStuffing -> CVXR::OSQP_QP_Solver
ℹ Compile time: 0.234s
─────────────────────────────── Numerical solver ───────────────────────────────
-----------------------------------------------------------------
           OSQP v1.0.0  -  Operator Splitting QP Solver
              (c) The OSQP Developer Team
-----------------------------------------------------------------
problem:  variables n = 2, constraints m = 2
          nnz(P) + nnz(A) = 5
settings: algebra = Built-in,
          OSQPInt = 4 bytes, OSQPFloat = 8 bytes,
          linear system solver = QDLDL v0.1.8,
          eps_abs = 1.0e-05, eps_rel = 1.0e-05,
          eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
          rho = 1.00e-01 (adaptive: 50 iterations),
          sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
          check_termination: on (interval 25, duality gap: on),
          time_limit: 1.00e+10 sec,
          scaling: on (10 iterations), scaled_termination: off
          warm starting: on, polishing: on, 
iter   objective    prim res   dual res   gap        rel kkt    rho         time
   1   0.0000e+00   1.00e+00   1.00e+02  -5.00e+01   1.00e+02   1.00e-01    3.54e-05s
  50   2.0000e-01   4.87e-10   6.73e-10   2.79e-10   6.73e-10   1.00e-01    7.67e-05s
plsh   2.0000e-01   1.57e-16   1.57e-16  -1.11e-16   1.57e-16   --------    1.03e-04s

status:               solved
solution polishing:   successful
number of iterations: 50
optimal objective:    0.2000
dual objective:       0.2000
duality gap:          -1.1102e-16
primal-dual integral: 5.0000e+01
run time:             1.03e-04s
optimal rho estimate: 7.60e-02
──────────────────────────────────── Summary ───────────────────────────────────
✔ Status: optimal
✔ Optimal value: 0.2
ℹ Compile time: 0.234s
ℹ Solver time: 0.049s
[1] 0.2

The output has three parts, and they preview the rest of this book:

  1. Compilation — CVXR translates your problem into a standard form a solver understands. The chain (here Dcp2Cone -> ... -> ConeMatrixStuffing -> ...) is the machinery we open up in Chapter 3; for now, note that your symbolic problem becomes solver data.
  2. Numerical solver — the chosen solver’s own iteration log, converging to the answer.
  3. Summary — the status, optimal value, and how long compilation vs. solving took.

Turn it on whenever you are curious or stuck; turn it off (the default) once things work. Every psolve() call in this book accepts it.

2.4 Modifying and re-solving

CVXR objects are ordinary R objects: you can build new problems from old ones. Here we reuse the constraints from prob1, change the objective, and add a constraint:

prob2 <- Problem(Minimize(x^2 + y^2 + abs(x - y)),
                 c(prob1@constraints, list(y <= 1)))
sol2 <- psolve(prob2)
status(prob2)
[1] "optimal"
sol2
[1] 0.2222222
value(x)
          [,1]
[1,] 0.3333333
value(y)
          [,1]
[1,] 0.3333333

This modify-and-re-solve loop — change one thing, solve again — is what makes CVXR so good for prototyping new methods.

2.5 An invalid problem

You can’t type any problem you like. Suppose we tried to maximize the convex objective \(x^2 + y^2\):

prob_bad <- Problem(Maximize(x^2 + y^2), prob1@constraints)
psolve(prob_bad)
Error in `construct_solving_chain()`:
! Problem is not DCP compliant.

CVXR refuses. Maximizing a convex function is not a convex problem, and CVXR checks this using a strict rule set called Disciplined Convex Programming (DCP). Follow the rules and your problem is guaranteed convex; break them and CVXR stops you rather than silently returning nonsense. The next chapter is devoted to DCP — this error is your first encounter with it.

2.6 A familiar example: least squares

Let’s connect to something every statistician knows. We generate data from a known linear model \(Y = X\beta + \epsilon\) and recover \(\beta\).

set.seed(123)
n <- 50; p <- 10
beta_true <- -4:5                      # the true coefficients, -4 through 5
X <- matrix(rnorm(n * p), nrow = n)
Y <- X %*% beta_true + rnorm(n)

The textbook tool is lm():

ls_fit <- lm(Y ~ 0 + X)                # no intercept in our model
coef(ls_fit)
         X1          X2          X3          X4          X5          X6 
-3.90253282 -2.96833856 -1.82342672 -1.15621330  0.07539855  0.92243167 
         X7          X8          X9         X10 
 1.87995112  3.11070199  4.10865028  5.09475916 

In CVXR we write the least-squares problem in its mathematical form, \(\min_\beta \|Y - X\beta\|_2^2\):

beta <- Variable(p)
obj  <- Minimize(sum((Y - X %*% beta)^2))
ols  <- Problem(obj)
sol  <- psolve(ols)

Notice the objective is written with the same sum, %*%, and ^ you use on ordinary R numbers — but beta is a CVXR variable. The estimates match lm():

cbind(CVXR = value(beta), lm = coef(ls_fit))
                         lm
X1  -3.90253282 -3.90253282
X2  -2.96833856 -2.96833856
X3  -1.82342672 -1.82342672
X4  -1.15621330 -1.15621330
X5   0.07539855  0.07539855
X6   0.92243167  0.92243167
X7   1.87995112  1.87995112
X8   3.11070199  3.11070199
X9   4.10865028  4.10865028
X10  5.09475916  5.09475916

2.6.1 Anything is a function of the solution

A solved problem gives you more than the coefficients. Once beta has a value, you can evaluate any expression built from it with value(). The fitted values \(X\hat\beta\) and the residuals, for instance, are one line each:

fitted <- value(X %*% beta)        # X times beta-hat
resid  <- Y - fitted
head(round(cbind(fitted, resid), 3))
        [,1]   [,2]
[1,]  10.437 -0.827
[2,]  -7.877 -0.880
[3,]  -9.711  0.927
[4,]  -0.504  0.367
[5,]   7.310 -1.323
[6,] -29.583  0.117

X %*% beta is a CVXR expression; value() evaluates it at the solution, and the result is ordinary R you can plot, summarize, or feed onward. This little move — any quantity you care about is value() of an expression — is one we will reuse in every chapter.

2.6.2 So what did we gain?

Fair question — we replaced one call to lm() with several lines, and it even runs slower. The payoff appears the moment the problem changes. Suppose we know the coefficients are nonnegative. That is nonnegative least squares, and lm() can no longer help. In CVXR it is one extra constraint:

nnls_prob <- Problem(obj, constraints = list(beta >= 0))
psolve(nnls_prob)
[1] 850.5519
value(beta)
               [,1]
 [1,] -3.663280e-18
 [2,] -2.775389e-18
 [3,]  2.116383e-18
 [4,]  4.424205e-19
 [5,]  1.190672e+00
 [6,]  1.460162e+00
 [7,]  1.817840e+00
 [8,]  3.147380e+00
 [9,]  5.828935e+00
[10,]  5.287308e+00

We can check against the dedicated nnls package:

cbind(CVXR = value(beta), nnls = nnls::nnls(X, Y)$x)
                        nnls
 [1,] -3.663280e-18 0.000000
 [2,] -2.775389e-18 0.000000
 [3,]  2.116383e-18 0.000000
 [4,]  4.424205e-19 0.000000
 [5,]  1.190672e+00 1.190672
 [6,]  1.460162e+00 1.460162
 [7,]  1.817840e+00 1.817840
 [8,]  3.147380e+00 3.147380
 [9,]  5.828935e+00 5.828935
[10,]  5.287308e+00 5.287308

Same answer — but in CVXR we got there by adding a line, not by finding a new package. That is the whole idea: you describe the problem, CVXR solves it.

2.7 Exercises

Try these by editing the chunks above and re-running. Solutions are one click away.

Exercise 1. Change the least-squares objective to least absolute deviation (minimize the sum of absolute residuals) and solve. Which objective gives a smaller sum of absolute residuals?

lad_obj  <- Minimize(sum(abs(Y - X %*% beta)))
lad_prob <- Problem(lad_obj)
psolve(lad_prob)
[1] 31.27002
cbind(LAD = value(beta), lm = coef(ls_fit))
                        lm
X1  -4.0324773 -3.90253282
X2  -2.8267845 -2.96833856
X3  -1.9473476 -1.82342672
X4  -1.1859615 -1.15621330
X5   0.1886891  0.07539855
X6   0.9447007  0.92243167
X7   1.9321485  1.87995112
X8   3.0070658  3.11070199
X9   4.2535749  4.10865028
X10  5.0186776  5.09475916

LAD minimizes absolute residuals, so by that yardstick it wins; lm minimizes squared residuals and would win on sum of squares. Note we reused beta — its value reflects whichever problem was last solved.

Exercise 2. Add the constraint that the coefficients are nondecreasing: \(\beta_1 \le \beta_2 \le \cdots \le \beta_p\).

Hint: what base-R function takes lagged differences?

base::diff() has a CVXR method, so the monotonicity constraint is just diff(beta) >= 0:

mono <- Problem(obj, constraints = list(diff(beta) >= 0))
psolve(mono)
[1] 34.13461
value(beta)
             [,1]
 [1,] -3.90253282
 [2,] -2.96833856
 [3,] -1.82342672
 [4,] -1.15621330
 [5,]  0.07539855
 [6,]  0.92243167
 [7,]  1.87995112
 [8,]  3.11070199
 [9,]  4.10865028
[10,]  5.09475916

Exercise 3 (⭐). Constrain the first four coefficients to sum to at most zero, \(\sum_{i=1}^{4}\beta_i \le 0\), using a matrix row rather than writing out four terms.

Build a row vector that picks out \(\beta_1,\ldots,\beta_4\):

A <- matrix(c(rep(1, 4), rep(0, 6)), nrow = 1)
sum_con <- Problem(obj, constraints = list(A %*% beta <= 0))
psolve(sum_con)
[1] 34.13461
value(beta)
             [,1]
 [1,] -3.90253282
 [2,] -2.96833856
 [3,] -1.82342672
 [4,] -1.15621330
 [5,]  0.07539855
 [6,]  0.92243167
 [7,]  1.87995112
 [8,]  3.11070199
 [9,]  4.10865028
[10,]  5.09475916

Working with matrices generalizes far better than writing beta[1] + beta[2] + ....

2.8 Takeaways

  • Every CVXR problem is Variable → objective → constraints → Problempsolve, then status() and value().
  • Operators like >=, ==, sum, %*% are overloaded to build optimization objects, not to compute numbers.
  • The power is flexibility: change the loss, add a constraint, re-solve — no closed form or special-purpose package required.
  • CVXR only accepts problems that pass the DCP rules, which we turn to next.

For a longer version of this introduction, see the CVXR website.