8  Parameters and Efficient Sweeps (DPP)

NoteWhat you’ll be able to do

Replace a numeric constant with a Parameter, recognize when a parametrized problem follows the DPP rules, and re-solve it over a grid of values far faster than rebuilding the problem each time.

8.1 A hidden cost in Chapter 3

In Chapter 5 we traced a regularization path by building a new problem for each \(\lambda\) and solving it. That works, but it hides a cost: every solve re-runs CVXR’s compilation — the translation of your symbolic problem into the matrices a solver wants. When you sweep dozens or hundreds of values, you pay that toll over and over.

Parameters fix this. A Parameter is a symbolic placeholder for a constant whose value you can change without rebuilding the problem. If the problem obeys the Disciplined Parametrized Programming (DPP) rules, CVXR compiles it once, caches the parameter-to-data mapping, and later solves just swap in the new number.

8.2 Parameters and the DPP rule

A Parameter looks like a variable but stands for data you will supply:

gamma <- Parameter(nonneg = TRUE)   # a nonnegative constant, value set later
x <- Variable(5)
prob <- Problem(Minimize(sum_squares(x - 1) + gamma * p_norm(x, 1)))
is_dpp(prob)
[1] TRUE

DPP adds one main restriction to DCP: you may not multiply two parametrized expressions (a parameter times a parameter, or a parameter times a parameter-containing expression). Such a problem is still DCP, just not DPP — so it solves, but without the speedup:

g <- Parameter(nonneg = TRUE); z <- Variable()
bad <- Problem(Minimize(g * g * z), list(z >= 1))
cat("Is DPP?", is_dpp(bad), "\n")
cat("Is DCP?", is_dcp(bad), "\n")
Is DPP? FALSE 
Is DCP? TRUE 

Just like the DCP reformulation skill from Chapter 3, a non-DPP problem can usually be rewritten to comply — here by introducing a helper variable:

z <- Variable(); w <- Variable(); g <- Parameter(nonneg = TRUE)
good <- Problem(Minimize(g * w), list(w == g * z, z >= 1))
cat("Is DPP?", is_dpp(good), "\n")
Is DPP? TRUE 

8.3 The payoff: watch the compile time vanish

Recall from Chapter 2 that verbose = TRUE splits each solve into a compile time and a solver time in its summary. Parameters target the compile time — so let’s just watch it. We set up a lasso whose penalty weight is a Parameter, and solve it the first time:

set.seed(1)
n <- 120; m <- 40
A <- matrix(rnorm(n * m), n, m); b <- rnorm(n)
x <- Variable(m)
error <- sum_squares(A %*% x - b)

gamma <- Parameter(nonneg = TRUE)
param_prob <- Problem(Minimize(error + gamma * p_norm(x, 1)))   # DPP

value(gamma) <- 0.5
psolve(param_prob, verbose = TRUE)        # first solve: pays compilation
────────────────────────────────── CVXR v1.9.1 ─────────────────────────────────
ℹ Problem: 1 variable, 0 constraints (QP)
ℹ Compilation: "OSQP" via CVXR::Dcp2Cone -> CVXR::CvxAttr2Constr -> CVXR::ConeMatrixStuffing -> CVXR::OSQP_QP_Solver
ℹ Compile time: 0.396s
─────────────────────────────── Numerical solver ───────────────────────────────
-----------------------------------------------------------------
           OSQP v1.0.0  -  Operator Splitting QP Solver
              (c) The OSQP Developer Team
-----------------------------------------------------------------
problem:  variables n = 200, constraints m = 200
          nnz(P) + nnz(A) = 5200
settings: algebra = Built-in,
          OSQPInt = 4 bytes, OSQPFloat = 8 bytes,
          linear system solver = QDLDL v0.1.8,
          eps_abs = 1.0e-05, eps_rel = 1.0e-05,
          eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
          rho = 1.00e-01 (adaptive: 50 iterations),
          sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
          check_termination: on (interval 25, duality gap: on),
          time_limit: 1.00e+10 sec,
          scaling: on (10 iterations), scaled_termination: off
          warm starting: on, polishing: on, 
iter   objective    prim res   dual res   gap        rel kkt    rho         time
   1  -1.6000e+02   8.00e+00   6.83e+02  -3.13e+03   6.83e+02   1.00e-01    7.07e-04s
  50   8.0194e+01   6.18e-02   2.66e-03  -1.87e-01   6.18e-02   5.00e-01*   1.41e-03s
 100   8.0454e+01   8.44e-03   7.36e-05  -6.66e-03   8.44e-03   5.57e+00*   2.28e-03s
 200   8.0462e+01   1.33e-05   5.75e-07  -1.90e-06   1.33e-05   5.57e+00    3.82e-03s
plsh   8.0462e+01   1.46e-15   1.28e-14  -5.68e-14   1.28e-14   --------    4.35e-03s

status:               solved
solution polishing:   successful
number of iterations: 200
optimal objective:    80.4620
dual objective:       80.4620
duality gap:          -5.6843e-14
primal-dual integral: 3.1311e+03
run time:             4.35e-03s
optimal rho estimate: 2.78e+01
──────────────────────────────────── Summary ───────────────────────────────────
✔ Status: optimal
✔ Optimal value: 80.462
ℹ Compile time: 0.396s
ℹ Solver time: 0.026s
[1] 80.46205

Look at the Compile time in the summary. Now change the parameter and re-solve the same problem object:

value(gamma) <- 0.7
psolve(param_prob, verbose = TRUE)        # re-solve: compilation is cached
────────────────────────────────── CVXR v1.9.1 ─────────────────────────────────
ℹ Problem: 1 variable, 0 constraints (QP)
ℹ Using cached DPP tensor (fast path)
ℹ Compilation: "OSQP" via CVXR::Dcp2Cone -> CVXR::CvxAttr2Constr -> CVXR::ConeMatrixStuffing -> CVXR::OSQP_QP_Solver
ℹ Compile time: 0.038s
─────────────────────────────── Numerical solver ───────────────────────────────
-----------------------------------------------------------------
           OSQP v1.0.0  -  Operator Splitting QP Solver
              (c) The OSQP Developer Team
-----------------------------------------------------------------
problem:  variables n = 200, constraints m = 200
          nnz(P) + nnz(A) = 5200
settings: algebra = Built-in,
          OSQPInt = 4 bytes, OSQPFloat = 8 bytes,
          linear system solver = QDLDL v0.1.8,
          eps_abs = 1.0e-05, eps_rel = 1.0e-05,
          eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
          rho = 1.00e-01 (adaptive: 50 iterations),
          sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
          check_termination: on (interval 25, duality gap: on),
          time_limit: 1.00e+10 sec,
          scaling: on (10 iterations), scaled_termination: off
          warm starting: on, polishing: on, 
iter   objective    prim res   dual res   gap        rel kkt    rho         time
   1  -2.2400e+02   8.00e+00   9.56e+02  -4.38e+03   9.56e+02   1.00e-01    7.68e-04s
 150   8.1079e+01   1.52e-02   9.34e-07  -2.57e-02   1.52e-02   1.44e+01*   2.85e-03s
 200   8.1117e+01   8.80e-07   7.04e-07  -2.81e-07   8.80e-07   1.44e+01    3.79e-03s
plsh   8.1117e+01   1.27e-15   1.67e-14   7.96e-14   7.96e-14   --------    4.30e-03s

status:               solved
solution polishing:   successful
number of iterations: 200
optimal objective:    81.1172
dual objective:       81.1172
duality gap:          7.9581e-14
primal-dual integral: 4.3836e+03
run time:             4.30e-03s
optimal rho estimate: 1.82e+01
──────────────────────────────────── Summary ───────────────────────────────────
✔ Status: optimal
✔ Optimal value: 81.1172
ℹ Compile time: 0.038s
ℹ Solver time: 0.043s
[1] 81.11721

The compile time has collapsed — CVXR reused the cached parameter-to-data mapping and went straight to the solver. A freshly built problem, by contrast, pays the full compile every time.

8.4 Quantifying it

Now let’s measure the per-solve saving with bench::mark, which reports the garbage-collection-free minimum time — the honest way to compare. We pit one DPP re-solve against one rebuild (a fresh problem with a numeric weight):

library(bench)
bench::mark(
  `DPP re-solve` = { value(gamma) <- 0.7; psolve(param_prob) },
  `rebuild`      = { psolve(Problem(Minimize(error + 0.7 * p_norm(x, 1)))) },
  check = FALSE, min_iterations = 5, filter_gc = FALSE
)[, c("expression", "min", "median", "mem_alloc")]
# A tibble: 2 × 4
  expression        min   median mem_alloc
  <bch:expr>   <bch:tm> <bch:tm> <bch:byt>
1 DPP re-solve   9.52ms    9.8ms   554.7KB
2 rebuild       26.62ms   28.5ms     9.7MB

On our machine the DPP re-solve is roughly 2–3× faster per solve and allocates less memory — precisely the compile time we just watched disappear. Over a sweep of \(K\) values you save that difference \(K\) times.

NoteHow big is the win, really?

The saving is exactly the recompilation you avoid, so it is largest when compilation is a meaningful share of each solve — small-to-medium problems solved many times (regularization paths, grid searches, simulations, bootstraps). For very large problems the solver itself dominates and the ratio shrinks toward 1×. Either way the parametrized code is cleaner, and — as we note below — parameters are also the gateway to differentiating through a problem.

8.5 Back to the regularization path

This is exactly Chapter 3’s sweep, done right. We make \(\lambda\) a Parameter and re-solve — the path looks identical, but only one compilation happens:

lambda_vals <- 10^seq(-3, 1, length.out = 30)
lambda <- Parameter(nonneg = TRUE)
beta <- Variable(m)
path_prob <- Problem(Minimize(error + lambda * p_norm(beta, 1)))

betas <- sapply(lambda_vals, function(l) { value(lambda) <- l; psolve(path_prob); value(beta) })
dim(betas)   # m coefficients x number of lambda values
[1] 40 30

Whenever you plan to solve the same shape of problem many times — a path, a grid search, a simulation, a bootstrap — reach for a Parameter.

8.6 The other half of the solve: warm starts

Recall that every solve splits into a compile time and a solver time. DPP attacks the compile time — it reuses the canonicalized problem structure so we never re-stuff the matrices. But the solver still starts cold each time, refining an internal guess from scratch until it converges.

A warm start attacks the other half. A numerical solver is iterative; if you hand it a starting point already near the answer, it converges in fewer iterations. In a sweep over a fine grid the solution at one \(\lambda\) is an excellent guess for the next, so the previous answer is exactly the start point we want. In CVXR that is one flag — warm_start = TRUE uses the current variable values as the solver’s initial point:

betas_warm <- sapply(lambda_vals, function(l) {
  value(lambda) <- l
  psolve(path_prob, warm_start = TRUE)   # start from the previous lambda's solution
  value(beta)
})
max(abs(betas_warm - betas))             # same path, to solver tolerance
[1] 4.337505e-12

The two ideas are orthogonal and complementary, not the same trick:

reuses cuts wins biggest when
DPP the compiled problem structure compile time compilation dominates — small problems solved many times
Warm start the previous solution solver time the solver dominates — large/expensive problems, close-together solves

DPP does nothing for the solver’s iterations; a warm start does nothing for compilation. Each pays off precisely where the other does not, so a tight sweep wants both — compile once (DPP), then step the solver from each previous answer (warm start). Warm-starting is solver-dependent (about 9 of CVXR’s 15 solvers support it — OSQP, SCS, Clarabel, MOSEK, Gurobi, PIQP, HiGHS, and more); a solver that cannot warm-start simply ignores the flag.

8.7 Exercise

Exercise (⭐). The expression gamma * p_norm(x, 1) is DPP, but (gamma * gamma) * p_norm(x, 1) is not. Suppose you really want the penalty weight to be \(\gamma^2\). How can you keep the problem DPP and sweep over \(\gamma\)?

Do the nonlinear part in R, outside the DSL: keep a single parameter that holds the value \(\gamma^2\), and set it from your \(\gamma\) grid.

w <- Parameter(nonneg = TRUE)            # will hold gamma^2
prob <- Problem(Minimize(error + w * p_norm(x, 1)))
is_dpp(prob)
[1] TRUE
for (g in lambda_vals) { value(w) <- g^2; psolve(prob) }   # sweep, still DPP

The rule of thumb: parameters may enter affinely; any nonlinear transform of a parameter should be computed in R and stored in another parameter.

8.8 Takeaways

  • A Parameter is a constant you can change without rebuilding the problem.
  • If the problem is DPP (is_dpp() — essentially “no parameter × parameter”), CVXR compiles once and re-solves cheaply.
  • Use it for any repeated solve — paths, grid searches, simulations. It is the efficient way to do the sweep you met in Chapter 5.
  • DPP cuts compile time; a warm start cuts solver time. They are independent — combine Parameters with warm_start = TRUE to speed up both halves of a sweep.

For more, see the DPP Tutorial.