14  Beyond Convex: Nonlinear Programming

NoteWhat you’ll be able to do

Recognize a smooth but nonconvex problem, send it down CVXR’s nonlinear path with psolve(nlp = TRUE), and understand why a nonlinear solve gives only a local guarantee — and what to do about it.

TipYou already have what you need

This chapter uses two CRAN packages beyond CVXR’s built-in solvers — sparsediff (automatic derivatives) and Uno (the nonlinear solver) — and you installed both in the setup. So it is hands-on, like every other chapter. (The ipopt backend is optional and not on CRAN; you do not need it.)

14.1 From convex to merely smooth

Everything so far stayed inside the convex world, where a solution is guaranteed global. Disciplined Nonlinear Programming (DNLP), new in CVXR 1.9.1, relaxes that: it handles smooth problems built from differentiable atoms, even when they are not convex. You build the problem the usual way, check it with is_dnlp(), and solve with psolve(prob, nlp = TRUE). Every DCP problem is also a DNLP, so the convex world sits inside this larger one.

c(sparsediff = requireNamespace("sparsediff", quietly = TRUE),
  Uno        = requireNamespace("Uno", quietly = TRUE))
sparsediff        Uno 
      TRUE       TRUE 

A convex problem sent down the nonlinear path just works:

x <- Variable(2)
prob <- Problem(Minimize(sum_squares(x - c(1, 2))))
is_dnlp(prob)
[1] TRUE
psolve(prob, nlp = TRUE, solver = "UNO", logger = "SILENT")
[1] 0
value(x)
     [,1]
[1,]    1
[2,]    2

14.2 A genuinely nonconvex problem

Now something DCP cannot express. Maximizing a convex quadratic is nonconvex — but it is smooth, hence a DNLP. A classic instance is maximizing \(x^\top A x\) over the unit sphere, whose optimum is the largest eigenvalue of \(A\) — giving us a known answer to check against:

\[\text{maximize}\ \ x^\top A x \qquad \text{subject to}\qquad \|x\|_2^2 = 1.\]

set.seed(0)
n <- 3
A <- matrix(rnorm(n * n), n); A <- t(A) %*% A   # random PSD matrix

x <- Variable(n)
prob <- Problem(Maximize(quad_form(x, A)), list(sum_squares(x) == 1))
c(is_dcp = is_dcp(prob), is_dnlp = is_dnlp(prob))   # not DCP, but DNLP
 is_dcp is_dnlp 
  FALSE    TRUE 
value(x) <- rep(1, n)                                # a starting point
opt <- psolve(prob, nlp = TRUE, solver = "UNO", logger = "SILENT")
c(dnlp = opt, max_eigenvalue = max(eigen(A, only.values = TRUE)$values))
          dnlp max_eigenvalue 
      4.659154       4.659154 

The nonlinear solver lands exactly on the largest eigenvalue — a nonconvex problem solved from R, in the same modeling language.

14.3 The catch: only a local guarantee

This power comes with a caveat you must respect. Unlike convex optimization, a nonlinear solver returns a local optimum that can depend on where it starts — there is no global guarantee. Consider \(\sin(x) + \tfrac{1}{10}x^2\) on \([-10, 10]\), which has several local minima:

set.seed(1)
x <- Variable(bounds = list(-10, 10))
prob <- Problem(Minimize(sin(x) + 0.1 * x^2))

best <- psolve(prob, nlp = TRUE, solver = "UNO", logger = "SILENT", best_of = 5)
c(best_objective = best, x = value(x))
best_objective              x 
    -0.7945823     -1.3064400 

best_of = 5 solves from five random starting points and keeps the best — a simple, effective hedge against local minima. The lesson: convex problems give you the global optimum for free; nonconvex ones make you work for it (good starting points, random restarts, and a healthy skepticism of any single solve).

Warning

On a nonconvex problem an NLP solver may return a local (not global) optimum, give a result that depends on the starting point, or fail to find a feasible point. Supply a starting point with value(x) <- ..., or use best_of to try several.

14.4 Smooth atoms DCP doesn’t have

DNLP adds classically differentiable atoms that are not part of the convex grammar — sin, cos, tan, sinh, tanh, asinh, atanh, normcdf, prod — and lets existing smooth atoms (exp, log, quad_form, quad_over_lin, …) be used in either curvature direction. Three predicates classify an expression under the smooth grammar:

x <- Variable()
classify <- function(e) c(smooth      = is_smooth(e),
                          lin_convex  = is_linearizable_convex(e),
                          lin_concave = is_linearizable_concave(e))
rbind(`sin(x)`  = classify(sin(x)),
      `abs(x)`  = classify(abs(x)),
      `-abs(x)` = classify(-abs(x)))
        smooth lin_convex lin_concave
sin(x)    TRUE       TRUE        TRUE
abs(x)   FALSE       TRUE       FALSE
-abs(x)  FALSE      FALSE        TRUE

A smooth atom like sin(x) linearizes in both directions, so it may appear anywhere; nonsmooth convex/concave atoms keep the curvature DCP assigns them.

14.5 Exercises

Exercise 1. See the local-optimum trap for yourself. Solve \(\sin(x) + \tfrac{1}{10}x^2\) from a single starting point at x0 = 4, then compare with best_of = 20. Do they agree?

x <- Variable(bounds = list(-10, 10))
prob <- Problem(Minimize(sin(x) + 0.1 * x^2))

value(x) <- 4                                              # a poor starting point
one <- psolve(prob, nlp = TRUE, solver = "UNO", logger = "SILENT")

set.seed(2)
many <- psolve(prob, nlp = TRUE, solver = "UNO", logger = "SILENT", best_of = 20)
c(from_x0_4 = one, best_of_20 = many)
 from_x0_4 best_of_20 
 0.8315586  0.8315586 

The single solve from x0 = 4 gets stuck in a nearby local minimum (objective \(\approx 0.83\)); the random restarts escape it and find the global one (\(\approx -0.79\)). This is the whole story of nonconvex optimization in one example.

Exercise 2 (⭐). Turn on verbose = TRUE for one of the solves above and read UNO’s log. How is it different from the convex-solver output you saw in Chapter 2?

psolve(prob, nlp = TRUE, solver = "UNO", verbose = TRUE)

You will see UNO reformulate the model and run a nonlinear iteration (a Newton or SQP method), rather than the cone solver’s interior-point iterations. Same verbose = TRUE switch, a different engine underneath.

14.6 Takeaways

  • DNLP extends CVXR beyond convexity to smooth problems via psolve(nlp = TRUE); is_dnlp() checks membership, and every DCP problem qualifies.
  • It unlocks genuinely nonconvex models (and smooth atoms like sin/cos) — but trades the global guarantee for a local one; use starting points and best_of.
  • It needs the CRAN packages sparsediff + Uno (the practical R default; ipopt is optional and not on CRAN).

For the full treatment — duals on the NLP path, random-restart details, the smooth atom table — see the DNLP Tutorial.