Optimization

This page documents library components that attempt to find the minimum or maximum of a user supplied function. An introduction to the general purpose non-linear optimizers in this section can be found here. For an example showing how to use the non-linear least squares routines look here.

General Purpose Optimizers find_min find_min_single_variable find_min_using_approximate_derivatives find_min_bobyqa find_max find_max_single_variable find_max_using_approximate_derivatives find_max_bobyqa find_max_trust_region find_min_trust_region
Special Purpose Optimizers solve_qp_using_smo solve_qp2_using_smo solve_qp3_using_smo oca structural_svm_problem solve_least_squares solve_least_squares_lm solve_trust_region_subproblem max_cost_assignment
Strategies cg_search_strategy bfgs_search_strategy newton_search_strategy lbfgs_search_strategy objective_delta_stop_strategy gradient_norm_stop_strategy
Helper Routines derivative negate_function make_line_search_function poly_min_extrap lagrange_poly_min_extrap line_search
derivative dlib/optimization.h dlib/optimization/optimization_abstract.h This is a function that takes another function as input and returns a function object that numerically computes the derivative of the input function. negate_function dlib/optimization.h dlib/optimization/optimization_abstract.h This is a function that takes another function as input and returns a function object that computes the negation of the input function. make_line_search_function dlib/optimization.h dlib/optimization/optimization_line_search_abstract.h This is a function that takes another function f(x) as input and returns a function object l(z) = f(start + z*direction). It is useful for turning multi-variable functions into single-variable functions for use with the line_search routine. poly_min_extrap dlib/optimization.h dlib/optimization/optimization_line_search_abstract.h This function finds the 3rd degree polynomial that interpolates a set of points and returns you the minimum of that polynomial. lagrange_poly_min_extrap dlib/optimization.h dlib/optimization/optimization_line_search_abstract.h This function finds the second order polynomial that interpolates a set of points and returns you the minimum of that polynomial. line_search dlib/optimization.h dlib/optimization/optimization_line_search_abstract.h Performs a gradient based line search on a given function and returns the input that makes the function significantly smaller. cg_search_strategy dlib/optimization.h dlib/optimization/optimization_search_strategies_abstract.h This object represents a strategy for determining which direction a line search should be carried out along. This particular object is an implementation of the Polak-Ribiere conjugate gradient method for determining this direction.

This method uses an amount of memory that is linear in the number of variables to be optimized. So it is capable of handling problems with a very large number of variables. However, it is generally not as good as the L-BFGS algorithm (see the lbfgs_search_strategy class).

optimization_ex.cpp.html
bfgs_search_strategy dlib/optimization.h dlib/optimization/optimization_search_strategies_abstract.h This object represents a strategy for determining which direction a line search should be carried out along. This particular object is an implementation of the BFGS quasi-newton method for determining this direction.

This method uses an amount of memory that is quadratic in the number of variables to be optimized. It is generally very effective but if your problem has a very large number of variables then it isn't appropriate. Instead, you should try the lbfgs_search_strategy.

optimization_ex.cpp.html
newton_search_strategy dlib/optimization.h dlib/optimization/optimization_search_strategies_abstract.h This object represents a strategy for determining which direction a line search should be carried out along. This particular routine is an implementation of the newton method for determining this direction. That means using it requires you to supply a method for creating hessian matrices for the problem you are trying to optimize.

Note also that this is actually a helper function for creating newton_search_strategy_obj objects.

lbfgs_search_strategy dlib/optimization.h dlib/optimization/optimization_search_strategies_abstract.h This object represents a strategy for determining which direction a line search should be carried out along. This particular object is an implementation of the L-BFGS quasi-newton method for determining this direction.

This method uses an amount of memory that is linear in the number of variables to be optimized. This makes it an excellent method to use when an optimization problem has a large number of variables.

optimization_ex.cpp.html
objective_delta_stop_strategy dlib/optimization.h dlib/optimization/optimization_stop_strategies_abstract.h This object represents a strategy for deciding if an optimization algorithm should terminate. This particular object looks at the change in the objective function from one iteration to the next and bases its decision on how large this change is. If the change is below a user given threshold then the search stops. optimization_ex.cpp.html gradient_norm_stop_strategy dlib/optimization.h dlib/optimization/optimization_stop_strategies_abstract.h This object represents a strategy for deciding if an optimization algorithm should terminate. This particular object looks at the norm (i.e. the length) of the current gradient vector and stops if it is smaller than a user given threshold. find_min dlib/optimization.h dlib/optimization/optimization_abstract.h Performs an unconstrained minimization of a nonlinear function using some search strategy (e.g. bfgs_search_strategy). optimization_ex.cpp.html find_min_single_variable dlib/optimization.h dlib/optimization/optimization_line_search_abstract.h Performs a bound constrained minimization of a nonlinear function. The function must be of a single variable. Derivatives are not required. solve_trust_region_subproblem dlib/optimization.h dlib/optimization/optimization_trust_region_abstract.h This function solves the following optimization problem:
Minimize: f(p) == 0.5*trans(p)*B*p + trans(g)*p
subject to the following constraint:
   length(p) <= radius
solve_qp_using_smo dlib/optimization.h dlib/optimization/optimization_solve_qp_using_smo_abstract.h This function solves the following quadratic program:
   Minimize: f(alpha) == 0.5*trans(alpha)*Q*alpha - trans(alpha)*b
   subject to the following constraints:
      sum(alpha) == C 
      min(alpha) >= 0 
   Where f is convex.  This means that Q should be symmetric and positive-semidefinite.
solve_qp2_using_smo dlib/optimization.h dlib/optimization/optimization_solve_qp2_using_smo_abstract.h This function solves the following quadratic program:
   Minimize: f(alpha) == 0.5*trans(alpha)*Q*alpha 
   subject to the following constraints:
      sum(alpha) == nu*y.size() 
      0 <= min(alpha) && max(alpha) <= 1 
      trans(y)*alpha == 0

   Where all elements of y must be equal to +1 or -1 and f is convex.  
   This means that Q should be symmetric and positive-semidefinite.

This object implements the strategy used by the LIBSVM tool. The following papers can be consulted for additional details:
  • Chang and Lin, Training {nu}-Support Vector Classifiers: Theory and Algorithms
  • Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
solve_qp3_using_smo dlib/optimization.h dlib/optimization/optimization_solve_qp3_using_smo_abstract.h This function solves the following quadratic program:
   Minimize: f(alpha) == 0.5*trans(alpha)*Q*alpha + trans(p)*alpha
   subject to the following constraints:
        for all i such that y(i) == +1:  0 <= alpha(i) <= Cp 
        for all i such that y(i) == -1:  0 <= alpha(i) <= Cn 
        trans(y)*alpha == B 

   Where all elements of y must be equal to +1 or -1 and f is convex.  
   This means that Q should be symmetric and positive-semidefinite.

This object implements the strategy used by the LIBSVM tool. The following papers can be consulted for additional details:
  • Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
  • Working Set Selection Using Second Order Information for Training Support Vector Machines by Fan, Chen, and Lin. In the Journal of Machine Learning Research 2005.
max_cost_assignment dlib/optimization.h dlib/optimization/max_cost_assignment_abstract.h This function is an implementation of the Hungarian algorithm (also know as the Kuhn-Munkres algorithm). It solves the optimal assignment problem. For example, suppose you have an equal number of workers and jobs and you need to decide which workers to assign to which jobs. Some workers are better at certain jobs than others. So you would like to find out how to assign them all to jobs such that overall productivity is maximized. You can use this routine to solve this problem and others like it. oca dlib/optimization.h dlib/optimization/optimization_oca_abstract.h This object is a tool for solving the following optimization problem:
   Minimize: f(w) == 0.5*dot(w,w) + C*R(w)

   Where R(w) is a user-supplied convex function and C > 0


For a detailed discussion you should consult the following papers from the Journal of Machine Learning Research:
Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization by Vojtech Franc, Soren Sonnenburg; 10(Oct):2157--2192, 2009.
Bundle Methods for Regularized Risk Minimization by Choon Hui Teo, S.V.N. Vishwanthan, Alex J. Smola, Quoc V. Le; 11(Jan):311-365, 2010.
find_min_bobyqa dlib/optimization.h dlib/optimization/optimization_bobyqa_abstract.h This function defines the dlib interface to the BOBYQA software developed by M.J.D Powell. BOBYQA is a method for optimizing a function in the absence of derivative information. Powell described it as a method that seeks the least value of a function of many variables, by applying a trust region method that forms quadratic models by interpolation. There is usually some freedom in the interpolation conditions, which is taken up by minimizing the Frobenius norm of the change to the second derivative of the model, beginning with the zero matrix. The values of the variables are constrained by upper and lower bounds.

The following paper, published in 2009 by Powell, describes the detailed working of the BOBYQA algorithm.

The BOBYQA algorithm for bound constrained optimization without derivatives by M.J.D. Powell

Note that BOBYQA only works on functions of two or more variables. So if you need to perform derivative-free optimization on a function of a single variable then you should use the find_min_single_variable function.

optimization_ex.cpp.html model_selection_ex.cpp.html
find_max_bobyqa dlib/optimization.h dlib/optimization/optimization_bobyqa_abstract.h This function is identical to the find_min_bobyqa routine except that it negates the objective function before performing optimization. Thus this function will attempt to find the maximizer of the objective rather than the minimizer.

Note that BOBYQA only works on functions of two or more variables. So if you need to perform derivative-free optimization on a function of a single variable then you should use the find_max_single_variable function.

optimization_ex.cpp.html model_selection_ex.cpp.html
find_min_using_approximate_derivatives dlib/optimization.h dlib/optimization/optimization_abstract.h Performs an unconstrained minimization of a nonlinear function using some search strategy (e.g. bfgs_search_strategy). This version doesn't take a gradient function but instead numerically approximates the gradient. optimization_ex.cpp.html solve_least_squares dlib/optimization.h dlib/optimization/optimization_least_squares_abstract.h This is a function for solving non-linear least squares problems. It uses a method which combines the traditional Levenberg-Marquardt technique with a quasi-newton approach. It is appropriate for large residual problems (i.e. problems where the terms in the least squares function, the residuals, don't go to zero but remain large at the solution) least_squares_ex.cpp.html solve_least_squares_lm dlib/optimization.h dlib/optimization/optimization_least_squares_abstract.h This is a function for solving non-linear least squares problems. It uses the traditional Levenberg-Marquardt technique. It is appropriate for small residual problems (i.e. problems where the terms in the least squares function, the residuals, go to zero at the solution) least_squares_ex.cpp.html find_min_trust_region dlib/optimization.h dlib/optimization/optimization_trust_region_abstract.h Performs an unconstrained minimization of a nonlinear function using a trust region method. find_max_trust_region dlib/optimization.h dlib/optimization/optimization_trust_region_abstract.h Performs an unconstrained maximization of a nonlinear function using a trust region method. find_max dlib/optimization.h dlib/optimization/optimization_abstract.h Performs an unconstrained maximization of a nonlinear function using some search strategy (e.g. bfgs_search_strategy). find_max_single_variable dlib/optimization.h dlib/optimization/optimization_line_search_abstract.h Performs a bound constrained maximization of a nonlinear function. The function must be of a single variable. Derivatives are not required. find_max_using_approximate_derivatives dlib/optimization.h dlib/optimization/optimization_abstract.h Performs an unconstrained maximization of a nonlinear function using some search strategy (e.g. bfgs_search_strategy). This version doesn't take a gradient function but instead numerically approximates the gradient. structural_svm_problem dlib/svm.h dlib/svm/structural_svm_problem_abstract.h This object is a tool for solving the optimization problem associated with a structural support vector machine. A structural SVM is a supervised machine learning method for learning to predict complex outputs. This is contrasted with a binary classifier which makes only simple yes/no predictions. A structural SVM, on the other hand, can learn to predict outputs as complex as entire parse trees. To do this, it learns a function F(x,y) which measures how well a particular data sample x matches a label y. When used for prediction, the best label for a new x is then given by the y which maximizes F(x,y).

For further information you should consult the following paper:
T. Joachims, T. Finley, Chun-Nam Yu, Cutting-Plane Training of Structural SVMs, Machine Learning, 77(1):27-59, 2009.
Note that this object is essentially a tool for solving the 1-Slack structural SVM with margin-rescaling. Specifically, see Algorithm 3 in the above referenced paper.