optimization_solve_qp_using_smo.h 8.89 KB
Newer Older
Davis King's avatar
Davis King committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// Copyright (C) 2010  Davis E. King (davis@dlib.net)
// License: Boost Software License   See LICENSE.txt for the full license.
#ifndef DLIB_OPTIMIZATION_SOLVE_QP_UsING_SMO_H__
#define DLIB_OPTIMIZATION_SOLVE_QP_UsING_SMO_H__

#include "optimization_solve_qp_using_smo_abstract.h"
#include "../matrix.h"

namespace dlib
{

// ----------------------------------------------------------------------------------------

    /*
        The algorithm defined in the solve_qp_using_smo() function below can be
        derived by using an important theorem from the theory of constrained optimization.
        This theorem tells us that any optimal point of a constrained function must
        satisfy what are called the KKT conditions (also sometimes called just the KT 
        conditions, especially in older literature).  A very good book to consult 
        regarding this topic is Practical Methods of Optimization (second edition) by 
        R. Fletcher.  Below I will try to explain the general idea of how this is 
        applied.

        Let e == ones_matrix(alpha.size(),1)

        First, note that the function below solves the following quadratic program.  
            Minimize: f(alpha) == 0.5*trans(alpha)*Q*alpha - trans(alpha)*b
            subject to the following constraints:
                - trans(e)*alpha == C (i.e. the sum of alpha values doesn't change)
                - min(alpha) >= 0 (i.e. all alpha values are nonnegative)


        To get from this problem formulation to the algorithm below we have to 
        consider the KKT conditions.  They tell us that any solution to the above
        problem must satisfy the following 5 conditions:
            1. trans(e)*alpha == C
            2. min(alpha) >= 0

            3. Let L(alpha, x, y) == f(alpha) - trans(x)*alpha - y*(trans(e)*alpha - C)
               Where x is a vector of length alpha.size() and y is a single scalar.
               Then the derivative of L with respect to alpha must == 0
               So we get the following as our 3rd condition:
               f'(alpha) - x - y*e == 0

            4. min(x) >= 0 (i.e. all x values are nonnegative)
            5. pointwise_multiply(x, alpha) == 0
               (i.e. only one member of each x(i) and alpha(i) pair can be non-zero)
        
        
        From 3 we can easily obtain this rule:
            for all i: f'(alpha)(i) - x(i) == y

        If we then consider 4 and 5 we see that we can infer that the following
        must also be the case:
            - if (alpha(i) > 0) then
                - x(i) == 0
                - f'(alpha)(i) == y
            - else
                - x(i) == some nonnegative number
                - f'(alpha)(i) >= y

        
        The important thing to take away is the final rule.  It tells us that at the
        optimal solution all elements of the gradient of f have the same value if 
        their corresponding alpha is non-zero.  It also tells us that all the other
66
67
        gradient values are bigger than y.  We can use this information to help us
        pick which alpha variables to optimize at each iteration. 
Davis King's avatar
Davis King committed
68
69
70
71
72
73
74
75
76
    */

// ----------------------------------------------------------------------------------------

    template <
        typename EXP1,
        typename EXP2,
        typename T, long NR, long NC, typename MM, typename L
        >
77
    unsigned long solve_qp_using_smo ( 
Davis King's avatar
Davis King committed
78
79
80
        const matrix_exp<EXP1>& Q,
        const matrix_exp<EXP2>& b,
        matrix<T,NR,NC,MM,L>& alpha,
81
82
        T eps,
        unsigned long max_iter
Davis King's avatar
Davis King committed
83
84
85
86
87
88
89
90
    )
    {
        // make sure requires clause is not broken
        DLIB_ASSERT(Q.nr() == Q.nc() &&
                     is_col_vector(b) &&
                     is_col_vector(alpha) &&
                     b.size() == alpha.size() &&
                     b.size() == Q.nr() &&
Davis King's avatar
Davis King committed
91
                     alpha.size() > 0 &&
Davis King's avatar
Davis King committed
92
                     min(alpha) >= 0 &&
93
94
                     eps > 0 &&
                     max_iter > 0,
Davis King's avatar
Davis King committed
95
96
97
98
99
100
101
102
103
104
105
                     "\t void solve_qp_using_smo()"
                     << "\n\t Invalid arguments were given to this function"
                     << "\n\t Q.nr():               " << Q.nr()
                     << "\n\t Q.nc():               " << Q.nc()
                     << "\n\t is_col_vector(b):     " << is_col_vector(b)
                     << "\n\t is_col_vector(alpha): " << is_col_vector(alpha)
                     << "\n\t b.size():             " << b.size() 
                     << "\n\t alpha.size():         " << alpha.size() 
                     << "\n\t Q.nr():               " << Q.nr() 
                     << "\n\t min(alpha):           " << min(alpha) 
                     << "\n\t eps:                  " << eps 
106
                     << "\n\t max_iter:             " << max_iter 
Davis King's avatar
Davis King committed
107
108
        );

109
110
        const T C = sum(alpha);

Davis King's avatar
Davis King committed
111
112
113
114
115
        // Compute f'(alpha) (i.e. the gradient of f(alpha)) for the current alpha.  
        matrix<T,NR,NC,MM,L> df = Q*alpha - b;

        const T tau = 1000*std::numeric_limits<T>::epsilon();

116
        T big, little;
117
118
        unsigned long iter = 0;
        for (; iter < max_iter; ++iter)
Davis King's avatar
Davis King committed
119
120
        {
            // Find the two elements of df that satisfy the following:
121
            //    - little_idx == index_of_min(df)
Davis King's avatar
Davis King committed
122
123
124
            //    - big_idx   == the index of the largest element in df such that alpha(big_idx) > 0
            // These two indices will tell us which two alpha values are most in violation of the KKT 
            // optimality conditions.  
125
            big = -std::numeric_limits<T>::max();
Davis King's avatar
Davis King committed
126
            long big_idx = 0;
127
            little = std::numeric_limits<T>::max();
128
            long little_idx = 0;
Davis King's avatar
Davis King committed
129
130
131
132
133
134
135
            for (long i = 0; i < df.nr(); ++i)
            {
                if (df(i) > big && alpha(i) > 0)
                {
                    big = df(i);
                    big_idx = i;
                }
136
                if (df(i) < little)
Davis King's avatar
Davis King committed
137
                {
138
139
                    little = df(i);
                    little_idx = i;
Davis King's avatar
Davis King committed
140
141
142
                }
            }

143
144
145
146
147
148
149
150
151
152
            // Check if the KKT conditions are still violated and stop if so.  
            //if (alpha(little_idx) > 0 && (big - little) < eps)
            //    break;

            // Check how big the duality gap is and stop when it goes below eps.  
            // The duality gap is the gap between the objective value of the function
            // we are optimizing and the value of it's primal form.  This value is always 
            // greater than or equal to the distance to the optimum solution so it is a 
            // good way to decide if we should stop.   See the book referenced above for 
            // more information.  In particular, see the part about the Wolfe Dual.
Davis King's avatar
Davis King committed
153
            if (trans(alpha)*df - C*little < eps)
Davis King's avatar
Davis King committed
154
155
156
157
158
                break;


            // Save these values, we will need them later.
            const T old_alpha_big = alpha(big_idx);
159
            const T old_alpha_little = alpha(little_idx);
Davis King's avatar
Davis King committed
160
161
162


            // Now optimize the two variables we just picked. 
163
            T quad_coef = Q(big_idx,big_idx) + Q(little_idx,little_idx) - 2*Q(big_idx, little_idx);
Davis King's avatar
Davis King committed
164
165
            if (quad_coef <= tau)
                quad_coef = tau;
166
            const T delta = (big - little)/quad_coef;
Davis King's avatar
Davis King committed
167
            alpha(big_idx) -= delta;
168
            alpha(little_idx) += delta;
Davis King's avatar
Davis King committed
169
170
171
172
173
174
175
176

            // Make sure alpha stays feasible.  That is, make sure the updated alpha doesn't
            // violate the non-negativity constraint.  
            if (alpha(big_idx) < 0)
            {
                // Since an alpha can't be negative we will just set it to 0 and shift all the
                // weight to the other alpha.
                alpha(big_idx) = 0;
177
                alpha(little_idx) = old_alpha_big + old_alpha_little;
Davis King's avatar
Davis King committed
178
179
            }

180
181
            // Every 300 iterations
            if ((iter%300) == 299)
182
183
184
185
186
187
188
189
190
191
            {
                // Perform this form of the update every so often because doing so can help
                // avoid the buildup of numerical errors you get with the alternate update
                // below.
                df = Q*alpha - b;
            }
            else
            {
                // Now update the gradient. We will perform the equivalent of: df = Q*alpha - b;
                const T delta_alpha_big   = alpha(big_idx) - old_alpha_big;
192
                const T delta_alpha_little = alpha(little_idx) - old_alpha_little;
193
194

                for(long k = 0; k < df.nr(); ++k)
Davis King's avatar
Davis King committed
195
                    df(k) += Q(big_idx,k)*delta_alpha_big + Q(little_idx,k)*delta_alpha_little;;
196
            }
Davis King's avatar
Davis King committed
197
        }
198

199
200
201
202
203
204
205
206
207
        /*
        using namespace std;
        cout << "SMO: " << endl;
        cout << "   duality gap: "<< trans(alpha)*df - C*min(df) << endl;
        cout << "   KKT gap:     "<< big-little << endl;
        cout << "   iter:        "<< iter+1 << endl;
        cout << "   eps:         "<< eps << endl;
        */

208
        return iter+1;
Davis King's avatar
Davis King committed
209
210
211
212
213
214
215
216
    }

// ----------------------------------------------------------------------------------------

}

#endif // DLIB_OPTIMIZATION_SOLVE_QP_UsING_SMO_H__