Generalized Benders Decomposition¶
Danger
Non-linear features in BendersLib are experimental.
BendersLib currently supports only linear Benders cuts. Therefore, the user must ensure that the problem is linearly separable as described in the special cases for both optimality and feasibility cuts.
The Generalized Benders Decomposition (GBD) [1] extends the Classical Benders Decomposition from Mixed-Integer Linear Programming (MILP) to a broader class of Mixed-Integer Non-Linear Programming (MINLP). It maintains the same core idea of decomposing the problem into a master problem and a subproblem but uses more general principles from convex optimization.
Original Problem¶
The GBD is applied to MINLP where the variables can be partitioned into complicating variables \(x\) (typically integer) and easy variables \(y\) (typically continuous), such that when \(x\) is fixed, the remaining problem in \(y\) is a convex optimization problem.
Consider a general MINLP problem of the following form.
In the formulation above, \(f\) and \(g\) are functions, \(X\) is a non-empty compact set, and \(Y\) is a non-empty convex set. The key assumptions for GBD are as follows.
For any fixed \(\bar{x} \in X\), the functions \(f(\bar{x}, y)\) and \(g(\bar{x}, y)\) are convex in \(y\).
For any fixed \(\bar{x} \in X\), the subproblem satisfies a constraint qualification (e.g., Slater’s condition) to ensure strong duality.
Slater’s Condition
Slater’s condition is a constraint qualification in convex optimization. Its main purpose is to guarantee strong duality. The condition is satisfied if there exists a strictly feasible point (a point that lies in the interior of the feasible region). If Slater’s condition holds for a convex problem, the optimal value of the primal problem equals the optimal value of its dual.
Reformulation¶
The reformulation projects the original problem onto the space of the complicating variables \(x\). A master problem is defined in terms of \(x\) and a scalar \(\eta\) that represents the objective value \(f(x,y)\). The master problem is defined as follows and is iteratively refined with cuts.
For a candidate solution \(\bar{x}\) from the master problem, the primal subproblem is a convex Non-Linear Programming problem that evaluates the best possible outcome for that choice of \(\bar{x}\).
Instead of Linear Programming duality, GBD relies on the more general Lagrange Duality. The Lagrangian function for the subproblem is defined as follows.
Then the Lagrangian dual function is defined as follows.
Note
The Lagrangian dual function is a lower bound on the optimal value of the primal subproblem for any \(\lambda \geq 0\).
The Lagrangian dual subproblem is then defined to find the optimal Lagrange multipliers \(\lambda\) by maximizing the Lagrangian dual function.
The optimal Lagrange multipliers \(\bar{\lambda}\) for the subproblem are crucial for generating cuts. The values of Lagrange multipliers play a similar role to values of dual variables in classical Benders Decomposition.
Special Case: Separable Problems
A common special case is when the functions \(f(x, y)\) and \(g(x, y)\) are separable between \(x\) and \(y\). This property is essential for simplifying the subproblem formulation and is a prerequisite for deriving linear Benders cuts. This means they can be expressed as
where \(f_x(x)\) and \(g_x(x)\) depend only on \(x\), and \(f_y(y)\) and \(g_y(y)\) depend only on \(y\). In this case, the prime subproblem simplifies to
It Lagrangian function becomes
The Lagrangian dual subproblem is
By solving the dual subproblem we can obtain the optimal Lagrange multipliers and the objective value of the subproblem for generating Benders cuts.
Benders Cuts¶
Generalized Optimality Cut¶
When is it generated? When the Lagrangian dual subproblem for \(\bar{x}\) is feasible, yielding an optimal primal solution \(\bar{y}\) and optimal Lagrange multipliers \(\bar{\lambda}\).
What is the logic? For the fixed multipliers \(\bar{\lambda}\) obtained from solving the subproblem at \(\bar{x}\), the Lagrangian dual function \(\inf_{y \in Y} L(x, y, \bar{\lambda})\) provides a valid lower bound for the true value function \(v(x)\) for all \(x\). If the Lagrange multipliers are optimal and the strong duality holds, the bound is tight at \(\bar{x}\).
The Cut: We add the following constraint to the master problem, which may be nonlinear in \(x\).
\[\eta \geq \inf_{y \in Y} \{ f(x, y) + \bar{\lambda}^T g(x, y) \}\]
Caution
BendersLib currently supports only linear Benders cuts. Therefore, the user must ensure that the problem is linearly separable as described below.
Special Case: Linear Optimality Cuts
The generalized optimality cut becomes linear in \(x\) if the functions \(f(x, y)\) and \(g(x, y)\) meet two conditions: they are separable between \(x\) and \(y\), and the parts involving \(x\) are linear.
1. Separability
First, assume the functions are separable, the cut becomes
Since \(f_x(x)\) and \(g_x(x)\) do not depend on \(y\), they can be moved outside the infimum as
where \(\inf_{y \in Y} \{ f_y(y) + \bar{\lambda}^T g_y(y) \}\) is a constant. It can be nonlinear due to \(f_x(x)\) and \(g_x(x)\).
2. Linearity in x
If we further assume the functions are linear with respect to \(x\), i.e., \(f_x(x) = c^T x\) and \(g_x(x) = - A x\), the cut simplifies to a linear expression
where \(K(\bar{\lambda}) = \inf_{y \in Y} \{ f_y(y) + \bar{\lambda}^T g_y(y) \}\) is a constant, implying the cut is a linear inequality in \(x\). Note that here we use \(-A\) as coefficient matrix to be consistent with the Classical Benders Decomposition formulation. Consider the constraint \(Ax + Dy \geq b\) (\(\geq\) constraint) in the Classical Benders Decomposition. It can be rewritten as \(b - Ax - Dy \leq 0\) (\(\leq\) constraint) in the \(g(x,y) \leq 0\) format used here. Therefore, the coefficient matrix for \(x\) in \(g(x,y)\) is \(-A\).
Transforming to Classical Benders Cut
If the linearly separable conditions are met, the generalized optimality cut can be transformed into the first-order Taylor approximation form of classical Benders optimality cut.
For a given \(\bar{x}\), let (\(\bar{y}\), \(\bar{\lambda}\)) be the optimal dual solution. The primal subproblem objective is \(f_x(\bar{x}) + f_y(\bar{y})\). The dual subproblem objective is \(f_x(\bar{x}) + \bar{\lambda}^T g_x(\bar{x}) + \inf_{y \in Y} \{ f_y(y) + \bar{\lambda}^T g_y(y) \}\). By strong duality, they are equal. Using the definitions \(f_x(x) = c^T x\), \(g_x(x) = - A x\), and \(K(\bar{\lambda}) = \inf_{y \in Y} \{ f_y(y) + \bar{\lambda}^T g_y(y) \}\), we get
This simplifies to an expression for \(K(\bar{\lambda})\)
Substituting this back into (1), we have
By rearranging terms, we get
This simplifies to the classical Benders optimality cut (first-order Taylor approximation form)
where \(\eta' = \eta - c^T x\) is the estimator variable with the same definition in the Classical Benders Decomposition.
Tip
When the linearly separable condition is satisfied, and we have a black-box nonlinear solver that can provide the optimal objective value and Lagrange multipliers of convex Non-Linear Programming problems, the generalized optimality cut can be implemented using (2).
Generalized Feasibility Cut¶
When is it generated? When the primal subproblem for a given \(\bar{x}\) is infeasible. This means there is no \(y \in Y\) that satisfies the constraints \(g(\bar{x}, y) \leq 0\).
What is the logic? If the subproblem is infeasible, we need to add a constraint to the master problem to cut off the region of \(x\) values that leads to this infeasibility. This is achieved using a generalization of Farkas’ Lemma: if the set \(\{ y \in Y \mid g(\bar{x}, y) \leq 0 \}\) is empty, then there exists a vector \(\bar{\mu} \geq 0\), \(\bar{\mu} \neq 0\), such that for all \(y \in Y\), \(\bar{\mu}^T g(\bar{x}, y) > 0\). Or equivalently,
\[\inf_{y \in Y} \{ \bar{\mu}^T g(\bar{x}, y) \} > 0.\]The Cut: The feasibility cut forces any new candidate solution \(x\) to satisfy the condition that would have made the subproblem feasible. The cut added to the master problem is
(3)¶\[\inf_{y \in Y} \{ \bar{\mu}^T g(x, y) \} \leq 0.\]
Caution
BendersLib currently supports only linear Benders cuts. Therefore, the user must ensure that the problem is linearly separable as described below.
Special Case: Linear Feasibility Cuts
Similar to the optimality cut, the generalized feasibility cut becomes linear if the function \(g(x, y)\) is separable and linear in \(x\).
1. Separability
If \(g(x, y) = g_x(x) + g_y(y)\), the cut becomes
Moving the term involving only \(x\) outside the infimum, we get
2. Linearity in x
If we further assume \(g_x(x) = A x\), the cut simplifies to a linear inequality
where \(K'(\bar{\mu}) = \inf_{y \in Y} \{ \bar{\mu}^T g_y(y) \}\) is a constant that can be computed separately using unconstrained optimization methods over \(y\).
Obtaining the Farkas Multiplier \(\bar{\mu}\)
For a general convex Non-Linear Programming subproblem, you can solve an auxiliary “feasibility-seeking” problem to find \(\bar{\mu}\). A common approach is to introduce slack variables and minimize their sum.
If the original subproblem is infeasible, the optimal objective of this auxiliary problem will be positive. The optimal Lagrange multipliers associated with the constraints \(g_i(\bar{x}, y) \leq s_i\) are the components of the Farkas multiplier vector \(\bar{\mu}\).
Tip
When the linearly separable condition is satisfied, and we have a black-box nonlinear solver that can provide a \(\bar{\mu}\) for infeasible convex Non-Linear Programming problems, we can solve \(\min_{y \in Y} \{ \bar{\mu}^T g_y(y) \}\) to compute \(K'(\bar{\mu})\) and implement (4).
Transforming to Classical Benders Cut
When the subproblem is a Linear Programming problem, the generalized feasibility cut simplifies to the classical Benders feasibility cut.
Consider a subproblem with constraints \(Dy \geq b - Ax\), which in the format \(g(x,y) \leq 0\) is \(b - Ax - Dy \leq 0\). The generalized cut (3) is \(\bar{\mu}^T (b - Ax) + \inf_{y \geq 0} \{ -\bar{\mu}^T Dy \} \leq 0\). The key is to show that \(\inf_{y \geq 0} \{ -\bar{\mu}^T Dy \} = 0\).
The Farkas multiplier \(\bar{\mu}\) is an extreme ray from the dual of the subproblem, satisfying \(D^T \bar{\mu} \leq 0\) and \(\bar{\mu} \geq 0\). From \(D^T \bar{\mu} \leq 0\), we get \(\bar{\mu}^T D \leq 0\). Since \(y \geq 0\), the term \(\bar{\mu}^T D y\) is always non-positive. Therefore, \(-\bar{\mu}^T D y\) is always non-negative. The infimum of a non-negative expression that can be zero (e.g., at \(y=0\)) is zero.
This reduces the generalized cut to \(\bar{\mu}^T (b - Ax) \leq 0\), which is the feasibility cut in the Classical Benders Decomposition.
Algorithm¶
The algorithm is an iterative process that alternates between solving the master problem, solving the subproblem, and adding generalized Benders cuts to the master problem based on the subproblem’s results.
Initialization
Initialize a lower bound \(LB = -\infty\) and an upper bound \(UB = +\infty\).
Add any initial cuts to the master problem if available. Often, the algorithm starts with no cuts.
Set the iteration counter \(k = 1\).
Step 1: Solve the master problem
Solve the current master problem to get a solution \((\bar{x}_k, \bar{\eta}_k)\).
The objective value of the master problem, \(\bar{\eta}_k\), is a valid lower bound on the optimal solution of the original problem. Update \(LB = \max(LB, \bar{\eta}_k)\).
Step 2: Check for convergence
If \(UB - LB \leq \epsilon\) (where \(\epsilon\) is a small tolerance), then STOP. The optimal solution has been found within the desired tolerance. The best solution found so far that yielded a feasible subproblem is the optimal solution.
Step 3: Solve the subproblem
Using the fixed value \(\bar{x}_k\) from the master problem, solve the convex Non-Linear Programming subproblem.
Step 4: Generate and add a cut
Case A: Subproblem is feasible.
Obtain the optimal subproblem solution \(\bar{y}_k\) and the corresponding optimal Lagrange multipliers \(\bar{\lambda}_k\).
The objective value for this feasible solution is \(f(\bar{x}_k, \bar{y}_k)\).
Update the upper bound by \(UB = \min(UB, f(\bar{x}_k, \bar{y}_k))\).
Add a generalized optimality cut to the master problem for lower bounding \(\eta\).
Caution
For \(-\infty < \eta < +\infty\) introduced in the master problem for optimality cuts, \(-\infty\) should be replaced with a sufficiently large negative number that remains as small as possible to ensure validity, to avoid numerical issues in practice.
In BendersLib, \(-\infty\) is provided as a customizable parameter
BendersParams.theta_lb; \(+\infty\) is set to solvers’ default upper bound for unbounded variables.
Case B: Subproblem is infeasible.
Find a Farkas multiplier (or extreme ray) \(\bar{\mu}_k\) and add a generalized feasibility cut to the master problem.
Step 5: Loop
Increment \(k = k + 1\) and go back to Step 1.
Given the set of complicating variables, the generalized Benders Decomposition method has a standard way to formulate the master problem, subproblem, and Benders cuts. Therefore, these procedures can be automated, and users only need to provide the original problem and specify which variables are complicating variables.
See also
See Classical Benders Decomposition for basic concepts of Benders decomposition.
An in-depth introduction to GBD theory (in Chinese).
Papers using GBD: facility location [2]; budgeting [3]; inventory management [4].
BendersLib’s implementation of optimality and feasibility cuts:
GeneralizedOCandGeneralizedFC.BendersLib’s implementation of the Benders method:
GeneralizedBenders.Example: Generalized Benders Decomposition.