Generalized Benders Decomposition

Danger

Non-linear features in BendersLib are experimental.

BendersLib currently supports only linear Benders cuts. Therefore, the user must ensure that the problem is linearly separable as described in the special cases for both optimality and feasibility cuts.

The Generalized Benders Decomposition (GBD) [1] extends the Classical Benders Decomposition from Mixed-Integer Linear Programming (MILP) to a broader class of Mixed-Integer Non-Linear Programming (MINLP). It maintains the same core idea of decomposing the problem into a master problem and a subproblem but uses more general principles from convex optimization.

Original Problem

The GBD is applied to MINLP where the variables can be partitioned into complicating variables \(x\) (typically integer) and easy variables \(y\) (typically continuous), such that when \(x\) is fixed, the remaining problem in \(y\) is a convex optimization problem.

Consider a general MINLP problem of the following form.

\[\begin{split}\begin{aligned} \min_{x, y} \quad & f(x, y) \\ \text{s.t.} \quad & g(x, y) \leq 0 \\ & x \in X \\ & y \in Y \end{aligned}\end{split}\]

In the formulation above, \(f\) and \(g\) are functions, \(X\) is a non-empty compact set, and \(Y\) is a non-empty convex set. The key assumptions for GBD are as follows.

  • For any fixed \(\bar{x} \in X\), the functions \(f(\bar{x}, y)\) and \(g(\bar{x}, y)\) are convex in \(y\).

  • For any fixed \(\bar{x} \in X\), the subproblem satisfies a constraint qualification (e.g., Slater’s condition) to ensure strong duality.

Slater’s Condition

Slater’s condition is a constraint qualification in convex optimization. Its main purpose is to guarantee strong duality. The condition is satisfied if there exists a strictly feasible point (a point that lies in the interior of the feasible region). If Slater’s condition holds for a convex problem, the optimal value of the primal problem equals the optimal value of its dual.

Reformulation

The reformulation projects the original problem onto the space of the complicating variables \(x\). A master problem is defined in terms of \(x\) and a scalar \(\eta\) that represents the objective value \(f(x,y)\). The master problem is defined as follows and is iteratively refined with cuts.

\[\begin{split}\begin{aligned} \min_{x, \eta} \quad & \eta \\ \text{s.t.} \quad & x \in X \\ & \eta \geq \dots (\text{optimality cuts}) \\ & 0 \geq \dots (\text{feasibility cuts}) \end{aligned}\end{split}\]

For a candidate solution \(\bar{x}\) from the master problem, the primal subproblem is a convex Non-Linear Programming problem that evaluates the best possible outcome for that choice of \(\bar{x}\).

\[v(\bar{x}) = \min_{y \in Y} \{ f(\bar{x}, y) \mid g(\bar{x}, y) \leq 0 \}\]

Instead of Linear Programming duality, GBD relies on the more general Lagrange Duality. The Lagrangian function for the subproblem is defined as follows.

\[L(\bar{x}, y, \lambda) = f(\bar{x}, y) + \lambda^T g(\bar{x}, y)\]

Then the Lagrangian dual function is defined as follows.

\[\inf_{y \in Y} L(\bar{x}, y, \lambda) = \inf_{y \in Y} \{ f(\bar{x}, y) + \lambda^T g(\bar{x}, y) \}\]

Note

The Lagrangian dual function is a lower bound on the optimal value of the primal subproblem for any \(\lambda \geq 0\).

The Lagrangian dual subproblem is then defined to find the optimal Lagrange multipliers \(\lambda\) by maximizing the Lagrangian dual function.

\[\max_{\lambda \geq 0} \inf_{y \in Y} L(\bar{x}, y, \lambda)\]

The optimal Lagrange multipliers \(\bar{\lambda}\) for the subproblem are crucial for generating cuts. The values of Lagrange multipliers play a similar role to values of dual variables in classical Benders Decomposition.

Special Case: Separable Problems

A common special case is when the functions \(f(x, y)\) and \(g(x, y)\) are separable between \(x\) and \(y\). This property is essential for simplifying the subproblem formulation and is a prerequisite for deriving linear Benders cuts. This means they can be expressed as

\[f(x, y) = f_x(x) + f_y(y)\]
\[g(x, y) = g_x(x) + g_y(y)\]

where \(f_x(x)\) and \(g_x(x)\) depend only on \(x\), and \(f_y(y)\) and \(g_y(y)\) depend only on \(y\). In this case, the prime subproblem simplifies to

\[v(\bar{x}) = \min_{y \in Y} \{ f_x(\bar{x}) + f_y(y) \mid g_y(y) \leq -g_x(\bar{x}) \}.\]

It Lagrangian function becomes

\[L(\bar{x}, y, \lambda) = f_x(\bar{x}) + f_y(y) + \lambda^T (g_y(y) + g_x(\bar{x})).\]

The Lagrangian dual subproblem is

\[\max_{\lambda \geq 0} \{ f_x(\bar{x}) + \lambda^T g_x(\bar{x}) + \inf_{y \in Y} \{ f_y(y) + \lambda^T g_y(y) \} \}.\]

By solving the dual subproblem we can obtain the optimal Lagrange multipliers and the objective value of the subproblem for generating Benders cuts.

Benders Cuts

Generalized Optimality Cut

  • When is it generated? When the Lagrangian dual subproblem for \(\bar{x}\) is feasible, yielding an optimal primal solution \(\bar{y}\) and optimal Lagrange multipliers \(\bar{\lambda}\).

  • What is the logic? For the fixed multipliers \(\bar{\lambda}\) obtained from solving the subproblem at \(\bar{x}\), the Lagrangian dual function \(\inf_{y \in Y} L(x, y, \bar{\lambda})\) provides a valid lower bound for the true value function \(v(x)\) for all \(x\). If the Lagrange multipliers are optimal and the strong duality holds, the bound is tight at \(\bar{x}\).

  • The Cut: We add the following constraint to the master problem, which may be nonlinear in \(x\).

    \[\eta \geq \inf_{y \in Y} \{ f(x, y) + \bar{\lambda}^T g(x, y) \}\]

Caution

BendersLib currently supports only linear Benders cuts. Therefore, the user must ensure that the problem is linearly separable as described below.

Special Case: Linear Optimality Cuts

The generalized optimality cut becomes linear in \(x\) if the functions \(f(x, y)\) and \(g(x, y)\) meet two conditions: they are separable between \(x\) and \(y\), and the parts involving \(x\) are linear.

1. Separability

First, assume the functions are separable, the cut becomes

\[\eta \geq \inf_{y \in Y} \{ f_x(x) + f_y(y) + \bar{\lambda}^T (g_x(x) + g_y(y)) \}.\]

Since \(f_x(x)\) and \(g_x(x)\) do not depend on \(y\), they can be moved outside the infimum as

\[\eta \geq f_x(x) + \bar{\lambda}^T g_x(x) + \inf_{y \in Y} \{ f_y(y) + \bar{\lambda}^T g_y(y) \}\]

where \(\inf_{y \in Y} \{ f_y(y) + \bar{\lambda}^T g_y(y) \}\) is a constant. It can be nonlinear due to \(f_x(x)\) and \(g_x(x)\).

2. Linearity in x

If we further assume the functions are linear with respect to \(x\), i.e., \(f_x(x) = c^T x\) and \(g_x(x) = - A x\), the cut simplifies to a linear expression

(1)\[\eta \geq c^T x - \bar{\lambda}^T A x + K(\bar{\lambda}),\]

where \(K(\bar{\lambda}) = \inf_{y \in Y} \{ f_y(y) + \bar{\lambda}^T g_y(y) \}\) is a constant, implying the cut is a linear inequality in \(x\). Note that here we use \(-A\) as coefficient matrix to be consistent with the Classical Benders Decomposition formulation. Consider the constraint \(Ax + Dy \geq b\) (\(\geq\) constraint) in the Classical Benders Decomposition. It can be rewritten as \(b - Ax - Dy \leq 0\) (\(\leq\) constraint) in the \(g(x,y) \leq 0\) format used here. Therefore, the coefficient matrix for \(x\) in \(g(x,y)\) is \(-A\).

Transforming to Classical Benders Cut

If the linearly separable conditions are met, the generalized optimality cut can be transformed into the first-order Taylor approximation form of classical Benders optimality cut.

For a given \(\bar{x}\), let (\(\bar{y}\), \(\bar{\lambda}\)) be the optimal dual solution. The primal subproblem objective is \(f_x(\bar{x}) + f_y(\bar{y})\). The dual subproblem objective is \(f_x(\bar{x}) + \bar{\lambda}^T g_x(\bar{x}) + \inf_{y \in Y} \{ f_y(y) + \bar{\lambda}^T g_y(y) \}\). By strong duality, they are equal. Using the definitions \(f_x(x) = c^T x\), \(g_x(x) = - A x\), and \(K(\bar{\lambda}) = \inf_{y \in Y} \{ f_y(y) + \bar{\lambda}^T g_y(y) \}\), we get

\[c^T \bar{x} + f_y(\bar{y}) = c^T \bar{x} - \bar{\lambda}^T A \bar{x} + K(\bar{\lambda}).\]

This simplifies to an expression for \(K(\bar{\lambda})\)

\[K(\bar{\lambda}) = f_y(\bar{y}) + \bar{\lambda}^T A \bar{x}.\]

Substituting this back into (1), we have

\[\eta \geq c^T x - \bar{\lambda}^T A x + (f_y(\bar{y}) + \bar{\lambda}^T A \bar{x}).\]

By rearranging terms, we get

\[\eta - c^T x \geq f_y(\bar{y}) - \bar{\lambda}^T A (x - \bar{x}).\]

This simplifies to the classical Benders optimality cut (first-order Taylor approximation form)

(2)\[\eta' \geq f_y(\bar{y}) - \bar{\lambda}^T A (x - \bar{x}) \quad \text{[Optimality Cut]}\]

where \(\eta' = \eta - c^T x\) is the estimator variable with the same definition in the Classical Benders Decomposition.

Tip

When the linearly separable condition is satisfied, and we have a black-box nonlinear solver that can provide the optimal objective value and Lagrange multipliers of convex Non-Linear Programming problems, the generalized optimality cut can be implemented using (2).

Generalized Feasibility Cut

  • When is it generated? When the primal subproblem for a given \(\bar{x}\) is infeasible. This means there is no \(y \in Y\) that satisfies the constraints \(g(\bar{x}, y) \leq 0\).

  • What is the logic? If the subproblem is infeasible, we need to add a constraint to the master problem to cut off the region of \(x\) values that leads to this infeasibility. This is achieved using a generalization of Farkas’ Lemma: if the set \(\{ y \in Y \mid g(\bar{x}, y) \leq 0 \}\) is empty, then there exists a vector \(\bar{\mu} \geq 0\), \(\bar{\mu} \neq 0\), such that for all \(y \in Y\), \(\bar{\mu}^T g(\bar{x}, y) > 0\). Or equivalently,

    \[\inf_{y \in Y} \{ \bar{\mu}^T g(\bar{x}, y) \} > 0.\]
  • The Cut: The feasibility cut forces any new candidate solution \(x\) to satisfy the condition that would have made the subproblem feasible. The cut added to the master problem is

    (3)\[\inf_{y \in Y} \{ \bar{\mu}^T g(x, y) \} \leq 0.\]

Caution

BendersLib currently supports only linear Benders cuts. Therefore, the user must ensure that the problem is linearly separable as described below.

Special Case: Linear Feasibility Cuts

Similar to the optimality cut, the generalized feasibility cut becomes linear if the function \(g(x, y)\) is separable and linear in \(x\).

1. Separability

If \(g(x, y) = g_x(x) + g_y(y)\), the cut becomes

\[\inf_{y \in Y} \{ \bar{\mu}^T (g_x(x) + g_y(y)) \} \leq 0.\]

Moving the term involving only \(x\) outside the infimum, we get

\[\bar{\mu}^T g_x(x) + \inf_{y \in Y} \{ \bar{\mu}^T g_y(y) \} \leq 0.\]

2. Linearity in x

If we further assume \(g_x(x) = A x\), the cut simplifies to a linear inequality

(4)\[\bar{\mu}^T A x + K'(\bar{\mu}) \leq 0, \quad \text{[Feasibility Cut]}\]

where \(K'(\bar{\mu}) = \inf_{y \in Y} \{ \bar{\mu}^T g_y(y) \}\) is a constant that can be computed separately using unconstrained optimization methods over \(y\).

Obtaining the Farkas Multiplier \(\bar{\mu}\)

For a general convex Non-Linear Programming subproblem, you can solve an auxiliary “feasibility-seeking” problem to find \(\bar{\mu}\). A common approach is to introduce slack variables and minimize their sum.

\[\begin{split}\begin{aligned} \min_{y, s} \quad & \sum_i s_i \\ \text{s.t.} \quad & g_i(\bar{x}, y) \leq s_i, \quad \forall i \\ & s_i \geq 0, \quad \forall i \\ & y \in Y \end{aligned}\end{split}\]

If the original subproblem is infeasible, the optimal objective of this auxiliary problem will be positive. The optimal Lagrange multipliers associated with the constraints \(g_i(\bar{x}, y) \leq s_i\) are the components of the Farkas multiplier vector \(\bar{\mu}\).

Tip

When the linearly separable condition is satisfied, and we have a black-box nonlinear solver that can provide a \(\bar{\mu}\) for infeasible convex Non-Linear Programming problems, we can solve \(\min_{y \in Y} \{ \bar{\mu}^T g_y(y) \}\) to compute \(K'(\bar{\mu})\) and implement (4).

Transforming to Classical Benders Cut

When the subproblem is a Linear Programming problem, the generalized feasibility cut simplifies to the classical Benders feasibility cut.

Consider a subproblem with constraints \(Dy \geq b - Ax\), which in the format \(g(x,y) \leq 0\) is \(b - Ax - Dy \leq 0\). The generalized cut (3) is \(\bar{\mu}^T (b - Ax) + \inf_{y \geq 0} \{ -\bar{\mu}^T Dy \} \leq 0\). The key is to show that \(\inf_{y \geq 0} \{ -\bar{\mu}^T Dy \} = 0\).

The Farkas multiplier \(\bar{\mu}\) is an extreme ray from the dual of the subproblem, satisfying \(D^T \bar{\mu} \leq 0\) and \(\bar{\mu} \geq 0\). From \(D^T \bar{\mu} \leq 0\), we get \(\bar{\mu}^T D \leq 0\). Since \(y \geq 0\), the term \(\bar{\mu}^T D y\) is always non-positive. Therefore, \(-\bar{\mu}^T D y\) is always non-negative. The infimum of a non-negative expression that can be zero (e.g., at \(y=0\)) is zero.

This reduces the generalized cut to \(\bar{\mu}^T (b - Ax) \leq 0\), which is the feasibility cut in the Classical Benders Decomposition.

Algorithm

The algorithm is an iterative process that alternates between solving the master problem, solving the subproblem, and adding generalized Benders cuts to the master problem based on the subproblem’s results.

  1. Initialization

    • Initialize a lower bound \(LB = -\infty\) and an upper bound \(UB = +\infty\).

    • Add any initial cuts to the master problem if available. Often, the algorithm starts with no cuts.

    • Set the iteration counter \(k = 1\).

  2. Step 1: Solve the master problem

    • Solve the current master problem to get a solution \((\bar{x}_k, \bar{\eta}_k)\).

    • The objective value of the master problem, \(\bar{\eta}_k\), is a valid lower bound on the optimal solution of the original problem. Update \(LB = \max(LB, \bar{\eta}_k)\).

  3. Step 2: Check for convergence

    • If \(UB - LB \leq \epsilon\) (where \(\epsilon\) is a small tolerance), then STOP. The optimal solution has been found within the desired tolerance. The best solution found so far that yielded a feasible subproblem is the optimal solution.

  4. Step 3: Solve the subproblem

    • Using the fixed value \(\bar{x}_k\) from the master problem, solve the convex Non-Linear Programming subproblem.

  5. Step 4: Generate and add a cut

    • Case A: Subproblem is feasible.

      • Obtain the optimal subproblem solution \(\bar{y}_k\) and the corresponding optimal Lagrange multipliers \(\bar{\lambda}_k\).

      • The objective value for this feasible solution is \(f(\bar{x}_k, \bar{y}_k)\).

      • Update the upper bound by \(UB = \min(UB, f(\bar{x}_k, \bar{y}_k))\).

      • Add a generalized optimality cut to the master problem for lower bounding \(\eta\).

    Caution

    • For \(-\infty < \eta < +\infty\) introduced in the master problem for optimality cuts, \(-\infty\) should be replaced with a sufficiently large negative number that remains as small as possible to ensure validity, to avoid numerical issues in practice.

    • In BendersLib, \(-\infty\) is provided as a customizable parameter BendersParams.theta_lb; \(+\infty\) is set to solvers’ default upper bound for unbounded variables.

    • Case B: Subproblem is infeasible.

      • Find a Farkas multiplier (or extreme ray) \(\bar{\mu}_k\) and add a generalized feasibility cut to the master problem.

  6. Step 5: Loop

    • Increment \(k = k + 1\) and go back to Step 1.

Given the set of complicating variables, the generalized Benders Decomposition method has a standard way to formulate the master problem, subproblem, and Benders cuts. Therefore, these procedures can be automated, and users only need to provide the original problem and specify which variables are complicating variables.

See also

References