2 October 2024
Academic research papers can be a valuable source of material for creating and improving real world optimization models. But, as we have commented previously, we wish that academics would publish working code and data to accompany their papers.
In many papers, the mathematical formulations are poorly defined and hard to understand – often they are just plain wrong. But even when the math is correct, the lack of code and data accompanying most academic papers makes them difficult and time-consuming to replicate, which greatly undermines their value.
Some academic institutions understand the importance and value of publishing data and code, for example:
Reproducible science requires not just sharing the paper and the data, but the methods as well. When those methods are partly or entirely captured by computer code, sharing that code is a powerful step towards making your work fully reproducible. Not only does this allow verification and detailed exploration of your findings, but it also facilitates further research; well-written code can be re-used and even built upon, as a scientific product in and of itself.
Publishing code and software, Utrecht University, The Netherlands
But such a sentiment is an exception rather than the norm.
In this article:
- Firstly, we briefly look at some reasons why academics might be reluctant to publish their data and code.
- Then we replicate, modify, and explore a published model that has been done well, with the data and program code publicly available.
12 September 2024
We complete an exploration of methods to solve our device rack positioning problem. The methods are:
- Model 1. Enumerate all possible position orders.
- Model 2. Search randomly for a specified time.
- Model 3. Local search around random starting positions.
- Model 4. Constraint programming using OR-Tools.
- Model 5. Mixed integer linear programming using Pyomo.
Of the models we've looked at so far, the constraint programming method works best. Model 4 quickly finds optimal solutions for all cases except the largest (24 device) case. Even then, its solution for the 24 device case is probably optimal.
In this article, we develop Model 5, which formulates the situation as a Mixed Integer Linear Program (MILP) in Pyomo. Will this method work as well as constraint programming, or perhaps even better?
26 August 2024
We continue our exploration of methods to solve our device rack positioning problem. The methods are:
- Model 1. Enumerate all possible position orders.
- Model 2. Search randomly for a specified time.
- Model 3. Local search around random starting positions.
- Model 4. Constraint programming using OR-Tools.
- Model 5. Mixed integer linear programming using Pyomo.
The previous articles describe the enumeration and parallel random search methods. Enumeration finds optimal solutions for the small data sets, but performs poorly for larger data sets. The random search method finds good solutions for the larger data sets, but those solutions are unlikely to be optimal. Using a more focussed local search method, we improved the solutions even further.
In this article, we develop Model 4 which formulates the situation as a Constraint Programming problem and solves it using the CP-SAT solver in OR-Tools. Perhaps this more sophisticated method will enable us to find optimal solutions for all of our data sets?
13 August 2024
We continue our exploration of methods to solve our device rack positioning problem. The methods are:
- Model 1. Enumerate all possible position orders.
- Model 2. Search randomly for a specified time.
- Model 3. Local search around random starting positions.
- Model 4. Constraint programming using OR-Tools.
- Model 5. Mixed integer linear programming using Pyomo.
The previous article describes Model 2, which uses a parallel random search method to find good solutions. Model 2 works surprisingly well, even for the larger data sets, finding solutions that are up to 50% better than those we found by partial enumeration in Model 1.
In this article, we develop Model 3, which implements a variation of the random search method. We start with a solution (initially random), then swap the positions of a pair of devices. If that solution is better, then it becomes the incumbent solution and we repeat the process, otherwise we keep swapping positions in the initial solution. In effect, this process is a simple heuristic for searching the solution space around a solution. This method, known as a "local search", has the potential to improve the initial solutions. Will local search perform better than a pure random search in this situation?
23 July 2024
We continue our exploration of methods to solve our device rack positioning problem. The methods are:
- Model 1. Enumerate all possible position orders.
- Model 2. Search randomly for a specified time.
- Model 3. Local search around random starting positions.
- Model 4. Constraint programming using OR-Tools.
- Model 5. Mixed integer linear programming using Pyomo.
The previous article describes Model 1, which uses enumeration of all possible permutations. That method works well for a small number of devices, up to about 12. But it fails to find good solutions for larger data sets because it takes too long to run – up to tens of billions of years, for the larger cases.
In this article, we develop Model 2, which tries a parallel random search method. The idea is to randomly generate possible solutions for a specified time, reporting the best solution we find. This method has an advantage over enumeration: it samples from all parts of the solution space, unlike Model's 1 enumeration method which is limited by run time to a specific part of the solution space. But will Model 2 perform better than Model 1?
30 June 2024
Most of the optimization models we work with involve finding the best combination of discrete decision variables. For example, allocating people to teams, designing warehouses, or scheduling staff.
A frequent question is, "Why not just look at all possible combinations and pick the best one?"
The answer is, sometimes, that's exactly what we do. In some situations, enumerating all feasible combinations of the variables is the easiest method for solving a problem.
But, in most situations, the number of combinations is too large. Many orders of magnitude too large. That's when we need more sophisticated methods, like mixed integer linear programming and constraint programming.
In this series of articles, we look at a simple situation that requires deciding the best order for positioning devices in a rack. We use five methods for solving this problem:
- Model 1. Enumerate all possible position orders.
- Model 2. Search randomly for a specified time.
- Model 3. Local search around random starting positions.
- Model 4. Constraint programming using OR-Tools.
- Model 5. Mixed integer linear programming using Pyomo.
Along the way, we see how a problem's size can quickly escalate to a colossal magnitude. We also demonstrate how, contrary to popular belief, that magnitude is not necessarily a barrier to finding a good solution.
20 May 2024
When formulating an optimization model, a common question is "How do I express an IF function as a constraint?". Linear programs can't represent an IF function directly, so we need to use some linearization tricks to achieve the behaviour we want.
In previous articles, we've provided references to a variety of Transformation and linearization techniques, along with MIP formulations and linearizations. We've also described in detail how to formulate Logic conditions as constraints.
In this article, we examine the answers to a question on Operations Research Stack Exchange: Linear condition between two continuous variables. Specifically, the question asked:
There are two real variables \(x\) and \(y\). The conditions are such that:
\(\text{if } y \le 0, \text{then } x = 0\)
\(\text{if } y \gt 0, \text{then } x = y\)
How to write linear equations or inequalities to satisfy both the conditions?
Three answers are provided on Stack Exchange:
- Formulation 1. A special case method that has the advantage of being a pure linear program, though it works correctly only when the model has a specific form of objective function.
- Formulation 2. Uses a BigM approach that would normally work, but the answer has a subtle error.
- Formulation 3. Essentially the same as Formulation 2, except that it is correct.
We illustrate each of the methods both mathematically and graphically, to show how they are intended to mimic the required IF statements.
In addition, we derive a formulation from the more general situation for the constraint \(x = max(y, z)\).
18 April 2024
When doing analysis with an optimization model we commonly need to run many cases. For example, exploring a range of input value scenarios or evaluating a set of alternative designs.
In this article, we explore running optimization model cases in parallel. Specifically, we use the Python multiprocessing and mpi4py libraries to fully use the many CPU cores/threads in modern computers.
Our goals are to:
- Illustrate how to apply the multiprocessing and mpi4py libraries to running optimization model cases in parallel.
- Measure the performance of running cases in parallel compared with serially.
- Compare the performance of an old 4 core / 4 thread CPU with a new 20 core / 28 thread CPU, using the HiGHS solver.
As an example, we convert the serial model from a previous article, Warehouse space for free: Exogenous enumeration, to run multiple cases in parallel.
14 April 2024
Nathan Brixius has an interesting recent blog article about an old mixed integer programming (MIP) problem: Don Knuth’s MIP, 64 years later.
Don Knuth, a legendary computer scientist, wrote a paper in 1960 about optimizing computer performance, Minimizing drum latency time. Knuth formulated a MIP model, with 51 variables and 43 constraints. But he was unable to solve it after more than 10 hours of run time on an IBM 650 Data Processing Machine. One issue was that the machine had only 10 kB of memory. The model was eventually solved, 35 years later in 1995, using CPLEX and a more modern computer.
Knuth describes the model, and its solution, in a note An integer programming problem unsolved for 35 years (in the "Unpublications" section).
Today, Knuth's formulation is a very small model. Demonstrating how much MIP solvers have improved over time, Nathan recreates the model and solves it with SCIP and Gurobi – each taking a fraction of a second to find an optimal solution.
This situation is reminiscent of our article Solver performance: 1989 vs 2024, in which we estimate how much the performance of MIP optimization solvers has improved over the last 35 years. Our analysis was prompted by a 1989 paper about an unsuccessful attempt to use MIP models to compile crossword puzzles. We have somewhat more success in our articles Crossword MILP - Model 1 and Crossword MILP - Model 2.
As Nathan says, the tremendous improvement in MIP solvers is another testament to the amazingly wonderful effectiveness of operations research!
27 March 2024
We conclude our project to improve the efficiency of a pallet warehouse by redesigning the warehouse's racks and shelves.
In part 2 of this article series, we linearized our warehouse racks and shelves design model with the hope of making it easier to solve at the full scale we need. It didn't work. Model 2 is even slower than the non-linear Model 1 we built in part 1.
In this article, we simplify our modelling by taking some of the variables out of the model and making them exogenous inputs. We enumerate all possible combinations of those exogenous inputs, then solve the simplified model for each combination. Hopefully, this time, our model will be solvable at the scale we need.