3 April 2023

In the previous article, we built a staff scheduling optimization model in Excel, using OpenSolver. That model enumerates all possible shift patterns and then decides how many of each shift pattern to use so that we meet the various constraints at least cost.

Our focus in this article is on replicating the Excel model by translating it into Python, using the Pyomo library. That involves formalizing the mathematical formulation, creating appropriate data structures, and representing the formulation in Pyomo syntax. Both models use the CBC solver, so the differences between the models aren't in the solving process, but rather in the build process.

Excel can be a great tool for some types of optimization modelling, either for prototyping or as a production tool. Python offers a wider range of optimization options, and it is usually better for models that have varying data sizes or need to operate at a large scale. The advantages and disadvantages of modelling in Excel and Python are discussed in more detail in our article Optimization in Excel vs Python.

## Download the model

The model is available on GitHub.

## Situation

The modelling situation is identical to that in the previous article. That is, our business opens at 6:00am and closes at 6:00pm. We need to create a shift schedule for our staff to meet the demand, expressed as the minimum number of staff required in each half hour. Demand is low at the start of the working day, rises to a peak at around noon, then declines towards the end of the working day. Our objective is to minimize the total wage cost for the day.

We have three categories of staff. Each category may be assigned to shifts that start at the beginning of any half-hour, provided that:

- Full-time staff work 8 hours per day, plus a 1 hour unpaid break in the middle of their shift. Up to 7 of these staff are available, at a wage of $22.00/hour each.
- Some part-time staff work 6 hours per day, plus a 0.5 hour unpaid break in the middle of their shift. Up to 6 of these staff are available, at a wage of $21.00/hour each.
- Some part-time staff work 4 hours per day, with no unpaid break. Up to 14 of these staff are available, at a wage of $20.00/hour each.

## Model design

### Get the data

Our Python model needs some data. We already have all the data we need in the Excel version of this model, so we'll use that file for data input. A convenient way to read data from Excel is to use named ranges – our Excel file already has all the data ranges named, so we can use them to access the data.

### Define the sets

Excel and Python have very different ways of representing an optimization model. Excel uses a grid structure, which implicitly defines data dimensions. In Pyomo, we need to explicitly define a `Set`

for each dimension.

Examining the Excel workbook, we see that there are three implicit sets used to define the data. That is:

- People. The categories of people who work in our business, defined in terms of the hours they work: 8h, 6h, or 4h per day.
- Shifts. An enumerated list of shifts that our people can work. In this model, there are 36 possible shifts, representing various start times for each of the people categories.
- Slots. We need to meet demand for specific time slots over the working day, beginning at 6:00am through to 5:30pm in half-hour steps.

We will need to define each of these sets in our Pyomo model. Then we'll need to translate the raw data into Pyomo objects, mostly `Param`

parameters, defined over these sets.

### Define the variables

In the Solver/OpenSolver dialog, we can see that there is one set of variables, `vAllocation`

, defined over the Shifts dimension. These variables represent the number of people allocated to each of the enumerated shifts.

### Define the constraints

The extended Excel model has five constraints, as shown in the Solver or OpenSolver dialog. Similarly, we'll need to define five constraints in Pyomo, representing:

- Demand. Ensure we have the required number of staff, in total, in each time slot.
- Availability. Number of staff of each category is within their availability.
- Minimum people. Number of staff of each category is at least the minimum required.
- Shift must use. Must use specific shifts, if required.
- Surplus upper bound. The number of surplus staff in each time slot is no more than the maximum allowed.

In the Excel model, we construct each constraint using a combination for formulae and selections in the Solver/OpenSolver dialog. In Pyomo, we'll need to represent each constraint using Python code.

### Define the objective function

In the Solver/OpenSolver dialog, the objective function refers to a single cell. The value in that cell is the result of a series of calculations that involve many `SUM`

and `SUMIFS`

functions, along with other calculations. We'll need to untangle those calculations and translate them into Pyomo syntax – a process that, given the different perspectives of Excel and Python models, is often not straightforward.

## Model formulation

A key in translating the model is to formalize the mathematical formulation. To some extent we can ignore the maths when building a model in Excel. But having a formal formulation is very useful when writing the Pyomo version, as the Python code is an almost literal translation of the formulation.

The formulation for our model is shown in Figure 1.

## Model implementation

### Import dependencies

The first task is to import the libraries that are needed for our program, as shown in Figure 2. The dependencies include the `openpyxl`

library, which we'll use to read data from our Excel workbook.

### Globals

We have a few global objects that will make things easier, as shown in Figure 3. This includes `Model`

, which is the primary object used for building the model.

As an aside, we acknowledge that programming purists will frown upon the use of global values. However, these values are the inputs that the user is most likely to change, so it is convenient to have them collated near the top of the program and available throughout the program.

### Excel workbook generic loader

Since we'll get our data from named ranges in an Excel workbook, it is helpful to have a generic function to perform this task, as shown in Figure 4. We just supply this function an Excel workbook file name, a worksheet name, and a range name – it returns a Pandas dataframe containing that data.

### Get data

Figure 5 shows the process for getting each data item, given the range name where it is located in the workbook. Each of these dataframes is just a temporary placeholder – we'll put them into the `Model`

object in the next section.

### Declarations

The data declarations are shown in Figure 6. This is where we translate the Excel data into the `Model`

object.

We divide the declarations into three sections:

- Single parameter values. This includes things like the maximum surplus assumption (a single number) and the solver name (a string).
- Sets. The sets correspond to the indices in the formulation. That is, People, Shifts, and Slots. Note that we're using Pyomo's
`Set`

, which is 1-based, rather than Python's 0-based`set`

. - Parameters. In populating the set-based parameters, our general approach is to declare a parameter as mutable, and then separately populate it with data. Note that we need to adjust some of the indices to account for the difference between 0-based dataframes and 1-based Pyomo Sets.

### Check the data

It can be useful to check that we've correctly translated the Excel data into the `Model`

object. Checking that the data is as expected can save a lot of frustration when building the rest of the model, so we do that in the function shown in Figure 7. This function has a Boolean parameter that determines whether it outputs anything – effectively allowing us to skip the check.

### Define the model

The model definition is shown in Figure 8.

There are five constraint definitions, corresponding to the five constraints in the formulation:

`Model.PeopleRequired`

. Ensure we have the required number of staff, in total, in each time slot.`Model.PeopleAvailable`

. Number of staff of each category is within their availability.`Model.PeopleLB`

. Number of staff of each category is at least the minimum required.`Model.ShiftMustUse`

. Must use specific shifts, if required.`Model.SlotSurplus`

. The number of surplus staff in each time slot is no more than the maximum allowed. In Equation (6) of the formulation we moved the`Model.SlotRequired`

term to the right-hand-side for presentation reasons.

We provide two alternative objective function definitions. `Model.TotalCostAlt`

is a direct translation of what the spreadsheet formulae do. `Model.TotalCost`

is entirely equivalent, but we think it is easier to understand.

Note that the objective function and two of the constraints include the expression `if pyo.value(Model.ShiftLabel[s]) == pyo.value(Model.PeopleLabel[p])`

within the `sum`

. This test is used in the formulation to represent summing by the people categories 8h, 6h, and 4h – which we did in the spreadsheet using separate rows.

### Solve model

We're now ready to solve the model, as shown in Figure 9. We've allowed for a choice between different solvers, each of which specifies the time limit differently. In addition, when calling this function, we can specify whether we want the solver to produce verbose output as it works on the solution.

Note that, as of March 2023, the Pyomo interface to the HiGHS solver has a bug where mutable parameters with zero values produce an error. Therefore, the HiGHS solver doesn't currently work for this model – though hopefully it will soon.

### Process results

After the solve, we process the results, as shown in Figure 10. Note that in the solve process above, we defer loading the results in case there are no results to load (because the solver didn't return a result, perhaps due to an error).

### Write output

Having processed the results to see what we've able to output, now we do the actual output, as shown in Figure 11.

The outputs are simple, consisting of some summary information about the model, the objective function value, and key results that are of interest – like the results shown in the Excel version of this model.

### Main procedure

Finally, we bring everything together by calling each of the functions, starting with getting the data, and finishing with writing the output, as shown in Figure 12. Because the model is encapsulated within the global `Model`

object, we don't need to pass it to the functions.

## Comparison between Excel and Python models

### Optimal solution

The optimal solutions found by the Excel OpenSolver and Python Pyomo models are identical. Since this model has alternative optima, in general there's no guarantee that the two versions will find the same solution – even though both models use the CBC solver. That is, subtleties in how the model is presented to the solver may lead to it taking a different path during the solve process, resulting in a different alternative optima.

### Model build process

The Python model has about 270 lines of code, though some of those lines are not strictly necessary (like `def CheckData`

). Even so, the code appears complex – which it is, because we must manage every detail of the modelling process.

Conversely, the Excel model appears much simpler. That's largely because the spreadsheet takes care of some modelling aspects for us. But that appearance is somewhat deceptive. The Excel workbook contains more than 1,000 formulae (mostly in the Intermediate section, where we calculate the shift patterns multiplied by the usage variables). There are also quite complex relationships between the formulae, most of which is hidden from view behind the nice, neat grid interface.

Overall, the Excel model version was easier to build – mainly because it involves less low-level data manipulation for building the model and writing outputs. That's a key reason why Excel can be a useful tool for building prototype optimization models. An Excel model might even be sufficient for operational use. Though the flexibility of Python is generally better for models that require different data inputs. That's especially the case for models where the data varies in size, as that can be difficult to handle within the rigid grid structure of a spreadsheet.

## Conclusion

In this article we replicated in Python Pyomo the model previously built in Excel OpenSolver. The two models produce the same result, but they are very different in terms of build process. Each model version has its advantages and disadvantages, as is typical when looking at the choice of tools for a modelling project.

If you would like to know more about this model, or you want help with your own models, then please contact us.