Ryan Papale

Baseball/Lineup Optimization

Suppose we have a 40-man roster and want to choose a starting lineup. Naturally, we want the players that will maximize our chances of winning. One way to approach this is as an optimization problem. Suppose that we have some measure of each player's projected contribution, like wins above average. First, define a player's overall contribution as the sum of their offensive and a defensive component so that: \[WAA_{iph} = dWAA_{ip} + oWAA_{ih},\] where \(dWAA_{ip}\) is player \(i\)'s defensive contribution at position \(p\) and \(oWAA_{ih}\) is their offensive contribution against pitchers with handedness \(h\). Their overall projected contribution is therefore \(WAA_{iph}\).

To set the optimal lineup assume that we are facing a RHP for simplicity and drop the \(h\) subscript. We wish to solve the following optimization problem: \[\max_{x_{ip}}\sum_{i}\sum_{p}WAA_{ip}x_{ip}\] \[s.t. \ \sum_{i}x_{ip} = 1 \ \forall p,\] \[\ \ \ \ \ \ \ \sum_{p}x_{ip} \leq 1 \ \forall i,\] \[\ \ \ \ \ \ \ \sum_{i}\sum_{p}x_{ip} = 9.\] \[\ \ \ \ \ \ \ x_{ip} \in \{0, 1\} \ \forall i, p,\] where \(i \in \{1, \ldots, N\}\) are the \(N\) position players on the 40-man roster and \(p \in \{1, \ldots,9\}\) indexes the position (including DH). The choice variable \(x_{ip}\) is binary, such that: \[x_{ip} = \begin{cases} 1 & \text{if in lineup} \\ 0 &\text{otherwise.} \end{cases} \] In words, what we want to do is set the 9-man lineup against RHPs to maximize total projected wins above average subject to roster constraints: one player per position, no player can appear more than once, and that we select nine players.

To solve for the optimal lineup, i.e., the lineup that maximizes total projected WAA for a given pitcher handedness, I use the Kuhn-Munkres algorithm. A brief overview of the algorithm, it takes a matrix where rows are people and columns are jobs they can be assigned to. Elements of the matrix are the costs associated with each person working in each job. The algorithm returns the worker/job assignments that minimize the total cost.

For our purposes, to maximize total projected WAA, this algorithm can easily be reformulated as a maximization problem. There are several benefits to using this algorithm. First, it naturally imposes the constraints from the maximization problem above. Second, it is quick and easy to use. Third, the solution is guaranteed to be a global optimum under the assignment structure of the algorithm, although it may not be unique.

Below I present an example using the San Diego Padres. These numbers are not true projections; they are prorated statistics from the 2025 season intended for demonstration purposes only. The table below shows the top of the dataframe used for the assignment algorithm. The first three columns are self-explanatory. The fourth column is players' defensive projections. The fifth and sixth columns are players' offensive projections against RHPs and LHPs, respectively.

head(padres, 10)
Using the data above, I then solve for the optimal lineup using the Kuhn-Munkres algorithm. In the table below I show the optimal lineup when facing LHPs.
opt(padres, "l")
A relatively straightforward extension to this would be to incorporate uncertainty penalties into the objective function, allowing for risk-adjusted lineup selection. Additionally, positional adjustments have been omitted because they do not affect the optimal assignments. Again, the data used here is illustrative rather than a genuine projection model.

The code to for this project can be found here.