Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/daniel-stoneuk/load-balancing
A Software Package for Simulating and Visualising Load Balancing Algorithms
https://github.com/daniel-stoneuk/load-balancing
Last synced: about 1 month ago
JSON representation
A Software Package for Simulating and Visualising Load Balancing Algorithms
- Host: GitHub
- URL: https://github.com/daniel-stoneuk/load-balancing
- Owner: daniel-stoneuk
- License: mit
- Created: 2022-07-02T22:33:26.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-12-12T17:49:45.000Z (about 2 years ago)
- Last Synced: 2023-03-07T13:56:07.701Z (almost 2 years ago)
- Language: Jupyter Notebook
- Size: 1.49 MB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# A Software Package for Simulating and Visualising Load Balancing Algorithms
**This repository contains the source code for my Computer Science BSc Individual Project at Durham (graded 80/100)**
## Paper Abstract
A simple abstraction of load balancing systems is the balls into bins probability model, which can be modified to represent more load balancing scenarios. Performance analysis of algorithms used to allocate balls into bins is often complex and not intuitive. This project aims to create a software package to aid in collecting and analysing experimental data for various balls into bins algorithms. The package provides an easy-to-use API that can run simulations faster than equivalent plain Python code whilst reducing the implementation complexity required to run and parallelise experiments. Simulation results contain several metrics that are calculated after the experiments are run. These can be visualised in the provided Jupyter Notebook to compare and evaluate different algorithms. The software was able to replicate results seen in literature with minimal effort in order to deliver an experimentation focused survey of commonly cited balls into bins algorithms.
## Project Paper
The project paper which describes the scientific process can be found [here](report.pdf).
## Usage Notebook
Get started by running `pip install .`. Libraries used in the notebook need to be installed separately.
This Notebook provides a) example usage of the software package and b) the required code to recreate the paper's results. To begin using the software package, the `core` and `simulation` packages need to be imported.
In these examples, the API's classes from the `simulation` package are imported into the current namespace. The `core` package has been imported under the alias `lb` (for load balancer).
```python
import load_balancing.core as lb
from load_balancing.simulation import Simulation, ParallelSimulation, SimulationResult# We can also import the suggested matplotlib style parameters
from load_balancing import mpl_style
from load_balancing.utils import available_cores
```The following imports are used by the notebook to run and visualise the experiments.
* `matplotlib` is used to plot charts
* `numpy` is used for math functions and to generate distributions
* `tikzplotlib` is used to output plots to a .TeX file```python
import matplotlib.colors
import matplotlib.pyplot as plt
import matplotlib.style# Initialise matplotlib for Jupyter Notebook and set style parameters
%matplotlib inline
matplotlib.style.use(mpl_style) # type: ignoreimport colorsys
from dataclasses import asdict
from functools import partial# Used for a custom multiprocessing implementation
from multiprocessing import Pool
from operator import itemgetter
from typing import Callable, List, Tupleimport numpy as np
import tikzplotlib
from IPython.display import Markdown, display
from tabulate import tabulate
```A few helper functions are defined for use in the notebook.
```python
def fig_to_tex(fig, filename: str):
"""Save a Matplotlib Figure to a .TeX file:param fig: Figure that should be saved
:type fig: plt.Figure
:param filename: Name of file to save to
:type filename: str
"""
tikzplotlib.clean_figure(fig)
tikzplotlib.save(filename, figure=fig)def plot_attributes(
results: List[SimulationResult], x: str = "m", y: str = "maximum_load"
) -> Tuple[Tuple, Tuple]:
"""Convert a list of SimulationResult to two lists for the given attributed.:param results: List of simulation results
:type results: List[SimulationResult]
:param x: Attribute to be returned as xs, defaults to "m"
:type x: str, optional
:param y: Attribute to be returned as ys, defaults to "maximum_load"
:type y: str, optional
:return: Tuple of xs, ys
:rtype: Tuple[Tuple, Tuple]
"""
xs, ys = zip(*[(getattr(result, x), getattr(result, y)) for result in results])
return xs, ysdef plot_gap(results: List[SimulationResult], x: str = "m") -> Tuple[Tuple, Tuple]:
"""Convert a list of SimulationResult to two lists containing the gap between maximum and minimum load.
Similar to the plot_attribute function.:param results: List of simulation results
:type results: List[SimulationResult]
:param x: Attribute to be returned as xs, defaults to "m"
:type x: str, optional
:return: Typle of xs, ys
:rtype: Tuple[Tuple, Tuple]
"""
xs, ys = zip(
*[
(getattr(result, x), result.maximum_load - result.minimum_load)
for result in results
]
)
return xs, ysdef plot_function(min: float, max: float, f: Callable, num=1000) -> Tuple[List, List]:
"""Helper function to plot a function, useful to visualise bounds:param min: Value to plot from
:type min: float
:param max: Value to plot to
:type max: float
:param f: Function such that y = f(x)
:type f: Callable
:param num: Number of points, defaults to 1000
:type num: int
:return: Tuple of xs, ys
:rtype: Tuple[List, List]
"""
xs = list(np.linspace(min, max, num))
ys = []
for x in xs:
ys.append(f(x))
return xs, ysdef lighter(hex: str) -> Tuple[float, float, float]:
"""Return a lighter version of hex:param hex: Colour hex code
:type hex: str
:return: Lighter colour tuple
:rtype: Tuple[float, float, float]
"""
h, l, s = colorsys.rgb_to_hls(*matplotlib.colors.ColorConverter.to_rgb(hex))
return colorsys.hls_to_rgb(h, min(1, l * 1.6), s=s)def display_results_as_table(results: List[SimulationResult], head=False):
"""Display a list of SimulationResult as a table:param results: List of results
:type results: List[SimulationResult]
"""
if head:
results = results[0:5]
display(Markdown("Top 5 rows:"))
display(
tabulate(
[asdict(result) for result in results], headers="keys", tablefmt="html"
)
)
```### Describing the simple Greedy[d] algorithm.
To describe the `greedy[d]` algorithm, write the algorithm as a function using the `load_balancing.core` API.
Here we choose $d$ bins from the list of bins, and return the least loaded of the $d$| bins.
```python
def my_greedy_d(ball: lb.Ball, bins: lb.Bins, d: int):
chosen_bins = bins.choose(d)
return chosen_bins.min()
```### Run a simulation
To run this as a simulation, pass the algorithm to the `Simulation` class and call it like a function. Make sure the number of balls $m$ and bins $n$ are provided. The `greedy[d]` parameter, $d$, is added to the end of the function arguments.
In this experiment we will simulate 100,000 balls and bins.
```python
my_greedy_d_sim = Simulation(my_greedy_d)experiment_result = my_greedy_d_sim(100000, 100000, 2)
display_results_as_table([experiment_result])
```m n maximum_load minimum_load std_load load_accesses
100000100000 4 0 0.703264 200000
### A more complicated algorithm
The `greedy[d]` algorithm is rather simple to describe and doesn't show the power of the simulation system. In the next cell we describe the `always-go-left` algorithm and run the same simulation.
```python
def my_always_go_left(ball: lb.Ball, bins: lb.Bins):
left_bins, right_bins = bins.split(2)
left_bin = left_bins.choose_one()
right_bin = right_bins.choose_one()
if left_bin.load <= right_bin.load: # this is where the unfair tie-breaking is
return left_bin
else:
return right_binmy_always_go_left_sim = Simulation(my_always_go_left)
experiment_result = my_always_go_left_sim(100000, 100000)
display_results_as_table([experiment_result])
```m n maximum_load minimum_load std_load load_accesses
100000100000 3 0 0.684485 200000
We can already see an improvement in the maximum load of any bin after this experiment.
## Paper Results
In this section we will use the simulation to recreate the results from the paper. An associated `algorithms.py` file will be used for the implementations. This is because the system requires functions to be defined in a Python file for parallelisation.
```python
from algorithms import (
uniform,
greedy_d,
always_go_left,
fair_tiebreak,
one_plus_beta,
threshold,
adaptive,
)
```The following flag can be set to True to run the larger experiments that were used in the paper. Otherwise, smaller experiments will be run to save time.
```python
USE_LARGE_NUMBERS = True
```#### 4.1 Uniform Load Balancing
We will use the `ParallelSimulation` class to take advantage of the multicore processor. In this case it will be used with only one algorithm.
The warning is shown when the class is used in a notebook (like this one) and can be safely ignored since we are loading algorithms from a file.
```python
def uniform(ball: lb.Ball, bins: lb.Bins):
"""Select a bin at random"""
return bins.choose_one()
``````python
uniform_sim = ParallelSimulation((uniform, []))low = 100
high = 70000100 if USE_LARGE_NUMBERS else 7000experiment_results = uniform_sim.run(low, high, num=40, repeats=5)
uniform_result = experiment_results["uniform"]display_results_as_table(uniform_result, head=True)
```Warning: Parallel Simulations may hang if functions are defined in interactive mode.
Using 6 processes to run 40 jobs0%| | 0/40 [00:00, ?it/s]
Top 5 rows:
m n maximum_load minimum_load std_load load_accesses
10070000100 1 00.00119523 0
14170000100 1 00.00141925 0
19970000100 1 00.00168607 0
28170000100 1 00.00200356 0
39770000100 1 00.00238147 0The following code shows how this data can be plotted on a chart using the library.
We will use the `plot_function` method to plot the bounds $\frac{\log n}{\log \frac{n \log n}{m}} \left( 1 + \alpha \frac{\log \log \frac{n \log n}{m}}{\log \frac{n \log n}{m}} \right)$ for different values of $\alpha$
```python
fig, ax = plt.subplots()
ax.set_xlabel("$m$")
ax.set_ylabel("Average Maximum Load")# Plot the results
ax.plot(*plot_attributes(uniform_result), label="Uniform")# Plot the bounds.
n = highdef uniform_bounds(a):
logn = np.log(n)def func(m):
lognlognoverm = np.log((n * logn) / m)
return (logn / lognlognoverm) * (
1 + a * (np.log(lognlognoverm) / lognlognoverm)
)return func
ax.plot(
*plot_function(low, high, uniform_bounds(2)), label="$\\alpha = 2$", linestyle="--"
)
ax.plot(
*plot_function(low, high, uniform_bounds(0.9)),
label="$\\alpha = 0.9$",
linestyle="--"
)# Enable the legend
ax.legend()# Save to .TeX file
fig_to_tex(fig, "charts/uniform.tex")
```
![png](examples_files/examples_19_0.png)
#### 4.2 Always-Go-Left
This experiment will produce a bar chart that will be used to verify the results of the Always-Go-Left paper. The results aim to show how asymmetry in the tiebreaking algorithm decreases the average maximum load.
```python
def always_go_left(ball: lb.Ball, bins: lb.Bins):
"""Divide the bins into two halves, and choose a random bin from each. Select the lowest loaded. Tiebreaks: Choose left."""
left_bins, right_bins = bins.split(2)
left_bin = left_bins.choose_one()
right_bin = right_bins.choose_one()
if left_bin.load <= right_bin.load:
return left_bin
else:
return right_bindef fair_tiebreak(ball: lb.Ball, bins: lb.Bins):
"""Divide the bins into two halves, and choose a random bin from each. Select the lowest loaded. Tiebreaks: Fair."""
left_bins, right_bins = bins.split(2)
left_bin = left_bins.choose_one()
right_bin = right_bins.choose_one()
if left_bin.load < right_bin.load:
return left_bin
elif left_bin.load > right_bin.load:
return right_bin
else:
if lb.probability(0.5):
return left_bin
else:
return right_bin
``````python
parallel_simulation = ParallelSimulation(
(uniform, []), (greedy_d, [2]), (always_go_left, []), (fair_tiebreak, [])
)# 2^20 if USE_LARGE_NUMBERS else 2^12
m = n = 1048576 if USE_LARGE_NUMBERS else 4096experiment_results = parallel_simulation(m, n, repeats=100)
```Warning: Parallel Simulations may hang if functions are defined in interactive mode.
Using 4 processes to run 4 jobs0%| | 0/4 [00:00, ?it/s]
```python
fig, ax = plt.subplots()
ax.set_xlabel("Algorithm")
ax.set_ylabel("Average Maximum load")def get_maximum_loads(results):
return [algoritm_results[0].maximum_load for algoritm_results in results.values()]# Plot the results
labels = ["Uniform", "Greedy[2]", "Always-Go-Left", "Fair Tiebreak"]x = np.arange(len(labels))
width = 0.35 # Width of the barspaper_results = [8.86, 4, 3.04, 0] if USE_LARGE_NUMBERS else [6.16, 3.01, 3, 0]
ours = ax.bar(
x - width / 2, get_maximum_loads(experiment_results), width, label="Our results"
)
paper = ax.bar(x + width / 2, paper_results, width, label="Paper")# Set the labels
ax.set_xticks(x, labels)# Enable values on top of the bars
ax.bar_label(ours, [bar.get_height() or "N/A" for bar in ours], padding=3)
ax.bar_label(paper, [bar.get_height() or "N/A" for bar in paper], padding=3)# Enable the legend
ax.legend()# Save to .TeX file
fig_to_tex(fig, "charts/always_go_left.tex")
```/Users/danielstone/.pyenv/versions/3.10.2/lib/python3.10/site-packages/tikzplotlib/_cleanfigure.py:200: UserWarning: Cleaning Bar Container (bar plot) is not supported yet.
warnings.warn("Cleaning Bar Container (bar plot) is not supported yet.")
![png](examples_files/examples_22_1.png)
#### 4.3 Distributed Algorithms
In this section, the `greedy[2]` and `uniform` algorithms will be evaluated in a distributed scenario. The corresponding paper states that for the `greedy[2]` algorithm, the expected largest gap between maximum and minimum load of any bin is $\mathcal{O}(\log n)$. For the `uniform` algorithm it is proven to be $\Theta (\sqrt{\frac{m \log n}{n}})$.
Constants will be found so that the experimental data is tightly bounded.
This experiment runs for varying values of n.
```python
experiment_results = {}high = 1000000
repeats = 20 if USE_LARGE_NUMBERS else 5for n in [5000, 10000, 15000, 20000]:
display(Markdown(f"**Running for n = {n}**"))
parallel_simulation = ParallelSimulation((uniform, []), (greedy_d, [2]))
experiment_results[n] = parallel_simulation.run(
n + 1, high, num=40, repeats=repeats, bins=n, balancers=n
)
```**Running for n = 5000**
Warning: Parallel Simulations may hang if functions are defined in interactive mode.
Using 6 processes to run 80 jobs0%| | 0/80 [00:00, ?it/s]
**Running for n = 10000**
Warning: Parallel Simulations may hang if functions are defined in interactive mode.
Using 6 processes to run 80 jobs0%| | 0/80 [00:00, ?it/s]
**Running for n = 15000**
Warning: Parallel Simulations may hang if functions are defined in interactive mode.
Using 6 processes to run 80 jobs0%| | 0/80 [00:00, ?it/s]
**Running for n = 20000**
Warning: Parallel Simulations may hang if functions are defined in interactive mode.
Using 6 processes to run 80 jobs0%| | 0/80 [00:00, ?it/s]
```python
fig, ax = plt.subplots()
ax.set_xlabel("$m$")
ax.set_ylabel("Average Load Gap")# Plot the bounds.
def distributed_greedy_d_bounds(n):
logn = np.log(n)def func(m):
return 1.3 * lognreturn func
for n, results in experiment_results.items():
# Plot the bound in the default colour
greedy_line = ax.plot(
*plot_function(n, high, distributed_greedy_d_bounds(n)),
linestyle="--",
zorder=2,
)[0]
# Plot the data in a lighter colour
ax.plot(
*plot_gap(results["greedy_d_2"]),
label=f"$n = {n}$",
color=lighter(greedy_line.get_color()),
zorder=1,
)# Enable the legend
ax.legend()# Save to .TeX file
fig_to_tex(fig, "charts/distributed_greedy.tex")
```
![png](examples_files/examples_25_0.png)
```python
fig, ax = plt.subplots()
ax.set_xlabel("$m$")
ax.set_ylabel("Average Load Gap")# Plot the bounds.
def distributed_uniform_bounds(n):
logn = np.log(n)def func(m):
return 2.55 * np.sqrt(m * logn / n)return func
for n, results in experiment_results.items():
# Plot the bound in the default colour
greedy_line = ax.plot(
*plot_function(n, high, distributed_uniform_bounds(n)), linestyle="--", zorder=2
)[0]
# Plot the data in a lighter colour
ax.plot(
*plot_gap(results["uniform"]),
label=f"$n = {n}$",
color=lighter(greedy_line.get_color()),
zorder=1,
)# Enable the legend
ax.legend()# Save to .TeX file
fig_to_tex(fig, "charts/distributed_uniform.tex")
```
![png](examples_files/examples_26_0.png)
#### 4.4 Weighted Balls
This experiment on `greedy[2]` analyses the average maximum load for a sorted weight distribution as it becomes less sorted.
```python
from sorting_steps import sorting_stepssteps = [100, 200, 500, 1000]
repeats = 10000 if USE_LARGE_NUMBERS else 100
cpu_count = min(len(steps), available_cores())
print(f"Using {cpu_count} processes")with Pool(cpu_count) as p:
experiment_results = p.map(partial(sorting_steps, repeats), steps)
```Using 4 processes
Number of balls and bins: 100
Number of balls and bins: 200
Number of balls and bins: 500
Number of balls and bins: 1000```python
fig, ax = plt.subplots()
ax.set_xlabel("Mixing Steps")
ax.set_ylabel("Maximum load")for i, result in enumerate(experiment_results):
ax.plot(*plot_attributes(result, x="mixing_steps"), label=f"$n = {steps[i]}$")# Enable the legend
ax.legend()# Save to .TeX file
fig_to_tex(fig, "charts/weighted.tex")
```
![png](examples_files/examples_29_0.png)
#### 4.4 Communication Complexity
This experiment will compare the communication complexity of the previously seen algorithms, with the `one_plus_beta`, `threshold` and `adaptive` algorithms.
```python
def one_plus_beta(ball: lb.Ball, bins: lb.Bins, p: float):
"""With probability p choose two bins, otherwise choose one at random"""
if lb.probability(p):
return bins.choose(2).min()
else:
return bins.choose_one()def threshold(ball: lb.Ball, bins: lb.Bins):
threshold = ball.m / bins.n + 1
while True:
chosen_bin = bins.choose_one()
if chosen_bin.load < threshold:
return chosen_bindef adaptive(ball: lb.Ball, bins: lb.Bins):
n = bins.n
while True:
chosen_bin = bins.choose_one()
if chosen_bin.load < ball.position / n + 1:
return chosen_bin
``````python
communication_complexity_simulation = ParallelSimulation(
(uniform, []),
(greedy_d, [2]),
(one_plus_beta, [0.5]),
(threshold, []),
(adaptive, []),
)# 2^20 if USE_LARGE_NUMBERS else 2^12
m = n = 1048576 if USE_LARGE_NUMBERS else 4096results = communication_complexity_simulation(m, n, repeats=100)
```Warning: Parallel Simulations may hang if functions are defined in interactive mode.
Using 5 processes to run 5 jobs0%| | 0/5 [00:00, ?it/s]
```python
sorted_results = sorted(
[
{
"Algorithm": name,
"Maximum Load": result[0].maximum_load,
"Load Access Count": result[0].load_accesses,
}
for name, result in results.items()
],
key=itemgetter("Maximum Load"),
reverse=True,
)display(tabulate(sorted_results, headers="keys", tablefmt="html"))
```Algorithm Maximum Load Load Access Count
uniform 8.75 0
one_plus_beta_0.5 7.05 1048480
greedy_d_2 4 2097152
threshold 2 1201901
adaptive 2 1201912## References (including those cited in the corresponding paper)
* Afzal, S. & Kavitha, G. (2019) Load balancing in cloud computing–A hierarchical taxonomical classification. Journal of Cloud Computing, 8, 1-24.
* AmazonWebServices How Elastic Load Balancing works.
* Angluin, D. et al. (2006) Computation in networks of passively mobile finite-state sensors. Distributed computing, 18, 235-253.
* Azar, Y. et al. (1994) Balanced allocations. Proceedings of the twenty-sixth annual ACM symposium on theory of computing, 593-602.
* Behnel, S. et al. (2010) Cython: The best of both worlds. Computing in Science and Engineering, 13, 31-39.
* Berenbrink, P. et al. (2008) On weighted balls-into-bins games. Theoretical Computer Science, 409, 511-520.
* Berenbrink, P. et al. (2012) Multiple-choice balanced allocation in (almost) parallel. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, Springer, pp. 411-422.
* Berenbrink, P. et al. (2013) Balls-into-bins with nearly optimal load distribution. Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures, 326-335.
* Chaczko, Z. et al. (2011) Availability and load balancing in cloud computing. International Conference on Computer and Software Modeling, Singapore 14, 134-140.
* Cole, R. et al. (1998) On balls and bins with deletions. International Workshop on Randomization and Approximation Techniques in Computer Science, 145-158.
* Czumaj, A. & Stemann, V. (2001) Randomized allocation processes. Random Structures and Algorithms, 18, 297-331.
* Doty, D. & Severson, E. (2021) ppsim: A software package for efficiently simulating and visualizing population protocols. International Conference on Computational Methods in Systems Biology, 245-253.
* Feuersänger, C. (2011) Manual for Package pgfplots. 17,
* Fischer, S. & Vöcking, B. (2009) Adaptive routing with stale information. Theoretical Computer Science, 410, 3357-3371.
* Gonnet, G.H. (1981) Expected length of the longest probe sequence in hash code searching. Journal of the ACM (JACM), 28, 289-304.
* Harris, C.R. et al. (2020) Array programming with NumPy. Nature, 585, 357-362.
* Hunter, J.D. (2007) Matplotlib: A 2D graphics environment. Computing in science and engineering, 9, 90-95.
* Kluyver, T. et al. (2016) Jupyter Notebooks-a publishing format for reproducible computational workflows.
* Lam, S.K., Pitrou, A. & Seibert, S. (2015) Numba: A llvm-based python jit compiler. Proceedings of the Second Workshop on …,
* LLVM The LLVM Compiler Infrastructure - Users.
* Los, D. & Sauerwald, T. (2022) Balanced Allocations in Batches: Simplified and Generalized. arXiv preprint arXiv:2203.13902,
* Mitzenmacher, M. (2001) The power of two choices in randomized load balancing. IEEE Transactions on Parallel and …,
* Peres, Y., Talwar, K. & Wieder, U. (2010) The (1+ Beta)-choice process and weighted balls-into-bins. Proceedings of the twenty-first annual ACM …,
* Raab, M. & Steger, A. (1998) “Balls into bins”—A simple and tight analysis. International Workshop on Randomization and Approximation Techniques in Computer Science, 159-170.
* Scheibler, R., Bezzam, E. & Dokmanić, I. (2018) Pyroomacoustics: A python package for audio room simulation and array processing algorithms. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, 351-355.
* Talwar, K. & Wieder, U. (2007) Balanced allocations: the weighted case. Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, 256-265.
* Vöcking, B. (2003) How asymmetry helps load balancing. Journal of the ACM (JACM), 50, 568-589.