{"id":15988854,"url":"https://github.com/d-krupke/algbench","last_synced_at":"2025-03-10T21:30:55.855Z","repository":{"id":169432990,"uuid":"644365295","full_name":"d-krupke/AlgBench","owner":"d-krupke","description":"Experiment execution and result management for empirical evaluations of algorithms in Python.","archived":false,"fork":false,"pushed_at":"2025-01-06T17:02:46.000Z","size":1781,"stargazers_count":4,"open_issues_count":7,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-08T13:18:07.085Z","etag":null,"topics":["algorithm-engineering","python","result-mangement"],"latest_commit_sha":null,"homepage":"https://algbench.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/d-krupke.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-23T11:16:52.000Z","updated_at":"2025-01-06T17:02:50.000Z","dependencies_parsed_at":"2023-10-11T12:42:26.163Z","dependency_job_id":"4cbd6d5b-7546-4752-af19-7a8ea802c7ed","html_url":"https://github.com/d-krupke/AlgBench","commit_stats":{"total_commits":73,"total_committers":4,"mean_commits":18.25,"dds":"0.31506849315068497","last_synced_commit":"ad038c3fe5fada68e3beb3302d2b42d3e125c98a"},"previous_names":["d-krupke/algbench"],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d-krupke%2FAlgBench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d-krupke%2FAlgBench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d-krupke%2FAlgBench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d-krupke%2FAlgBench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/d-krupke","download_url":"https://codeload.github.com/d-krupke/AlgBench/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242930089,"owners_count":20208396,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm-engineering","python","result-mangement"],"created_at":"2024-10-08T04:21:44.916Z","updated_at":"2025-03-10T21:30:55.404Z","avatar_url":"https://github.com/d-krupke.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"AlgBench: A Python-util to run benchmarks for the empirical evaluation of algorithms.\n=====================================================================================\n\n.. image:: https://img.shields.io/pypi/v/algbench.svg\n   :target: https://pypi.python.org/pypi/algbench\n\n.. image:: https://img.shields.io/pypi/pyversions/algbench.svg\n   :target: https://pypi.python.org/pypi/algbench\n\n.. image:: https://img.shields.io/pypi/l/algbench.svg\n   :target: https://pypi.python.org/pypi/algbench\n\n.. image:: https://github.com/d-krupke/algbench/actions/workflows/pytest.yml/badge.svg\n   :target: https://github.com/d-krupke/AlgBench\n\nThere are a number of challenges when performing benchmarks for\n(long-running) algorithms.\n\n-  Saving all information requires a lot of **boilerplate code** and\n   often you forget something.\n-  If you add some further instances or want to compare an additional\n   parameter, you have to check which data is already available to\n   **skip existing entries**. Same if you need to interrupt the\n   benchmark.\n-  Just piping the results into a file can create a **huge amount of\n   data**, no longer fitting into memory.\n-  Proper benchmarks often take days or even weeks to run, such that\n   **parallelization** is necessary (e.g., with slurm) which requires a\n   thread-safe database.\n-  Many file formats and databases are **difficult to access or\n   impossible to repair** once corrupted.\n\nAlgBench tries to ease your life by\n\n-  saving a lot of the information and data (function arguments, return\n   values, runtime, environment information, stdout, etc.) automatically\n   with a single line of code\n-  remembering which function arguments have already run and skipping\n   those\n-  providing a compressible database to save memory, and saving highly\n   redundant information, e.g., of the environment, only once\n-  providing an NFS-compatible parallel database and compatibility to\n   distribution libraries, such as slurminade\n-  using a simple format based on JSON and Zip to allow simple parsing\n   and even repairing broken databases by hand\n\nThere is a predecessor project, called\n`AeMeasure \u003chttps://github.com/d-krupke/AeMeasure\u003e`__. AeMeasure made\nsaving the data easy, but required more boilerplate code and reading the\ndata was more difficult and less efficient.\n\nOther things you should know about for empirical/experimental evaluations\n-------------------------------------------------------------------------\n\nThe following tools I consider essential for empirical evaluations (of\nalgorithms):\n\n-  `pandas \u003chttps://pandas.pydata.org/\u003e`__: Simple and powerful tool for\n   working with data tables. Do your experiments and parse the important\n   data into a pandas DataFrame.\n-  `seaborn \u003chttps://seaborn.pydata.org/\u003e`__ and\n   `matplotlib \u003chttps://matplotlib.org/\u003e`__: Creating beautiful plots\n   from pandas DataFrames with little work.\n-  `JupyterLab \u003chttps://jupyterlab.readthedocs.io/en/latest/\u003e`__:\n   Interactive Python+Markdown documents. Great for analyzing data and\n   sharing the insights. Works great with pandas and seaborn.\n\nAlgBench essentially takes over the part of saving the information from\nthe runs and allowing you to easily extract pandas DataFrames from it.\nFor very simple studies, you could also directly save your data into a\nPandas DataFrame but even for nearly every serious experiment, you run\ninto the problems mentioned in the beginning.\n\nNote that the actual algorithms can also be written in another, more\nefficient programming language. It is reasonably easy to create\nPython-bindings, e.g., for C++ with\n`pybind11 \u003chttps://pybind11.readthedocs.io/\u003e`__, or just call the\nbinaries with Python.\n\nPublishable evaluations often require extensive experiments that are\nbest performed on a cluster of shared workstations. Many institutes and\ncompanies are using\n`slurm \u003chttps://slurm.schedmd.com/documentation.html\u003e`__ to schedule and\ndistribute the workloads. The data is usually shared via a network file\nsystem (NFS), for which AlgBench is designed. While you usually also\nhave databases available, they are not made for just dumping all the\ndata you may need for analyzis and potentially debugging into. We\ndeveloped an additional tool\n`slurminade \u003chttps://github.com/d-krupke/slurminade\u003e`__ that allows you\nto distribute your experiments with just a few additional lines. You can\nsee this in an example: `original\nscript \u003c./examples/graph_coloring/02_run_benchmark.py\u003e`__ vs `script\nwith\nslurminade \u003c./examples/graph_coloring/02b_run_benchmark_with_slurminade.py\u003e`__.\n\nLet me further recommend the books `A Guide To Experimental Algorithmics\nby Catherine\nMcGeoch \u003chttps://www.cambridge.org/core/books/guide-to-experimental-algorithmics/CDB0CB718F6250E0806C909E1D3D1082\u003e`__\nhere that gives a good introduction into the big picture of performing\nempirical evaluations for algorithms. If you want to know more about\nactually implementing complex algorithms for difficult problems, I\nrecommend to read `In Pursuit of the Traveling Salesman by Bill\nCook \u003chttps://press.princeton.edu/books/paperback/9780691163529/in-pursuit-of-the-traveling-salesman\u003e`__\nor `The Traveling Salesman Problem: A Computational Study by Appelgate\net al. \u003chttps://www.math.uwaterloo.ca/tsp/book/index.html\u003e`__ to really\ngo into details. The Traveling Salesman Problem is an excellent example\nfor this because it is probably had gotten the most attention of any\nNP-hard combinatorial problems. However, it can also be intimidating as\nyou probably won’t have the funds to look into any problem as deep as\nthe Travelings Salesman Problem has been looked at. Maybe you want to\nread some papers from the SIAM Symposium on Algorithm Engineering and\nExperiments (ALENEX) to see how smaller studies can be performed\n(though, for most papers you will find aspects that could be improved).\n\nBefore you submit any paper (or thesis) with an empirical analysis,\nI also recommend to first go through `this checklist \u003chttps://blog.sigplan.org/2019/08/28/a-checklist-manifesto-for-empirical-evaluation-a-preemptive-strike-against-a-replication-crisis-in-computer-science/\u003e`__.\n\nInstallation\n------------\n\nYou can install AlgBench using pip\n\n.. code:: bash\n\n   pip install -U algbench\n\nUsage\n-----\n\nThere is one important class ``Benchmark`` to run the benchmark, and two\nimportant functions ``describe`` and ``read_as_pandas`` to analyze the\nresults.\n\n1. Create a function that creates an entry in the database. Name all\n   arguments that should be saved and used for identifying entries\n   without ``_`` in the front. They should be JSON-compatible. Name all\n   arguments that provide higher objects, such as the instance database,\n   with an ``_`` in the front to tell *algbench* not to try to save or\n   compare them. Return everything you want to be saved for the\n   benchmark, best as a dictionary.\n\n.. code:: python\n\n   def create_benchmark_entry(\n       instance_name: str,  # instance identifier for the database\n       alg_parameters: dict,  # readable parameters for the algorithm\n       _instance,  # the parsed instance (not to be added to the database)\n   ):\n       solution = alg(_instance, **alg_parameters)\n       return {\"objective_value\": solution.obj()}\n\n2. Create a ``Benchmark``-object by passing it a path for the database.\n\n.. code:: python\n\n   from algbench import Benchmark\n\n   benchmark = Benchmark(\"./my_benchmark\")\n\n   # Optionally (if logging is used):\n   import logging\n\n   # Configure which logger should be captured and with which level\n   benchmark.capture_logger(\"my_alg\", logging.INFO)\n   benchmark.capture_logger(\"my_alg.submodule\", logging.WARNING)\n\n3. Use ``Benchmark.add`` to the function for all missing entries.\n\n.. code:: python\n\n   for instance_name, instance in instance_db:\n       for params in params_to_compare:\n           benchmark.add(\n               create_benchmark_entry,  # function (could also be a lambda)\n               # arguments for function\n               instance_name=instance_name,\n               alg_parameters=params,\n               _instance=instance,\n           )\n   benchmark.compress()  # reduce the size of the database by file compression\n\n4. Use a for loop to iterate over all raw entries\n\n.. code:: python\n\n   benchmark = Benchmark(\"./my_benchmark\")\n   for entry in benchmark:\n       print(entry)  # dictionary\n\nor ``read_as_pandas`` to extract a simple pandas table\n\n.. code:: python\n\n   t = read_as_pandas(\n       \"./my_benchmark/\",\n       lambda result: {\n           \"instance\": result[\"parameters\"][\"args\"][\"instance_name\"],\n           \"alg_params\": result[\"parameters\"][\"args\"][\"alg_params\"],\n           \"obj\": result[\"result\"][\"objective_value\"],\n           \"runtime\": result[\"runtime\"],  # automatically saved\n       },\n   )\n\nYou can use ``describe(\"./my_benchmark\")`` to get an overview of the\navailable entries.\n\nThe ``Benchmark`` class provides further functionality, e.g., for\ndeleting selected entries or reparing a broken database.\n\nYou can find `an example for graph\ncoloring \u003c./examples/graph_coloring/\u003e`__. The important parts are shown\nbelow.\n\nRunning a benchmark\n~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n   from _utils import InstanceDb\n   from algbench import Benchmark\n   import networkx as nx\n\n   benchmark = Benchmark(\"03_benchmark_data\")\n   instances = InstanceDb(\"./01_instances.zip\")\n\n\n   def load_instance_and_run(instance_name: str, alg_params):\n       # load the instance outside the actual measurement\n       g = instances[instance_name]\n\n       def eval_greedy_alg(instance_name: str, alg_params, _instance: nx.Graph):\n           # arguments starting with `_` are not saved.\n           coloring = nx.coloring.greedy_coloring.greedy_color(_instance, **alg_params)\n           return {  # the returned values are saved to the database\n               \"num_vertices\": _instance.number_of_nodes(),\n               \"num_edges\": _instance.number_of_edges(),\n               \"coloring\": coloring,\n               \"n_colors\": max(coloring.values()) + 1,\n           }\n\n       benchmark.add(eval_greedy_alg, instance_name, alg_params, g)\n\n\n   alg_params_to_evaluate = [\n       {\"strategy\": \"largest_first\", \"interchange\": True},\n       {\"strategy\": \"largest_first\", \"interchange\": False},\n       {\"strategy\": \"random_sequential\", \"interchange\": True},\n       {\"strategy\": \"random_sequential\", \"interchange\": False},\n       {\"strategy\": \"smallest_last\", \"interchange\": True},\n       {\"strategy\": \"smallest_last\", \"interchange\": False},\n       {\"strategy\": \"independent_set\"},\n       {\"strategy\": \"connected_sequential_bfs\", \"interchange\": True},\n       {\"strategy\": \"connected_sequential_bfs\", \"interchange\": False},\n       {\"strategy\": \"connected_sequential_dfs\", \"interchange\": True},\n       {\"strategy\": \"connected_sequential_dfs\", \"interchange\": False},\n       {\"strategy\": \"saturation_largest_first\"},\n   ]\n\n   if __name__ == \"__main__\":\n       for instance_name in instances:\n           print(instance_name)\n           for conf in alg_params_to_evaluate:\n               load_instance_and_run(instance_name, conf)\n       benchmark.compress()\n\nAnalyzing the data\n~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n   from algbench import describe, read_as_pandas, Benchmark\n\n   describe(\"./03_benchmark_data/\")\n\n\nOutput:\n\n::\n\n    result:\n   | num_vertices: 68\n   | num_edges: 697\n   | coloring:\n   || 0: 7\n   || 1: 8\n   || 2: 2\n   || 3: 5\n   || 4: 3\n   || 5: 7\n   || 6: 7\n   || 7: 6\n   || 8: 5\n   || 9: 4\n   || 10: 5\n   || 11: 4\n   || 12: 0\n   || 13: 6\n   || 14: 0\n   || 15: 3\n   || 16: 5\n   || 17: 5\n   || 18: 7\n   || 19: 0\n   || ...\n   | n_colors: 9\n    timestamp: 2023-05-25T21:58:39.201553\n    runtime: 0.002952098846435547\n    stdout:\n    stderr:\n    env_fingerprint: 53ad3b5b29d082d7e2bca6881ec9fe35fe441ae1\n    args_fingerprint: 10ce65b7a61d5ecbfcb1f4e390d72122f7a1f6ec\n    parameters:\n   | func: eval_greedy_alg\n   | args:\n   || instance_name: graph_0\n   || alg_params:\n   ||| strategy: largest_first\n   ||| interchange: True\n    argv: ['02_run_benchmark.py']\n    env:\n   | hostname: workstation-r7\n   | python_version: 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0]\n   | python: /home/krupke/anaconda3/envs/mo310/bin/python3\n   | cwd: /home/krupke/Repositories/AlgBench/examples/graph_coloring\n   | environment: [{'name': 'virtualenv', 'path': '/home/krupke/.local/lib/python3.10/site-pack...\n   | git_revision: 5357426feb4b49174c313ffa33e2cadf6a83e226\n   | python_file: /home/krupke/Repositories/AlgBench/examples/graph_coloring/02_run_benchmark.py\n\n\n\n\n.. code:: python\n\n   # we can also see the raw data of the first entry using `front`\n   Benchmark(\"./03_benchmark_data/\").front()\n\n\n\nOutput:\n\n::\n\n   {'result': {'num_vertices': 68,\n     'num_edges': 697,\n     'coloring': {'0': 7,\n      '1': 8,\n      '2': 2,\n      '3': 5,\n      '4': 3,\n      '5': 7,\n      '6': 7,\n      '7': 6,\n      '8': 5,\n      '9': 4,\n      '10': 5,\n      '11': 4,\n      '12': 0,\n      '13': 6,\n      '14': 0,\n      '15': 3,\n      '16': 5,\n      '17': 5,\n      '18': 7,\n      '19': 0,\n      '20': 2,\n      '21': 3,\n       ...},\n     'n_colors': 9},\n    'timestamp': '2023-05-25T21:58:39.201553',\n    'runtime': 0.002952098846435547,\n    'stdout': '',\n    'stderr': '',\n    'env_fingerprint': '53ad3b5b29d082d7e2bca6881ec9fe35fe441ae1',\n    'args_fingerprint': '10ce65b7a61d5ecbfcb1f4e390d72122f7a1f6ec',\n    'parameters': {'func': 'eval_greedy_alg',\n     'args': {'instance_name': 'graph_0',\n      'alg_params': {'strategy': 'largest_first', 'interchange': True}}},\n    'argv': ['02_run_benchmark.py'],\n    'env': {'hostname': 'workstation-r7',\n     'python_version': '3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0]',\n     'python': '/home/krupke/anaconda3/envs/mo310/bin/python3',\n     'cwd': '/home/krupke/Repositories/AlgBench/examples/graph_coloring',\n     'environment': [{'name': 'virtualenv',\n       'path': '/home/krupke/.local/lib/python3.10/site-packages',\n       'version': '20.14.1'},\n      {'name': 'cfgv',\n       'path': '/home/krupke/.local/lib/python3.10/site-packages',\n       'version': '3.3.1'},\n     ...],\n     'git_revision': '5357426feb4b49174c313ffa33e2cadf6a83e226',\n     'python_file': '/home/krupke/Repositories/AlgBench/examples/graph_coloring/02_run_benchmark.py'}}\n\n\n.. code:: python\n\n   # we can extract a full pandas tables using `read_as_pandas`\n   t = read_as_pandas(\n       \"./03_benchmark_data/\",\n       lambda result: {\n           \"instance\": result[\"parameters\"][\"args\"][\"instance_name\"],\n           \"strategy\": result[\"parameters\"][\"args\"][\"alg_params\"][\"strategy\"],\n           \"interchange\": result[\"parameters\"][\"args\"][\"alg_params\"].get(\n               \"interchange\", None\n           ),\n           \"colors\": result[\"result\"][\"n_colors\"],\n           \"runtime\": result[\"runtime\"],\n           \"num_vertices\": result[\"result\"][\"num_vertices\"],\n           \"num_edges\": result[\"result\"][\"num_edges\"],\n       },\n   )\n   print(t)\n\nOutput:\n\n::\n\n          instance                  strategy interchange  colors   runtime ...\n   0       graph_0             largest_first        True       9  0.002952\n   1       graph_0             largest_first       False      10  0.000183\n   2       graph_0         random_sequential        True       9  0.003562\n   3       graph_0         random_sequential       False      12  0.000173\n   4       graph_0             smallest_last        True       9  0.003813\n   ...         ...                       ...         ...     ...       ...\n   5995  graph_499  connected_sequential_bfs        True       3  0.000216\n   5996  graph_499  connected_sequential_bfs       False       3  0.000132\n   5997  graph_499  connected_sequential_dfs        True       3  0.000231\n   5998  graph_499  connected_sequential_dfs       False       4  0.000132\n   5999  graph_499  saturation_largest_first        None       3  0.000202\n\n\n   [6000 rows x 7 columns]\n\n\nWhich information is saved?\n---------------------------\n\nThe following information is saved automatically:\n\n-  function name\n-  all arguments that do not begin with “\\_” (use this to pass parsed\n   instances etc.)\n-  the returned values\n-  runtime\n-  current date and time\n-  hostname\n-  Python version\n-  Python binary path\n-  current working directory\n-  stdout and stderr\n-  all installed modules and their versions\n-  git revision\n-  path of the python file\n\nThings to be aware of\n---------------------\n\n-  Only function name and arguments not starting with “\\_” are used to\n   compare entries. If an argument (or part of it) is not\n   JSON-compatible, the string of it is used.\n-  Arguments and return values that cannot be translated to json are\n   converted to string in the database. The default string conversion\n   may not be very useful.\n-  The stdout/strerr capturing only works if Python’s stdout/stderr are\n   used. E.g., C++ write by default to the system’s stdout/stderr and\n   cannot be captured (if you have been wondering, why C++-modules have\n   a bad output it Jupyter-notebooks: this is the reason). PyBind11\n   allows you `to change that\n   behavior \u003chttps://pybind11.readthedocs.io/en/stable/advanced/pycpp/utilities.html#using-python-s-print-function-in-c\u003e`__.\n-  Global variables are not saved. Try to pass all important parameters\n   as function arguments, as they can also alter the benchmark and are\n   important to distinguish entries (e.g., you would want to recompute\n   an entry if the timelimit has been changed. This is only possible if\n   you tell algbench this by making it an argument).\n-  ‘sys.argv’ and the filename are saved, but not used for\n   distinguishing entries.\n\nOn doing good empirical evaluations of algorithms\n-------------------------------------------------\n\nTo get a feeling on the interesting instances and parameters, or\ngenerally on where to look deeper, you should first perform an\nexplorative study. For such an explorative study, you should select some\nrandom parameters and instances, and just look how the numbers look.\nIteratively change the parameters and instances, until you know what to\nevaluate properly. At that point, you can state some research questions\nand design corresponding workhorse studies to answer them.\n\nHere are some general hints:\n\n-  Do not mix algorithm code and experiment code, even if it saves you\n   rebuilding your package after every change. Such a mixed setup may\n   save you a command line, but it is harder to log and many problems\n   may remain unnoticed until you try to publish your algorithm. The\n   little overhead is worth it in the long run.\n-  Create a separate folder for every study. Don’t mix too much because\n   you want to reduce redundancies: Once things become complicated, you\n   may draw conclusions from the wrong data without noticing.\n-  Add a README.md into each folder that describes the study. At least\n   describe in a sentence, who created this study when in which context.\n-  Have separated, numerated files for preparing, running, processing,\n   checking, and evaluating the study.\n-  Extract a simplified pandas table from the database with only the\n   important data (e.g., stdout or environment information are only\n   necessary for debugging and don’t need to be shared for evaluation).\n   You can save pandas tables as ``.json.zip`` such that they are small\n   and can simply be added to your Git, even when the full data is too\n   large.\n-  The file for checking the generated data should also describe it.\n-  Use a separate Jupyter-notebook for each family of plots you want to\n   generate.\n-  Save the plots into files whose name you can easily trace back to the\n   generating notebook. You will probably copy them later into some\n   paper and half a year later, when you receive the reviews and want to\n   do some changes, you have to find the code that generated them.\n\n\nOn gaining more insights using logging\n---------------------------------------\n\nIf you develop complex algorithms, you often want to not only measure\nthe runtime of the whole algorithm, but also of its parts, as well as\nother information, such as the number of iterations, the current\nsolution, etc. You can use the Python logging framework for this. The\nlogging framework allows you to create loggers that can be configured\nindividually. You can also create a logger for each module and\nsubmodule, and configure them individually. You can further configure\nhandlers for the loggers, e.g., to write them to a file or to the\nconsole. The level of the loggers and handlers can also be configured,\nsuch that you can easily switch between different levels of logging.\nAlgBench allows you to capture the loggers and save them to the\ndatabase. You can then extract and analyze them.\n\nYou can also use simple ``print`` statements, but they are not as\nflexible as the logging framework. While AlgBench can actually\nadd the runtime to the print statements, it is not as easy to\nconfigure the output as with the logging framework. There is no\nway to disable the output for individual parts of your algorithm,\nor to change the level of the output. The logging framework is\nas easy to use as print statements, but much more flexible.\nIt can be more expensive, but ``print`` statements are also not\nfree and should be used with care.\n\nHere is an example for using the logging framework:\n\n.. code:: python\n\n   import logging\n\n\n   def my_alg():\n       logger = logging.getLogger(\"my_alg\")\n       logger.info(\"Starting my_alg\")\n       # do something\n       logger.info(\"Finished my_alg\")\n\n\n   logger = logging.getLogger(\"my_alg\")\n   logger.setLevel(logging.INFO)\n   logger.addHandler(logging.StreamHandler())\n   my_alg()\n\nA further advantage of the logging framework is that you can separate\nthe message structure from the data. This allows you to easily query\nfor specific events and directly extract the data you want to analyze.\n\n.. code:: python\n\n   logger.info(\"Submodule X needed %d iterations\", 42)\n\nWill be saved as a dictionary with a separate field for the message and\nthe data:\n\n::\n\n   {\n       \"msg\": \"Submodule X needed %d iterations\",\n       \"args\": [42],\n   }\n\nA further alternative is to use a dedicated class for stats that you\npass around. This is generally a good idea, but takes more work and\nrequires you to change the code. The logging framework is a good\ncompromise between flexibility and ease of use.\n\nIf your algorithm may be run in parallel or different contexts, you may want to allow\nto pass a logger to the algorithm. This allows you to create a\nseparate logger for each context to separate the logs.\n\n::\n\n   Note that AlgBench v2 automatically adds the runtime to print statments and log entries.\n\nUsing Git LFS for the data\n--------------------------\n\nThe data are large binary files. Use Git LFS to add them to your\nrepository more efficiently.\n\nYou can find a guide `here \u003chttps://git-lfs.com/\u003e`__ on how to install\nGit LFS.\n\nRun\n\n.. code:: bash\n\n   git lfs install\n\nto set up git LFS and\n\n.. code:: bash\n\n   git lfs track \"*.zip\"\n\nto manage all zips via LFS.\n\nAlternatively, you can also just edit ``.gitattributes`` by hand\n\n::\n\n   *.zip filter=lfs diff=lfs merge=lfs -text\n\nFinally, add ``.gitattributes`` to git via\n\n.. code:: bash\n\n   git add .gitattributes\n\nVersion History\n===============\n\n- **2.4.1** Fixes bug when path ends with `/`.\n- **2.4.0** Removing information on installed packages due to deprecated ``pkg_resources``. New apply-function.\n- **2.2.2** Fixing problem with Jupyter notebooks, because they may not have a ``__file__`` attribute.\n- **2.2.1** Should be able to deal with corrupt zip files now.\n- **2.2.0** Allowing to skip entries in ``read_as_pandas`` by returning a None for the row.\n- **2.1.0** More flexible stream handling. You can now disable the output saving and hidding. The default behavior still is to save the output with time stamps and hide it from the console.\n- **2.0.0** Extensive change of stdout/stderr handling and new logging functionality.\n   By default, stdout and stderr will now be saved with the runtime of the function.\n   Additionally, you can now capture loggers of the Python logging framework and save them to the database.\n   This is especially useful if you use a library that uses the logging framework. Prefere ``logging`` over ``print`` for logging information.\n-  **1.1.0** Some changes for efficiency turned out to be less robust in\n   case of, e.g., keyboard interrupt. Fixed that.\n-  **1.0.0** Changing the database layout, making it more efficient\n   (breaking change!).\n-  **0.2.0** Changing database slightly to contain meta data and doing\n   more caching. Saving some more information.\n-  **0.1.3** Fixed bug in arg fingerprint set.\n-  **0.1.2** Fixed bug with empty rows in pandas table.\n-  **0.1.1** Fixed bug with ``delete_if``.\n-  **0.1.0** First complete version\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fd-krupke%2Falgbench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fd-krupke%2Falgbench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fd-krupke%2Falgbench/lists"}