{"id":13542026,"url":"https://github.com/eXascaleInfolab/PyExPool","last_synced_at":"2025-04-02T09:33:14.151Z","repository":{"id":62580251,"uuid":"48866232","full_name":"eXascaleInfolab/PyExPool","owner":"eXascaleInfolab","description":"Python Multi-Process Execution Pool: concurrent asynchronous execution pool with custom resource constraints (memory, timeouts, affinity, CPU cores and caching), load balancing and profiling capabilities of the external apps on NUMA architecture","archived":false,"fork":false,"pushed_at":"2019-08-28T03:08:40.000Z","size":2187,"stargazers_count":164,"open_issues_count":1,"forks_count":12,"subscribers_count":15,"default_branch":"master","last_synced_at":"2024-10-29T22:47:24.829Z","etag":null,"topics":["application-framework","benchmarking-framework","cache-control","execution-pool","in-memory-computations","load-balancing","monitoring-server","multiprocessing","numa","parallel-computing","parallel-processing","task-queue"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eXascaleInfolab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-01-01T00:53:53.000Z","updated_at":"2024-09-19T09:34:46.000Z","dependencies_parsed_at":"2022-11-03T22:01:00.386Z","dependency_job_id":null,"html_url":"https://github.com/eXascaleInfolab/PyExPool","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eXascaleInfolab%2FPyExPool","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eXascaleInfolab%2FPyExPool/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eXascaleInfolab%2FPyExPool/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eXascaleInfolab%2FPyExPool/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eXascaleInfolab","download_url":"https://codeload.github.com/eXascaleInfolab/PyExPool/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246789067,"owners_count":20834225,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["application-framework","benchmarking-framework","cache-control","execution-pool","in-memory-computations","load-balancing","monitoring-server","multiprocessing","numa","parallel-computing","parallel-processing","task-queue"],"created_at":"2024-08-01T10:01:00.356Z","updated_at":"2025-04-02T09:33:13.539Z","avatar_url":"https://github.com/eXascaleInfolab.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# PyExPool\n\nA Lightweight Multi-Process Execution Pool with load balancing and customizable resource consumption constraints.\n\n\\author: (c) Artem Lutov \u003cartem@exascale.info\u003e  \n\\license:  [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)  \n\\organizations: [eXascale Infolab](http://exascale.info/), [Lumais](http://www.lumais.com/), [ScienceWise](http://sciencewise.info/)  \n\\date: 2015-07 v1, 2017-06 v2, 2018-05 v3  \n\\grants: Swiss National Science Foundation grant number `CRSII2_147609`, European Commission grant `Graphint 683253`\n\nBibTeX:\n```bibtex\n@misc{pyexpool,\n\tauthor = {Artem Lutov and Philippe Cudré-Mauroux},\n\turl = {https://github.com/eXascaleInfolab/PyExPool},\n\ttitle = {PyExPool-v.3: A Lightweight Execution Pool with Constraint-aware Load-Balancer.},\n\tyear = {2018}\n}\n```\n\n## Content  \u003c!-- omit in toc --\u003e\n- [Overview](#overview)\n- [Installation](#installation)\n- [Requirements](#requirements)\n- [API](#api)\n\t- [Job](#job)\n\t- [Task](#task)\n\t- [AffinityMask](#affinitymask)\n\t- [ExecPool](#execpool)\n\t- [Optional WebUi](#optional-webui)\n\t\t- [WebUiApp](#webuiapp)\n\t\t- [UiCmd](#uicmd)\n\t- [Accessory Routines](#accessory-routines)\n- [Usage](#usage)\n\t- [Usage Example](#usage-example)\n\t- [Failsafe Termination](#failsafe-termination)\n- [Related Projects](#related-projects)\n\n\n## Overview\n\nA Lightweight Multi-Process Execution Pool with load balancing to schedule Jobs execution with *per-job timeout*, optionally grouping them into Tasks and specifying optional execution parameters considering NUMA architecture peculiarities:\n- automatic rescheduling and *load balancing* (reduction) of the worker processes and on low memory condition for the *in-RAM computations* (requires [psutil](https://pypi.python.org/pypi/psutil), can be disabled)\n- *chained termination* of the related worker processes (started jobs) and non-started jobs rescheduling to satisfy *timeout* and *memory limit* constraints\n- automatic CPU affinity management and maximization of the dedicated CPU cache vs parallelization for a worker process\n- *timeout per each Job* (it was the main initial motivation to implement this module, because this feature is not provided by any Python implementation out of the box)\n- onstart/ondone *callbacks*, ondone is called only on successful completion (not termination) for both Jobs and Tasks (group of jobs)\n- stdout/err output, which can be redirected to any custom file or PIPE\n- custom parameters for each Job and respective owner Task besides the name/id\n\n\u003e Automatic rescheduling of the workers on low memory condition for the in-RAM computations is an optional and the only feature that requires an external package, [psutil](https://pypi.python.org/pypi/psutil).  \nAll scheduling jobs share the same CPU affinity policy, which is convenient for the benchmarking, but not so suitable for scheduling both single and multi-threaded apps with distinct demands for the CPU cache.\n\nAll main functionality is implemented as a *single-file module* to be *easily included into your project and customized as a part of your distribution* (like in [PyCaBeM](https://github.com/eXascaleInfolab/PyCABeM) to execute muliple apps in parralel on the dedicated CPU cores and avoiding their swapping from the main memory), also it can be installed as a library. An optional minimalistic Web interface is provided in the separate file to inspect and profile the load balancer and execution pool.  \nThe main purpose of the main single-file module is the **concurrent execution of modules and external executables with custom resource consumption constraints, cache / parallelization tuning and automatic balancing of the worker processes for the in memory computations on the single server**. PyExPool is typically used as an application framework for benchmarking or heavy-loaded multi-process execution activities on constrained computational resources.  \nIf the concurrent execution of *Python functions* is required, usage of external modules is not a problem and the automatic jobs scheduling for the in-RAM computations is not necessary, then a more handy and straightforward approach is to use [Pebble](https://pypi.python.org/pypi/Pebble) library. A pretty convenient transparent parallel computations are provided by the [Joblib](https://pythonhosted.org/joblib/). If a distributed task queue is required with advanced monitoring and reporting facilities then [Celery](http://www.celeryproject.org/) might be a good choice. For the comprehensive parallel computing [Dask](http://dask.pydata.org) is a good choice. For the parallel execution of only the shell scripts the [GNU parallel](https://en.wikipedia.org/wiki/GNU_parallel) might be a good option.  \nThe only another existing open-source load balancer I'm aware about, which has wider functionality than PyExPool (but can not be integrated into your Python scripts so seamlessly) is [Slurm Workload Manager](https://slurm.schedmd.com/overview.html).\n\nThe **load balancing** is enabled when the global variables `_LIMIT_WORKERS_RAM` and `_CHAINED_CONSTRAINTS` are set, jobs `.category` and relative `.size` (if known) specified. The balancing is performed to use as much RAM and CPU resources as possible performing in-RAM computations and meeting the specified timeout and memory constraints for each job and for the whole pool.  \nLarge executing jobs can be postponed for the later execution with less number of worker processes after completion of the smaller jobs. The number of workers is reduced automatically (balanced) on the jobs queue processing to meet memory constraints. It is recommended to add jobs in the order of the increasing memory/time complexity if possible to reduce the number of worker processes terminations on jobs postponing (rescheduling).\n\nDemo of the *scheduling with memory constraints* for the worker processes:\n![mpepool_memory](images/mpepool_mem.png)\n\nDemo of the *scheduling with cache L1 maximization* for single-threaded processes on the server with cross-node CPUs enumeration. Whole physical CPU core consisting of two hardware threads assigned to each worker process, so the L1 cache is dedicated (not shared), but the maximal loading over all CPUs is 50%:\n![mpepool_cacheL1_1](images/mpepool_cacheL1_1.png)\n![mpepool_cacheL1_2](images/mpepool_cacheL1_2.png)\n\nDemo of the WebUI for the Jobs and Tasks tracing and profiling:\n![WebUI, Failures page (root)](images/webui.png)\nExactly the same fully functional interface is accessible from the console using [w3m](http://w3m.sourceforge.net/) or other terminal browsers:\n![WebUI Console, Failures page (root)](images/webui_console.png)\nTo explore the WebUI demo execute the following testcase\n```sh\n$ MANUAL=1 python -m unittest mpetests.TestWebUI.test_failures\n```\nand open http://localhost:8081 (or :8080) in the browser.\n\n## Installation\n\nInclude the following modules:\n- [mpepool](mpepool.py)  - execution pool with load balancer, the only mandatory module,\n- [mpewui](mpewui.py)  - optional WebUI for the interactive profiling of the scheduled Jobs and Tasks.\n\nThese modules can be install either manually from [GitHub](https://github.com/eXascaleInfolab/PyExPool) or from the [pypi repository](https://pypi.org/project/pyexpool/):\n```sh\n$ pip install pyexpool\n```\n\u003e WebUI(`mpewui` module) renders interface from the bottle html templates located in the `.`, `./views/` or any other folder from the `bottle.TEMPLATE_PATH` list, where custom views can be placed to overwrite the default pages.\n\nAdditionally, [hwloc / lstopo](http://www.admin-magazine.com/HPC/Articles/hwloc-Which-Processor-Is-Running-Your-Service) should be installed if customized CPU affinity masking and cache control are required, see [Requirements](#requirements) section.\n\n\n## Requirements\n\nMulti-Process Execution Pool *can be run without any external modules* with automatically disabled load balancing.  \nThe external modules / apps are required only for the extended functionality:\n- Platform-specific requirements:\n  - [hwloc](http://www.admin-magazine.com/HPC/Articles/hwloc-Which-Processor-Is-Running-Your-Service) (includes `lstopo`) is required to identify enumeration type of logical CPUs to perform correct CPU affinity masking. Required only for the automatic affinity masking with cache usage optimization and only if the CPU enumeration type is not specified manually.\n\t```sh\n\t$ sudo apt-get install -y hwloc\n\t```\n- Cross-platform Python requirements:\n\t- [psutil](https://pypi.python.org/pypi/psutil) is required for the dynamic jobs balancing to perform the in-RAM computations (`_LIMIT_WORKERS_RAM = True`) and limit memory consumption of the workers.\n\t\t```sh\n\t\t$ sudo pip install psutil\n\t\t```\n\t\t\u003e To perform in-memory computations dedicating almost all available RAM (specifying *memlimit ~= physical memory*), it is recommended to set swappiness to 1 .. 10: `$ sudo sysctl -w vm.swappiness=5` or set it permanently in `/etc/sysctl.conf`: `vm.swappiness = 5`.\n\t- [bottle](http://bottlepy.org) is required for the minimalistic optional WebUI to monitor executing jobs.\n\t\t```sh\n\t\t$ sudo pip install bottle\n\t\t```\n\t\t\u003e WebUI(`mpewui` module) renders interface from the bottle html templates located in the `.`, `./views/` or any other folder from the `bottle.TEMPLATE_PATH` list, where custom views can be placed to overwrite the default pages.\n\t\t\n\t- [mock](https://pypi.python.org/pypi/mock) is required exclusively for the unit testing under Python2, `mock` is included in the standard lib of Python3.\n\t\t```sh\n\t\t$ sudo pip install mock\n\t\t```\n\nAll Python requirements are optional and installed automatically from the `pip` distribution (`$ pip install pyexpool`) or  can be installed manually from the `pyreqsopt.txt` file:\n```sh\n$ sudo pip install -r pyreqsopt.txt\n```\n\u003e `lstopo` app of `hwloc` package is a system requirement and should be installed manually from the system-specific package repository or built from the [sources](https://www.open-mpi.org/projects/hwloc/).\n\n\n## API\n\nFlexible API provides *automatic CPU affinity management, maximization of the dedicated CPU cache, limitation of the minimal dedicated RAM per worker process, balancing of the worker processes and rescheduling of chains of the related jobs on low memory condition for the in-RAM computations*, optional automatic restart of jobs on timeout, access to job's process, parent task, start and stop execution time and more...  \n`ExecPool` represents a pool of worker processes to execute `Job`s that can be grouped into the hierarchy of `Tasks`s for more flexible management.\n\n\n### Job\n\n```python\n# Global Parameters\n# Limit the amount of memory (\u003c= RAM) used by worker processes\n# NOTE: requires import of psutils\n_LIMIT_WORKERS_RAM = True\n\n# Use chained constraints (timeout and memory limitation) in jobs to terminate\n# also related worker processes and/or reschedule jobs, which have the same\n# category and heavier than the origin violating the constraints\n CHAINED_CONSTRAINTS = True\n\n\nJob(name, workdir=None, args=(), timeout=0, rsrtonto=False, task=None #,*\n\t, startdelay=0., onstart=None, ondone=None, onfinish=None, params=None, category=None, size=0, slowdown=1.\n\t, omitafn=False, memkind=1, memlim=0., stdout=sys.stdout, stderr=sys.stderr, poutlog=None, perrlog=None):\n\t\"\"\"Initialize job to be executed\n\n\tJob is executed in a separate process via Popen or Process object and is\n\tmanaged by the Process Pool Executor\n\n\t\tMain parameters:\n\t\tname: str  - job name\n\t\tworkdir  - working directory for the corresponding process, None means the dir of the benchmarking\n\t\targs  - execution arguments including the executable itself for the process\n\t\t\tNOTE: can be None to make make a stub process and execute the callbacks\n\t\ttimeout  - execution timeout in seconds. Default: 0, means infinity\n\t\trsrtonto  - restart the job on timeout, Default: False. Can be used for\n\t\t\tnon-deterministic Jobs like generation of the synthetic networks to regenerate\n\t\t\tthe network on border cases overcoming getting stuck on specific values of the rand variables.\n\t\ttask: Task  - origin task if this job is a part of the task\n\t\tstartdelay  - delay after the job process starting to execute it for some time,\n\t\t\texecuted in the CONTEXT OF THE CALLER (main process).\n\t\t\tATTENTION: should be small (0.1 .. 1 sec)\n\t\tonstart  - a callback, which is executed on the job starting (before the execution\n\t\t\tstarted) in the CONTEXT OF THE CALLER (main process) with the single argument,\n\t\t\tthe job. Default: None.\n\t\t\tIf onstart() raises an exception then the job is completed before been started (.proc = None)\n\t\t\treturning the error code (can be 0) and tracing the cause to the stderr.\n\t\t\tATTENTION: must be lightweight\n\t\t\tNOTE:\n\t\t\t\t- It can be executed several times if the job is restarted on timeout\n\t\t\t\t- Most of the runtime job attributes are not defined yet\n\t\tondone  - a callback, which is executed on successful completion of the job in the\n\t\t\tCONTEXT OF THE CALLER (main process) with the single argument, the job. Default: None\n\t\t\tATTENTION: must be lightweight\n\t\tonfinish  - a callback, which is executed on either completion or termination of the job in the\n\t\t\tCONTEXT OF THE CALLER (main process) with the single argument, the job. Default: None\n\t\t\tATTENTION: must be lightweight\n\t\tparams  - additional parameters to be used in callbacks\n\t\tstdout  - None or file name or PIPE for the buffered output to be APPENDED.\n\t\t\tThe path is interpreted in the CONTEXT of the CALLER\n\t\tstderr  - None or file name or PIPE or STDOUT for the unbuffered error output to be APPENDED\n\t\t\tATTENTION: PIPE is a buffer in RAM, so do not use it if the output data is huge or unlimited.\n\t\t\tThe path is interpreted in the CONTEXT of the CALLER\n\t\tpoutlog: str  - file name to log non-empty piped stdout pre-pended with the timestamp. Actual only if stdout is PIPE.\n\t\tperrlog: str  - file name to log non-empty piped stderr pre-pended with the timestamp. Actual only if stderr is PIPE.\n\n\t\tScheduling parameters:\n\t\tomitafn  - omit affinity policy of the scheduler, which is actual when the affinity is enabled\n\t\t\tand the process has multiple treads\n\t\tcategory  - classification category, typically semantic context or part of the name,\n\t\t\tused to identify related jobs;\n\t\t\trequires _CHAINED_CONSTRAINTS\n\t\tsize  - expected relative memory complexity of the jobs of the same category,\n\t\t\ttypically it is size of the processing data, \u003e= 0, 0 means undefined size\n\t\t\tand prevents jobs chaining on constraints violation;\n\t\t\tused on _LIMIT_WORKERS_RAM or _CHAINED_CONSTRAINTS\n\t\tslowdown  - execution slowdown ratio, \u003e= 0, where (0, 1) - speedup, \u003e 1 - slowdown; 1 by default;\n\t\t\tused for the accurate timeout estimation of the jobs having the same .category and .size.\n\t\t\trequires _CHAINED_CONSTRAINTS\n\t\tmemkind  - kind of memory to be evaluated (average of virtual and resident memory\n\t\t\tto not overestimate the instant potential consumption of RAM):\n\t\t\t0  - mem for the process itself omitting the spawned sub-processes (if any)\n\t\t\t1  - mem for the heaviest process of the process tree spawned by the original process\n\t\t\t\t(including the origin itself)\n\t\t\t2  - mem for the whole spawned process tree including the origin process\n\t\tmemlim: float  - max amount of memory in GB allowed for the job execution, 0 - unlimited\n\n\t\tExecution parameters, initialized automatically on execution:\n\t\ttstart  - start time, filled automatically on the execution start (before onstart). Default: None\n\t\ttstop  - termination / completion time after ondone\n\t\t\tNOTE: onstart() and ondone() callbacks execution is included in the job execution time\n\t\tproc  - process of the job, can be used in the ondone() to read its PIPE\n\t\tpipedout  - contains output from the PIPE supplied to stdout if any, None otherwise\n\t\t\tNOTE: pipedout is used to avoid a deadlock waiting on the process completion having a piped stdout\n\t\t\thttps://docs.python.org/3/library/subprocess.html#subprocess.Popen.wait\n\t\tpipederr  - contains output from the PIPE supplied to stderr if any, None otherwise\n\t\t\tNOTE: pipederr is used to avoid a deadlock waiting on the process completion having a piped stderr\n\t\t\thttps://docs.python.org/3/library/subprocess.html#subprocess.Popen.wait\n\t\tmem  - consuming memory (smooth max of average of VMS and RSS, not just the current value)\n\t\t\tor the least expected value inherited from the jobs of the same category having non-smaller size;\n\t\t\trequires _LIMIT_WORKERS_RAM\n\t\tterminates  - accumulated number of the received termination requests caused by the constraints violation\n\t\t\tNOTE: \u003e 0 (1 .. ExecPool._KILLDELAY) for the apps terminated by the execution pool\n\t\t\t\t(resource constrains violation or ExecPool exception),\n\t\t\t\t== 0 for the crashed apps\n\t\twkslim  - worker processes limit (max number) on the job postponing if any,\n\t\t\tthe job is postponed until at most this number of worker processes operate;\n\t\t\trequires _LIMIT_WORKERS_RAM\n\t\tchtermtime  - chained termination: None - disabled, False - by memory, True - by time;\n\t\t\trequires _CHAINED_CONSTRAINTS\n\t\"\"\"\n```\n\n### Task\n```python\nTask(name, timeout=0, onstart=None, ondone=None, onfinish=None, params=None\n\t, task=None, latency=1.5, stdout=sys.stdout, stderr=sys.stderr):\n\t\"\"\"Initialize task, which is a group of subtasks including jobs to be executed\n\n\tTask is a managing container for subtasks and Jobs.\n\tNote: the task is considered to be failed if at least one subtask / job is failed\n\t(terminated or completed with non-zero return code).\n\n\tname: str  - task name\n\ttimeout  - execution timeout in seconds. Default: 0, means infinity. ATTENTION: not implemented\n\tonstart  - a callback, which is executed on the task start (before the subtasks/jobs execution\n\t\tstarted) in the CONTEXT OF THE CALLER (main process) with the single argument,\n\t\tthe task. Default: None\n\t\tATTENTION: must be lightweight\n\tondone  - a callback, which is executed on the SUCCESSFUL completion of the task in the\n\t\tCONTEXT OF THE CALLER (main process) with the single argument, the task. Default: None\n\t\tATTENTION: must be lightweight\n\tonfinish  - a callback, which is executed on either completion or termination of the task in the\n\t\tCONTEXT OF THE CALLER (main process) with the single argument, the task. Default: None\n\t\tATTENTION: must be lightweight\n\tparams  - additional parameters to be used in callbacks\n\ttask: Task  - optional owner super-task\n\tlatency: float  - lock timeout in seconds: None means infinite,\n\t\t\u003c= 0 means non-bocking, \u003e 0 is the actual timeout\n\tstdout  - None or file name or PIPE for the buffered output to be APPENDED\n\tstderr  - None or file name or PIPE or STDOUT for the unbuffered error output to be APPENDED\n\t\tATTENTION: PIPE is a buffer in RAM, so do not use it if the output data is huge or unlimited\n\n\tAutomatically initialized and updated properties:\n\ttstart  - start time is filled automatically on the execution start (before onstart). Default: None\n\ttstop  - termination / completion time after ondone.\n\tnumadded: uint  - the number of direct added subtasks\n\tnumdone: uint  - the number of completed DIRECT subtasks\n\t\t(each subtask may contain multiple jobs or sub-sub-tasks)\n\tnumterm: uint  - the number of terminated direct subtasks (including jobs) that are not restarting\n\t\tnumdone + numterm \u003c= numadded\n\t\"\"\"\n```\n\n### AffinityMask\n```python\nAffinityMask(afnstep, first=True, sequential=cpusequential())\n\t\"\"\"Affinity mask\n\n\tAffinity table is a reduced CPU table by the non-primary HW treads in each core.\n\tTypically, CPUs are enumerated across the nodes:\n\tNUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30\n\tNUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31\n\tIn case the number of HW threads per core is 2 then the physical CPU cores are 1 .. 15:\n\tNUMA node0 CPU(s):     0,2,4,6,8,10,12,14\t(16,18,20,22,24,26,28,30  - 2nd HW treads)\n\tNUMA node1 CPU(s):     1,3,5,7,9,11,13,15\t(17,19,21,23,25,27,29,31  - 2nd HW treads)\n\tBut the enumeration can be also sequential:\n\tNUMA node0 CPU(s):     0,(1),2,(3),...\n\t...\n\n\tHardware threads share all levels of the CPU cache, physical CPU cores share only the\n\tlast level of the CPU cache (L2/3).\n\tThe number of worker processes in the pool should be equal to the:\n\t- physical CPU cores for the cache L1/2 maximization\n\t- NUMA nodes for the cache L2/3 maximization\n\n\tNOTE: `hwloc` utility can be used to detect the type of logical CPUs enumeration:\n\t`$ sudo apt-get install hwloc`\n\tSee details: http://www.admin-magazine.com/HPC/Articles/hwloc-Which-Processor-Is-Running-Your-Service\n\n\tafnstep: int  - affinity step, integer if applied, allowed values:\n\t\t1, CORE_THREADS * n,  n E {1, 2, ... CPUS / (NODES * CORE_THREADS)}\n\n\t\tUsed to bind worker processes to the logical CPUs to have warm cache and,\n\t\toptionally, maximize cache size per a worker process.\n\t\tGroups of logical CPUs are selected in a way to maximize the cache locality:\n\t\tthe single physical CPU is used taking all its hardware threads in each core\n\t\tbefore allocating another core.\n\n\t\tTypical Values:\n\t\t1  - maximize parallelization for the single-threaded apps\n\t\t\t(the number of worker processes = logical CPUs)\n\t\tCORE_THREADS  - maximize the dedicated CPU cache L1/2\n\t\t\t(the number of worker processes = physical CPU cores)\n\t\tCPUS / NODES  - maximize the dedicated CPU cache L3\n\t\t\t(the number of worker processes = physical CPUs)\n\tfirst  - mask the first logical unit or all units in the selected group.\n\t\tOne unit per the group maximizes the dedicated CPU cache for the\n\t\tsingle-threaded worker, all units should be used for the multi-threaded\n\t\tapps.\n\tsequential  - sequential or cross nodes enumeration of the CPUs in the NUMA nodes:\n\t\tNone  - undefined, interpreted as cross-nodes (the most widely used on servers)\n\t\tFalse  - cross-nodes\n\t\tTrue  - sequential\n\n\t\tFor two hardware threads per a physical CPU core, where secondary HW threads\n\t\tare taken in brackets:\n\t\tCrossnodes enumeration, often used for the server CPUs\n\t\tNUMA node0 CPU(s):     0,2(,4,6)\n\t\tNUMA node1 CPU(s):     1,3(,5,7)\n\t\tSequential enumeration, often used for the laptop CPUs\n\t\tNUMA node0 CPU(s):     0(,1),2(,3)\n\t\tNUMA node1 CPU(s):     4(,5),6(,7)\n\t\"\"\"\n```\n\n### ExecPool\n```python\nExecPool(wksnum=max(cpu_count()-1, 1), afnmask=None, memlimit=0., latency=0., name=None, webuiapp=None)\n\t\"\"\"Multi-process execution pool of jobs\n\n\tA worker in the pool executes only a single job, a new worker is created for\n\teach subsequent job.\n\n\twksnum: int  - number of resident worker processes, \u003e=1. The reasonable\n\t\tvalue \u003c= logical CPUs (returned by cpu_count()) = NUMA nodes * node CPUs,\n\t\twhere node CPUs = CPU cores * HW treads per core.\n\t\tThe recommended value is max(cpu_count() - 1, 1) to leave one logical\n\t\tCPU for the benchmarking framework and OS applications.\n\n\t\tTo guarantee minimal average RAM per a process, for example 2.5 GB\n\t\twithout _LIMIT_WORKERS_RAM flag (not using psutil for the dynamic\n\t\tcontrol of memory consumption):\n\t\t\twksnum = min(cpu_count(), max(ramfracs(2.5), 1))\n\tafnmask  - affinity mask for the worker processes, AffinityMask\n\t\tNone if not applied\n\tmemlimit  - limit total amount of Memory (automatically reduced to\n\t\tthe amount of physical RAM if the larger value is specified) in gigabytes\n\t\tthat can be used by worker processes to provide in-RAM computations, \u003e= 0.\n\t\tDynamically reduces the number of workers to consume not more memory\n\t\tthan specified. The workers are rescheduled starting from the\n\t\tmost memory-heavy processes.\n\t\tNOTE:\n\t\t\t- applicable only if _LIMIT_WORKERS_RAM\n\t\t\t- 0 means unlimited (some jobs might be [partially] swapped)\n\t\t\t- value \u003e 0 is automatically limited with total physical RAM to process\n\t\t\t\tjobs in RAM almost without the swapping\n\tlatency  - approximate minimal latency of the workers monitoring in sec, float \u003e= 0;\n\t\t0 means automatically defined value (recommended, typically 2-3 sec)\n\tname  - name of the execution pool to distinguish traces from subsequently\n\t\tcreated execution pools (only on creation or termination)\n\twebuiapp: WebUiApp  - WebUI app to inspect load balancer remotely\n\n\tInternal attributes:\n\talive  - whether the execution pool is alive or terminating, bool.\n\t\tShould be reseted to True on reuse after the termination.\n\t\tNOTE: should be reseted to True if the execution pool is reused\n\t\tafter the joining or termination.\n\tfailures: [JobInfo]  - failed (terminated or crashed) jobs with timestamps.\n\t\tNOTE: failures contain both terminated, crashed jobs that jobs completed with non-zero return code\n\t\texcluding the jobs terminated by timeout that have set .rsrtonto (will be restarted)\n\tjobsdone: uint  - the number of successfully completed (non-terminated) jobs with zero code\n\ttasks: set(Task)  - tasks associated with the scheduled jobs\n\t\"\"\"\n\n\texecute(job, concur=True):\n\t\t\"\"\"Schedule the job for the execution\n\n\t\tjob: Job  - the job to be executed, instance of Job\n\t\tconcur: bool  - concurrent execution or wait until execution completed\n\t\t\t NOTE: concurrent tasks are started at once\n\t\treturn int  - 0 on successful execution, process return code otherwise\n\t\t\"\"\"\n\n\tjoin(timeout=0):\n\t\t\"\"\"Execution cycle\n\n\t\ttimeout: int  - execution timeout in seconds before the workers termination, \u003e= 0.\n\t\t\t0 means unlimited time. The time is measured SINCE the first job\n\t\t\twas scheduled UNTIL the completion of all scheduled jobs.\n\t\treturn bool  - True on graceful completion, False on termination by the specified\n\t\t\tconstraints (timeout, memory limit, etc.)\n\t\t\"\"\"\n\n\tclear():\n\t\t\"\"\"Clear execution pool to reuse it\n\n\t\tRaises:\n\t\t\tValueError: attempt to clear a terminating execution pool\n\t\t\"\"\"\n\n\t__del__():\n\t\t\"\"\"Force termination of the pool\"\"\"\n\n\t__finalize__():\n\t\t\"\"\"Force termination of the pool\"\"\"\n```\n\n### Optional WebUi\n\nA simple Web UI is designed to profile Jobs and Tasks, interactively trace their failures and resource consumption. It is implemented in the optional module [mpewui](mpewui.py) and can be spawned by instantiating the `WebUiApp` class. A dedicated `WebUiApp` instance can be created per each `ExecPool`, serving the interfaces on the dedicated addresses (host:port). However, typically, a *single global instance of `WebUiApp` is created and supplied to all employed `ExecPool` instances*.  \nWeb UI module requires HTML templates installed by default from the `pip` distribution, which can be overwritten with the custom pages located in the [views](./views/) directory.\n\nSee [WebUI queries manual](views/restapi.md) for API details. An example of the WebUI usage is shown in the `mpetests.TestWebUI.test_failures` of the [mpetests](mpetests.py).\n\u003c!-- webui.md#webui-queries --\u003e\n\u003e `WebUiApp` instance works in the *dedicated thread* of the load balancer application and designed for the internal profiling with relatively small number of queries but not as a public web interface for the huge number of clients.  \n\u003e *WARNING: high loading of the WebUI may increase latency of the load balancer.*\n\n#### WebUiApp\n\n```python\nWebUiApp(host='localhost', port=8080, name=None, daemon=None, group=None, args=(), kwargs={})\n\t\"\"\"WebUI App starting in the dedicated thread and providing remote interface to inspect ExecPool\n\n\tATTENTION: Once constructed, the WebUI App lives in the dedicated thread until the main program exit.\n\n\tArgs:\n\t\tuihost: str  - Web UI host\n\t\tuiport: uint16  - Web UI port\n\t\tname: str  - The thread name. By default, a unique name\n\t\t\tis constructed of the form “Thread-N” where N is a small decimal number.\n\t\tdaemon: bool  - Start the thread in the daemon mode to\n\t\t\tbe automatically terminated on the main app exit.\n\t\tgroup  - Reserved for future extension\n\t\t\twhen a ThreadGroup class is implemented.\n\t\targs: tuple  - The argument tuple for the target invocation.\n\t\tkwargs: dict  - A dictionary of keyword arguments for the target invocation.\n\n\tInternal attributes:\n\t\tcmd: UiCmd  - UI command to be executed, which includes (reserved) attribute(s) for the invocation result.\n\t\"\"\"\n```\n\n#### UiCmd\n```python\nUiCmdId = IntEnum('UiCmdId', 'FAILURES LIST_JOBS LIST_TASKS API_MANUAL')\n\"\"\"UI Command Identifier associated with the REST URL\"\"\"\n```\n\n\n### Accessory Routines\n```python\ndef ramfracs(fracsize):\n\t\"\"\"Evaluate the minimal number of RAM fractions of the specified size in GB\n\n\tUsed to estimate the reasonable number of processes with the specified minimal\n\tdedicated RAM.\n\n\tfracsize  - minimal size of each fraction in GB, can be a fractional number\n\treturn the minimal number of RAM fractions having the specified size in GB\n\t\"\"\"\n\ndef cpucorethreads():\n\t\"\"\"The number of hardware treads per a CPU core\n\n\tUsed to specify CPU affinity dedicating the maximal amount of CPU cache L1/2.\n\t\"\"\"\n\ndef cpunodes():\n\t\"\"\"The number of NUMA nodes, where CPUs are located\n\n\tUsed to evaluate CPU index from the affinity table index considering the NUMA architecture.\n\t\"\"\"\n\t\ndef cpusequential():\n\t\"\"\"Enumeration type of the logical CPUs: cross-nodes or sequential\n\n\tThe enumeration can be cross-nodes starting with one hardware thread per each\n\tNUMA node, or sequential by enumerating all cores and hardware threads in each\n\tNUMA node first.\n\tFor two hardware threads per a physical CPU core, where secondary hw threads\n\tare taken in brackets:\n\t\tCrossnodes enumeration, often used for the server CPUs\n\t\tNUMA node0 CPU(s):     0,2(,4,6)\t\t=\u003e PU L#1 (P#4)\n\t\tNUMA node1 CPU(s):     1,3(,5,7)\n\t\tSequential enumeration, often used for the laptop CPUs\n\t\tNUMA node0 CPU(s):     0(,1),2(,3)\t\t=\u003e PU L#1 (P#1)  - indicates sequential\n\t\tNUMA node1 CPU(s):     4(,5),6(,7)\n\tATTENTION: `hwloc` utility is required to detect the type of logical CPUs\n\tenumeration:  `$ sudo apt-get install hwloc`\n\tSee details: http://www.admin-magazine.com/HPC/Articles/hwloc-Which-Processor-Is-Running-Your-Service\n\n\treturn  - enumeration type of the logical CPUs, bool or None:\n\t\tNone  - was not defined, most likely cross-nodes\n\t\tFalse  - cross-nodes\n\t\tTrue  - sequential\n\t\"\"\"\n```\n\n\n## Usage\n\nTarget version of the Python is 2.7+ including 3.x, also works fine on PyPy.\n\nThe workflow consists of the following steps:\n\n1. Create Execution Pool.\n2. Create and schedule Jobs with required parameters, callbacks and optionally packing them into Tasks.\n3. Wait on Execution pool until all the jobs are completed or terminated, or until the global timeout is elapsed.\n\nSee [unit tests](mpetests.py) (`TestExecPool`, `TestProcMemTree`, `TestTasks` classes) for the advanced examples.\n\n\n### Usage Example\n```python\nfrom multiprocessing import cpu_count\nfrom sys import executable as PYEXEC  # Full path to the current Python interpreter\nfrom mpepool import AffinityMask, ExecPool, Job, Task  # Import all required classes\n\n# 1. Create Multi-process execution pool with the optimal affinity step to maximize the dedicated CPU cache size\nexecpool = ExecPool(max(cpu_count() - 1, 1), cpucorethreads())\nglobal_timeout = 30 * 60  # 30 min, timeout to execute all scheduled jobs or terminate them\n\n\n# 2. Schedule jobs execution in the pool\n\n# 2.a Job scheduling using external executable: \"ls -la\"\nexecpool.execute(Job(name='list_dir', args=('ls', '-la')))\n\n\n# 2.b Job scheduling using python function / code fragment,\n# which is not a goal of the design, but is possible.\n\n# 2.b.1 Create the job with specified parameters\njobname = 'NetShuffling'\njobtimeout = 3 * 60  # 3 min\n\n# The network shuffling routine to be scheduled as a job,\n# which can also be a call of any external executable (see 2.alt below)\nargs = (PYEXEC, '-c',\n\"\"\"import os\nimport subprocess\n\nbasenet = '{jobname}' + '{_EXTNETFILE}'\n#print('basenet:', basenet, file=sys.stderr)\nfor i in range(1, {shufnum} + 1):\n\tnetfile = ''.join(('{jobname}', '.', str(i), '{_EXTNETFILE}'))\n\tif {overwrite} or not os.path.exists(netfile):\n\t\t# sort -R pgp_udir.net -o pgp_udir_rand3.net\n\t\tsubprocess.call(('sort', '-R', basenet, '-o', netfile))\n\"\"\".format(jobname=jobname, _EXTNETFILE='.net', shufnum=5, overwrite=False))\n\n# 2.b.2 Schedule the job execution, which might be postponed\n# if there are no any free executor processes available\nexecpool.execute(Job(name=jobname, workdir='this_sub_dir', args=args, timeout=jobtimeout\n\t# Note: onstart/ondone callbacks, custom parameters and others can be also specified here!\n))\n\n# Add another jobs\n# ...\n\n\n# 3. Wait for the jobs execution for the specified timeout at most\nexecpool.join(global_timeout)  # 30 min\n```\n\nIn case the execution pool is required locally then it can be used in the following way:\n```python\n...\n# Limit of the memory consumption for the all worker processes with max(32 GB, RAM)\n# and provide latency of 1.5 sec for the jobs rescheduling\nwith ExecPool(max(cpu_count()-1, 1), vmlimit=32, latency=1.5) as xpool:\n\tjob = Job('jmem_proc', args=(PYEXEC, '-c', TestProcMemTree.allocAndSpawnProg(\n\t\tallocDelayProg(inBytes(amem), duration), allocDelayProg(inBytes(camem), duration)))\n\t\t, timeout=timeout, memkind=0, ondone=mock.MagicMock())\n\tjobx = Job('jmem_max-subproc', args=(PYEXEC, '-c', TestProcMemTree.allocAndSpawnProg(\n\t\tallocDelayProg(inBytes(amem), duration), allocDelayProg(inBytes(camem), duration)))\n\t\t, timeout=timeout, memkind=1, ondone=mock.MagicMock())\n\t...\n\txpool.execute(job)\n\txpool.execute(jobx)\n\t...\n\txpool.join(10)  # Timeout for the execution of all jobs is 10 sec [+latency]\n```\nThe code shown above is fetched from the [`TestProcMemTree` unit test](mpetests.py).\n\n\n### Failsafe Termination\nTo perform *graceful termination* of the Jobs in case of external termination of your program, signal handlers can be set:\n```python\nimport signal  # Intercept kill signals\n\n# Use execpool as a global variable, which is set to None when all jobs are done,\n# and recreated on jobs scheduling\nexecpool = None\n\ndef terminationHandler(signal=None, frame=None, terminate=True):\n\t\"\"\"Signal termination handler\n\n\tsignal  - raised signal\n\tframe  - origin stack frame\n\tterminate  - whether to terminate the application\n\t\"\"\"\n\tglobal execpool\n\n\tif execpool:\n\t\tdel execpool  # Destructors are called later\n\t\t# Define _execpool to avoid unnecessary trash in the error log, which might\n\t\t# be caused by the attempt of subsequent deletion on destruction\n\t\texecpool = None  # Note: otherwise _execpool becomes undefined\n\tif terminate:\n\t\tsys.exit()  # exit(0), 0 is the default exit code.\n\n# Set handlers of external signals, which can be the first lines inside\n# if __name__ == '__main__':\nsignal.signal(signal.SIGTERM, terminationHandler)\nsignal.signal(signal.SIGHUP, terminationHandler)\nsignal.signal(signal.SIGINT, terminationHandler)\nsignal.signal(signal.SIGQUIT, terminationHandler)\nsignal.signal(signal.SIGABRT, terminationHandler)\n\n# Ignore terminated children procs to avoid zombies\n# ATTENTION: signal.SIG_IGN affects the return code of the former zombie resetting it to 0,\n# where signal.SIG_DFL works fine and without any the side effects.\nsignal.signal(signal.SIGCHLD, signal.SIG_DFL)\n\n# Define execpool to schedule some jobs\nexecpool = ExecPool(max(cpu_count() - 1, 1))\n\n# Failsafe usage of execpool ...\n```\nAlso it is recommended to register the termination handler for the normal interpreter termination using [**atexit**](https://docs.python.org/2/library/atexit.html):\n```python\nimport atexit\n...\n# Set termination handler for the internal termination\natexit.register(terminationHandler, terminate=False)\n```\n\n**Note:** Please, [star this project](//github.com/eXascaleInfolab/PyExPool) if you use it.\n\n\n## Related Projects\n- [ExecTime](https://bitbucket.org/lumais/exectime)  -  *failover* lightweight resource consumption profiler (*timings and memory*), applicable to multiple processes with optional *per-process results labeling* and synchronized *output to the specified file* or `stderr`\n- [PyCABeM](https://github.com/eXascaleInfolab/PyCABeM) - Python Benchmarking Framework for the Clustering Algorithms Evaluation. Uses extrinsic (NMIs) and intrinsic (Q) measures for the clusters quality evaluation considering overlaps (nodes membership by multiple clusters).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FeXascaleInfolab%2FPyExPool","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FeXascaleInfolab%2FPyExPool","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FeXascaleInfolab%2FPyExPool/lists"}