{"id":16907918,"url":"https://github.com/cjrh/coroexecutor","last_synced_at":"2025-03-23T17:30:43.142Z","repository":{"id":51132168,"uuid":"216823621","full_name":"cjrh/coroexecutor","owner":"cjrh","description":"A CoroutineExecutor for asyncio, similar to nurseries and task groups","archived":false,"fork":false,"pushed_at":"2022-08-20T12:34:31.000Z","size":35,"stargazers_count":13,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-18T22:09:13.476Z","etag":null,"topics":["asyncio","coroutines","taskgroup","taskgroups"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cjrh.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":["cjrh"]}},"created_at":"2019-10-22T13:41:28.000Z","updated_at":"2023-05-18T00:39:01.000Z","dependencies_parsed_at":"2022-08-26T03:25:05.464Z","dependency_job_id":null,"html_url":"https://github.com/cjrh/coroexecutor","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cjrh%2Fcoroexecutor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cjrh%2Fcoroexecutor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cjrh%2Fcoroexecutor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cjrh%2Fcoroexecutor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cjrh","download_url":"https://codeload.github.com/cjrh/coroexecutor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245140704,"owners_count":20567431,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asyncio","coroutines","taskgroup","taskgroups"],"created_at":"2024-10-13T18:49:20.283Z","updated_at":"2025-03-23T17:30:42.878Z","avatar_url":"https://github.com/cjrh.png","language":"Python","funding_links":["https://github.com/sponsors/cjrh"],"categories":[],"sub_categories":[],"readme":".. image:: https://github.com/cjrh/coroexecutor/workflows/Python%20application/badge.svg\n    :target: https://github.com/cjrh/coroexecutor/actions\n\n.. image:: https://img.shields.io/badge/stdlib--only-yes-green.svg\n    :target: https://img.shields.io/badge/stdlib--only-yes-green.svg\n\n.. image:: https://coveralls.io/repos/github/cjrh/coroexecutor/badge.svg?branch=master\n    :target: https://coveralls.io/github/cjrh/coroexecutor?branch=master\n\n.. image:: https://img.shields.io/pypi/pyversions/coroexecutor.svg\n    :target: https://pypi.python.org/pypi/coroexecutor\n\n.. image:: https://img.shields.io/github/tag/cjrh/coroexecutor.svg\n    :target: https://img.shields.io/github/tag/cjrh/coroexecutor.svg\n\n.. image:: https://img.shields.io/badge/install-pip%20install%20coroexecutor-ff69b4.svg\n    :target: https://img.shields.io/badge/install-pip%20install%20coroexecutor-ff69b4.svg\n\n.. image:: https://img.shields.io/pypi/v/coroexecutor.svg\n    :target: https://img.shields.io/pypi/v/coroexecutor.svg\n\n.. image:: https://img.shields.io/badge/calver-YYYY.MM.MINOR-22bfda.svg\n    :target: http://calver.org/\n\n.. warning::\n    This is alpha. Please don't rely on this in a production\n    setting yet. I will remove this warning when it is ready.\n\ncoroexecutor\n============\n\n.. contents::\n    :local:\n    :depth: 2\n    :backlinks: top\n\nProvides an ``Executor`` interface for running a group of coroutines\ntogether in asyncio-native applications.\n\nDemo\n----\n\n.. code-block:: python3\n\n    import asyncio\n    from coroexecutor import CoroutineExecutor\n\n    async def f(dt, msg=''):\n        await asyncio.sleep(dt)\n        print(f'completion message: {msg}')\n\n    async def main():\n        async with CoroutineExecutor(max_workers=10) as exe:\n            t1 = await exe.submit(f, 0.01, msg=\"task 1\")\n            t2 = await exe.submit(f, 0.05, msg=\"task 2\")\n\n        assert t1.done()\n        assert t2.done()\n\n    asyncio.run(main())\n\n``max_workers`` controls how many submitted jobs can run concurrently.\nThese internal workers are lightweight of course, they're just\n``asyncio.Task`` instances. Millions of jobs can be pushed through\nthe executor. As is normal for asyncio, concurrency requires\nthat these jobs be IO-bound, and the upper bound for setting\n``max_workers`` is mainly going to depend on your CPU and RAM resources.\n\nDiscussion\n----------\n\nThe ``CoroutineExecutor`` context manager works very much like\nthe ``Executor`` implementations in the ``concurrent.futures``\npackage in the standard library. This is the intention of\nthis package. The basic components of the interface are:\n\n- The executor applies a context over the creation of jobs\n- Jobs are submitted to the executor\n- All jobs must be complete when the context manager for the executor exits.\n\nAfter creating a context manager using ``CoroutineExecutor``, the two\nmain features are the ``submit()`` method, and the ``map()`` method.\n\nIt is impossible to *exactly* match the ``Executor`` interface in the\n``concurrent.futures`` package because some functions in this interface\nneed to be ``async`` functions. But we can get close; certainly close\nenough that a user with experience using the ``ThreadPoolExecutor`` or\n``ProcessPoolExecutor`` should be able to figure things out pretty quickly.\n\nThere is a great deal of complexity that can arise. The \"happy path\" is\nsimple. You just submit jobs to the executor, and they will get\nexecuted accordingly. But there are many corner cases:\n\n- asyncio can concurrently execute thousands, or even tens of thousands\n  of (IO-bound) jobs concurrently. But how to handle more, say, millions\n  of jobs?\n- If one job raises an exception, how to terminate all the other jobs?\n  In the CTRL-C case, this is desired, but what about other cases? Do\n  you always want a single task failure (with an unexpected exception)\n  to cancel the entire batch? And is there a difference between\n  a job raising ``CancelledError`` versus raising some other kind of\n  exception?\n- The ``CoroutineExectutor`` provides a context manager API: if\n  some code within the body of the context manager (that is not a task)\n  raises an exception, should all the submitted tasks also\n  be cancelled?\n\nEach of these will be discussed in more detail in the sections\nthat follow.\n\nThrottling, using ``max_workers``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nEven though it is possible to concurrently execute a much larger number\nof (IO bound) tasks with asyncio compared to threads or processes, there\nwill still be an upper limit the machine can handle based on either:\n\n- memory limitations: many task object instances\n- CPU limitations: too many concurrent task objects and events for the event loop to process.\n\nThus, we also have a ``max_workers`` setting to limit concurrency. It might\nnot be obvious how that limitation is applied, say, in the scenario of\nmillions of jobs.\n\nThe ``CoroutineExecutor.submit()`` is an ``async def`` method. This means\nthat you will have to await it, like so:\n\n.. code-block:: python3\n\n    import asyncio\n    from coroexecutor import CoroutineExecutor\n\n    async def f():\n        print('hi!')\n\n    async def main():\n        async with CoroutineExecutor(max_workers=10) as exe:\n            t1 = await exe.submit(f)\n\n    asyncio.run(main())\n\nIf the total number of jobs already submitted is less than ``max_workers``,\nthe call to ``await exe.submit()`` will return immediately: the job will\nbegin executing, and ``submit()`` returns an ``asyncio.Task`` instance\nfor that job. However, if the total number of concurrently-running jobs\nis greater than the ``max_workers`` setting, this call will wait until\nthe number of currently-running jobs drops below the threshold before\nadding the new job. This means that ``submit()`` applies *back-pressure*.\n\nSay you have a file containing ten million URLs that you want to fetch\nusing aiohttp. That program might look something like this:\n\n.. code-block:: python3\n\n    import asyncio, aiohttp\n    from coroexecutor import CoroutineExecutor\n\n    async def fetch(url: str):\n        try:\n            async with aiohttp.ClientSession() as session:\n                async with session.get(url) as response:\n                    print('body:', response.text())  # or whatever\n        except Exception:\n            print('Problem with url:', url)\n\n    async def main():\n        async with CoroutineExecutor(max_workers=10000) as exe:\n            for line in open('urls.txt'):\n                await exe.submit(fetch, line)\n\n    asyncio.run(main())\n\nAssuming it takes 3 seconds to fetch a single url, this program\nshould take around 1e7 / 1e4 =\u003e 1000 seconds to fetch all of them.\nAbout 17 minutes, since even though there are 10 million urls, we're\ndoing 10k concurrently. (In practice, some of the endpoints will be\nvery slow to respond, if they respond at all. So for real code you're\ngoing to want to either use aiohttp facilities for timeouts on the\n``.get()``, or wrap the work inside an ``asyncio.wait_for()`` wrapper.)\n\nNote that we're handling errors inside our job function ``fetch()``.\nBy default, if jobs raise exceptions these will cancel all pending jobs\ninside the executor, and shut it down. For long batch jobs, that may\nnot be what we want, and this is discussed next.\n\nDealing with errors and cancellation\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nGenerally, there are these kinds of error situations:\n\n- A job is cancelled, and you want the executor to be shut down\n- A job is cancelled, and the executor must NOT be shut down\n- A job raises an exception (not ``CancelledError``), and\n  you want the executor to shut down\n- A job raises an exception (not ``CancelledError``), and the\n  executor must NOT be shut down\n\nConsider the previous example using aiohttp to fetch URLs: inside\nthe ``fetch()`` function, we're handling ``Exception``, which\nincludes ``asyncio.CancelledError``. In general, this is the\ncorrect thing to do because you can control what happens in\neach of the scenarios presented above. But what happens\nif your code is not supplying the jobs and you don't control\nhow error handling inside them is being managed? By default,\nif any job raises an exception (cancellation or otherwise)\nthat will initiate \"shutdown\" of the executor instance, and\nall other pending jobs on that executor will be cancelled.\n\nIf you have a situation where this is not desired, you can\nask ``CoroutineExecutor`` to ignore all task errors for you:\n\n.. code-block:: python3\n\n    import asyncio, aiohttp\n    from coroexecutor import CoroutineExecutor\n\n    async def naive_fetch(url: str):\n        async with aiohttp.ClientSession() as session:\n            async with session.get(url) as response:\n                print('body:', response.text())  # or whatever\n\n    async def main():\n        async with CoroutineExecutor(\n                max_workers=10000,\n                suppress_task_errors=True,\n        ) as exe:\n            for line in open('urls.txt'):\n                await exe.submit(naive_fetch, line)\n\n    asyncio.run(main())\n\nIn this modified example, the job function ``naive_fetch`` has\nno error handling. No matter, the ``suppress_task_errors``\nparameter will allow the executor to absorb them all. Be careful\nwith this. I recommend against doing this wherever possible, and\nhandle exceptions and ``CancelledError`` explicitly within\nyour job functions instead.\n\nExamples\n--------\n\nUsing ``map``\n^^^^^^^^^^^^^\n\nThe ``concurrent.futures.Executor`` interface also defines ``map()`` which\nreturns an iterator. However, it makes for sense for us to use an\n*asynchronous generator* for this purpose. Here's an example from the tests:\n\n.. code-block:: python3\n\n    times = [0.01, 0.02, 0.03]\n\n    async def f(dt):\n        await asyncio.sleep(dt)\n        return dt\n\n    async def main():\n        async with CoroutineExecutor() as exe:\n            results = exe.map(f, times)\n            assert [v async for v in results] == times\n\n    asyncio.run(main())\n\nYou can see how ``async for`` is used to asynchronously loop over the\nresult from calling ``map``.\n\nIf one of the function calls raises an error, all unfinished calls will\nbe cancelled, but you may still have received partial results. Here's\nanother example from the tests:\n\n.. code-block:: python3\n\n    times = [0.01, 0.02, 0.1, 0.2]\n    results = []\n\n    async def f(dt):\n        await asyncio.sleep(dt)\n        if dt == 0.1:\n            raise Exception('oh noes')\n        return dt\n\n    async def main():\n        async with CoroutineExecutor() as exe:\n            async for r in exe.map(f, times):\n                results.append(r)\n\n    with pytest.raises(Exception):\n        asyncio.run(main())\n\n    assert results == [0.01, 0.02]\n\nThe first two values of the batch finish quickly, and I saved these to the\n``results`` list in the outer scope. Then, one of the jobs fails with\nan exception. This results in the other pending jobs being cancelled (i.e.,\nthe \"0.2\" case in this example), the ``CoroutineExecutor`` instance\nre-raising the exception, and in this example, the exception raises all\nthe way out to the invocation of the ``run()`` function itself. However,\nnote that we still have the results from jobs that succeeded.\n\nNesting\n^^^^^^^\n\nYou don't always have to submit tasks to the executor in a single function.\nThe executor instance can be passed around and work can be added to it\nfrom several different places.\n\n.. code-block:: python3\n\n    from random import random\n\n    async def f(dt):\n        await asyncio.sleep(dt)\n\n    async def producer1(executor: CoroutineExecutor):\n        executor.submit(f, random())\n        executor.submit(f, random())\n        executor.submit(f, random())\n\n    async def producer2(executor: CoroutineExecutor):\n        executor.submit(f, random())\n        executor.submit(f, random())\n        executor.submit(f, random())\n\n    async def main():\n        async with CoroutineExecutor(timeout=0.5) as executor:\n            executor.submit(f, random())\n            executor.submit(f, random())\n            executor.submit(f, random())\n\n            executor.submit(producer1, executor)\n            executor.submit(producer2, executor)\n\n    run(main())\n\nYou can not only submit jobs within the executor context manager, but also\npass the instance around and collect jobs from other functions too. And the\ntimeout set when creating the ``CoroutineExecutor`` instance will still\nbe applied.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcjrh%2Fcoroexecutor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcjrh%2Fcoroexecutor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcjrh%2Fcoroexecutor/lists"}