{"id":31531521,"url":"https://github.com/hyriver/async-retriever","last_synced_at":"2025-10-26T05:32:44.131Z","repository":{"id":38020810,"uuid":"363422978","full_name":"hyriver/async-retriever","owner":"hyriver","description":"A part of HyRiver software stack for asynchronous requests with persistent caching","archived":false,"fork":false,"pushed_at":"2025-06-25T20:11:12.000Z","size":658,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-04T23:53:38.751Z","etag":null,"topics":["async","asyncio","caching","python","requests"],"latest_commit_sha":null,"homepage":"https://docs.hyriver.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hyriver.png","metadata":{"files":{"readme":"README.rst","changelog":"HISTORY.rst","contributing":"CONTRIBUTING.rst","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.rst","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":["cheginit"]}},"created_at":"2021-05-01T13:57:21.000Z","updated_at":"2025-06-25T20:11:18.000Z","dependencies_parsed_at":"2023-12-04T20:00:54.322Z","dependency_job_id":"8d25b536-a14e-4953-bd56-ae90ede109bf","html_url":"https://github.com/hyriver/async-retriever","commit_stats":{"total_commits":634,"total_committers":5,"mean_commits":126.8,"dds":0.06940063091482651,"last_synced_commit":"10d40d5e1f93fedba3020848684cbe8772395049"},"previous_names":["cheginit/async_retriever"],"tags_count":29,"template":false,"template_full_name":null,"purl":"pkg:github/hyriver/async-retriever","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyriver%2Fasync-retriever","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyriver%2Fasync-retriever/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyriver%2Fasync-retriever/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyriver%2Fasync-retriever/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hyriver","download_url":"https://codeload.github.com/hyriver/async-retriever/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyriver%2Fasync-retriever/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278254466,"owners_count":25956604,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-04T02:00:05.491Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["async","asyncio","caching","python","requests"],"created_at":"2025-10-04T02:13:06.158Z","updated_at":"2025-10-04T02:13:07.208Z","avatar_url":"https://github.com/hyriver.png","language":"Python","funding_links":["https://github.com/sponsors/cheginit"],"categories":[],"sub_categories":[],"readme":".. image:: https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/async_retriever_logo.png\n    :target: https://github.com/hyriver/HyRiver\n\n|\n\n.. image:: https://joss.theoj.org/papers/b0df2f6192f0a18b9e622a3edff52e77/status.svg\n    :target: https://joss.theoj.org/papers/b0df2f6192f0a18b9e622a3edff52e77\n    :alt: JOSS\n\n|\n\n.. |pygeohydro| image:: https://github.com/hyriver/pygeohydro/actions/workflows/test.yml/badge.svg\n    :target: https://github.com/hyriver/pygeohydro/actions/workflows/test.yml\n    :alt: Github Actions\n\n.. |pygeoogc| image:: https://github.com/hyriver/pygeoogc/actions/workflows/test.yml/badge.svg\n    :target: https://github.com/hyriver/pygeoogc/actions/workflows/test.yml\n    :alt: Github Actions\n\n.. |pygeoutils| image:: https://github.com/hyriver/pygeoutils/actions/workflows/test.yml/badge.svg\n    :target: https://github.com/hyriver/pygeoutils/actions/workflows/test.yml\n    :alt: Github Actions\n\n.. |pynhd| image:: https://github.com/hyriver/pynhd/actions/workflows/test.yml/badge.svg\n    :target: https://github.com/hyriver/pynhd/actions/workflows/test.yml\n    :alt: Github Actions\n\n.. |py3dep| image:: https://github.com/hyriver/py3dep/actions/workflows/test.yml/badge.svg\n    :target: https://github.com/hyriver/py3dep/actions/workflows/test.yml\n    :alt: Github Actions\n\n.. |pydaymet| image:: https://github.com/hyriver/pydaymet/actions/workflows/test.yml/badge.svg\n    :target: https://github.com/hyriver/pydaymet/actions/workflows/test.yml\n    :alt: Github Actions\n\n.. |pygridmet| image:: https://github.com/hyriver/pygridmet/actions/workflows/test.yml/badge.svg\n    :target: https://github.com/hyriver/pygridmet/actions/workflows/test.yml\n    :alt: Github Actions\n\n.. |pynldas2| image:: https://github.com/hyriver/pynldas2/actions/workflows/test.yml/badge.svg\n    :target: https://github.com/hyriver/pynldas2/actions/workflows/test.yml\n    :alt: Github Actions\n\n.. |async| image:: https://github.com/hyriver/async-retriever/actions/workflows/test.yml/badge.svg\n    :target: https://github.com/hyriver/async-retriever/actions/workflows/test.yml\n    :alt: Github Actions\n\n.. |signatures| image:: https://github.com/hyriver/hydrosignatures/actions/workflows/test.yml/badge.svg\n    :target: https://github.com/hyriver/hydrosignatures/actions/workflows/test.yml\n    :alt: Github Actions\n\n================ ====================================================================\nPackage          Description\n================ ====================================================================\nPyNHD_           Navigate and subset NHDPlus (MR and HR) using web services\nPy3DEP_          Access topographic data through National Map's 3DEP web service\nPyGeoHydro_      Access NWIS, NID, WQP, eHydro, NLCD, CAMELS, and SSEBop databases\nPyDaymet_        Access daily, monthly, and annual climate data via Daymet\nPyGridMET_       Access daily climate data via GridMET\nPyNLDAS2_        Access hourly NLDAS-2 data via web services\nHydroSignatures_ A collection of tools for computing hydrological signatures\nAsyncRetriever_  High-level API for asynchronous requests with persistent caching\nPyGeoOGC_        Send queries to any ArcGIS RESTful-, WMS-, and WFS-based services\nPyGeoUtils_      Utilities for manipulating geospatial, (Geo)JSON, and (Geo)TIFF data\n================ ====================================================================\n\n.. _PyGeoHydro: https://github.com/hyriver/pygeohydro\n.. _AsyncRetriever: https://github.com/hyriver/async-retriever\n.. _PyGeoOGC: https://github.com/hyriver/pygeoogc\n.. _PyGeoUtils: https://github.com/hyriver/pygeoutils\n.. _PyNHD: https://github.com/hyriver/pynhd\n.. _Py3DEP: https://github.com/hyriver/py3dep\n.. _PyDaymet: https://github.com/hyriver/pydaymet\n.. _PyGridMET: https://github.com/hyriver/pygridmet\n.. _PyNLDAS2: https://github.com/hyriver/pynldas2\n.. _HydroSignatures: https://github.com/hyriver/hydrosignatures\n\nAsyncRetriever: Asynchronous requests with persistent caching\n-------------------------------------------------------------\n\n.. image:: https://img.shields.io/pypi/v/async-retriever.svg\n    :target: https://pypi.python.org/pypi/async-retriever\n    :alt: PyPi\n\n.. image:: https://img.shields.io/conda/vn/conda-forge/async-retriever.svg\n    :target: https://anaconda.org/conda-forge/async-retriever\n    :alt: Conda Version\n\n.. image:: https://codecov.io/gh/hyriver/async-retriever/branch/main/graph/badge.svg\n    :target: https://codecov.io/gh/hyriver/async-retriever\n    :alt: CodeCov\n\n.. image:: https://img.shields.io/pypi/pyversions/async-retriever.svg\n    :target: https://pypi.python.org/pypi/async-retriever\n    :alt: Python Versions\n\n.. image:: https://static.pepy.tech/badge/async-retriever\n    :target: https://pepy.tech/project/async-retriever\n    :alt: Downloads\n\n|\n\n.. image:: https://img.shields.io/badge/security-bandit-green.svg\n    :target: https://github.com/PyCQA/bandit\n    :alt: Security Status\n\n.. image:: https://www.codefactor.io/repository/github/hyriver/async-retriever/badge\n   :target: https://www.codefactor.io/repository/github/hyriver/async-retriever\n   :alt: CodeFactor\n\n.. image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json\n    :target: https://github.com/astral-sh/ruff\n    :alt: Ruff\n\n.. image:: https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit\u0026logoColor=white\n    :target: https://github.com/pre-commit/pre-commit\n    :alt: pre-commit\n\n|\n\nFeatures\n--------\n\nAsyncRetriever is a part of `HyRiver \u003chttps://github.com/hyriver/HyRiver\u003e`__ software stack that\nis designed to aid in hydroclimate analysis through web services. This package serves as HyRiver's\nengine for asynchronously sending requests and retrieving responses as ``text``, ``binary``, or\n``json`` objects. It uses persistent caching using\n`aiohttp-client-cache \u003chttps://aiohttp-client-cache.readthedocs.io\u003e`__ to speed up the retrieval\neven further. Moreover, thanks to `nest_asyncio \u003chttps://github.com/erdewit/nest_asyncio\u003e`__\nyou can use this package in Jupyter notebooks. Although this package is part of the HyRiver\nsoftware stack, it can be used for any web calls. There are three functions that you can\nuse to make web calls:\n\n* ``retrieve_text``: Get responses as ``text`` objects.\n* ``retrieve_binary``: Get responses as ``binary`` objects.\n* ``retrieve_json``: Get responses as ``json`` objects.\n* ``stream_write``: Stream responses and write them to disk in chunks.\n\nYou can also use the general-purpose ``retrieve`` function to get responses as any\nof the three types. All responses are returned as a list that has the same order as the\ninput list of requests. Moreover, there is another function called ``delete_url_cache``\nfor removing all requests from a cache file that contains a given URL.\n\nYou can control the request/response caching behavior and verbosity of the package\nby setting the following environment variables:\n\n* ``HYRIVER_CACHE_NAME``: Path to the caching SQLite database. It defaults to\n  ``./cache/aiohttp_cache.sqlite``\n* ``HYRIVER_CACHE_EXPIRE``: Expiration time for cached requests in seconds. It defaults to\n  one week.\n* ``HYRIVER_CACHE_DISABLE``: Disable reading/writing from/to the cache. The default is false.\n* ``HYRIVER_SSL_CERT``: Path to a SSL certificate file.\n\nFor example, in your code before making any requests you can do:\n\n.. code-block:: python\n\n    import os\n\n    os.environ[\"HYRIVER_CACHE_NAME\"] = \"path/to/file.sqlite\"\n    os.environ[\"HYRIVER_CACHE_EXPIRE\"] = \"3600\"\n    os.environ[\"HYRIVER_CACHE_DISABLE\"] = \"true\"\n    os.environ[\"HYRIVER_SSL_CERT\"] = \"path/to/cert.pem\"\n\nYou can find some example notebooks `here \u003chttps://github.com/hyriver/HyRiver-examples\u003e`__.\n\nYou can also try using AsyncRetriever without installing\nit on your system by clicking on the binder badge. A Jupyter Lab\ninstance with the HyRiver stack pre-installed will be launched in your web browser, and you\ncan start coding!\n\nMoreover, requests for additional functionalities can be submitted via\n`issue tracker \u003chttps://github.com/hyriver/async-retriever/issues\u003e`__.\n\nCitation\n--------\nIf you use any of HyRiver packages in your research, we appreciate citations:\n\n.. code-block:: bibtex\n\n    @article{Chegini_2021,\n        author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},\n        doi = {10.21105/joss.03175},\n        journal = {Journal of Open Source Software},\n        month = {10},\n        number = {66},\n        pages = {1--3},\n        title = {{HyRiver: Hydroclimate Data Retriever}},\n        volume = {6},\n        year = {2021}\n    }\n\nInstallation\n------------\n\nYou can install ``async-retriever`` using ``pip``:\n\n.. code-block:: console\n\n    $ pip install async-retriever\n\nAlternatively, ``async-retriever`` can be installed from the ``conda-forge`` repository\nusing `Conda \u003chttps://docs.conda.io/en/latest/\u003e`__:\n\n.. code-block:: console\n\n    $ conda install -c conda-forge async-retriever\n\nQuick start\n-----------\n\nAsyncRetriever by default creates and/or uses ``./cache/aiohttp_cache.sqlite`` as the cache\nthat you can customize by the ``cache_name`` argument. Also, by default, the cache doesn't\nhave any expiration date and the ``delete_url_cache`` function should be used if you know\nthat a database on a server was updated, and you want to retrieve the latest data.\nAlternatively, you can use the ``expire_after`` to set the expiration date for the cache.\n\nAs an example for retrieving a ``binary`` response, let's use the DAAC server to get\n`NDVI \u003chttps://daac.ornl.gov/VEGETATION/guides/US_MODIS_NDVI.html\u003e`_.\nThe responses can be directly passed to ``xarray.open_mfdataset`` to get the data as\na ``xarray`` Dataset. We can also disable SSL certificate verification by setting\n``ssl=False``.\n\n.. code-block:: python\n\n    import io\n    import xarray as xr\n    import async_retriever as ar\n    from datetime import datetime\n\n    west, south, east, north = (-69.77, 45.07, -69.31, 45.45)\n    base_url = \"https://thredds.daac.ornl.gov/thredds/ncss/ornldaac/1299\"\n    dates_itr = ((datetime(y, 1, 1), datetime(y, 1, 31)) for y in range(2000, 2005))\n    urls, kwds = zip(\n        *[\n            (\n                f\"{base_url}/MCD13.A{s.year}.unaccum.nc4\",\n                {\n                    \"params\": {\n                        \"var\": \"NDVI\",\n                        \"north\": f\"{north}\",\n                        \"west\": f\"{west}\",\n                        \"east\": f\"{east}\",\n                        \"south\": f\"{south}\",\n                        \"disableProjSubset\": \"on\",\n                        \"horizStride\": \"1\",\n                        \"time_start\": s.strftime(\"%Y-%m-%dT%H:%M:%SZ\"),\n                        \"time_end\": e.strftime(\"%Y-%m-%dT%H:%M:%SZ\"),\n                        \"timeStride\": \"1\",\n                        \"addLatLon\": \"true\",\n                        \"accept\": \"netcdf\",\n                    }\n                },\n            )\n            for s, e in dates_itr\n        ]\n    )\n    resp = ar.retrieve_binary(urls, kwds, max_workers=8, ssl=False)\n    data = xr.open_mfdataset(io.BytesIO(r) for r in resp)\n\nWe can remove these requests and their responses from the cache like so:\n\n.. code-block:: python\n\n    ar.delete_url_cache(base_url)\n\n.. image:: https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/ndvi.png\n    :target: https://github.com/hyriver/HyRiver-examples/blob/main/notebooks/async.ipynb\n\nFor a ``json`` response example, let's get water level recordings of an NOAA's water level station,\n8534720 (Atlantic City, NJ), during 2012, using CO-OPS API. Note that this CO-OPS product has a\n31-day limit for a single request, so we have to break the request down accordingly.\n\n.. code-block:: python\n\n    import pandas as pd\n\n    station_id = \"8534720\"\n    start = pd.to_datetime(\"2012-01-01\")\n    end = pd.to_datetime(\"2012-12-31\")\n\n    s = start\n    dates = []\n    for e in pd.date_range(start, end, freq=\"m\"):\n        dates.append((s.date(), e.date()))\n        s = e + pd.offsets.MonthBegin()\n\n    url = \"https://api.tidesandcurrents.noaa.gov/api/prod/datagetter\"\n\n    urls, kwds = zip(\n        *[\n            (\n                url,\n                {\n                    \"params\": {\n                        \"product\": \"water_level\",\n                        \"application\": \"web_services\",\n                        \"begin_date\": f'{s.strftime(\"%Y%m%d\")}',\n                        \"end_date\": f'{e.strftime(\"%Y%m%d\")}',\n                        \"datum\": \"MSL\",\n                        \"station\": f\"{station_id}\",\n                        \"time_zone\": \"GMT\",\n                        \"units\": \"metric\",\n                        \"format\": \"json\",\n                    }\n                },\n            )\n            for s, e in dates\n        ]\n    )\n\n    resp = ar.retrieve_json(urls, kwds)\n    wl_list = []\n    for rjson in resp:\n        wl = pd.DataFrame.from_dict(rjson[\"data\"])\n        wl[\"t\"] = pd.to_datetime(wl.t)\n        wl = wl.set_index(wl.t).drop(columns=\"t\")\n        wl[\"v\"] = pd.to_numeric(wl.v, errors=\"coerce\")\n        wl_list.append(wl)\n    water_level = pd.concat(wl_list).sort_index()\n    water_level.attrs = rjson[\"metadata\"]\n\n.. image:: https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/water_level.png\n    :target: https://github.com/hyriver/HyRiver-examples/blob/main/notebooks/async.ipynb\n\nNow, let's see an example without any payload or headers. Here's how we can retrieve\nharmonic constituents of several NOAA stations from CO-OPS:\n\n.. code-block:: python\n\n    stations = [\n        \"8410140\",\n        \"8411060\",\n        \"8413320\",\n        \"8418150\",\n        \"8419317\",\n        \"8419870\",\n        \"8443970\",\n        \"8447386\",\n    ]\n\n    base_url = \"https://api.tidesandcurrents.noaa.gov/mdapi/prod/webapi/stations\"\n    urls = [f\"{base_url}/{i}/harcon.json?units=metric\" for i in stations]\n    resp = ar.retrieve_json(urls)\n\n    amp_list = []\n    phs_list = []\n    for rjson in resp:\n        sid = rjson[\"self\"].rsplit(\"/\", 2)[1]\n        const = pd.DataFrame.from_dict(rjson[\"HarmonicConstituents\"]).set_index(\"name\")\n        amp = const.rename(columns={\"amplitude\": sid})[sid]\n        phase = const.rename(columns={\"phase_GMT\": sid})[sid]\n        amp_list.append(amp)\n        phs_list.append(phase)\n\n    amp = pd.concat(amp_list, axis=1)\n    phs = pd.concat(phs_list, axis=1)\n\n.. image:: https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/tides.png\n    :target: https://github.com/hyriver/HyRiver-examples/blob/main/notebooks/async.ipynb\n\nContributing\n------------\n\nContributions are appreciated and very welcomed. Please read\n`CONTRIBUTING.rst \u003chttps://github.com/hyriver/async-retriever/blob/main/CONTRIBUTING.rst\u003e`__\nfor instructions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyriver%2Fasync-retriever","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhyriver%2Fasync-retriever","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyriver%2Fasync-retriever/lists"}