{"id":15402698,"url":"https://github.com/sethaxen/python_utilities","last_synced_at":"2026-03-11T03:02:47.768Z","repository":{"id":36550723,"uuid":"40856577","full_name":"sethaxen/python_utilities","owner":"sethaxen","description":"Useful tools for common Python tasks.","archived":false,"fork":false,"pushed_at":"2022-06-02T20:31:22.000Z","size":55,"stargazers_count":5,"open_issues_count":4,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2026-02-26T00:55:42.294Z","etag":null,"topics":["io","logging","parallelization","plotting","python","scripting","utilities"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sethaxen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-08-17T02:25:37.000Z","updated_at":"2024-11-03T17:09:06.000Z","dependencies_parsed_at":"2022-09-08T18:31:51.919Z","dependency_job_id":null,"html_url":"https://github.com/sethaxen/python_utilities","commit_stats":null,"previous_names":["sdaxen/python_utilities"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/sethaxen/python_utilities","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sethaxen%2Fpython_utilities","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sethaxen%2Fpython_utilities/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sethaxen%2Fpython_utilities/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sethaxen%2Fpython_utilities/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sethaxen","download_url":"https://codeload.github.com/sethaxen/python_utilities/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sethaxen%2Fpython_utilities/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30368573,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T21:41:54.280Z","status":"online","status_checked_at":"2026-03-11T02:00:07.027Z","response_time":84,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["io","logging","parallelization","plotting","python","scripting","utilities"],"created_at":"2024-10-01T16:04:34.683Z","updated_at":"2026-03-11T03:02:47.748Z","avatar_url":"https://github.com/sethaxen.png","language":"Python","readme":"# Python Utilities\nUseful tools for common Python tasks.\n\n## Introduction\nThis package arose from a desire to standardize useful methods and classes I found myself reusing in many projects. These fall into several sub-packages:\n- [`scripting`](python_utilities/scripting.py): method with useful defaults and settings for log format, verbosity, and destination\n- [`io_tools`](python_utilities/io_tools.py): methods for intelligently guessing file compression from extension and safely buffering numerical data before writing to an HDF5 file\n- [`parallel`](python_utilities/parallel.py): determine which options for parallelization are available in the current environment, and run a method on a dataset using a master-slave paradigm. The `Parallelizer` class arose from a common use case of writing/testing/running scripts on a local machine using multiprocessing or multithreading for parallelization but then needing to modify the scripts to use MPI on a large cluster. The `Parallelizer` allows the same script to be run in both contexts without any need for changing code.\n- [`plotting`](python_utilities/plotting): useful color schemes for maximum contrast and methods for conversion between color spaces\n\n## Installation\n`python_utilities` may be installed in three ways, in order of preference:\n1. Using conda: `conda install -c conda-forge sdaxen_python_utilities`\n2. Using pip: `pip install sdaxen_python_utilities`\n3. From GitHub:\n   1. Download the source from this repository\n   1. Download this repository to your machine\n      - Clone this repository to your machine with `git clone https://github.com/sdaxen/python_utilities.git`\n      - OR download an archive by navigating to [https://github.com/sdaxen/python_utilities](https://github.com/sdaxen/python_utilities) and clicking \"Clone or download \u003e Download ZIP\". Extract the archive.\n   2. Add the path to the repository to your `$PYTHONPATH`. On Unix, this can be done with `export PYTHONPATH=[PATH/TO/REPO]:$PYTHONPATH` where `[PATH/TO/REPO]` is replaced with the path on your machine.\n\n## Usage\nAn example usage of the most common methods/classes is given below.\nIn this example, we read in a file that contains a range of numbers.\nWe then compute the product between each of those numbers and a single\nnumber. We do this in parallel, so that as each slave node is ready,\nthe master sends it a number from the file. All results are logged\nto `log.txt`, and the results are saved to a file `products.txt`.\n```python\nfrom python_utilities.scripting import setup_logging\nfrom python_utilities.io_tools import smart_open\nfrom python_utilities.parallel import Parallelizer, make_data_iterator\n\n\n# Methods written for parallel have non-keyword (num1) and keyword (num2)\n# arguments. All keyword arguments must be constant across all parallel\n# runs, while non-keyword arguments may vary. Here, we will vary num1, but\n# num2 will be constant.\ndef product(num1, num2=100):\n    return num1 * num2\n\n\n# log everything, including logging.debug messages, to log.txt\nsetup_logging(\"log.txt\", verbose=True)\n\ndata_list = []\n# smart_open recognizes the .gz extension\nwith smart_open(\"numbers.txt.gz\", \"r\") as f:\n    for line in f:\n        data_list.append(float(line.strip()))\n\n# items in iterator must be lists or tuples (non-keyword args)\ndata_iterator = make_data_iterator(data_list)\n# use multiprocessing if available\nparallelizer = Parallelizer(parallel_mode=\"processes\")\nrun_kwargs = {\"out_file\": \"products.txt\",  # save one result per line\n              \"out_str\": \"%d\\n\",  # formatting of output line\n              \"out_format\": lambda x: x,  # modify result before saving\n              \"logging_str\": \"Multiplied by %d\",  # format log line\n              \"logging_format\": lambda x: (x),  # modify result before logging\n              \"kwargs\": {\"num2\": 100}}  # pass constant keyword argument\n\n# run the method on every item in the iterator. If out_file specified,\n# boolean success is returned. Otherwise, result is returned. Use\n# parallelizer.run to run method on all data before returning and return\n# in order.\nfor success, data in parallelizer.run_gen(product, data_iterator,\n                                          **run_kwargs):\n    print(success)\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsethaxen%2Fpython_utilities","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsethaxen%2Fpython_utilities","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsethaxen%2Fpython_utilities/lists"}