Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sethaxen/python_utilities
Useful tools for common Python tasks.
https://github.com/sethaxen/python_utilities
io logging parallelization plotting python scripting utilities
Last synced: 4 months ago
JSON representation
Useful tools for common Python tasks.
- Host: GitHub
- URL: https://github.com/sethaxen/python_utilities
- Owner: sethaxen
- License: mit
- Created: 2015-08-17T02:25:37.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2022-06-02T20:31:22.000Z (over 2 years ago)
- Last Synced: 2024-10-19T11:40:48.683Z (4 months ago)
- Topics: io, logging, parallelization, plotting, python, scripting, utilities
- Language: Python
- Size: 53.7 KB
- Stars: 4
- Watchers: 4
- Forks: 1
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Python Utilities
Useful tools for common Python tasks.## Introduction
This package arose from a desire to standardize useful methods and classes I found myself reusing in many projects. These fall into several sub-packages:
- [`scripting`](python_utilities/scripting.py): method with useful defaults and settings for log format, verbosity, and destination
- [`io_tools`](python_utilities/io_tools.py): methods for intelligently guessing file compression from extension and safely buffering numerical data before writing to an HDF5 file
- [`parallel`](python_utilities/parallel.py): determine which options for parallelization are available in the current environment, and run a method on a dataset using a master-slave paradigm. The `Parallelizer` class arose from a common use case of writing/testing/running scripts on a local machine using multiprocessing or multithreading for parallelization but then needing to modify the scripts to use MPI on a large cluster. The `Parallelizer` allows the same script to be run in both contexts without any need for changing code.
- [`plotting`](python_utilities/plotting): useful color schemes for maximum contrast and methods for conversion between color spaces## Installation
`python_utilities` may be installed in three ways, in order of preference:
1. Using conda: `conda install -c conda-forge sdaxen_python_utilities`
2. Using pip: `pip install sdaxen_python_utilities`
3. From GitHub:
1. Download the source from this repository
1. Download this repository to your machine
- Clone this repository to your machine with `git clone https://github.com/sdaxen/python_utilities.git`
- OR download an archive by navigating to [https://github.com/sdaxen/python_utilities](https://github.com/sdaxen/python_utilities) and clicking "Clone or download > Download ZIP". Extract the archive.
2. Add the path to the repository to your `$PYTHONPATH`. On Unix, this can be done with `export PYTHONPATH=[PATH/TO/REPO]:$PYTHONPATH` where `[PATH/TO/REPO]` is replaced with the path on your machine.## Usage
An example usage of the most common methods/classes is given below.
In this example, we read in a file that contains a range of numbers.
We then compute the product between each of those numbers and a single
number. We do this in parallel, so that as each slave node is ready,
the master sends it a number from the file. All results are logged
to `log.txt`, and the results are saved to a file `products.txt`.
```python
from python_utilities.scripting import setup_logging
from python_utilities.io_tools import smart_open
from python_utilities.parallel import Parallelizer, make_data_iterator# Methods written for parallel have non-keyword (num1) and keyword (num2)
# arguments. All keyword arguments must be constant across all parallel
# runs, while non-keyword arguments may vary. Here, we will vary num1, but
# num2 will be constant.
def product(num1, num2=100):
return num1 * num2# log everything, including logging.debug messages, to log.txt
setup_logging("log.txt", verbose=True)data_list = []
# smart_open recognizes the .gz extension
with smart_open("numbers.txt.gz", "r") as f:
for line in f:
data_list.append(float(line.strip()))# items in iterator must be lists or tuples (non-keyword args)
data_iterator = make_data_iterator(data_list)
# use multiprocessing if available
parallelizer = Parallelizer(parallel_mode="processes")
run_kwargs = {"out_file": "products.txt", # save one result per line
"out_str": "%d\n", # formatting of output line
"out_format": lambda x: x, # modify result before saving
"logging_str": "Multiplied by %d", # format log line
"logging_format": lambda x: (x), # modify result before logging
"kwargs": {"num2": 100}} # pass constant keyword argument# run the method on every item in the iterator. If out_file specified,
# boolean success is returned. Otherwise, result is returned. Use
# parallelizer.run to run method on all data before returning and return
# in order.
for success, data in parallelizer.run_gen(product, data_iterator,
**run_kwargs):
print(success)
```