Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cthoyt/pystow
👜 Easily pick a place to store data for your Python code.
https://github.com/cthoyt/pystow
file-management file-manager filesystem pathlib python reproducibility reproducible-research reproducible-science
Last synced: 12 days ago
JSON representation
👜 Easily pick a place to store data for your Python code.
- Host: GitHub
- URL: https://github.com/cthoyt/pystow
- Owner: cthoyt
- License: mit
- Created: 2020-12-03T12:58:43.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2024-10-03T10:01:20.000Z (about 1 month ago)
- Last Synced: 2024-10-14T02:23:38.071Z (about 1 month ago)
- Topics: file-management, file-manager, filesystem, pathlib, python, reproducibility, reproducible-research, reproducible-science
- Language: Python
- Homepage: https://pystow.readthedocs.io
- Size: 283 KB
- Stars: 35
- Watchers: 4
- Forks: 6
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
PyStow👜 Easily pick a place to store data for your python code.
## 🚀 Getting Started
Get a directory for your application.
```python
import pystow# Get a directory (as a pathlib.Path) for ~/.data/pykeen
pykeen_directory = pystow.join('pykeen')# Get a subdirectory (as a pathlib.Path) for ~/.data/pykeen/experiments
pykeen_experiments_directory = pystow.join('pykeen', 'experiments')# You can go as deep as you want
pykeen_deep_directory = pystow.join('pykeen', 'experiments', 'a', 'b', 'c')
```If you reuse the same directory structure a lot, you can save them in a module:
```python
import pystowpykeen_module = pystow.module("pykeen")
# Access the module's directory with .base
assert pystow.join("pykeen") == pystow.module("pykeen").base# Get a subdirectory (as a pathlib.Path) for ~/.data/pykeen/experiments
pykeen_experiments_directory = pykeen_module.join('experiments')# You can go as deep as you want past the original "pykeen" module
pykeen_deep_directory = pykeen_module.join('experiments', 'a', 'b', 'c')
```Get a file path for your application by adding the `name` keyword argument. This is made explicit so PyStow knows which
parent directories to automatically create. This works with `pystow` or any module you create with `pystow.module`.```python
import pystow# Get a directory (as a pathlib.Path) for ~/.data/indra/database.tsv
indra_database_path = pystow.join('indra', 'database', name='database.tsv')
```Ensure a file from the internet is available in your application's directory:
```python
import pystowurl = 'https://raw.githubusercontent.com/pykeen/pykeen/master/src/pykeen/datasets/nations/test.txt'
path = pystow.ensure('pykeen', 'datasets', 'nations', url=url)
```Ensure a tabular data file from the internet and load it for usage (requires `pip install pandas`):
```python
import pystow
import pandas as pdurl = 'https://raw.githubusercontent.com/pykeen/pykeen/master/src/pykeen/datasets/nations/test.txt'
df: pd.DataFrame = pystow.ensure_csv('pykeen', 'datasets', 'nations', url=url)
```Ensure a comma-separated tabular data file from the internet and load it for usage (requires `pip install pandas`):
```python
import pystow
import pandas as pdurl = 'https://raw.githubusercontent.com/cthoyt/pystow/main/tests/resources/test_1.csv'
df: pd.DataFrame = pystow.ensure_csv('pykeen', 'datasets', 'nations', url=url, read_csv_kwargs=dict(sep=","))
```Ensure a RDF file from the internet and load it for usage (requires `pip install rdflib`)
```python
import pystow
import rdfliburl = 'https://ftp.expasy.org/databases/rhea/rdf/rhea.rdf.gz'
rdf_graph: rdflib.Graph = pystow.ensure_rdf('rhea', url=url)
```Also see `pystow.ensure_excel()`, `pystow.ensure_rdf()`, `pystow.ensure_zip_df()`, and `pystow.ensure_tar_df()`.
If your data comes with a lot of different files in an archive,
you can ensure the archive is downloaded and get specific files from it:```python
import numpy as np
import pystowurl = "https://cloud.enterprise.informatik.uni-leipzig.de/index.php/s/LHPbMCre7SLqajB/download/MultiKE_D_Y_15K_V1.zip"
# the path inside the archive to the file you want
inner_path = "MultiKE/D_Y_15K_V1/721_5fold/1/20210219183115/ent_embeds.npy"
with pystow.ensure_open_zip("kiez", url=url, inner_path=inner_path) as file:
emb = np.load(file)
```Also see `pystow.module.ensure_open_lzma()`, `pystow.module.ensure_open_tarfile()` and `pystow.module.ensure_open_gz()`.
## ⚙️️ Configuration
By default, data is stored in the `$HOME/.data` directory. By default, the `` app will create the
`$HOME/.data/` folder.If you want to use an alternate folder name to `.data` inside the home directory, you can set the `PYSTOW_NAME`
environment variable. For example, if you set `PYSTOW_NAME=mydata`, then the following code for the `pykeen` app will
create the `$HOME/mydata/pykeen/` directory:```python
import os
import pystow# Only for demonstration purposes. You should set environment
# variables either with your .bashrc or in the command line REPL.
os.environ['PYSTOW_NAME'] = 'mydata'# Get a directory (as a pathlib.Path) for ~/mydata/pykeen
pykeen_directory = pystow.join('pykeen')
```If you want to specify a completely custom directory that isn't relative to your home directory, you can set
the `PYSTOW_HOME` environment variable. For example, if you set `PYSTOW_HOME=/usr/local/`, then the following code for
the `pykeen` app will create the `/usr/local/pykeen/` directory:```python
import os
import pystow# Only for demonstration purposes. You should set environment
# variables either with your .bashrc or in the command line REPL.
os.environ['PYSTOW_HOME'] = '/usr/local/'# Get a directory (as a pathlib.Path) for /usr/local/pykeen
pykeen_directory = pystow.join('pykeen')
```Note: if you set `PYSTOW_HOME`, then `PYSTOW_NAME` is disregarded.
### X Desktop Group (XDG) Compatibility
While PyStow's main goal is to make application data less opaque and less
hidden, some users might want to use the
[XDG specifications](http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html)
for storing their app data.If you set the environment variable `PYSTOW_USE_APPDIRS` to `true` or `True`, then the
[`appdirs`](https://pypi.org/project/appdirs/) package will be used to choose
the base directory based on the `user data dir` option. This can still be
overridden by `PYSTOW_HOME`.## 🚀 Installation
The most recent release can be installed from
[PyPI](https://pypi.org/project/pystow/) with:```bash
$ pip install pystow
```Note, as of v0.3.0, Python 3.6 isn't officially supported (its
end-of-life was in December 2021). For the time being, `pystow` might still
work on py36, but this is only coincidental.The most recent code and data can be installed directly from GitHub with:
```bash
$ pip install git+https://github.com/cthoyt/pystow.git
```To install in development mode, use the following:
```bash
$ git clone git+https://github.com/cthoyt/pystow.git
$ cd pystow
$ pip install -e .
```## ⚖️ License
The code in this package is licensed under the MIT License.