Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mathause/filefinder
find and parse file and folder names
https://github.com/mathause/filefinder
Last synced: 2 months ago
JSON representation
find and parse file and folder names
- Host: GitHub
- URL: https://github.com/mathause/filefinder
- Owner: mathause
- License: mit
- Created: 2022-07-22T14:44:54.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-06T13:43:58.000Z (8 months ago)
- Last Synced: 2024-06-11T17:00:53.487Z (7 months ago)
- Language: Python
- Size: 90.8 KB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# FileFinder
_Find and parse file and folder names._
Define regular folder and file patterns with the intuitive python syntax:
```python
from filefinder import FileFinderpath_pattern = "/root/{category}"
file_pattern = "{category}_file_{number}"ff = FileFinder(path_pattern, file_pattern)
```## Create file and path names
Everything enclosed in curly brackets is a placeholder. Thus, you can create file and
path names like so:```python
ff.create_path_name(category="a")
>>> /root/a/ff.create_file_name(category="a", number=1)
>>> a_file_1ff.create_full_name(category="a", number=1)
>>> /root/a/a_file_1
```## Find files on disk
However, the strength of filefinder is parsing file names on disk. Assuming you have the
following folder structure:```
/root/a1/a1_file_1
/root/a1/a1_file_2
/root/b2/b2_file_1
/root/b2/b2_file_2
/root/c3/c3_file_1
/root/c3/c3_file_2
```You can then look for paths:
```python
ff.find_paths()
>>>
>>> filename category
>>> 0 /root/a1/* a1
>>> 1 /root/b2/* b2
>>> 2 /root/c3/* c3
```
The placeholders (here `{category}`) is parsed and returned. You can also look for
files:```python
ff.find_files()
>>>
>>> filename category number
>>> 0 /root/a1/a1_file_1 a1 1
>>> 1 /root/a1/a1_file_2 a1 2
>>> 2 /root/b2/b2_file_1 b2 1
>>> 3 /root/b2/b2_file_2 b2 2
>>> 4 /root/c3/c3_file_1 c3 1
>>> 5 /root/c3/c3_file_2 c3 2
```It's also possible to filter for certain files:
```python
ff.find_files(category=["a1", "b2"], number=1)
>>>
>>> filename category number
>>> 0 /root/a1/a1_file_1 a1 1
>>> 2 /root/b2/b2_file_1 b2 1
```Often we need to be sure to find _exactly one_ file or path. This can be achieved using
```python
ff.find_single_file(category="a1", number=1)
>>>
>>> filename category number
>>> 0 /root/a1/a1_file_1 a1 1
```If none or more than one file is found a `ValueError` is raised.
## Format syntax
You can pass format specifiers to allow more complex formats, see
[format-specification](https://github.com/r1chardj0n3s/parse#format-specification) for details.
Using format specifiers, you can parse names that are not possible otherwise.### Example
```python
from filefinder import FileFinderpaths = ["a1_abc", "ab200_abcdef",]
ff = FileFinder("", "{letters:l}{num:d}_{beg:2}{end}", test_paths=paths)
fc = ff.find_files()
fc
```which results in the following:
```python
filename letters num beg end
0 a1_abc a 1 ab c
1 ab200_abcdef ab 200 ab cdef
```Note that `fc.df.num` has now a data type of `int` while without the `:d` it would be an
string (or more precisely an object as pandas uses this dtype to represent strings).## Filters
Filters can postprocess the found paths in ``. Currently only a `priority_filter`
is implemented.### Example
Assuming you have data for several models with different time resolution, e.g., 1 hourly
(`"1h"`), 6 hourly (`"6h"`), and daily (`"1d"`), but not all models have all time resolutions:```
/root/a/a_1h
/root/a/a_6h
/root/a/a_1d/root/b/b_1h
/root/b/b_6h/root/c/c_1h
```You now want to get the `"1d"` data if available, and then the `"6h"` etc.. This can be achieved with the `priority filter`. Let's first parse the file names:
```python
ff = FileFinder("/root/{model}", "{model}_{time_res}")files = ff.find_files()
files
```which yields:
```
filename model time_res
0 /root/a/a_1d a 1d
1 /root/a/a_1h a 1h
2 /root/a/a_6h a 6h
3 /root/b/b_1h b 1h
4 /root/b/b_6h b 6h
5 /root/c/c_1h c 1h
```We can now apply a `priority_filter` as follows:
```python
from filefinder.filters import priority_filterfiles = priority_filter(files, "time_res", ["1d", "6h", "1h"])
files
```Resulting in the desired selection:
```
filename model time_res
0 /root/a/a_1d a 1d
1 /root/b/b_6h b 6h
2 /root/c/c_1h c 1h
```