Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sneaksanddata/hadoop-fs-wrapper
Python Wrappers for Hadoop FileSystem
https://github.com/sneaksanddata/hadoop-fs-wrapper
distributed-computing hadoop spark
Last synced: 2 months ago
JSON representation
Python Wrappers for Hadoop FileSystem
- Host: GitHub
- URL: https://github.com/sneaksanddata/hadoop-fs-wrapper
- Owner: SneaksAndData
- License: mit
- Created: 2022-01-20T13:08:44.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-10-24T11:27:20.000Z (3 months ago)
- Last Synced: 2024-10-25T05:32:18.767Z (3 months ago)
- Topics: distributed-computing, hadoop, spark
- Language: Python
- Homepage:
- Size: 134 KB
- Stars: 4
- Watchers: 4
- Forks: 0
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# Hadoop FileSystem Java Class Wrapper
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)Typed Python wrappers for [Hadoop FileSystem](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html) class family.
## Installation
You can install this package from `pypi` on any Hadoop or Spark runtime:
```commandline
pip install hadoop-fs-wrapper
```Select a version that matches hadoop version you are using:
| Hadoop Version / Spark version | Compatible hadoop-fs-wrapper version |
|--------------------------------|:------------------------------------:|
| 3.2.x / 3.2.x | 0.4.x |
| 3.3.x / 3.3.x | 0.4.x, 0.5.x |
| 3.3.x / 3.4.x | 0.6.x |
| 3.5.x / 3.5.x | 0.7.x |## Usage
Common use case is accessing Hadoop FileSystem from Spark session object:```python
from hadoop_fs_wrapper.wrappers.file_system import FileSystemfile_system = FileSystem.from_spark_session(spark=spark_session)
```Then, for example, one can check if there are any files under specified path:
```python
from hadoop_fs_wrapper.wrappers.file_system import FileSystemdef is_valid_source_path(file_system: FileSystem, path: str) -> bool:
"""
Checks whether a regexp path refers to a valid set of paths
:param file_system: pyHadooopWrapper FileSystem
:param path: path e.g. (s3a|abfss|file|...)://[email protected]/path/part*.csv
:return: true if path resolves to existing paths, otherwise false
"""
return len(file_system.glob_status(path)) > 0
```## Contribution
Currently basic filesystem operations (listing, deleting, search, iterative listing etc.) are supported. If an operation you require is not yet wrapped,
please open an issue or create a PR.All changes are tested against Spark 3.4 running in local mode.