Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/florents-tselai/pandas-sets
Set-oriented Operations in Pandas
https://github.com/florents-tselai/pandas-sets
data-science pandas set-operations sets
Last synced: 16 days ago
JSON representation
Set-oriented Operations in Pandas
- Host: GitHub
- URL: https://github.com/florents-tselai/pandas-sets
- Owner: Florents-Tselai
- License: bsd-3-clause
- Created: 2018-12-26T13:53:04.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-05-27T17:23:50.000Z (over 4 years ago)
- Last Synced: 2024-10-11T15:19:50.246Z (about 1 month ago)
- Topics: data-science, pandas, set-operations, sets
- Language: Python
- Homepage: https://tselai.com/pandas-sets.html
- Size: 9.77 KB
- Stars: 24
- Watchers: 3
- Forks: 3
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.md
- License: LICENSE
Awesome Lists containing this project
README
# Pandas Sets: Set-oriented Operations in Pandas
If you store standard Python `set`s or `frozenset`s in your `Series` or `DataFrame` objects, you'll find this useful.
The `pandas_sets` package adds a `.set` accessor to any pandas `Series` object;
it's like `.dt` for `datetime` or `.str` for `string`, but for [`set`](https://docs.python.org/3.7/library/stdtypes.html#set).It exposes all public methods available in the standard [`set`](https://docs.python.org/3.7/library/stdtypes.html#set).
## Installation
```bash
pip install pandas-sets
```
Just import the `pandas_sets` package and it will register a `.set` accessor to any `Series` object.```python
import pandas_sets
```## Examples
```python
import pandas_sets
import pandas as pd
df = pd.DataFrame({'post': [1, 2, 3, 4],
'tags': [{'python', 'pandas'}, {'philosophy', 'strategy'}, {'scikit-learn'}, {'pandas'}]
})pandas_posts = df[df.tags.set.contains('pandas')]
pandas_posts.tags.set.add('data')
pandas_posts.tags.set.update({'data', 'analysis'})
pandas_posts.tags.set.len()
```## Notes
* The implementation is primitive for now. It's based heavily on the pandas' core [`StringMethods`](https://github.com/pandas-dev/pandas/blob/52a2bb490556a86c5f756465320c18977dbe1c36/pandas/core/strings.py#L1783) implementation.
* The public API has been tested for most expected scenarios.
* The API will need to be extended to handle `NA` values appropriately.