https://github.com/elysian01/data-purifier-dataset
Data repository for Data Purifier examples
https://github.com/elysian01/data-purifier-dataset
data-purifer data-science datase ml-datasets nlp-datasets
Last synced: 5 months ago
JSON representation
Data repository for Data Purifier examples
- Host: GitHub
- URL: https://github.com/elysian01/data-purifier-dataset
- Owner: Elysian01
- Created: 2021-05-12T13:20:25.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2021-08-22T06:49:26.000Z (almost 5 years ago)
- Last Synced: 2025-05-20T00:39:24.449Z (about 1 year ago)
- Topics: data-purifer, data-science, datase, ml-datasets, nlp-datasets
- Homepage:
- Size: 5.92 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data-Purifier-Dataset
Data repository for [Data-Purifier](https://pypi.org/project/data-purifier/) examples
This repository exists only to provide a convenient target for the datapurifier.load_dataset function to download sample datasets from. Its existence makes it easy to document datapurifier without confusing things by spending time loading and munging data. The datasets may change or be removed at any time if they are no longer useful for the datapurifier documentation. Some of the datasets have also been modifed from their canonical sources.
Data is sourced from kaggle
## Get Started
Install the packages
```bash
pip install data-purifier
```
```bash
python -m spacy download en_core_web_sm
```
Load the module
```python
import datapurifier as dp
from datapurifier import Mleda, Nleda, Nlpurifier
print(dp.__version__)
```
Get the list of the example dataset
```python
print(dp.get_dataset_names()) # to get all dataset names
print(dp.get_text_dataset_names()) # to get all text dataset names
```
Load an example dataset, pass one of the dataset names from the example list as an argument.
```python
df = dp.load_dataset("womens_clothing_e-commerce_reviews")
```
## Example:
[Colab Notebook](https://colab.research.google.com/drive/1J932G1uzqxUHCMwk2gtbuMQohYZsze8U?usp=sharing)
Official Documentation: https://cutt.ly/CbFT5Dw
Python Package: https://pypi.org/project/data-purifier/