https://github.com/codito/nlp-expt
Some experiments and datasets for natural language processing and classification
https://github.com/codito/nlp-expt
datasets experiment nlp simplewiki
Last synced: about 1 year ago
JSON representation
Some experiments and datasets for natural language processing and classification
- Host: GitHub
- URL: https://github.com/codito/nlp-expt
- Owner: codito
- Created: 2019-09-02T00:47:57.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-02-26T19:43:35.000Z (about 3 years ago)
- Last Synced: 2025-01-07T06:41:49.641Z (about 1 year ago)
- Topics: datasets, experiment, nlp, simplewiki
- Language: Jupyter Notebook
- Size: 33.6 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
A bunch of random experiments in NLP.
## Usage
```bash
# create a virtualenv, I usually install pandas etc. in system
> python -m venv .venv --system-site-packages
> source .venv/bin/activate
# in case you don't have base packages, install them in venv
> pip install sklearn pandas seaborn jupterlab
> pip install cython
> pip install -r requirements.txt
# create an ipython kernel to use the virtualenv
> ipython kernel install --user --name=nlp-expt
# modify the kernel.json file to include python executable from the venv
> jupyter lab
```
## Data
### Simplewiki
A cleaned and category labeled dataset of articles/pages in
.
See [data/simplewiki][simplewiki_data] and [README][simplewiki_readme].
[simplewiki_data]: https://github.com/codito/nlp-expt/blob/master/data/simplewiki/
[simplewiki_readme]: https://github.com/codito/nlp-expt/blob/master/data/simplewiki/README.md
## License
Datasets are licensed similar to the upstream licenses. Check individual
sections above.