Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/trbndev/wikibert
๐จ๐ปโ๐ซ๐ง Experiment: Training a Neural Network with Wikipedia Articles
https://github.com/trbndev/wikibert
Last synced: 4 days ago
JSON representation
๐จ๐ปโ๐ซ๐ง Experiment: Training a Neural Network with Wikipedia Articles
- Host: GitHub
- URL: https://github.com/trbndev/wikibert
- Owner: trbndev
- Created: 2024-12-15T15:21:42.000Z (10 days ago)
- Default Branch: main
- Last Pushed: 2024-12-15T15:30:26.000Z (10 days ago)
- Last Synced: 2024-12-15T16:30:20.891Z (10 days ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 4.66 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐จ๐ปโ๐ซ Wikibert
This is an experimental weekend project for testing purposes.
## ๐ Overview
**Wikibert** is a project that demonstrates text generation using Wikipedia data. It consists of two main components:
- ๐ A Python script (`get_data.py`) for fetching and saving Wikipedia pages.
- ๐ A Jupyter notebook (`wikibert.ipynb`) for training a text generation model using TensorFlow.## ๐ ๏ธ get_data.py
The `get_data.py` script performs the following tasks:
- ๐ Fetches a Wikipedia page and its linked pages recursively up to a specified depth.
- ๐พ Saves the content of each page into individual text files in a `data` folder.
- ๐งน Sanitizes filenames to ensure compatibility with the file system.## ๐ wikibert.ipynb
The `wikibert.ipynb` notebook includes the following steps:
- ๐ Loads and preprocesses text data from the saved Wikipedia pages.
- ๐๏ธ Builds and trains a GRU-based RNN model for text generation using TensorFlow.
- ๐พ Saves model checkpoints during training.
- ๐ Generates text using the trained model.## ๐ Usage
1. Run `get_data.py` to fetch and save Wikipedia pages.
2. Open `wikibert.ipynb` in Jupyter Notebook or Google Colab.
3. Follow the cells in the notebook to train the text generation model and generate text.## ๐ Requirements
- Python 3.x
- `wikipedia-api` library
- TensorFlow
- Jupyter Notebook (for `wikibert.ipynb`)## ๐ ๏ธ Installation
Install the required libraries using pip:
```bash
pip install wikipedia-api tensorflow
```## ๐ License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.