https://github.com/markiskorova/machine-learning-nlp-predict-author
Machine Learning & Natural Language Processing: Predict the author of literary text snippets. Built with TensorFlow and Keras, this project trains an LSTM model on classic literature to identify writing style and authorship.
https://github.com/markiskorova/machine-learning-nlp-predict-author
keras machine-learning natural-language-processing python tensorflow text-tokenization text-vectorization
Last synced: 21 days ago
JSON representation
Machine Learning & Natural Language Processing: Predict the author of literary text snippets. Built with TensorFlow and Keras, this project trains an LSTM model on classic literature to identify writing style and authorship.
- Host: GitHub
- URL: https://github.com/markiskorova/machine-learning-nlp-predict-author
- Owner: markiskorova
- License: mit
- Created: 2024-07-10T21:06:10.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-25T03:25:06.000Z (9 months ago)
- Last Synced: 2025-05-25T04:59:57.712Z (9 months ago)
- Topics: keras, machine-learning, natural-language-processing, python, tensorflow, text-tokenization, text-vectorization
- Language: Python
- Homepage:
- Size: 3.49 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🧠 Machine Learning & NLP: Predicting Authors from Classic Literature
This project employs machine learning and natural language processing (NLP) to analyze classic literary works and predict the author of a given phrase. By examining textual patterns and stylistic nuances, the model learns to attribute authorship with notable accuracy.
## 📚 Overview
- **Objective**: Develop a model that can predict the author of a text snippet from classic literature.
- **Techniques Used**:
- Text vectorization and tokenization
- Sequential modeling with LSTM (Long Short-Term Memory) networks
- **Tools & Libraries**:
- Python
- TensorFlow & Keras
- Pandas & NumPy
## 📁 Repository Structure
- `Text_Author.csv`: Dataset containing text excerpts and corresponding author labels.
- `text-analysis-detect-author-seq-lstm.py`: Python script for data preprocessing, model training, and evaluation.
- `README.md`: Project documentation.
- `LICENSE`: MIT License.
## 🚀 Getting Started
### Prerequisites
Ensure you have the following installed:
- Python 3.x
- pip (Python package installer)
### Installation
1. **Clone the repository**:
```bash
git clone https://github.com/markiskorova/Machine-Learning-NLP-Predict-Author.git
cd Machine-Learning-NLP-Predict-Author
```
2. **Create and activate a virtual environment**:
```bash
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
```
3. **Install required packages**:
```bash
pip install tensorflow pandas numpy
```
### Running the Model
Execute the script to train and evaluate the model:
```bash
python text-analysis-detect-author-seq-lstm.py
```
*The script will process the data, train the LSTM model, and output evaluation metrics.*
## 📊 Dataset Details
- **Source**: Curated collection of classic literary texts.
- **Format**: CSV file with two columns:
- `text`: Excerpt from a literary work.
- `author`: Name of the author.
## 🔍 Model Architecture
- **Embedding Layer**: Converts words into vector representations.
- **LSTM Layer**: Captures sequential dependencies in the text.
- **Dense Output Layer**: Outputs probabilities for each author class.
## 📈 Evaluation Metrics
- **Accuracy**: Measures the proportion of correct predictions.
- **Loss**: Evaluates the model's prediction error.
## 🛠️ Future Enhancements
- Incorporate more diverse literary works to improve model generalization.
- Experiment with advanced architectures like Bidirectional LSTMs or Transformers.
- Implement a user interface for interactive author prediction.
## 📄 License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
## 🤝 Contributing
Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.
## 📬 Contact
For questions or suggestions, feel free to open an issue or contact the repository maintainer.