An open API service indexing awesome lists of open source software.

https://github.com/mancrurod/linguaanimae

Exploring emotions and meaning in Bible verses with NLP, transformers, and a custom Streamlit app.
https://github.com/mancrurod/linguaanimae

bert corpus-linguistics digital-humanities emotion-detection huggingface-transformers humanities multi-label-classification natural-language-processing nlp python semantic-analysis streamlit text-classification theme-detection web-scraping

Last synced: 7 months ago
JSON representation

Exploring emotions and meaning in Bible verses with NLP, transformers, and a custom Streamlit app.

Awesome Lists containing this project

README

          

# πŸ“– LinguaAnimae

**LinguaAnimae** is a multilingual NLP pipeline that classifies and explores sacred texts through the lens of **themes** and **emotions**, culminating in a **Streamlit-based chatbot** that retrieves Bible verses aligned with natural language prompts.

---

## πŸ” Project Goals

- Extract and normalize full Bible corpora (English + Spanish)
- Annotate every verse with emotion and theme labels
- Translate annotations for multilingual consistency
- Power a semantic chatbot that suggests aligned verses in real time
- Support additional domains like poetry or music lyrics (planned)

---

## 🧠 Core Technologies

- **Python 3.10+**
- `transformers`, `torch`, `sentence-transformers`
- `pandas`, `scikit-learn`, `regex`
- `beautifulsoup4`, `requests`
- `streamlit` – multilingual app for emotion/theme-based verse recommendation

---

## πŸ“ Project Structure

```
LinguaAnimae/
β”œβ”€β”€ .streamlit/ # Streamlit secrets and config
β”‚ └── secrets.toml
β”œβ”€β”€ app/ # Streamlit app frontend
β”‚ β”œβ”€β”€ assets/ # Visual assets (background image)
β”‚ β”‚ └── old-wrinkled-paper.jpg
β”‚ β”œβ”€β”€ components/ # UI rendering components
β”‚ β”‚ β”œβ”€β”€ render_emotion.py
β”‚ β”‚ └── render_theme.py
β”‚ β”œβ”€β”€ app.py # Main Streamlit entry point
β”‚ └── texts.py # Multilingual UI dictionary
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ raw/ # Original scraped texts
β”‚ β”œβ”€β”€ processed/ # Cleaned and merged verse data
β”‚ └── labeled/ # Emotion and theme-labeled corpora
β”‚ └── /
β”‚ β”œβ”€β”€ emotion/
β”‚ └── emotion_theme/
β”œβ”€β”€ logs/
β”‚ β”œβ”€β”€ labeling_logs/ # Logs from the labeling pipeline
β”‚ └── cleaning_logs/ # Logs from cleaning steps
β”œβ”€β”€ notebooks/ # Data exploration and validation
β”‚ β”œβ”€β”€ 01_scraping_exploration.ipynb
β”‚ β”œβ”€β”€ 02_cleaning.ipynb
β”‚ β”œβ”€β”€ 03_label_emotions_and_themes.ipynb
β”‚ β”œβ”€β”€ 04_translate_labels.ipynb
β”‚ └── 05_evaluation.ipynb
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ interface/
β”‚ β”‚ β”œβ”€β”€ recommender.py
β”‚ β”‚ └── labeling_pipeline.py
β”‚ β”œβ”€β”€ modeling/
β”‚ β”‚ β”œβ”€β”€ emotion_theme_labeling.py
β”‚ β”‚ β”œβ”€β”€ theme_labeling.py
β”‚ β”‚ └── labeling_pipeline.py
β”‚ β”œβ”€β”€ preprocessing/
β”‚ β”‚ β”œβ”€β”€ cleaning.py
β”‚ β”‚ β”œβ”€β”€ merge.py
β”‚ β”‚ └── translate_and_apply_labels.py
β”‚ β”œβ”€β”€ scraping/
β”‚ β”‚ β”œβ”€β”€ bible_scraper.py
β”‚ β”‚ └── parse_osis_kjv.py
β”‚ └── utils/
β”‚ β”œβ”€β”€ save_feedback_to_gsheet.py
β”‚ └── translation_maps.py
β”œβ”€β”€ tests/ # Future test coverage
β”œβ”€β”€ .gitignore
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ environment.yml
β”œβ”€β”€ README.md
β”œβ”€β”€ CHANGELOG.md
```

---

## πŸš€ Getting Started

You can set up the environment using either `conda` (recommended) or `pip`.

### πŸ§ͺ Option 1: Using Conda (recommended)

```bash
conda env create -f environment.yml
conda activate linguaanimae
```

### πŸ’‘ Option 2: Using pip

1. Clone the repository
```bash
git clone https://github.com/your-username/LinguaAnimae.git
cd LinguaAnimae
```

2. Create a virtual environment
```bash
python -m venv venv
source venv/bin/activate # or .\venv\Scripts\activate on Windows
```

3. Install dependencies
```bash
pip install -r requirements.txt
```

4. Run the Bible scraper to download all books
```bash
python src/scraping/bible_scraper.py
```

---

## 🧰 Usage

### 1. Scrape the Bible (RV60)

Use the scraping script to extract the full Reina-Valera 1960 Bible and save it as structured CSVs:

```bash
python src/scraping/bible_scraper.py
```

### 2. Label Verses with Emotions + Themes

Use the labeling pipeline to classify English Bible verses (bible_kjv) using pretrained HuggingFace models:

```bash
python src/interface/labeling_pipeline.py --bible bible_kjv
```

Optional flags:

- --skip-emotion to skip emotion classification
- --skip-theme to skip theme labeling
- --device -1 to force CPU mode (default is --device 0 for GPU)
- --dry-run path/to/file.csv to test a single file

### 3. Translate Labels into Spanish

Align the English emotion/theme annotations with their Spanish verse equivalents in bible_rv60:

```bash
python src/preprocessing/translate_and_apply_labels.py
```

This creates a labeled Spanish version under:

```bash
data/labeled/bible_rv60/emotion_theme/
```

---

## πŸ’¬ Streamlit Interface

The interactive Streamlit app allows users to input a free-form emotional message and receive recommended Bible verses matching its **emotion** and **theme**.

### Features

- πŸ”„ **Automatic translation** of input (EN/ES)
- 🧠 **Emotion detection** (6 Plutchik categories)
- 🏷️ **Theme classification** (5 canonical themes)
- πŸ“– **Context-aware verse matching** from KJV or RV60
- 🎨 **Stylized cards** with emotion/theme color, emoji, and verse metadata
- βœ… **User feedback collection** via like/dislike buttons (stored in Google Sheets)

### Example

Input:

> *Tengo miedo y necesito consuelo...*

Returns:

πŸ“– *GΓ©nesis 40:7* β€” *"ΒΏPor quΓ© parecen hoy mal vuestros semblantes?"*

---

## πŸ“€ Feedback System

Users can now rate the relevance of the emotion/theme detection with a πŸ‘ / πŸ‘Ž system.
Feedback is saved to a **Google Sheet** along with:

- Original input
- Detected emotion and score
- Detected theme and score
- User name (optional)
- Feedback value (`like` / `dislike`)

This enables future model refinement and analytics.

---

## ✨ UI Enhancements

- Feedback buttons styled with semantic colors and **hover animation**
- Subtitles, emotion/theme blocks, and translation notices are now **centered and consistently styled**
- Merriweather font applied to all key UI blocks for elegance and readability

---

## πŸ“Š Outputs

Labeled files are saved to:

- *_emotion.csv: Emotion column using 6 Plutchik labels
- *_emotion_theme.csv: Adds multilabel theme column from 5 canonical themes
- Logs are saved to: logs/labeling_logs/ with per-file runtime and pipeline summary

---

## πŸ“Œ Roadmap

### βœ… Completed (Weeks 1–3)
- Full Bible scraping (KJV + RV60)
- Corpus cleaning and normalization
- Emotion and theme labeling using pretrained HuggingFace models
- Cross-lingual label transfer and alignment
- Manual evaluation with accuracy and F1 metrics
- Streamlit interface: emotion + theme detection, stylized results
- Multilingual support: automatic input translation and corpus selection
- Recommendation system based on emotion + theme match

### πŸ”„ Week 4: Model + Interface Integration and User Testing
- [ ] Connect model inference to real-time recommendations in the interface
- [ ] Run test sessions with 5–10 users
- [ ] Deploy and collect feedback via form (Google Forms or equivalent)

### πŸ”„ Week 5: Iteration Based on Feedback
- [ ] Refine model behavior and recommendation logic
- [ ] Improve clarity of explanations and label rendering
- [ ] Implement user-suggested improvements

### 🏁 Week 6: Final Demo and Documentation
- [ ] Consolidate the MVP into a cohesive narrative
- [ ] Write technical and functional report
- [ ] Prepare public demo with real examples
- [ ] (Optional) Add export features (PDF), voice synthesis, or word cloud summaries

[See CHANGELOG.md](CHANGELOG.md) for complete history.

---

## πŸ“– License

For academic and research use only. Sources are derived from public domain Bibles (e.g., RV60, KJV) and open ML models from HugginFace. License will be finalized before v1.0.

---

## ✨ Acknowledgements

Developed by [Manuel Cruz RodrΓ­guez](https://github.com/mancrurod) as part of an NLP and Data Science learning journey.