https://github.com/mancrurod/linguaanimae
Exploring emotions and meaning in Bible verses with NLP, transformers, and a custom Streamlit app.
https://github.com/mancrurod/linguaanimae
bert corpus-linguistics digital-humanities emotion-detection huggingface-transformers humanities multi-label-classification natural-language-processing nlp python semantic-analysis streamlit text-classification theme-detection web-scraping
Last synced: 7 months ago
JSON representation
Exploring emotions and meaning in Bible verses with NLP, transformers, and a custom Streamlit app.
- Host: GitHub
- URL: https://github.com/mancrurod/linguaanimae
- Owner: mancrurod
- Created: 2025-04-21T14:10:40.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-05-08T20:09:56.000Z (7 months ago)
- Last Synced: 2025-05-08T20:23:58.225Z (7 months ago)
- Topics: bert, corpus-linguistics, digital-humanities, emotion-detection, huggingface-transformers, humanities, multi-label-classification, natural-language-processing, nlp, python, semantic-analysis, streamlit, text-classification, theme-detection, web-scraping
- Language: Jupyter Notebook
- Homepage:
- Size: 16.6 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
# π LinguaAnimae
**LinguaAnimae** is a multilingual NLP pipeline that classifies and explores sacred texts through the lens of **themes** and **emotions**, culminating in a **Streamlit-based chatbot** that retrieves Bible verses aligned with natural language prompts.
---
## π Project Goals
- Extract and normalize full Bible corpora (English + Spanish)
- Annotate every verse with emotion and theme labels
- Translate annotations for multilingual consistency
- Power a semantic chatbot that suggests aligned verses in real time
- Support additional domains like poetry or music lyrics (planned)
---
## π§ Core Technologies
- **Python 3.10+**
- `transformers`, `torch`, `sentence-transformers`
- `pandas`, `scikit-learn`, `regex`
- `beautifulsoup4`, `requests`
- `streamlit` β multilingual app for emotion/theme-based verse recommendation
---
## π Project Structure
```
LinguaAnimae/
βββ .streamlit/ # Streamlit secrets and config
β βββ secrets.toml
βββ app/ # Streamlit app frontend
β βββ assets/ # Visual assets (background image)
β β βββ old-wrinkled-paper.jpg
β βββ components/ # UI rendering components
β β βββ render_emotion.py
β β βββ render_theme.py
β βββ app.py # Main Streamlit entry point
β βββ texts.py # Multilingual UI dictionary
βββ data/
β βββ raw/ # Original scraped texts
β βββ processed/ # Cleaned and merged verse data
β βββ labeled/ # Emotion and theme-labeled corpora
β βββ /
β βββ emotion/
β βββ emotion_theme/
βββ logs/
β βββ labeling_logs/ # Logs from the labeling pipeline
β βββ cleaning_logs/ # Logs from cleaning steps
βββ notebooks/ # Data exploration and validation
β βββ 01_scraping_exploration.ipynb
β βββ 02_cleaning.ipynb
β βββ 03_label_emotions_and_themes.ipynb
β βββ 04_translate_labels.ipynb
β βββ 05_evaluation.ipynb
βββ src/
β βββ interface/
β β βββ recommender.py
β β βββ labeling_pipeline.py
β βββ modeling/
β β βββ emotion_theme_labeling.py
β β βββ theme_labeling.py
β β βββ labeling_pipeline.py
β βββ preprocessing/
β β βββ cleaning.py
β β βββ merge.py
β β βββ translate_and_apply_labels.py
β βββ scraping/
β β βββ bible_scraper.py
β β βββ parse_osis_kjv.py
β βββ utils/
β βββ save_feedback_to_gsheet.py
β βββ translation_maps.py
βββ tests/ # Future test coverage
βββ .gitignore
βββ requirements.txt
βββ environment.yml
βββ README.md
βββ CHANGELOG.md
```
---
## π Getting Started
You can set up the environment using either `conda` (recommended) or `pip`.
### π§ͺ Option 1: Using Conda (recommended)
```bash
conda env create -f environment.yml
conda activate linguaanimae
```
### π‘ Option 2: Using pip
1. Clone the repository
```bash
git clone https://github.com/your-username/LinguaAnimae.git
cd LinguaAnimae
```
2. Create a virtual environment
```bash
python -m venv venv
source venv/bin/activate # or .\venv\Scripts\activate on Windows
```
3. Install dependencies
```bash
pip install -r requirements.txt
```
4. Run the Bible scraper to download all books
```bash
python src/scraping/bible_scraper.py
```
---
## π§° Usage
### 1. Scrape the Bible (RV60)
Use the scraping script to extract the full Reina-Valera 1960 Bible and save it as structured CSVs:
```bash
python src/scraping/bible_scraper.py
```
### 2. Label Verses with Emotions + Themes
Use the labeling pipeline to classify English Bible verses (bible_kjv) using pretrained HuggingFace models:
```bash
python src/interface/labeling_pipeline.py --bible bible_kjv
```
Optional flags:
- --skip-emotion to skip emotion classification
- --skip-theme to skip theme labeling
- --device -1 to force CPU mode (default is --device 0 for GPU)
- --dry-run path/to/file.csv to test a single file
### 3. Translate Labels into Spanish
Align the English emotion/theme annotations with their Spanish verse equivalents in bible_rv60:
```bash
python src/preprocessing/translate_and_apply_labels.py
```
This creates a labeled Spanish version under:
```bash
data/labeled/bible_rv60/emotion_theme/
```
---
## π¬ Streamlit Interface
The interactive Streamlit app allows users to input a free-form emotional message and receive recommended Bible verses matching its **emotion** and **theme**.
### Features
- π **Automatic translation** of input (EN/ES)
- π§ **Emotion detection** (6 Plutchik categories)
- π·οΈ **Theme classification** (5 canonical themes)
- π **Context-aware verse matching** from KJV or RV60
- π¨ **Stylized cards** with emotion/theme color, emoji, and verse metadata
- β
**User feedback collection** via like/dislike buttons (stored in Google Sheets)
### Example
Input:
> *Tengo miedo y necesito consuelo...*
Returns:
π *GΓ©nesis 40:7* β *"ΒΏPor quΓ© parecen hoy mal vuestros semblantes?"*
---
## π€ Feedback System
Users can now rate the relevance of the emotion/theme detection with a π / π system.
Feedback is saved to a **Google Sheet** along with:
- Original input
- Detected emotion and score
- Detected theme and score
- User name (optional)
- Feedback value (`like` / `dislike`)
This enables future model refinement and analytics.
---
## β¨ UI Enhancements
- Feedback buttons styled with semantic colors and **hover animation**
- Subtitles, emotion/theme blocks, and translation notices are now **centered and consistently styled**
- Merriweather font applied to all key UI blocks for elegance and readability
---
## π Outputs
Labeled files are saved to:
- *_emotion.csv: Emotion column using 6 Plutchik labels
- *_emotion_theme.csv: Adds multilabel theme column from 5 canonical themes
- Logs are saved to: logs/labeling_logs/ with per-file runtime and pipeline summary
---
## π Roadmap
### β
Completed (Weeks 1β3)
- Full Bible scraping (KJV + RV60)
- Corpus cleaning and normalization
- Emotion and theme labeling using pretrained HuggingFace models
- Cross-lingual label transfer and alignment
- Manual evaluation with accuracy and F1 metrics
- Streamlit interface: emotion + theme detection, stylized results
- Multilingual support: automatic input translation and corpus selection
- Recommendation system based on emotion + theme match
### π Week 4: Model + Interface Integration and User Testing
- [ ] Connect model inference to real-time recommendations in the interface
- [ ] Run test sessions with 5β10 users
- [ ] Deploy and collect feedback via form (Google Forms or equivalent)
### π Week 5: Iteration Based on Feedback
- [ ] Refine model behavior and recommendation logic
- [ ] Improve clarity of explanations and label rendering
- [ ] Implement user-suggested improvements
### π Week 6: Final Demo and Documentation
- [ ] Consolidate the MVP into a cohesive narrative
- [ ] Write technical and functional report
- [ ] Prepare public demo with real examples
- [ ] (Optional) Add export features (PDF), voice synthesis, or word cloud summaries
[See CHANGELOG.md](CHANGELOG.md) for complete history.
---
## π License
For academic and research use only. Sources are derived from public domain Bibles (e.g., RV60, KJV) and open ML models from HugginFace. License will be finalized before v1.0.
---
## β¨ Acknowledgements
Developed by [Manuel Cruz RodrΓguez](https://github.com/mancrurod) as part of an NLP and Data Science learning journey.