{"id":30290427,"url":"https://github.com/a904guy/emolanguage","last_synced_at":"2026-05-17T03:06:47.304Z","repository":{"id":310166194,"uuid":"1038941286","full_name":"a904guy/EmoLanguage","owner":"a904guy","description":"🤖 Semantic emoji encoder using LLMs to transform text into meaningful emoji sequences with grammar preservation and reversible decoding. UI-generated communication system inspired by Pantheon.","archived":false,"fork":false,"pushed_at":"2025-08-16T06:25:57.000Z","size":583,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-16T08:25:12.730Z","etag":null,"topics":["artificial-intelligence","communication","emoji","encoding","linguistics","llm","machine-learning","morphology","natural-language-processing","nlp","pantheon","python","semantic-encoding","text-processing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/a904guy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-16T06:06:39.000Z","updated_at":"2025-08-16T06:26:00.000Z","dependencies_parsed_at":"2025-08-16T08:25:17.481Z","dependency_job_id":"ef9dfda4-c79d-40e9-ad21-050cc35011f0","html_url":"https://github.com/a904guy/EmoLanguage","commit_stats":null,"previous_names":["a904guy/emolanguage"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/a904guy/EmoLanguage","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a904guy%2FEmoLanguage","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a904guy%2FEmoLanguage/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a904guy%2FEmoLanguage/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a904guy%2FEmoLanguage/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/a904guy","download_url":"https://codeload.github.com/a904guy/EmoLanguage/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a904guy%2FEmoLanguage/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278411261,"owners_count":25982368,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","communication","emoji","encoding","linguistics","llm","machine-learning","morphology","natural-language-processing","nlp","pantheon","python","semantic-encoding","text-processing"],"created_at":"2025-08-16T23:13:18.014Z","updated_at":"2025-10-05T05:07:03.835Z","avatar_url":"https://github.com/a904guy.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🤖 EmoLanguage: Semantic Emoji Encoding System\n\nA sophisticated language encoding system that transforms English text into semantically meaningful emoji sequences using Large Language Models (LLMs). Inspired by the Netflix series **[Pantheon](https://www.netflix.com/title/81937398)**, this project creates a reversible emoji-based communication protocol with advanced morphological transformation detection and intelligent collision resolution.\n\n**🧠 AI-Generated Project**: Much like the uploaded intelligences in **[Pantheon](https://www.netflix.com/title/81937398)** who developed their own symbolic communication methods, this entire project was conceived, designed, and implemented through **artificial intelligence collaboration**. Every line of code, architectural decision, and algorithmic approach emerged from AI reasoning and iterative development—creating a meta-commentary on machine intelligence developing communication systems for other machine intelligences.\n\n**🎭 From Fiction to Reality**: While we don't yet have the uploaded human consciousness technology from **[Pantheon](https://www.netflix.com/title/81937398)**, this project demonstrates how current AI can replicate the conceptual framework of advanced digital beings creating their own semantic languages. The irony is intentional: an AI system building tools for AI communication, mirroring the show's premise of uploaded minds developing new forms of expression beyond human linguistic limitations.\n\n## 🎯 What Makes This Special\n\n- **🧠 Context-Aware Grammar**: Automatically preserves plurals, tenses, and morphological variations using intelligent modifier emojis\n- **🎯 Semantic Intelligence**: LLM-generated mappings based on meaning and context, not just visual similarity\n- **🔄 Perfect Reversibility**: Encode→decode maintains functional equivalence with grammatical reconstruction\n- **🌟 Multi-Pass Generation**: Advanced consensus-building algorithms with quality scoring and collision detection\n- **⚡ Collision-Free Architecture**: Intelligent multi-pass resolution of duplicate emoji assignments using LLM consensus\n- **🔤 Character Fallback System**: Unmapped words automatically encode as individual characters using consistent emoji mappings\n- **🏗️ Production Ready**: Comprehensive error handling, logging, validation, and quality assurance\n\n## 🚀 Quick Demo\n\n```bash\n# Context-aware encoding with morphological preservation\n$ python3 encode.py \"The cats were running quickly\"\nOriginal: The cats were running quickly\nEncoded : 🏠✨🌍🔠 🐈🔢 👤💡 🏃‍♂️ 🏃⚡🎯\n\n# Decode back to text\n$ python3 decode.py \"🏠✨🌍🔠 🐈🔢 👤💡 🏃‍♂️ 🏃⚡🎯\"\nOriginal: 🏠✨🌍🔠 🐈🔢 👤💡 🏃‍♂️ 🏃⚡🎯\nDecoded : The cats were running quickly\n\n# Standard greeting encoding\n$ python3 encode.py \"Hello world, how are you today?\"\nOriginal: Hello world, how are you today?\nEncoded : 👋🌐🔠 🧭🌏, 🤨🔍 🚪🏠 👤🤗 📆🌞?\n\n# Perfect decode for simple phrases\n$ python3 decode.py \"👋🌐🔠 🧭🌏, 🤨🔍 🚪🏠 👤🤗 📆🌞?\"\nOriginal: 👋🌐🔠 🧭🌏, 🤨🔍 🚪🏠 👤🤗 📆🌞?\nDecoded : Hello world, how are you today?\n\n# Process files with piping\n$ cat your_text_file.txt | python3 encode.py \u003e encoded_output.txt\n```\n\n## 🎨 Use Cases\n\n- **🔒 Secure Communication**: Semantic encoding for privacy-conscious messaging\n- **🔬 Language Research**: Morphological analysis and linguistic pattern recognition  \n- **🎭 Creative Projects**: Digital art, emoji poetry, and experimental writing\n- **📚 Educational Tools**: Teaching semantic meaning, word roots, and language structure\n- **🤖 LLM Research**: Prompt engineering, NLP experiments, and model evaluation\n## 🎉 Fun Applications: Confuse friends, create unique social content, emoji games\n\n### 🐝 Community Projects\n\n**The Bee Movie Script in EmoLanguage**: For the ultimate test of semantic encoding (and pure entertainment), the entire Bee Movie script has been translated into EmoLanguage! Check out this epic exercise in AI-generated emoji communication: [Bee Movie EmoLanguage Script](https://gist.github.com/a904guy/a60071a516ea60d559a36b23f907679a)\n\n*\"According to all known laws of aviation, there is no way a bee should be able to fly...\"* becomes a beautiful sequence of semantically meaningful emojis. It's both a technical demonstration of the system's capabilities and a delightfully absurd homage to internet meme culture.\n\n**📟 EmojiPager - Web Interface**: Experience EmoLanguage through a nostalgic retro pager interface! This web-based UI brings the encode/decode functionality to your browser with an authentic old-school pager design. Complete with LCD-style display, chunky buttons, and that aesthetic of 90s communication devices. Perfect for demonstrating the system to others or just enjoying some retro-tech vibes while encoding your messages.\n\nTry it live at: [https://emojipager.com](https://emojipager.com)\n\nFeatures include real-time encoding/decoding, responsive design for mobile devices, and all the semantic intelligence of the command-line tools wrapped in a beautifully crafted vintage interface. Because sometimes the future of communication needs a little blast from the past!\n\n## 📂 Architecture \u0026 Components\n\n### Core System\n- **`lib/semantic_mapping_generator.py`** - Sophisticated mapping generation with multi-pass consensus, collision resolution, and quality scoring\n- **`build_mapping.py`** - Main interface for generating mappings with various strategies (batch, multipass, dictionary processing)\n- **`encode.py`** - Context-aware encoder with morphological transformation detection\n- **`decode.py`** - Intelligent decoder with grammatical reconstruction capabilities\n\n### Advanced Processing\n- **`lib/word_normalizer.py`** - NLTK + rule-based word normalization with extensive morphological handling\n- **`lib/collision_manager.py`** - Multi-pass collision detection and LLM-based resolution system\n- **`lib/file_manager.py`** - Robust file I/O with validation, backup, and collision tracking\n- **`lib/llm_client.py`** - LLM integration with score-aware responses and error handling\n\n### Utilities \u0026 Support\n- **`normalize_dictionary.py`** - Dictionary cleaning, deduplication, and sorting\n- **`settle_duplications.py`** - Legacy collision resolution (replaced by integrated system)\n\n### Data \u0026 Configuration\n- **`documents/dictionary.txt`** - Normalized word list derived from `/usr/share/dict/words` (17,265+ entries including short words)\n- **`mappings/mapping.json`** - Primary emoji-to-word mapping database\n- **`lib/config.py`** - Centralized configuration and prompt templates\n\n## 🛠️ Setup \u0026 Installation\n\n### Prerequisites\n\n1. **Local LLM Server** (for generating new mappings)\n   ```bash\n   # Install LM Studio, Ollama, or similar\n   # Download a compatible model (e.g., Llama, Mistral, GPT variants)\n   # Start server on http://127.0.0.1:1234 (default)\n   ```\n\n2. **Python Dependencies**\n   ```bash\n   # Recommended: Use Makefile for automated setup\n   make install\n   \n   # Alternative: Manual installation\n   pip install -r requirements.txt\n   # NLTK data downloads automatically when needed\n   ```\n\n### Quick Start\n\n```bash\n# 1. Encode text to emojis (uses existing mappings)\npython3 encode.py \"The quick brown fox jumps over the lazy dog\"\n# Output:\n# Original: The quick brown fox jumps over the lazy dog\n# Encoded : 🏠✨🌍🔠 🏃⚡ 🍂🟫 🦊 🦘3️⃣ 🔝💨 🏠✨🌍 🛌🤤 🐕\n\n# 2. Decode emojis back to text\npython3 decode.py \"🏠✨🌍🔠 🏃⚡ 🍂🟫 🦊 🦘3️⃣ 🔝💨 🏠✨🌍 🛌🤤 🐕\"\n# Output:\n# Original: 🏠✨🌍🔠 🏃⚡ 🍂🟫 🦊 🦘3️⃣ 🔝💨 🏠✨🌍 🛌🤤 🐕\n# Decoded : The quick brown fox jumps over the lazy dog\n```\n\n## 🏗️ Building Mappings: Multiple Methods\n\n### Method 1: Standard Batch Generation\n```bash\n# Process words in batches with collision resolution\npython3 build_mapping.py --mapping-size 50 --collision-size 25\n```\n\n### Method 2: Multi-Pass Generation (Recommended)\n```bash\n# Higher quality with consensus scoring across multiple LLM passes\npython3 build_mapping.py --multipass --mapping-size 50 --passes 3 --collision-passes 2\n```\n\n### Advanced Options\n```bash\n# Custom LLM configuration\npython3 build_mapping.py --base-url http://localhost:8080 --model custom-model\n\n# Dry run to preview results\npython3 build_mapping.py --dry-run --mapping-size 20\n\n# Multi-pass with custom settings\npython3 build_mapping.py --multipass --passes 5 --collision-passes 3 --mapping-size 100\n```\n\n## 🔤 Morphological System\n\n### Advanced Word Normalization\nThe system uses sophisticated normalization combining NLTK lemmatization with extensive rule-based processing:\n\n- **Contractions**: `didn't` → `did` + negation modifier `❌`\n- **Plurals**: `cats` → `cat` + plurality modifier `🔢` \n- **Verb Forms**: `running` → `run` + progressive modifier `🔄`\n- **Comparatives**: `bigger` → `big` + comparative modifier `➕`\n- **Complex Forms**: `children` → `child` + irregular plural modifier `🔢👑`\n\n### Grammar Preservation\nContext-aware encoding preserves grammatical information through modifier emojis:\n\n| Grammar Type | Modifier | Example |\n|-------------|----------|---------|\n| Plural | 🔢 | cats → 🐱🔢 |\n| Past Tense | ⏪ | walked → 🚶‍♂️⏪ |  \n| Progressive | 🔄 | running → 🏃🔄 |\n| Comparative | ➕ | bigger → 📏➕ |\n| Superlative | ⭐ | biggest → 📏⭐ |\n| Negation | ❌ | didn't → 🏃‍♀️✅❌ |\n| Capitalization | 🔠 | Hello → 👋🔠 |\n\n## 🔤 Character Fallback System\n\n### Handling Unmapped Words\nWhen words aren't found in the emoji dictionary, the system automatically falls back to character-by-character encoding:\n\n- **Letters A-Z**: Encoded using consistent squared letter emojis (🅰️🅱️🅾️🅿️...)\n- **Numbers 0-9**: Encoded using distinct number emojis that don't conflict with modifiers\n- **Proper Nouns**: Names and places like \"Andy Hawkins\" → 🅰️🔠🄽🄳🅨 🅷🅰️🅦🄺🄸🄽🅂\n- **Technical Terms**: Specialized vocabulary automatically handled without manual mapping\n- **Mixed Content**: \"test123\" → 🆃🄴🅂🆃①②③\n\n### Character Fallback Features\n- **Capitalization Preserved**: Uppercase letters get individual capitalization modifiers (🔠)\n- **Perfect Reversibility**: Character sequences decode back to original text exactly\n- **Visual Consistency**: All letters use the same emoji style for clean appearance\n- **No Conflicts**: Character emojis never clash with morphological modifiers\n\n## 🎛️ System Performance\n\n### Current Metrics (2025)\n- **Dictionary Size**: 17,265 unique normalized words  \n- **Mapping Database**: 17,794 emoji mappings (0.44MB file, ~0.4MB RAM)\n- **Memory Usage**: ~5.5MB incremental (mappings + dictionary + normalizer)\n- **Encoding Speed**: ~2 seconds startup + processing time (includes NLTK/normalization)\n- **Decoding Speed**: ~0.1 seconds (fast lookup-based decoding)\n- **Character Fallback**: Instant encoding for any unmapped content\n- **Coverage**: 100% text encoding (dictionary words + character fallback)\n- **Collision Rate**: \u003c0.1% after multi-pass resolution\n- **Reversibility**: \u003e99.9% functional accuracy\n\n### Quality Assurance\n- **Multi-Pass Scoring**: LLM confidence scores and consensus validation  \n- **Collision Detection**: Real-time duplicate emoji detection with automatic resolution\n- **Semantic Validation**: Context-aware quality checks and coherence testing\n- **Grammar Preservation**: Morphological transformation tracking and reconstruction\n- **Error Recovery**: Robust fallback systems and comprehensive logging\n\n## 🔧 Configuration \u0026 Customization\n\n### LLM Settings\n```python\n# lib/config.py\nDEFAULT_BASE_URL = \"http://127.0.0.1:1234\"\nDEFAULT_MODEL = \"openai/gpt-oss-20b\"  \nDEFAULT_MAPPING_BATCH_SIZE = 50\nDEFAULT_COLLISION_BATCH_SIZE = 10\n```\n\n### File Paths\n```python\n# lib/config.py\nDEFAULT_DICTIONARY_PATH = \"documents/dictionary.txt\"\nMAPPING_FILE_PATH = \"mappings/mapping.json\"\nLOGS_DIR = \"logs\"\n```\n\n### Morphological Modifiers\nAll modifier emojis are configurable in `encode.py`:\n```python\nMORPHOLOGICAL_MODIFIERS = {\n    'plural_s': '🔢',\n    'verb_ed': '⏪', \n    'verb_ing': '🔄',\n    'comparative': '➕',\n    'superlative': '⭐',\n    'contraction_nt': '❌',\n    # ... extensive customization options\n}\n```\n\n## 🔍 Troubleshooting \u0026 Development\n\n### Common Issues\n```bash\n# Check dictionary and mapping status\npython3 -c \"\nimport json\nwith open('mappings/mapping.json') as f: \n    mappings = json.load(f)\nprint(f'Loaded {len(mappings)} mappings')\n\"\n\n# Test specific words\necho \"test words here\" | python3 encode.py\n\n# Validate collision-free mappings\npython3 -c \"\nimport json\nfrom collections import Counter\nwith open('mappings/mapping.json') as f:\n    mappings = json.load(f)\nemoji_counts = Counter(mappings.values()) \nduplicates = {k:v for k,v in emoji_counts.items() if v \u003e 1}\nprint(f'Found {len(duplicates)} duplicate emojis: {duplicates}')\n\"\n```\n\n### Development Features\n- **Comprehensive Logging**: All operations logged with timestamps and context\n- **Dry Run Mode**: Preview changes without modifying files\n- **Statistics Tracking**: Detailed metrics and performance monitoring\n- **Modular Architecture**: Clean separation of concerns for easy extension\n\n## 🎓 Technical Background\n\n### Inspiration: **[Pantheon](https://www.netflix.com/title/81937398)** Series\nIn the Netflix series **[Pantheon](https://www.netflix.com/title/81937398)**, the \"Emo Language\" represents a symbolic communication method used by uploaded intelligences. This project reimagines that concept with:\n\n- **Semantic Encoding**: Meaning-based rather than visual emoji selection\n- **Linguistic Intelligence**: Advanced morphological and grammatical awareness\n- **Reversible Communication**: Perfect round-trip encoding/decoding capability\n- **Cultural Neutrality**: Universal emoji selection avoiding regional biases\n\n### Architecture Philosophy\n- **LLM-First Design**: Leveraging language models for semantic understanding\n- **Collision-Free Guarantee**: Ensuring one-to-one word-emoji correspondence  \n- **Context Preservation**: Maintaining grammatical and morphological information\n- **Production Quality**: Robust error handling, logging, and validation systems\n\n## 🤝 Contributing\n\nThis project welcomes contributions! Key areas:\n\n- **New Language Support**: Extend beyond English with multilingual mappings\n- **Alternative LLM Backends**: Support for different model architectures\n- **Grammar Extensions**: Enhanced morphological transformation detection\n- **Performance Optimization**: Faster encoding/decoding algorithms\n- **Quality Metrics**: Advanced semantic validation and scoring systems\n\n## 📜 License\n\nMIT License — use it, break it, improve it, just give credit where it's due.\n\n---\n\n*\"The future of communication is not just digital—it's semantic. Every emoji carries meaning, every sequence tells a story.\"*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fa904guy%2Femolanguage","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fa904guy%2Femolanguage","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fa904guy%2Femolanguage/lists"}