{"id":27058473,"url":"https://github.com/ethanke/translation-complexity-score","last_synced_at":"2025-04-09T19:20:56.373Z","repository":{"id":286162789,"uuid":"960573412","full_name":"ethanke/translation-complexity-score","owner":"ethanke","description":"Python tool for scoring translation complexity using multiple metrics and approaches.","archived":false,"fork":false,"pushed_at":"2025-04-04T17:19:25.000Z","size":35,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-04T18:23:36.560Z","etag":null,"topics":["complexity-measure","complexity-score","python","translation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ethanke.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-04-04T17:06:51.000Z","updated_at":"2025-04-04T17:19:28.000Z","dependencies_parsed_at":"2025-04-04T18:23:39.171Z","dependency_job_id":"61427167-9700-4dbd-aef3-832706d30f02","html_url":"https://github.com/ethanke/translation-complexity-score","commit_stats":null,"previous_names":["ethanke/translation-complexity-score"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ethanke%2Ftranslation-complexity-score","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ethanke%2Ftranslation-complexity-score/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ethanke%2Ftranslation-complexity-score/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ethanke%2Ftranslation-complexity-score/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ethanke","download_url":"https://codeload.github.com/ethanke/translation-complexity-score/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248095083,"owners_count":21046785,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["complexity-measure","complexity-score","python","translation"],"created_at":"2025-04-05T12:15:21.179Z","updated_at":"2025-04-09T19:20:56.345Z","avatar_url":"https://github.com/ethanke.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Translation Complexity Scorer\n\nA tool for scoring translation complexity using multiple metrics and machine learning approaches. This tool helps translators, language service providers, and researchers assess text complexity to optimize translation workflows and resource allocation.\n\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n## 🌟 Features\n\n### Multiple Scoring Methods\n- **Traditional Readability Metrics**\n  - Flesch-Kincaid Grade Level\n  - Coleman-Liau Index\n  - Gunning Fog Index\n  - SMOG Index\n  - Flesch Reading Ease\n\n- **Linguistic Feature Analysis**\n  - Average sentence length\n  - Lexical diversity (Type-Token Ratio)\n  - Syntactic complexity (Dependency tree depth)\n  - Vocabulary rarity analysis\n\n- **Translation-Specific Metrics**\n  - Semantic complexity using transformer embeddings\n  - Idiomatic expression density\n  - Domain-specific terminology detection\n\n### Technical Highlights\n- 🚀 GPU acceleration support (CUDA compatible)\n- 📊 Normalized scores (0-1 range)\n- 🎯 Configurable weights for different metrics\n- 📈 Batch processing support\n- 🔍 Detailed score breakdown\n- 🧪 Comprehensive test suite\n\n## 📋 Requirements\n\n- Python 3.8+\n- CUDA-compatible GPU (optional, for faster processing)\n- 4GB+ RAM\n\n## 🔧 Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/ethanke/translation-complexity-score.git\ncd translation-complexity-score\n\n# Create a virtual environment (recommended)\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n\n# Install dependencies\npip install -r requirements.txt\n\n# Download required models\npython -m spacy download en_core_web_sm\npython -m nltk.downloader punkt averaged_perceptron_tagger\n```\n\n## 🚀 Quick Start\n\n```python\nfrom translation_complexity import TranslationComplexityScorer\n\n# Initialize the scorer\nscorer = TranslationComplexityScorer()\n\n# Score a single text\ntext = \"Your text to analyze here.\"\nscores = scorer.score_text(text)\n\nprint(f\"Overall Complexity: {scores['overall_complexity']:.2f}\")\nprint(f\"Complexity Level: {scorer.config.get_complexity_level(scores['overall_complexity'])}\")\n\n# Batch processing\ntexts = [\"Text 1\", \"Text 2\", \"Text 3\"]\nbatch_scores = scorer.batch_score(texts)\n```\n\n## 📊 Example Output\n\n```json\n{\n  \"flesch_kincaid\": 0.45,\n  \"coleman_liau\": 0.38,\n  \"gunning_fog\": 0.52,\n  \"smog\": 0.41,\n  \"flesch_reading_ease\": 0.65,\n  \"avg_sentence_length\": 0.48,\n  \"lexical_diversity\": 0.72,\n  \"syntactic_complexity\": 0.55,\n  \"vocabulary_rarity\": 0.33,\n  \"semantic_complexity\": 0.61,\n  \"idiomatic_density\": 0.25,\n  \"domain_specificity\": 0.44,\n  \"overall_complexity\": 0.48\n}\n```\n\n## 🏗️ Project Structure\n\n```\ntranslation_complexity/\n├── metrics/\n│   ├── readability.py    # Traditional readability metrics\n│   ├── linguistic.py     # Linguistic feature analysis\n│   └── translation.py    # Translation-specific metrics\n├── utils/\n│   └── helpers.py        # Utility functions\n├── config.py             # Configuration settings\n└── scorer.py            # Main scoring interface\n```\n\n## ⚙️ Configuration\n\nYou can customize the scoring weights and thresholds:\n\n```python\nfrom translation_complexity import Config, TranslationComplexityScorer\n\nconfig = Config(\n    READABILITY_WEIGHT=0.3,\n    LINGUISTIC_WEIGHT=0.4,\n    TRANSLATION_WEIGHT=0.3,\n    LOW_COMPLEXITY_THRESHOLD=0.25,\n    MEDIUM_COMPLEXITY_THRESHOLD=0.45,\n    HIGH_COMPLEXITY_THRESHOLD=0.65\n)\n\nscorer = TranslationComplexityScorer(config)\n```\n\n## 🤝 Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n## 📝 Citation\n\nIf you use this tool in your research, please cite:\n\n```bibtex\n@software{translation_complexity_scorer,\n  title = {Translation Complexity Scorer},\n  author = {Ethan Kerdelhue},\n  year = {2025},\n  url = {https://github.com/ethanke/translation-complexity-score}\n}\n```\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\n- Thanks to the Hugging Face team for their transformer models\n- spaCy team for their excellent NLP library\n- NLTK contributors for their linguistic analysis tools \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fethanke%2Ftranslation-complexity-score","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fethanke%2Ftranslation-complexity-score","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fethanke%2Ftranslation-complexity-score/lists"}