{"id":19056831,"url":"https://github.com/andrianllmm/taglid","last_synced_at":"2026-04-10T10:02:16.335Z","repository":{"id":252387320,"uuid":"840280607","full_name":"andrianllmm/tagLID","owner":"andrianllmm","description":"A word-level Language Identification (LID) tool for Tagalog-English (Taglish) text","archived":false,"fork":false,"pushed_at":"2025-06-22T16:55:28.000Z","size":632,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-23T01:02:05.499Z","etag":null,"topics":["code-mixing","code-switching","english","language-identification","linguistics","nlp","tagalog","taglish"],"latest_commit_sha":null,"homepage":"https://andrianllmm.github.io/projects/taglid","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andrianllmm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-08-09T11:01:08.000Z","updated_at":"2025-06-22T16:55:31.000Z","dependencies_parsed_at":"2024-08-24T06:37:19.472Z","dependency_job_id":"64e39e0d-401a-4e16-a9bb-626fb429eccb","html_url":"https://github.com/andrianllmm/tagLID","commit_stats":null,"previous_names":["andrianllmm/taglid"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/andrianllmm/tagLID","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrianllmm%2FtagLID","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrianllmm%2FtagLID/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrianllmm%2FtagLID/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrianllmm%2FtagLID/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andrianllmm","download_url":"https://codeload.github.com/andrianllmm/tagLID/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrianllmm%2FtagLID/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31637747,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-10T07:40:12.752Z","status":"ssl_error","status_checked_at":"2026-04-10T07:40:11.664Z","response_time":98,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code-mixing","code-switching","english","language-identification","linguistics","nlp","tagalog","taglish"],"created_at":"2024-11-08T23:52:01.626Z","updated_at":"2026-04-10T10:02:16.329Z","avatar_url":"https://github.com/andrianllmm.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# TagLID\n\n**A word-level Language Identification (LID) tool for Tagalog-English (Taglish)\ntext**\n\n[![Demo](https://asciinema.org/a/674332.svg)](https://asciinema.org/a/674332)\n\n\u003c/div\u003e\n\n## About\n\nTagLID is a library that labels each word in a Taglish (Tagalog-English mix)\ntext by language. It gives either a simple tag (`tgl` or `eng`) or detailed\nfrequency info with flags indicating how the word was identified. It is a\nrule-based and opinionated system that mostly uses dictionary lookups. It also\nhandles cases like skipping numbers, names, and interjections, and includes\nlogic for dealing with slang, abbreviations, contractions, stemming or\nlemmatizing inflected words, intrawords, and correcting misspellings.\n\n## Installation\n\n```sh\npip install git+https://github.com/andrianllmm/taglid.git@main\n```\n\n## Usage\n\nTagLID can act as a standalone library that can be imported via `import taglid`\nor as a CLI application via `python -m taglid`.\n\n### Library Mode\n\n#### Textual data\n\nUse the `lid` module for textual data.\n\nUse `lang_identify` to identify each word in a text. This takes any string and\nreturns a list of words and their corresponding English and Tagalog values,\nflag, and correction.\n\n```python\nfrom taglid.lid import lang_identify\n\nlabeled_text = lang_identify(\"hello, mundo\")\nprint(labeled_text)\n```\n\nOutput:\n\n```\n[{'Word': 'hello', 'eng': 1.0, 'tgl': 0.0, 'Flag': 'DICT', 'Correction': None}, {'Word': 'mundo', 'eng': 0.0, 'tgl': 1.0, 'Flag': 'DICT', 'Correction': None}]\n```\n\nUse [`tabulate`](https://pypi.org/project/tabulate/) to view output in tabular\nformat.\n\n```python\nfrom tabulate import tabulate\n\nprint(tabulate(labeled_text, headers=\"keys\"))\n```\n\nOutput:\n\n```\nword      eng    tgl  flag    correction\n------  -----  -----  ------  ------------\nhello       1      0  DICT\nmundo       0      1  DICT\n```\n\nUse `simplify` to only show the words and their language. This takes the return\nvalue of `lang_identify` and returns a list of tuples containing the word and\nits language.\n\n```python\nfrom taglid.lid import simplify\n\nsimplified_text = simplify(labeled_text)\nprint(simplified_text)\n```\n\nOutput:\n\n```\n[('hello', 'eng'), ('mundo', 'tgl')]\n```\n\n#### Datasets\n\nUse the `lid_dataset` module for datasets.\n\nUse `lang_identify_df` to label each word in each cell in a\n[`pandas`](https://pypi.org/project/pandas/) DataFrame. This takes a DataFrame\nof multiple rows and columns with each cell containing textual data and returns\na labeled DataFrame where each token is a row labeled by its original row,\noriginal column, and token index.\n\n```python\nimport pandas as pd\nfrom taglid.lid_dataset import lang_identify_df\n\ndata = [['hello po', 'ano?'], ['mag-aask lang po', 'what?']]\n\ndf = pd.DataFrame(data)\n\nlabeled_df = lang_identify_df(df)\nprint(labeled_df)\n```\n\nOutput:\n\n```\n     col  token_index      word  eng  tgl  flag correction\nrow\n0      0            1     hello  1.0  0.0  DICT       None\n0      0            2        po  0.0  1.0  DICT       None\n0      1            1       ano  0.0  1.0  FREQ       None\n1      0            1  mag-aask  0.5  0.5  INTW       None\n1      0            2      lang  0.0  1.0  FREQ       None\n1      0            3        po  0.0  1.0  DICT       None\n1      1            1      what  1.0  0.0  DICT       None\n```\n\n### CLI Mode\n\nRun TagLID from the terminal.\n\n```sh\npython -m taglid.lid\n```\n\nThen type a sentence when prompted.\n\n```\ntext: hello, mundo\n```\n\nOutput:\n\n```\nword      eng    tgl  flag    correction\n------  -----  -----  ------  ------------\nhello       1      0  DICT\nmundo       0      1  DICT\n```\n\nAdd `--simplify` to only show the words and their language.\n\n```sh\npython -m taglid.lid --simplify --text hello, mundo\n```\n\nOutput:\n\n```\n-----  ---\nhello  eng\nmundo  tgl\n-----  ---\n```\n\nUse `lid_dataset` with Excel files to directly label spreadsheets.\n\n```sh\npython -m taglid.lid_dataset in_path out_path\n```\n\n## Accuracy\n\nThe accuracy hasn't been tested yet.\n\n## Contributing\n\nContributions are welcome! To get started:\n\n1. Fork the project\n2. Create your feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/AmazingFeature`)\n5. Open a pull request\n\n## Issues\n\nFound a bug or issue? Report it on the\n[issues page](https://github.com/andrianllmm/taglid/issues).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrianllmm%2Ftaglid","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandrianllmm%2Ftaglid","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrianllmm%2Ftaglid/lists"}