{"id":18993521,"url":"https://github.com/infinitode/valx","last_synced_at":"2026-04-15T10:30:19.113Z","repository":{"id":232762027,"uuid":"785156474","full_name":"Infinitode/ValX","owner":"Infinitode","description":"ValX is an open-source Python package for text cleaning tasks, including profanity detection and removal. Now also includes sensitive information detection, and removal.","archived":false,"fork":false,"pushed_at":"2024-04-13T07:28:12.000Z","size":37,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-04-14T05:12:09.711Z","etag":null,"topics":["ai","cleaner","datasets","nlp","profanity-detection","profanity-filter","python","removal","sensitive-data","sensitive-data-detection","text-cleaning"],"latest_commit_sha":null,"homepage":"https://infinitode.netlify.app","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Infinitode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-11T10:04:49.000Z","updated_at":"2024-06-13T19:13:12.757Z","dependencies_parsed_at":null,"dependency_job_id":"6131f57c-9090-4451-9f20-ac05a0959a46","html_url":"https://github.com/Infinitode/ValX","commit_stats":null,"previous_names":["infinitode/valx"],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infinitode%2FValX","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infinitode%2FValX/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infinitode%2FValX/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infinitode%2FValX/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Infinitode","download_url":"https://codeload.github.com/Infinitode/ValX/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240008130,"owners_count":19733261,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","cleaner","datasets","nlp","profanity-detection","profanity-filter","python","removal","sensitive-data","sensitive-data-detection","text-cleaning"],"created_at":"2024-11-08T17:21:45.271Z","updated_at":"2026-04-15T10:30:19.036Z","avatar_url":"https://github.com/Infinitode.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ValX\n![Python Version](https://img.shields.io/badge/python-3.12-blue.svg)\n[![Code Size](https://img.shields.io/github/languages/code-size/infinitode/valx)](https://github.com/infinitode/valx)\n![Downloads](https://pepy.tech/badge/valx)\n![License Compliance](https://img.shields.io/badge/license-compliance-brightgreen.svg)\n![PyPI Version](https://img.shields.io/pypi/v/valx)\n\nAn open-source Python library for data cleaning tasks. It includes functions for profanity detection, and removal, and detection and removal of personal information. Also includes hate speech and offensive language detection and removal, using AI.\n\n\u003e [!IMPORTANT]\n\u003e If you are using `scikit-learn` versions older than version `1.3.0`, please also downgrade your version of `numpy` as stated below. Otherwise, you can continue to use your preferred version of `scikit-learn` without downgrading `numpy`.\n\u003e\n\u003e Please downgrade to `numpy` version `1.26.4`. Our ValX **DecisionTreeClassifier** AI model, relies on lower versions of `numpy`, because it was trained on these versions.\n\u003e For more information see: https://techoverflow.net/2024/07/23/how-to-fix-numpy-dtype-size-changed-may-indicate-binary-incompatibility-expected-96-from-c-header-got-88-from-pyobject/\n\n\u003e [!NOTE]\n\u003e ValX will automatically install a version of `scikit-learn` that is compatible with your device if you don't have one already.\n\n## Changes in 0.2.4\n\nFixed a major incompatibility issue with `scikit-learn` due to version changes in `scikit-learn v1.3.0` which causes compatibility issues with versions later than `1.2.2`. ValX can now be used with `scikit-learn` versions earlier and later than `1.3.0`!\n\nWe've also removed `scikit-learn==1.2.2` as a dependency, as most versions of `scikit-learn` will now work.\n\n## Changes in 0.2.3\n\nWe have introduced a new optional `info_type` parameter into our `detect_sensitive_information`, and `remove_sensitive_information` functions, to allow you to have fine-grained control over what sensitive information you want to detect or remove.\n\nAlso introduced more detection patterns for other types of sensitive information, including:\n- `\"iban\"`: International Bank Account Number.\n- `\"mrn\"`: Medical Record Number (may not work correctly, depending on provider and country).\n- `\"icd10\"`: International Classification of Diseases, Tenth Revision.\n- `\"geo_coords\"`: Geo-coordinates (latitude and longitude in decimal degrees format).\n- `\"username\"`: Username handles (@username).\n- `\"file_path\"`: File paths (general patterns for both Windows and Unix paths).\n- `\"bitcoin_wallet\"`: Cryptocurrency wallet address.\n- `\"ethereum_wallet\"`: Cryptocurrency wallet addresses.\n\n## Changes in 0.2.2\n\nWe have refactored and changed the `detect_profanity` function:\n- Removed unnecessary printing\n- Now returns more information about each found profanity, including `Line`, `Column`, `Word`, and `Language`.\n\n\u003e [!NOTE]\n\u003e You can view [ValX's package documentation](https://infinitode-docs.gitbook.io/documentation/package-documentation/valx-package-documentation) for more information on changes.\n\n## Changes in 0.2.1\n\nUsing the AI models in ValX, you can now automatically remove hate speech, or offensive speech from your text data, without needing to run detection and write your own custom implementation method.\n\n## Installation\n\nYou can install ValX using pip:\n\n```bash\npip install valx\n```\n\n## Supported Python Versions\n\nValX supports the following Python versions:\n\n- Python 3.6\n- Python 3.7\n- Python 3.8\n- Python 3.9\n- Python 3.10\n- Python 3.11/Later (Preferred)\n\nPlease ensure that you have one of these Python versions installed before using ValX. ValX may not work as expected on lower versions of Python than the supported.\n\n## Features\n\n- **Profanity Detection**: Detect profane and NSFW words or terms.\n- **Remove Profanity**: Remove profane and NSFW words or terms.\n- **Detect Sensitive Information**: Detect sensitive information in text data.\n- **Remove Sensitive Information**: Remove sensitive information from text data.\n- **Detect Hate Speech**: Detect hate speech or offensive speech in text, using AI.\n- **Remove Hate Speech**: Remove hate speech or offensive speech in text, using AI.\n\n### List of supported languages for profanity detection and removal\nBelow is a complete list of all the available supported languages for ValX's profanity detection and removal functions which are valid values for `language`:\n\n- **All**\n- Arabic\n- Czech\n- Danish\n- German\n- English\n- Esperanto\n- Persian\n- Finnish\n- Filipino\n- French\n- French (CA)\n- Hindi\n- Hungarian\n- Italian\n- Japanese\n- Kabyle\n- Korean\n- Dutch\n- Norwegian\n- Polish\n- Portuguese\n- Russian\n- Swedish\n- Thai\n- Klingon\n- Turkish\n- Chinese\n\n## Usage\n\n### Detect Profanity\n\n```python\nfrom valx import detect_profanity\n\n# Detect profanity\nresults = detect_profanity(sample_text, language='English')\nprint(\"Profanity Evaluation Results\", results)\n```\n\n### Remove Profanity\n\n```python\nfrom valx import remove_profanity\n\n# Remove profanity\nremoved = remove_profanity(sample_text, \"text_cleaned.txt\", language=\"English\")\n```\n\n### Detect Sensitive Information\n\n```python\nfrom valx import detect_sensitive_information\n\n# Detect sensitive information\ndetected_sensitive_info = detect_sensitive_information(sample_text)\n```\n\n\u003e [!NOTE]\n\u003e We have updated this function, and it now includes an optional argument for `info_type`, which can be used to detect only specific types of sensitive information. It was also added to `remove_sensitive_information`.\n\n### Remove Sensitive Information\n\n```python\nfrom valx import remove_sensitive_information\n\n# Remove sensitive information\ncleaned_text = remove_sensitive_information(sample_text2)\n```\n\n### Detect Hate Speech And Offensive Language\n\n```python\nfrom valx import detect_hate_speech\n\n# Detect hate speech or offensive language\noutcome_of_detection = detect_hate_speech(\"You are stupid.\")\n```\n\n\u003e [!IMPORTANT]\n\u003e The model's possible outputs are:\n\u003e - `['Hate Speech']`: The text was flagged and contained hate speech.\n\u003e - `['Offensive Speech']`: The text was flagged and contained offensive speech.\n\u003e - `['No Hate and Offensive Speech']`: The text was not flagged for any hate speech or offensive speech.\n\n\u003e [!NOTE]\n\u003e See our [official documentation](https://infinitode-docs.gitbook.io/documentation/package-documentation/valx-package-documentation) for more examples on how to use **ValX**.\n\n## Contributing\n\nContributions are welcome! If you encounter any issues, have suggestions, or want to contribute to ValX, please open an issue or submit a pull request on [GitHub](https://github.com/infinitode/valx).\n\n## License\n\nValX is released under the terms of the **MIT License (Modified)**. Please see the [LICENSE](https://github.com/infinitode/valx/blob/main/LICENSE) file for the full text.\n\n### Derived licenses\n---\nValX uses data from this GitHub repository:\nhttps://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words/\n© 2012-2020 Shutterstock, Inc.\n\nCreative Commons Attribution 4.0 International License:\nhttps://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words/blob/master/LICENSE\n\n---\n\n**Modified License Clause**\n\nThe modified license clause grants users the permission to make derivative works based on the ValX software. However, it requires any substantial changes to the software to be clearly distinguished from the original work and distributed under a different name.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinitode%2Fvalx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finfinitode%2Fvalx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinitode%2Fvalx/lists"}