{"id":30669120,"url":"https://github.com/artlabss/open-data-anonymizer","last_synced_at":"2026-03-04T23:01:36.280Z","repository":{"id":37027934,"uuid":"424237793","full_name":"ArtLabss/open-data-anonymizer","owner":"ArtLabss","description":"Python Data Anonymization \u0026 Masking Library For Data Science Tasks","archived":false,"fork":false,"pushed_at":"2023-07-12T09:58:07.000Z","size":42136,"stargazers_count":295,"open_issues_count":6,"forks_count":35,"subscribers_count":7,"default_branch":"main","last_synced_at":"2026-02-15T15:56:22.676Z","etag":null,"topics":["anonymization","data-anonymization","data-encoding","data-science","machine-learning","pandas","pdf","pdf-anonymization","python","python-data-anonymization"],"latest_commit_sha":null,"homepage":"https://www.artlabs.tech","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ArtLabss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-03T13:37:27.000Z","updated_at":"2026-02-13T18:43:49.000Z","dependencies_parsed_at":"2024-06-19T16:11:50.289Z","dependency_job_id":null,"html_url":"https://github.com/ArtLabss/open-data-anonymizer","commit_stats":null,"previous_names":["artlabss/open-data-anonimizer"],"tags_count":18,"template":false,"template_full_name":null,"purl":"pkg:github/ArtLabss/open-data-anonymizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArtLabss%2Fopen-data-anonymizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArtLabss%2Fopen-data-anonymizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArtLabss%2Fopen-data-anonymizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArtLabss%2Fopen-data-anonymizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ArtLabss","download_url":"https://codeload.github.com/ArtLabss/open-data-anonymizer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArtLabss%2Fopen-data-anonymizer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30098085,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-04T22:49:54.894Z","status":"ssl_error","status_checked_at":"2026-03-04T22:49:48.883Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anonymization","data-anonymization","data-encoding","data-science","machine-learning","pandas","pdf","pdf-anonymization","python","python-data-anonymization"],"created_at":"2025-09-01T01:01:27.385Z","updated_at":"2026-03-04T23:01:36.214Z","avatar_url":"https://github.com/ArtLabss.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align='center'\u003e\n  \u003ca href=\"https://artlabs.tech/\"\u003e\n    \u003cimg src='https://raw.githubusercontent.com/ArtLabss/tennis-tracking/main/VideoOutput/artlabs%20logo.jpg' width=\"150\" height=\"170\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\u003ch1 align='center'\u003eanonympy 🕶️\u003c/h1\u003e\n\n\u003cp align='center'\u003e\n\u003cimg src=\"https://img.shields.io/github/forks/ArtLabss/open-data-anonimizer.svg\"\u003e\n  \u003cimg src=\"https://img.shields.io/github/stars/ArtLabss/open-data-anonimizer.svg\"\u003e\n  \u003cimg src=\"https://img.shields.io/github/watchers/ArtLabss/open-data-anonimizer.svg\"\u003e\n  \u003cimg src=\"https://img.shields.io/github/last-commit/ArtLabss/open-data-anonimizer.svg\"\u003e\n  \u003cbr\u003e\n  \u003cimg src=\"https://img.shields.io/pypi/v/anonympy.svg\"\u003e\n  \u003cimg src=\"https://img.shields.io/pypi/l/anonympy.svg\"\u003e\n  \u003cimg src=\"https://hits.sh/github.com/ArtLabss/open-data-anonimizer.svg\"\u003e\n  \u003ca href=\"https://pepy.tech/project/anonympy\"\u003e\u003cimg src=\"https://pepy.tech/badge/anonympy\"\u003e\u003c/a\u003e\n  \u003cbr\u003e\n  \u003ca href=\"https://github.com/ArtLabss/open-data-anonymizer/actions/workflows/pylinter.yml\"\u003e\u003cimg src=\"https://github.com/ArtLabss/open-data-anonymizer/actions/workflows/pylinter.yml/badge.svg\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/ArtLabss/open-data-anonymizer/actions/workflows/python-app.yml\"\u003e\u003cimg src=\"https://github.com/ArtLabss/open-data-anonymizer/actions/workflows/python-app.yml/badge.svg\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/ArtLabss/open-data-anonymizer/actions/workflows/codeql-analysis.yml\"\u003e\u003cimg src=\"https://github.com/ArtLabss/open-data-anonymizer/actions/workflows/codeql-analysis.yml/badge.svg\"\u003e\u003c/a\u003e\n  \u003cbr\u003e\n  \u003ccode\u003eWith ❤️ by ArtLabs\u003c/code\u003e\n  \n\u003ch2\u003eOverview\u003c/h2\u003e\n\u003cp\u003eGeneral Data Anonymization library for images, PDFs and tabular data. See \u003ca href=\"https://artlabs.tech/projects/\"\u003eArtLabs/projects\u003c/a\u003e for more or similar projects.\u003c/p\u003e\n\u003cbr\u003e\n\u003ch2\u003eMain Features\u003c/h2\u003e\n\n\u003cp\u003eEase of use - this package was written to be as intuitive as possible.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eTabular\u003c/strong\u003e\u003c/p\u003e\n\u003cul\u003e\n  \u003cli\u003eEfficient - based on pd.DataFrame\u003c/li\u003e\n  \u003cli\u003eNumerous anonymization methods\u003c/li\u003e\n    \u003cul\u003e\n      \u003cli\u003eNumeric data\u003c/li\u003e\n        \u003cul\u003e\n          \u003cli\u003eGeneralization - Binning\u003c/li\u003e\n          \u003cli\u003ePerturbation\u003c/li\u003e\n          \u003cli\u003ePCA Masking\u003c/li\u003e\n          \u003cli\u003eGeneralization - Rounding\u003c/li\u003e\n        \u003c/ul\u003e\n      \u003cli\u003eCategorical data\u003c/li\u003e\n        \u003cul\u003e\n          \u003cli\u003eSynthetic Data\u003c/li\u003e\n          \u003cli\u003eResampling\u003c/li\u003e\n          \u003cli\u003eTokenization\u003c/li\u003e\n          \u003cli\u003ePartial Email Masking\u003c/li\u003e\n        \u003c/ul\u003e\n      \u003cli\u003eDatetime data\u003c/li\u003e\n        \u003cul\u003e\n          \u003cli\u003eSynthetic Date\u003c/li\u003e\n          \u003cli\u003ePerturbation\u003c/li\u003e\n        \u003c/ul\u003e\n      \u003c/ul\u003e\n\u003c/ul\u003e\n\n\u003cp\u003e\u003cstrong\u003eImages\u003c/strong\u003e\u003c/p\u003e\n\u003cul\u003e\n  \u003cli\u003eAnonymization techniques\u003c/li\u003e\n  \u003cul\u003e\n    \u003cli\u003ePersonal Images (faces)\u003c/li\u003e\n    \u003cul\u003e\n      \u003cli\u003eBlurring\u003c/li\u003e\n      \u003cli\u003ePixaled Face Blurring\u003c/li\u003e\n      \u003cli\u003eSalt and Pepper Noise\u003c/li\u003e\n    \u003c/ul\u003e\n    \u003cli\u003eGeneral Images\u003c/li\u003e\n    \u003cul\u003e\n      \u003cli\u003eBlurring\u003c/li\u003e\n    \u003c/ul\u003e\n  \u003c/ul\u003e\n\u003c/ul\u003e\n\n\u003cp\u003e\u003cstrong\u003ePDF\u003c/strong\u003e\u003c/p\u003e\n\u003cul\u003e\n  \u003cli\u003eFind sensitive information and cover it with black boxes\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003e\u003cstrong\u003eText, Sound\u003c/strong\u003e\u003c/p\u003e\n\u003cul\u003e\n  \u003cli\u003eIn Development\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cbr\u003e\n\n\u003ch2\u003eInstallation\u003c/h2\u003e\n\n\u003ch3\u003eDependencies\u003c/h3\u003e\n\u003col\u003e\n  \u003cli\u003e Python (\u003e= 3.7)\u003c/li\u003e\n  \u003cli\u003ecape-dataframes\u003c/li\u003e\n  \u003cli\u003efaker\u003c/li\u003e\n  \u003cli\u003epandas\u003c/li\u003e\n  \u003cli\u003eOpenCV\u003c/li\u003e\n  \u003cli\u003epytesseract\u003c/li\u003e\n  \u003cli\u003etransformers\u003c/li\u003e\n  \u003cli\u003e\u003ca href=\"https://github.com/ArtLabss/open-data-anonimizer/blob/main/requirements.txt\"\u003e.         .  .  .  .  \u003c/a\u003e\u003c/li\u003e\n\u003c/ol\u003e\n\n\u003ch3\u003eInstall with pip\u003c/h3\u003e\n\n\u003cp\u003eEasiest way to install anonympy is using \u003ccode\u003epip\u003c/code\u003e\u003c/p\u003e\n\n```\npip install anonympy\n```\n\n\u003ch3\u003eInstall from source\u003c/h3\u003e\n\n\u003cp\u003eInstalling the library from source code is also possible\u003c/p\u003e\n\n```\ngit clone https://github.com/ArtLabss/open-data-anonimizer.git\ncd open-data-anonimizer\npip install -r requirements.txt\nmake bootstrap\n```\n\n\u003ch3\u003eDownloading Repository\u003c/h3\u003e\n\n\u003cp\u003eOr you could download this repository from \u003ca href=\"https://pypi.org/project/anonympy/\"\u003epypi\u003c/a\u003e and run the following:\n\n```\ncd open-data-anonimizer\npython setup.py install\n```\n\n\n\u003cbr\u003e\n\n\u003ch2\u003eUsage Example \u003c/h2\u003e\n\n[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wg4g4xWTSLvThYHYLKDIKSJEC4ChQHaM?usp=sharing)\n\n\u003cp\u003eMore examples \u003ca href=\"https://github.com/ArtLabss/open-data-anonimizer/blob/b5d5f2df94b80011a8a93fa08f0046d1390cec49/examples/examples.ipynb\"\u003ehere\u003c/a\u003e\n  \n\u003cp\u003e\u003cstrong\u003eTabular\u003c/strong\u003e\u003c/p\u003e\n\n```python\n\u003e\u003e\u003e from anonympy.pandas import dfAnonymizer\n\u003e\u003e\u003e from anonympy.pandas.utils_pandas import load_dataset\n\n\u003e\u003e\u003e df = load_dataset() \n\u003e\u003e\u003e print(df)\n```\n\n|   |  name | age |  birthdate |   salary |                                  web |                email |       ssn |\n|--:|------:|----:|-----------:|---------:|-------------------------------------:|---------------------:|----------:|\n| 0 | Bruce | 33  | 1915-04-17 | 59234.32 | http://www.alandrosenburgcpapc.co.uk | josefrazier@owen.com | 343554334 |\n| 1 | Tony  | 48  | 1970-05-29 | 49324.53 | http://www.capgeminiamerica.co.uk    | eryan@lewis.com      | 656564664 |\n  \n```python\n# Calling the generic function\n\u003e\u003e\u003e anonym = dfAnonymizer(df)\n\u003e\u003e\u003e anonym.anonymize(inplace = False) # changes will be returned, not applied\n```\n\n|      | name            | age    | birthdate  | age     | web        |         email       |     ssn     |\n|------|-----------------|--------|------------|---------|------------|---------------------|-------------|\n| 0    | Stephanie Patel | 30     | 1915-05-10 | 60000.0 | 5968b7880f | pjordan@example.com | 391-77-9210 |\n| 1    | Daniel Matthews | 50     | 1971-01-21 | 50000.0 | 2ae31d40d4 | tparks@example.org  | 872-80-9114 |\n  \n```python\n# Or applying a specific anonymization technique to a column\n\u003e\u003e\u003e from anonympy.pandas.utils_pandas import available_methods\n\n\u003e\u003e\u003e anonym.categorical_columns\n... ['name', 'web', 'email', 'ssn']\n\u003e\u003e\u003e available_methods('categorical') \n... categorical_fake\tcategorical_fake_auto\tcategorical_resampling\tcategorical_tokenization\tcategorical_email_masking\n\n\u003e\u003e\u003e anonym.anonymize({'name': 'categorical_fake',  # {'column_name': 'method_name'}\n                  'age': 'numeric_noise',\n                  'birthdate': 'datetime_noise',\n                  'salary': 'numeric_rounding',\n                  'web': 'categorical_tokenization', \n                  'email':'categorical_email_masking', \n                  'ssn': 'column_suppression'})\n\u003e\u003e\u003e print(anonym.to_df())\n```\n|   |  name | age |  birthdate |   salary |                                  web |                email |\n|--:|------:|----:|-----------:|---------:|-------------------------------------:|---------------------:|\n| 0 | Paul Lang | 31  | 1915-04-17 | 60000.0 | 8ee92fb1bd | j*****r@owen.com |\n| 1 | Michael Gillespie  | 42  | 1970-05-29 | 50000.0 | 51b615c92e    | e*****n@lewis.com      | \n \n\u003cbr \u003e\n\u003cp\u003e\u003cstrong\u003eImages\u003c/strong\u003e\u003c/p\u003e\n\n```python\n# Passing an Image\n\u003e\u003e\u003e import cv2\n\u003e\u003e\u003e from anonympy.images import imAnonymizer\n\n\u003e\u003e\u003e img = cv2.imread('salty.jpg')\n\u003e\u003e\u003e anonym = imAnonymizer(img)\n\n\u003e\u003e\u003e blurred = anonym.face_blur((31, 31), shape='r', box = 'r')  # blurring shape and bounding box ('r' / 'c')\n\u003e\u003e\u003e pixel = anonym.face_pixel(blocks=20, box=None)\n\u003e\u003e\u003e sap = anonym.face_SaP(shape = 'c', box=None)\n```\nblurred            |  pixel           |    sap\n:-------------------------:|:-------------------------:|:-------------------------:\n![input_img1](https://raw.githubusercontent.com/ArtLabss/open-data-anonimizer/d61127f7a8fdff603af21dcab8edbf72f2aab292/examples/files/sad_boy_blurred.jpg)  |  ![output_img1](https://raw.githubusercontent.com/ArtLabss/open-data-anonimizer/d61127f7a8fdff603af21dcab8edbf72f2aab292/examples/files/sad_boy_pixel.jpg)    |   ![sap_image](https://raw.githubusercontent.com/ArtLabss/open-data-anonimizer/d61127f7a8fdff603af21dcab8edbf72f2aab292/examples/files/sad_boy_sap.jpg) \n\n```python\n# Passing a Folder \n\u003e\u003e\u003e path = 'C:/Users/shakhansho.sabzaliev/Downloads/Data' # images are inside `Data` folder\n\u003e\u003e\u003e dst = 'D:/' # destination folder\n\u003e\u003e\u003e anonym = imAnonymizer(path, dst)\n\n\u003e\u003e\u003e anonym.blur(method = 'median', kernel = 11) \n```\n\n\u003cp\u003eThis will create a folder \u003ci\u003eOutput\u003c/i\u003e in \u003ccode\u003edst\u003c/code\u003e directory.\u003c/p\u003e\n\n```python\n# The Data folder had the following structure\n\n|   1.jpg\n|   2.jpg\n|   3.jpeg\n|   \n\\---test\n    |   4.png\n    |   5.jpeg\n    |   \n    \\---test2\n            6.png\n\n# The Output folder will have the same structure and file names but blurred images\n```\n\n\u003cbr\u003e\n\n\u003cp\u003e\u003cstrong\u003ePDF\u003c/strong\u003e\u003c/p\u003e\n\n\u003cp\u003eIn order to initialize \u003ccode\u003epdfAnonymizer\u003c/code\u003e object we have to install \u003ccode\u003epytesseract\u003c/code\u003e and \u003ccode\u003epoppler\u003c/code\u003e, and provide path to the binaries of both as arguments or add paths to system variables\u003c/p\u003e\n\n```python\n\u003e\u003e\u003e from anonympy.pdf import pdfAnonymizer\n\n# need to specify paths, since I don't have them in system variables\n\u003e\u003e\u003e anonym = pdfAnonymizer(path_to_pdf = \"Downloads\\\\test.pdf\",\n                       pytesseract_path = r\"C:\\Program Files\\Tesseract-OCR\\tesseract.exe\",\n                       poppler_path = r\"C:\\Users\\shakhansho\\Downloads\\Release-22.01.0-0\\poppler-22.01.0\\Library\\bin\")\n\n# Calling the generic function\n\u003e\u003e\u003e anonym.anonymize(output_path = 'output.pdf',\n                     remove_metadata = True,\n                     fill = 'black',\n                     outline = 'black')\n```\n\n`test.pdf`            |  `output.pdf`            | \n:-------------------------:|:-------------------------:|\n![test_img](https://raw.githubusercontent.com/ArtLabss/open-data-anonymizer/f09e98c05380ffda6cecdd5b332e3dc66a30e17c/examples/files/test-1.jpg)  |  ![output_img](https://raw.githubusercontent.com/ArtLabss/open-data-anonymizer/be3f376e6d93e7a726f083bf28db3bcbd7f592a3/examples/files/test_output.jpg)    |\n\n\u003cp\u003eIn case you only want to hide specific information, instead of \u003ccode\u003eanonymize\u003c/code\u003e use other methods\u003c/p\u003e\n\n```python\n\u003e\u003e\u003e anonym = pdfAnonymizer(path_to_pdf = r\"Downloads\\test.pdf\")\n\u003e\u003e\u003e anonym.pdf2images() #  images are stored in anonym.images variable \n\u003e\u003e\u003e anonym.images2text(anonym.images) # texts are stored in anonym.texts\n\n#  Entities of interest \n\u003e\u003e\u003e locs: dict = anonym.find_LOC(anonym.texts[0])  # index refers to page number\n\u003e\u003e\u003e emails: dict = anonym.find_emails(anonym.texts[0])  # {page_number: [coords]}\n\u003e\u003e\u003e coords: list = locs['page_1'] + emails['page_1'] \n\n\u003e\u003e\u003e anonym.cover_box(anonym.images[0], coords)\n\u003e\u003e\u003e display(anonym.images[0])\n```\n\n\u003ch2\u003eDevelopment\u003c/h2\u003e\n\n\u003ch3\u003eContributions\u003c/h3\u003e\n\n\u003cp\u003eThe \u003ca href=\"https://github.com/ArtLabss/open-data-anonimizer/blob/main/CONTRIBUTING.md\"\u003eContributing Guide\u003c/a\u003e has detailed information about contributing code and documentation.\u003c/p\u003e\n\n\u003ch3\u003eImportant Links\u003c/h3\u003e\n\u003cul\u003e\n  \u003cli\u003eOfficial source code repo: \u003ca href=\"https://github.com/ArtLabss/open-data-anonimizer\"\u003ehttps://github.com/ArtLabss/open-data-anonimizer\u003c/a\u003e\u003c/li\u003e\n  \u003cli\u003eDownload releases: \u003ca href=\"https://pypi.org/project/anonympy/\"\u003ehttps://pypi.org/project/anonympy/\u003c/a\u003e\u003c/li\u003e\n  \u003cli\u003eIssue tracker: \u003ca href=\"https://github.com/ArtLabss/open-data-anonimizer/issues\"\u003ehttps://github.com/ArtLabss/open-data-anonimizer/issues\u003c/li\u003e\u003c/a\u003e\n\u003c/ul\u003e\n\n\u003ch2\u003eLicense\u003c/h2\u003e\n\n\u003cp\u003e\u003ca href=\"https://github.com/ArtLabss/open-data-anonimizer/blob/main/LICENSE\"\u003eBSD-3\u003c/a\u003e\u003c/p\u003e\n\n\n\u003ch2\u003eCode of Conduct\u003c/h2\u003e\n\u003cp\u003ePlease see \u003ca href=\"https://github.com/ArtLabss/open-data-anonimizer/blob/main/CODE_OF_CONDUCT.md\"\u003eCode of Conduct\u003c/a\u003e. \nAll community members are expected to follow it.\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fartlabss%2Fopen-data-anonymizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fartlabss%2Fopen-data-anonymizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fartlabss%2Fopen-data-anonymizer/lists"}