{"id":18993525,"url":"https://github.com/infinitode/duplipy","last_synced_at":"2026-06-28T07:31:58.802Z","repository":{"id":179469203,"uuid":"663551393","full_name":"Infinitode/DupliPy","owner":"Infinitode","description":"DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.","archived":false,"fork":false,"pushed_at":"2026-04-09T07:35:52.000Z","size":95,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-09T09:25:11.110Z","etag":null,"topics":["ai","augmentation","data-analysis","data-preprocessing","data-science","images","language-models","nlp","preprocessing","text-data","text-datasets","text-formatting"],"latest_commit_sha":null,"homepage":"https://infinitode.netlify.app","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Infinitode.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-07-07T14:58:51.000Z","updated_at":"2026-04-09T07:35:05.000Z","dependencies_parsed_at":"2024-01-19T10:13:43.103Z","dependency_job_id":"4df0f6c9-dcf2-465a-88e3-68ff1541a5ce","html_url":"https://github.com/Infinitode/DupliPy","commit_stats":{"total_commits":15,"total_committers":2,"mean_commits":7.5,"dds":0.06666666666666665,"last_synced_commit":"d8a70bb8f35c32034b9fd67589234b61c8167af7"},"previous_names":["infinitode/duplipy"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/Infinitode/DupliPy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infinitode%2FDupliPy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infinitode%2FDupliPy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infinitode%2FDupliPy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infinitode%2FDupliPy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Infinitode","download_url":"https://codeload.github.com/Infinitode/DupliPy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infinitode%2FDupliPy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34881384,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-28T02:00:05.809Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","augmentation","data-analysis","data-preprocessing","data-science","images","language-models","nlp","preprocessing","text-data","text-datasets","text-formatting"],"created_at":"2024-11-08T17:21:46.434Z","updated_at":"2026-06-28T07:31:58.796Z","avatar_url":"https://github.com/Infinitode.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DupliPy 0.2.6\n\n![Python Version](https://img.shields.io/badge/python-3.12-blue.svg)\n![Code Size](https://img.shields.io/github/languages/code-size/infinitode/duplipy)\n![Downloads](https://pepy.tech/badge/duplipy)\n![License Compliance](https://img.shields.io/badge/license-compliance-brightgreen.svg)\n![PyPI Version](https://img.shields.io/pypi/v/duplipy)\n\n[DupliPy Documentation](https://infinitode-docs.gitbook.io/documentation/package-documentation/duplipy-package-documentation)\n\nAn open source Python library for text formatting, augmentation, and similarity calculation tasks in NLP, the package now also includes additional methods for image and tabular data augmentation.\n\n## Changes in DupliPy 0.2.6\n\n- Added new CSV augmentation functions: `augment_csv_data` for automatic imputation, balancing, and expansion of tabular datasets.\n- Added new numerical data augmentation functions: `add_noise`, `scale_data`, and `shift_data`.\n- Added new time-series augmentation function: `augment_time_series` with automatic date pattern identification.\n- Added standalone dataset balancing function: `balance_dataset`.\n- Improved type hints throughout the library to support Python 3.10+ features.\n- Enhanced docstrings and documentation for better code clarity and linting.\n\n## Changes in DupliPy 0.2.5\n\n- Added new text augmentation functions: `swap_random_words` and `random_word_deletion`.\n- Added new text similarity metrics: `sorensen_dice_coefficient` and `cosine_similarity_score`.\n- Added new image similarity metrics: `mean_squared_error` and `psnr`.\n- Added new text analysis function: `named_entity_recognition`.\n- Improved progress bars for augmentation functions.\n\n## Changes in DupliPy 0.2.4\n\n- Created new functions in `duplipy.replication` for image augmentation: `random_flip`, `random_color_jitter`, and `noise_overlay`.\n- Created a new function (`post_format_text`) for post-formatting after DupliPy processing or augmentation tasks that cleans up extra whitespace and normalizes punctuation spacing.\n\n## Changes to DupliPy 0.2.3\n\nDuplipy now utilizes another one of our Python packages, called ValX, which provides quick methods we can use to clean and format our text data before training in preprocessing steps.\n\n## Installation\n\nYou can install DupliPy using pip:\n\n```bash\npip install duplipy\n```\n\n## Supported Python Versions\n\nDupliPy supports the following Python versions:\n\n- Python 3.6\n- Python 3.7\n- Python 3.8\n- Python 3.9\n- Python 3.10\n- Python 3.11\n- Python 3.12 or later\n\nPlease ensure that you have one of these Python versions installed before using DupliPy. DupliPy may not work as expected on lower versions of Python than the supported.\n\n## Features\n\n- Text Formatting: Remove special characters, standardize text formatting.\n- Text Replication: Generate replicated instances of text for data augmentation.\n- Sentiment Analysis: Find impressions within sentences.\n- Similarity Calculation: Calculate text and image similarity using various metrics.\n- BLEU Score Calculation: Calculate how well your text-based NLP model performs.\n- Named Entity Recognition: Identify and categorize key information in text.\n- Image Augmentation Tasks.\n- Tabular and Numerical Data Augmentation (CSV expansion, imputation, balancing).\n- Profanity removal, hate speech removal, offensive speech removal, and sensitive information removal.\n\n_For full reference documentation view [DupliPy's official documentation](https://infinitode-docs.gitbook.io/documentation/package-documentation/duplipy-package-documentation)._\n\n## Usage\n\n### Text Formatting\n\n```python\nfrom duplipy.formatting import remove_special_characters, standardize_text\n\ntext = \"Hello! This is some text.\"\n\n# Remove special characters\nformatted_text = remove_special_characters(text)\nprint(formatted_text)  # Output: Hello This is some text\n\n# Standardize text formatting\nstandardized_text = standardize_text(text)\nprint(standardized_text)  # Output: hello! this is some text\n```\n\n### Text Replication\n\n```python\nfrom duplipy.replication import replace_word_with_synonym, augment_text_with_synonyms, swap_random_words, random_word_deletion\n\ntext = \"Hello! This is some text.\"\n\n# Replace words with synonyms\naugmented_text = augment_text_with_synonyms(text, augmentation_factor=3, probability=0.5)\nprint(augmented_text)\n\n# Swap random words\nswapped_text = swap_random_words(text)\nprint(swapped_text)\n\n# Delete random words\ndeleted_text = random_word_deletion(text, num_deletions=1)\nprint(deleted_text)\n```\n\n### Sentiment Analysis\n\n```python\nfrom duplipy.text_analysis import analyze_sentiment\n\ntext = \"I love this product! It's amazing!\"\n\n# Analyze sentiment\nsentiment = analyze_sentiment(text)\nprint(sentiment)  # Output: Positive\n```\n\n### Similarity Calculation\n\n```python\nfrom duplipy.similarity import edit_distance_score, sorensen_dice_coefficient, cosine_similarity_score\n\ntext1 = \"Hello! How are you?\"\ntext2 = \"Hi! How are you doing?\"\n\n# Calculate edit distance\nedit_distance = edit_distance_score(text1, text2)\nprint(edit_distance)  # Output: 4\n\n# Calculate Sorensen-Dice coefficient\ndice_coefficient = sorensen_dice_coefficient(text1, text2)\nprint(dice_coefficient)\n\n# Calculate cosine similarity\ncosine_sim = cosine_similarity_score(text1, text2)\nprint(cosine_sim)\n```\n\n### BLEU Score Calculation\n\n```python\nfrom duplipy.similarity import bleu_score\n\ntext1 = \"Hello! How are you?\"\ntext2 = \"Hi! How are you doing?\"\n\n# Calculate cosine similarity\nbleu_value = bleu_score(text1, text2)\nprint(bleu_value)  # Output: 0.434\n```\n\n### Image Augmentation\n\n```python\nfrom PIL import Image\nfrom duplipy.replication import flip_horizontal, flip_vertical, rotate, random_rotation, resize, crop, random_crop\n\n# Load an image for testing\nimage_path = \"path/to/image.jpg\"\nimage = Image.open(image_path)\n\n# Flip the image horizontally\nflipped_horizontal_image = flip_horizontal(image)\n\n# Flip the image vertically\nflipped_vertical_image = flip_vertical(image)\n\n# Rotate the image by a specific angle (e.g., 45 degrees)\nrotated_image = rotate(image, 45)\n\n# Apply a random rotation to the image within a specified range of angles (e.g., -30 to 30 degrees)\nrandomly_rotated_image = random_rotation(image, max_angle=30)\n\n# Resize the image to a specific target size (e.g., 224x224 pixels)\nresized_image = resize(image, target_size=(224, 224))\n\n# Crop a random region from the image (e.g., 150x150 pixels)\nrandomly_cropped_image = random_crop(image, crop_size=(150, 150))\n\n# Save the augmented images (optional, if you want to view the results)\nflipped_horizontal_image.save(\"path/to/flipped_horizontal.jpg\")\nflipped_vertical_image.save(\"path/to/flipped_vertical.jpg\")\nrotated_image.save(\"path/to/rotated.jpg\")\nrandomly_rotated_image.save(\"path/to/randomly_rotated.jpg\")\nresized_image.save(\"path/to/resized.jpg\")\nrandomly_cropped_image.save(\"path/to/randomly_cropped.jpg\")\n```\n\n### Image Similarity\n\n```python\nfrom PIL import Image\nfrom duplipy.similarity import mean_squared_error, psnr\n\n# Load two images for testing\nimage1 = Image.open(\"path/to/image1.jpg\")\nimage2 = Image.open(\"path/to/image2.jpg\")\n\n# Calculate Mean Squared Error (MSE)\nmse = mean_squared_error(image1, image2)\nprint(f\"Mean Squared Error: {mse}\")\n\n# Calculate Peak Signal-to-Noise Ratio (PSNR)\npsnr_value = psnr(image1, image2)\nprint(f\"PSNR: {psnr_value}\")\n\n```\n\n### Named Entity Recognition\n\n```python\nfrom duplipy.text_analysis import named_entity_recognition\n\ntext = \"Apple is looking at buying U.K. startup for $1 billion\"\n\n# Perform NER\nentities = named_entity_recognition(text)\nprint(entities)\n```\n\n### Tabular Data Augmentation (CSV)\n\n```python\nfrom duplipy.replication import augment_csv_data\n\n# Augment a CSV file with automatic imputation and balancing\naugment_csv_data(\n    input_path=\"data.csv\",\n    output_path=\"augmented_data.csv\",\n    augmentation_factor=2,\n    balance_column=\"gender\",\n    fill_missing=True\n)\n```\n\n### Numerical and Time-Series Augmentation\n\n```python\nfrom duplipy.replication import add_noise, augment_time_series\n\n# Add noise to numerical data\ndata = [1.0, 2.0, 3.0, 4.0]\nnoisy_data = add_noise(data, noise_factor=0.1)\nprint(noisy_data)\n\n# Augment time-series data\ntimestamps = [\"2023-01-01\", \"2023-01-02\", \"2023-01-05\"]\naugmented_timestamps = augment_time_series(timestamps, augmentation_factor=1)\nprint(augmented_timestamps)\n```\n\n### Hate speech and Offensive speech removal using AI\n\n```python\nfrom duplipy.formatting import remove_hate_speech_from_text\n\ntext = \"I hate all of you bad word! Can't you just bad word leave me alone! Hi, I'm Katy.\"\n\nprint(remove_hate_speech_from_text(text))\n\n### Output\n# \"Hi, I'm Katy.\"\n```\n\n## Contributing\n\nContributions are welcome! If you encounter any issues, have suggestions, or want to contribute to DupliPy, please open an issue or submit a pull request on [GitHub](https://github.com/infinitode/duplipy).\n\n## License\n\nDupliPy is released under the terms of the **MIT License (Modified)**. Please see the [LICENSE](https://github.com/infinitode/duplipy/blob/main/LICENSE) file for the full text.\n\n**Modified License Clause**\n\nThe modified license clause grants users the permission to make derivative works based on the DupliPy software. However, it requires any substantial changes to the software to be clearly distinguished from the original work and distributed under a different name.\n\nBy enforcing this distinction, it aims to prevent direct publishing of the source code without changes while allowing users to create derivative works that incorporate the code but are not exactly the same.\n\nPlease read the full license terms in the [LICENSE](https://github.com/infinitode/duplipy/blob/main/LICENSE) file for complete details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinitode%2Fduplipy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finfinitode%2Fduplipy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinitode%2Fduplipy/lists"}