{"id":18774696,"url":"https://github.com/firefly-cpp/arm-preprocessing","last_synced_at":"2025-04-13T09:21:46.671Z","repository":{"id":215449964,"uuid":"542056509","full_name":"firefly-cpp/arm-preprocessing","owner":"firefly-cpp","description":"Implementation of several preprocessing techniques for Association Rule Mining (ARM)","archived":false,"fork":false,"pushed_at":"2025-03-19T15:35:40.000Z","size":1151,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-27T00:54:57.018Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/firefly-cpp.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-27T11:47:48.000Z","updated_at":"2025-03-19T15:35:41.000Z","dependencies_parsed_at":"2024-01-04T14:38:32.820Z","dependency_job_id":"796835e8-8dd5-45d8-b099-79023443b620","html_url":"https://github.com/firefly-cpp/arm-preprocessing","commit_stats":{"total_commits":92,"total_committers":7,"mean_commits":"13.142857142857142","dds":0.3586956521739131,"last_synced_commit":"febff228ec08a55d1ef7c73842b965f2fd38472c"},"previous_names":["firefly-cpp/arm-preprocessing"],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firefly-cpp%2Farm-preprocessing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firefly-cpp%2Farm-preprocessing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firefly-cpp%2Farm-preprocessing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firefly-cpp%2Farm-preprocessing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/firefly-cpp","download_url":"https://codeload.github.com/firefly-cpp/arm-preprocessing/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248689376,"owners_count":21145923,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T19:39:06.264Z","updated_at":"2025-04-13T09:21:46.649Z","avatar_url":"https://github.com/firefly-cpp.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg alt=\"logo\" width=\"300\" src=\".github/images/logo_black.png\"\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003e\n  arm-preprocessing\n\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg alt=\"PyPI Version\" src=\"https://img.shields.io/pypi/v/arm-preprocessing.svg\"\u003e\n  \u003cimg alt=\"PyPI - Python Version\" src=\"https://img.shields.io/pypi/pyversions/arm-preprocessing.svg\"\u003e\n  \u003cimg alt=\"PyPI - Downloads\" src=\"https://img.shields.io/pypi/dm/arm-preprocessing.svg\" href=\"https://pepy.tech/project/arm-preprocessing\"\u003e\n  \u003ca href=\"https://repology.org/project/python:arm-preprocessing/versions\"\u003e\n    \u003cimg alt=\"Packaging status\" src=\"https://repology.org/badge/tiny-repos/python:arm-preprocessing.svg\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pepy.tech/project/arm-preprocessing\"\u003e\n    \u003cimg alt=\"Downloads\" src=\"https://static.pepy.tech/badge/arm-preprocessing\"\u003e\n  \u003c/a\u003e\n  \u003cimg alt=\"License\" src=\"https://img.shields.io/github/license/firefly-cpp/arm-preprocessing.svg\"\u003e\n  \u003ca href=\"https://github.com/firefly-cpp/arm-preprocessing/actions/workflows/test.yml\"\u003e\n    \u003cimg alt=\"arm-preprocessing\" src=\"https://github.com/firefly-cpp/arm-preprocessing/actions/workflows/test.yml/badge.svg\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://arm-preprocessing.readthedocs.io/en/latest/?badge=latest\"\u003e\n    \u003cimg alt=\"Documentation Status\" src=\"https://readthedocs.org/projects/arm-preprocessing/badge/?version=latest\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg alt=\"Repository size\" src=\"https://img.shields.io/github/repo-size/firefly-cpp/arm-preprocessing\"\u003e\n  \u003cimg alt=\"Open issues\" src=\"https://isitmaintained.com/badge/open/firefly-cpp/arm-preprocessing.svg\"\u003e\n  \u003ca href='http://isitmaintained.com/project/firefly-cpp/arm-preprocessing \"Average time to resolve an issue\"'\u003e\n    \u003cimg alt=\"Average time to resolve an issue\" src=\"http://isitmaintained.com/badge/resolution/firefly-cpp/arm-preprocessing.svg\"\u003e\n  \u003c/a\u003e\n  \u003cimg alt=\"GitHub commit activity\" src=\"https://img.shields.io/github/commit-activity/w/firefly-cpp/arm-preprocessing.svg\"\u003e\n  \u003cimg alt=\"GitHub contributors\" src=\"https://img.shields.io/github/contributors/firefly-cpp/arm-preprocessing.svg\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#-why-arm-preprocessing\"\u003e💡 Why arm-preprocessing?\u003c/a\u003e •\n  \u003ca href=\"#-key-features\"\u003e✨ Key features\u003c/a\u003e •\n  \u003ca href=\"#-installation\"\u003e📦 Installation\u003c/a\u003e •\n  \u003ca href=\"#-usage\"\u003e🚀 Usage\u003c/a\u003e •\n  \u003ca href=\"#-related-frameworks\"\u003e🔗 Related frameworks\u003c/a\u003e •\n  \u003ca href=\"#-references\"\u003e📚 References\u003c/a\u003e •\n  \u003ca href=\"#-license\"\u003e🔑 License\u003c/a\u003e\n\u003c/p\u003e\n\narm-preprocessing is a lightweight Python library supporting several key steps involving data preparation, manipulation, and discretisation for Association Rule Mining (ARM). 🧠 Embrace its minimalistic design that prioritises simplicity. 💡 The framework is intended to be fully extensible and offers seamless integration with related ARM libraries (e.g., [NiaARM](https://github.com/firefly-cpp/NiaARM)). 🔗\n\n* **Free software:** MIT license\n* **Documentation**: [http://arm-preprocessing.readthedocs.io](http://arm-preprocessing.readthedocs.io)\n* **Python**: 3.9.x, 3.10.x, 3.11.x, 3.12x\n* **Tested OS:** Windows, Ubuntu, Fedora, Alpine, Arch, macOS. **However, that does not mean it does not work on others**\n\n## 💡 Why arm-preprocessing?\n\nWhile numerous libraries facilitate data mining preprocessing tasks, this library is designed to integrate seamlessly with association rule mining. It harmonises well with the NiaARM library, a robust numerical association rule mining framework. The primary aim is to bridge the gap between preprocessing and rule mining, simplifying the workflow/pipeline. Additionally, its design allows for the effortless incorporation of new preprocessing methods and fast benchmarking.\n\n## ✨ Key features\n\n- Loading various formats of datasets (CSV, JSON, TXT, TCX) 📊\n- Converting datasets to different formats 🔄\n- Loading different types of datasets (numerical dataset, discrete dataset, time-series data, text, etc.) 📉\n- Dataset identification (which type of dataset) 🔍\n- Dataset statistics 📈\n- Discretisation methods 📏\n- Data squashing methods 🤏\n- Feature scaling methods ⚖️\n- Feature selection methods 🎯\n\n## 📦 Installation\n\n### pip\n\nTo install ``arm-preprocessing`` with pip, use:\n```bash\npip install arm-preprocessing\n```\n\nTo install ``arm-preprocessing`` on Alpine Linux, please use:\n```sh\n$ apk add py3-arm-preprocessing\n```\n\nTo install ``arm-preprocessing`` on Arch Linux, please use an [AUR helper](https://wiki.archlinux.org/title/AUR_helpers):\n```sh\n$ yay -Syyu python-arm-preprocessing\n```\n\n## 🚀 Usage\n\n### Data loading\n\nThe following example demonstrates how to load a dataset from a file (csv, json, txt). More examples can be found in the [examples/data_loading](./examples/data_loading/) directory:\n- [Loading a dataset from a CSV file](./examples/data_loading/load_dataset_csv.py)\n- [Loading a dataset from a JSON file](./examples/data_loading/load_dataset_json.py)\n- [Loading a dataset from a TCX file](./examples/data_loading/load_dataset_tcx.py)\n- [Loading a time-series dataset](./examples/data_loading/load_dataset_timeseries.py)\n\n```python\nfrom arm_preprocessing.dataset import Dataset\n\n# Initialise dataset with filename (without format) and format (csv, json, txt)\ndataset = Dataset('path/to/datasets', format='csv')\n\n# Load dataset\ndataset.load_data()\ndf = dataset.data\n```\n\n### Missing values\n\nThe following example demonstrates how to handle missing values in a dataset using imputation. More examples can be found in the [examples/missing_values](./examples/missing_values) directory:\n- [Handling missing values in a dataset using row deletion](./examples/missing_values/missing_values_rows.py)\n- [Handling missing values in a dataset using column deletion](./examples/missing_values/missing_values_columns.py)\n- [Handling missing values in a dataset using imputation](./examples/missing_values/missing_values_impute.py)\n\n```python\nfrom arm_preprocessing.dataset import Dataset\n\n# Initialise dataset with filename and format\ndataset = Dataset('examples/missing_values/data', format='csv')\ndataset.load()\n\n# Impute missing data\ndataset.missing_values(method='impute')\n```\n\n### Data discretisation\n\nThe following example demonstrates how to discretise a dataset using the equal width method. More examples can be found in the [examples/discretisation](./examples/discretisation) directory:\n- [Discretising a dataset using the equal width method](./examples/discretisation/equal_width_discretisation.py)\n- [Discretising a dataset using the equal frequency method](./examples/discretisation/equal_frequency_discretisation.py)\n- [Discretising a dataset using k-means clustering](./examples/discretisation/kmeans_discretisation.py)\n\n```python\nfrom arm_preprocessing.dataset import Dataset\n\n# Initialise dataset with filename (without format) and format (csv, json, txt)\ndataset = Dataset('datasets/sportydatagen', format='csv')\ndataset.load_data()\n\n# Discretise dataset using equal width discretisation\ndataset.discretise(method='equal_width', num_bins=5, columns=['calories'])\n```\n\n### Data squashing\n\nThe following example demonstrates how to squash a dataset using the euclidean similarity. More examples can be found in the [examples/squashing](./examples/squashing) directory:\n- [Squashing a dataset using the euclidean similarity](./examples/squashing/squash_euclidean.py)\n- [Squashing a dataset using the cosine similarity](./examples/squashing/squash_cosine.py)\n\n```python\nfrom arm_preprocessing.dataset import Dataset\n\n# Initialise dataset with filename and format\ndataset = Dataset('datasets/breast', format='csv')\ndataset.load()\n\n# Squash dataset\ndataset.squash(threshold=0.75, similarity='euclidean')\n```\n\n### Feature scaling\n\nThe following example demonstrates how to scale the dataset's features. More examples can be found in the [examples/scaling](./examples/scaling) directory:\n- [Scale features using normalisation](./examples/scaling/normalisation.py)\n- [Scale features using standardisation](./examples/scaling/standardisation.py)\n\n```python\nfrom arm_preprocessing.dataset import Dataset\n\n# Initialise dataset with filename and format\ndataset = Dataset('datasets/Abalone', format='csv')\ndataset.load()\n\n# Scale dataset using normalisation\ndataset.scale(method='normalisation')\n```\n\n### Feature selection\n\nThe following example demonstrates how to select features from a dataset. More examples can be found in the [examples/feature_selection](./examples/feature_selection) directory:\n- [Select features using the Kendall Tau correlation coefficient](./examples/feature_selection/feature_selection.py)\n\n```python\nfrom arm_preprocessing.dataset import Dataset\n\n# Initialise dataset with filename and format\ndataset = Dataset('datasets/sportydatagen', format='csv')\ndataset.load()\n\n# Feature selection\ndataset.feature_selection(\n    method='kendall', threshold=0.15, class_column='calories')\n```\n\n## 🔗 Related frameworks\n\n[1] [NiaARM: A minimalistic framework for Numerical Association Rule Mining](https://github.com/firefly-cpp/NiaARM)\n\n[2] [uARMSolver: universal Association Rule Mining Solver](https://github.com/firefly-cpp/uARMSolver)\n\n## 📚 References\n\n[1] I. Fister, I. Fister Jr., D. Novak and D. Verber, [Data squashing as preprocessing in association rule mining](https://iztok-jr-fister.eu/static/publications/300.pdf), 2022 IEEE Symposium Series on Computational Intelligence (SSCI), Singapore, Singapore, 2022, pp. 1720-1725, doi: 10.1109/SSCI51031.2022.10022240.\n\n[2] I. Fister Jr., I. Fister [A brief overview of swarm intelligence-based algorithms for numerical association rule mining](https://arxiv.org/abs/2010.15524). arXiv preprint arXiv:2010.15524 (2020).\n\n## 🔑 License\n\nThis package is distributed under the MIT License. This license can be found online\nat \u003chttp://www.opensource.org/licenses/MIT\u003e.\n\n## Disclaimer\n\nThis framework is provided as-is, and there are no guarantees that it fits your purposes or that it is bug-free. Use it at your own risk!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffirefly-cpp%2Farm-preprocessing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffirefly-cpp%2Farm-preprocessing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffirefly-cpp%2Farm-preprocessing/lists"}