{"id":23110592,"url":"https://github.com/wanghui5801/usmerge","last_synced_at":"2025-09-09T18:17:37.921Z","repository":{"id":206360330,"uuid":"716462479","full_name":"wanghui5801/usmerge","owner":"wanghui5801","description":"A tool package for one-dimensional data clustering.","archived":false,"fork":false,"pushed_at":"2024-11-21T20:02:28.000Z","size":249,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-13T04:43:27.995Z","etag":null,"topics":["binning","clustering","discretization","feature-engineering","one-dimensional"],"latest_commit_sha":null,"homepage":"https://wanghui5801.github.io/usmerge/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wanghui5801.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-09T07:29:58.000Z","updated_at":"2024-11-22T05:49:36.000Z","dependencies_parsed_at":null,"dependency_job_id":"43adfd0d-b364-4ca8-b654-aeffebfbae24","html_url":"https://github.com/wanghui5801/usmerge","commit_stats":null,"previous_names":["wanghui5801/usmerge"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/wanghui5801/usmerge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wanghui5801%2Fusmerge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wanghui5801%2Fusmerge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wanghui5801%2Fusmerge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wanghui5801%2Fusmerge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wanghui5801","download_url":"https://codeload.github.com/wanghui5801/usmerge/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wanghui5801%2Fusmerge/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274340885,"owners_count":25267295,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-09T02:00:10.223Z","response_time":80,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binning","clustering","discretization","feature-engineering","one-dimensional"],"created_at":"2024-12-17T01:49:27.582Z","updated_at":"2025-09-09T18:17:37.857Z","avatar_url":"https://github.com/wanghui5801.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://ice.frostsky.com/2024/11/22/eabf1a3e6df982243482db582277b7c2.png\" alt=\"usmerge logo\" width=\"200\"/\u003e\n\u003c/p\u003e\n\n# Unsupervised Merge\n\n[![PyPI version](https://badge.fury.io/py/usmerge.svg)](https://badge.fury.io/py/usmerge)\n[![Python versions](https://img.shields.io/pypi/pyversions/usmerge.svg)](https://pypi.org/project/usmerge/)\n[![License](https://img.shields.io/github/license/wanghui5801/usmerge.svg)](https://github.com/wanghui5801/usmerge/blob/main/LICENSE)\n[![Downloads](https://static.pepy.tech/badge/usmerge)](https://pepy.tech/project/usmerge)\n[![GitHub last commit](https://img.shields.io/github/last-commit/wanghui5801/usmerge.svg)](https://github.com/wanghui5801/usmerge/commits/main)\n\n\nA simple Python package for one-dimensional data clustering, implementing various clustering algorithms including traditional and novel approaches.\n\n## Installation\n\nInstall the package using pip:\n\n```\npip install usmerge\n```\n\n## Features\n\nThis package provides multiple one-dimensional clustering methods:\n\n- Equal Width Binning (equal_wid_merge)\n- Equal Frequency Binning (equal_fre_merge)\n- K-means Clustering (kmeans_merge)\n- SOM-K Clustering (som_k_merge)\n- Fuzzy C-Means (fcm_merge)\n- Kernel Density Based (kernel_density_merge)\n- Information Theoretic (information_merge)\n- Gaussian Mixture (gaussian_mixture_merge)\n- Hierarchical Density (hierarchical_density_merge)\n- Jenks Natural Breaks (jenks_breaks_merge)\n- Quantile-based (quantile_merge)\n- DBSCAN (dbscan_1d_merge)\n\n## Usage\n\n### Data Format\nThe package accepts various input formats:\n- pandas Series/DataFrame\n- numpy array\n- Python list/tuple\n- Any iterable of numbers\n\n### Basic Usage Examples\n\n1. Equal Width Binning:\n```python\nfrom usmerge import equal_wid_merge\nlabels, edges = equal_wid_merge(data, n=3)\n```\n\n2. Equal Frequency Binning:\n```python\nfrom usmerge import equal_fre_merge\nlabels, edges = equal_fre_merge(data, n=3)\n```\n\n3. K-means Clustering:\n```python\nfrom usmerge import kmeans_merge\nlabels, edges = kmeans_merge(data, n=3, max_iter=100)\n```\n\n### Advanced Usage\n\n1. SOM-K Clustering:\n```python\nfrom usmerge import som_k_merge\nlabels, edges = som_k_merge(data, n=3, sigma=0.5, learning_rate=0.5, epochs=1000)\n```\n\n2. Fuzzy C-Means:\n```python\nfrom usmerge import fcm_merge\nlabels, edges = fcm_merge(data, n=3, m=2.0, max_iter=100, epsilon=1e-6)\n```\n\n3. Kernel Density Based:\n```python\nfrom usmerge import kernel_density_merge\nlabels, edges = kernel_density_merge(data, n=3, bandwidth=None)\n```\n\n4. Jenks Natural Breaks:\n```python\nfrom usmerge import jenks_breaks_merge\nlabels, edges = jenks_breaks_merge(data, n=3)\n```\n\n5. Quantile-based Clustering:\n```python\nfrom usmerge import quantile_merge\nlabels, edges = quantile_merge(data, n=3)\n```\n\n6. DBSCAN Clustering:\n```python\nfrom usmerge import dbscan_1d_merge\nlabels, edges = dbscan_1d_merge(data, n=3, min_samples=3)\n```\n\n### Return Values\nAll clustering methods return two values:\n- labels: List of cluster labels for each data point\n- edges: List of cluster boundaries\n\n## Example Analysis\n\n```python\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom usmerge import som_k_merge, fcm_merge, kmeans_merge, hierarchical_density_merge, dbscan_1d_merge\n\n# Generate synthetic data with three clear clusters\nnp.random.seed(42)\ndata = np.concatenate([\n    np.random.normal(0, 0.3, 50),    # First cluster\n    np.random.normal(5, 0.4, 50),    # Second cluster\n    np.random.normal(10, 0.3, 50)    # Third cluster\n])\n\n# Compare different clustering methods\nmethods = {\n    'SOM-K': som_k_merge(data, n=3, sigma=0.5, learning_rate=0.5, epochs=1000),\n    'FCM': fcm_merge(data, n=3, m=2.0, max_iter=100),\n    'K-means': kmeans_merge(data, n=3),\n    'DBSCAN': dbscan_1d_merge(data, n=3, min_samples=3),\n    'Hierarchical Density': hierarchical_density_merge(data, n=3)\n}\n\n# Visualize results\nplt.figure(figsize=(15, 5))\nfor i, (name, (labels, edges)) in enumerate(methods.items(), 1):\n    plt.subplot(1, 5, i)\n    plt.scatter(data, np.zeros_like(data), c=labels, cmap='viridis')\n    plt.title(f'{name} Clustering')\n    # Plot cluster boundaries\n    for edge in edges:\n        plt.axvline(x=edge, color='r', linestyle='--', alpha=0.5)\n    plt.ylim(-0.5, 0.5)\n\nplt.tight_layout()\nplt.show()\n```\n\n## Parameters Guide\n\nEach clustering method has its own set of parameters:\n\n- SOM-K: `sigma` (neighborhood size), `learning_rate` (learning rate), `epochs` (iterations)\n- FCM: `m` (fuzziness), `max_iter`, `epsilon` (convergence threshold)\n- Kernel Density: `bandwidth` (kernel width)\n- Information Theoretic: `alpha` (compression-accuracy trade-off)\n- Gaussian Mixture: `max_iter`, `epsilon` (convergence threshold)\n- Hierarchical Density: `min_cluster_size` (minimum points per cluster)\n- Jenks Natural Breaks: Only requires number of clusters\n- Quantile-based: Only requires number of clusters\n- DBSCAN: `n` (target number of clusters), `eps` (optional neighborhood size), `min_samples` (minimum points in cluster), `max_iter` (maximum iterations for eps adjustment)\n\n## Contributing\n\nFeel free to contribute to this project by submitting issues or pull requests.\n\n## License\n\nMIT License","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwanghui5801%2Fusmerge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwanghui5801%2Fusmerge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwanghui5801%2Fusmerge/lists"}