{"id":21510551,"url":"https://github.com/elkronos/sample_py","last_synced_at":"2025-03-17T15:20:41.163Z","repository":{"id":246987141,"uuid":"824258809","full_name":"elkronos/sample_py","owner":"elkronos","description":"Sampling functions in python.","archived":false,"fork":false,"pushed_at":"2025-02-18T05:13:17.000Z","size":28,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-18T06:23:28.496Z","etag":null,"topics":["data-science","methodology","python","sampling-methods","statistics"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elkronos.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-04T17:47:51.000Z","updated_at":"2025-02-18T05:13:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"3ea03e10-699c-428e-b934-b1839fcdb71d","html_url":"https://github.com/elkronos/sample_py","commit_stats":null,"previous_names":["elkronos/sample_py"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elkronos%2Fsample_py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elkronos%2Fsample_py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elkronos%2Fsample_py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elkronos%2Fsample_py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elkronos","download_url":"https://codeload.github.com/elkronos/sample_py/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244056428,"owners_count":20390720,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","methodology","python","sampling-methods","statistics"],"created_at":"2024-11-23T21:49:21.387Z","updated_at":"2025-03-17T15:20:41.157Z","avatar_url":"https://github.com/elkronos.png","language":"Python","readme":"# Sample_Py\n\nA Python project implementing various sampling methods for data analysis.\n\n## Overview\n\nThis project provides a collection of functions for performing different types of sampling on datasets. The sampling methods include:\n\n- Simple Random Sampling\n- Stratified Sampling\n- Systematic Sampling\n- Cluster Sampling\n- Multi-Stage Sampling\n- Weighted Sampling\n- Reservoir Sampling\n- Bootstrap Sampling\n- Temporal Sampling\n- Spatial Sampling\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Simple Random Sampling](#simple-random-sampling)\n  - [Stratified Sampling](#stratified-sampling)\n  - [Systematic Sampling](#systematic-sampling)\n  - [Cluster Sampling](#cluster-sampling)\n  - [Multi-Stage Sampling](#multi-stage-sampling)\n  - [Weighted Sampling](#weighted-sampling)\n  - [Reservoir Sampling](#reservoir-sampling)\n  - [Bootstrap Sampling](#bootstrap-sampling)\n  - [Temporal Sampling](#temporal-sampling)\n  - [Spatial Sampling](#spatial-sampling)\n- [Testing](#testing)\n- [Contributing](#contributing)\n- [License](#license)\n- [Contact](#contact)\n\n## Installation\n\nTo install the project, clone the repository and install the dependencies using `pip`:\n\n```bash\ngit clone https://github.com/elkronos/sample_py.git\ncd sample_py\npip install -r requirements.txt\n```\n\n## Usage\n\nHere are examples of how to use the sampling functions provided in this project:\n\n### Simple Random Sampling\n\n```python\nimport pandas as pd\nfrom sampling import sampling_methods\n\n# Load your data\ndata = pd.read_csv('your_dataset.csv')\n\n# Perform simple random sampling\nsampled_data = sampling_methods.simple_random_sampling(data, sample_size=100, seed=42)\n\nprint(sampled_data)\n```\n\n### Stratified Sampling\n\n```python\nimport pandas as pd\nfrom sampling import sampling_methods\n\n# Load your data\ndata = pd.read_csv('your_dataset.csv')\n\n# Perform stratified sampling based on a 'category' column\nstratified_sampled_data = sampling_methods.stratified_sampling(\n    data, \n    strata_column='category', \n    sample_size=50, \n    seed=42\n)\n\nprint(stratified_sampled_data)\n```\n\n### Systematic Sampling\n\n```python\nimport pandas as pd\nfrom sampling import sampling_methods\n\n# Load your data\ndata = pd.read_csv('your_dataset.csv')\n\n# Perform systematic sampling with an interval of 10\nsystematic_sampled_data = sampling_methods.systematic_sampling(data, interval=10, seed=42)\n\nprint(systematic_sampled_data)\n```\n\n### Cluster Sampling\n\n```python\nimport pandas as pd\nfrom sampling import sampling_methods\n\n# Load your data\ndata = pd.read_csv('your_dataset.csv')\n\n# Perform cluster sampling based on a 'cluster_id' column\ncluster_sampled_data = sampling_methods.cluster_sampling(\n    data, \n    cluster_column='cluster_id', \n    num_clusters=5, \n    seed=42\n)\n\nprint(cluster_sampled_data)\n```\n\n### Multi-Stage Sampling\n\n```python\nimport pandas as pd\nfrom sampling import sampling_methods\n\n# Load your data\ndata = pd.read_csv('your_dataset.csv')\n\n# Perform multi-stage sampling: first by clusters, then sample within clusters\nmulti_stage_sampled_data = sampling_methods.multi_stage_sampling(\n    data, \n    cluster_column='cluster_id', \n    num_clusters=5, \n    stage_two_sample_size=10, \n    seed=42\n)\n\nprint(multi_stage_sampled_data)\n```\n\n### Weighted Sampling\n\n```python\nimport pandas as pd\nfrom sampling import sampling_methods\n\n# Load your data\ndata = pd.read_csv('your_dataset.csv')\n\n# Perform weighted sampling based on a 'weights' column\nweighted_sampled_data = sampling_methods.weighted_sampling(\n    data, \n    weights_column='weights', \n    sample_size=100, \n    seed=42\n)\n\nprint(weighted_sampled_data)\n```\n\n### Reservoir Sampling\n\n```python\nimport pandas as pd\nfrom sampling import sampling_methods\n\n# Load your data and convert it to a data stream (list of records)\ndata = pd.read_csv('your_dataset.csv')\ndata_stream = data.to_dict('records')\n\n# Perform reservoir sampling\nreservoir_sample = sampling_methods.reservoir_sampling(data_stream, sample_size=100, seed=42)\n\nprint(reservoir_sample)\n```\n\n### Bootstrap Sampling\n\n```python\nimport pandas as pd\nfrom sampling import sampling_methods\n\n# Load your data\ndata = pd.read_csv('your_dataset.csv')\n\n# Perform bootstrap sampling to generate 10 bootstrap samples\nbootstrap_samples = sampling_methods.bootstrap_sampling(data, num_samples=10, sample_size=100, seed=42)\n\nfor i, sample in enumerate(bootstrap_samples):\n    print(f\"Bootstrap Sample {i+1}\")\n    print(sample)\n```\n\n### Temporal Sampling\n\n```python\nimport pandas as pd\nfrom sampling import sampling_methods\n\n# Load your data\ndata = pd.read_csv('your_dataset.csv')\n\n# Perform temporal sampling on a time-based column 'timestamp'\ntemporal_sampled_data = sampling_methods.temporal_sampling(\n    data, \n    time_column='timestamp', \n    start_time=pd.Timestamp('2021-01-01'), \n    end_time=pd.Timestamp('2021-12-31'), \n    interval=7, \n    sample_size=10, \n    seed=42\n)\n\nprint(temporal_sampled_data)\n```\n\n### Spatial Sampling\n\n```python\nimport pandas as pd\nfrom shapely.geometry import Polygon\nfrom sampling import sampling_methods\n\n# Load your data\ndata = pd.read_csv('your_dataset.csv')\n\n# Define a region as a polygon\nregion = Polygon([(-10, -10), (-10, 10), (10, 10), (10, -10)])\n\n# Perform spatial sampling based on latitude and longitude columns\nspatial_sampled_data = sampling_methods.spatial_sampling(\n    data, \n    latitude_column='lat', \n    longitude_column='lon', \n    region=region, \n    sample_size=100, \n    seed=42\n)\n\nprint(spatial_sampled_data)\n```\n\n## Testing\n\nTo run the tests, use the following command:\n\n```bash\npytest\n```\n\n## Contributing\n\nIf you would like to contribute to this project, please follow these steps:\n\n1. Fork the repository.\n2. Create a new branch:\n   ```bash\n   git checkout -b feature-branch-name\n   ```\n3. Make your changes and commit them:\n   ```bash\n   git commit -m 'Add some feature'\n   ```\n4. Push to the branch:\n   ```bash\n   git push origin feature-branch-name\n   ```\n5. Submit a pull request.\n\nPlease ensure your code adheres to the project's coding standards and includes appropriate tests.\n\n## License\n\nThis project is licensed under the MIT License. See the `LICENSE` file for details.\n\n## Contact\n\nIf you have any questions or suggestions, feel free to reach out to:\n\n- [jchase.msu@gmail.com](mailto:jchase.msu@gmail.com)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felkronos%2Fsample_py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felkronos%2Fsample_py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felkronos%2Fsample_py/lists"}