https://github.com/deva-246/data-sampling-using-python
https://github.com/deva-246/data-sampling-using-python
clusters datasampling model numpy pandas python randomsamples scikit-learn strata typesofsampling
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/deva-246/data-sampling-using-python
- Owner: deva-246
- Created: 2023-12-05T15:16:31.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-05T16:51:14.000Z (over 1 year ago)
- Last Synced: 2025-01-27T06:43:51.059Z (4 months ago)
- Topics: clusters, datasampling, model, numpy, pandas, python, randomsamples, scikit-learn, strata, typesofsampling
- Language: Jupyter Notebook
- Homepage:
- Size: 12.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data-Sampling-using-Python
Certainly! In Python, you can perform data sampling using various libraries such as NumPy, pandas, or scikit-learn. Below, I'll provide a brief explanation of how you might perform random sampling and stratified sampling using these libraries:
### Random Sampling:
**Using NumPy:**
```python
import numpy as np# Assuming you have a dataset 'data'
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])# Perform random sampling
sample_size = 5
random_sample = np.random.choice(data, size=sample_size, replace=False)print("Random Sample:", random_sample)
```**Using pandas:**
```python
import pandas as pd# Assuming you have a DataFrame 'df' with a column 'column_name'
df = pd.DataFrame({'column_name': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})# Perform random sampling
sample_size = 5
random_sample = df['column_name'].sample(n=sample_size, replace=False)print("Random Sample:", random_sample.tolist())
```### Stratified Sampling:
**Using scikit-learn:**
```python
from sklearn.model_selection import train_test_split# Assuming you have features 'X' and labels 'y'
X, y = np.array([[1, 2], [2, 3], [3, 4], [4, 5]]), np.array([0, 0, 1, 1])# Perform stratified sampling
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, stratify=y)print("Stratified X_train:", X_train)
print("Stratified y_train:", y_train)
print("Stratified X_test:", X_test)
print("Stratified y_test:", y_test)
```In the stratified sampling example above, `stratify=y` ensures that the distribution of the target variable 'y' is maintained in both the training and testing sets.
These are just basic examples, and you may need to adapt the code to your specific dataset and requirements. Data sampling methods and parameters can vary based on the characteristics of your data and the objectives of your analysis or machine learning task.