Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/abhishek-patidar066/synthetic-datasets

Synthetic datasets are artificially generated data used for training machine learning models, simulating real-world data while ensuring privacy.
https://github.com/abhishek-patidar066/synthetic-datasets

clustering datasets jupyter-notebook libraries matplotlib-pyplot numpy pandas-dataframe python random sklearn

Last synced: 2 months ago
JSON representation

Synthetic datasets are artificially generated data used for training machine learning models, simulating real-world data while ensuring privacy.

Awesome Lists containing this project

README

        

# Synthetic-Datasets
Overview:
This repository contains synthetic datasets designed for machine learning and data analysis tasks. The datasets feature a variety of patterns and structures, making them ideal for testing, benchmarking, and visualizing algorithms. Each dataset simulates unique distributions or geometrical patterns, such as moons, blobs, clusters, and more.

Datasets:
Moon Pattern:
A crescent moon-shaped dataset with two distinct classes forming semi-circular shapes.
Useful for evaluating clustering and classification algorithms in non-linear decision boundary scenarios.

Blob Pattern:
A set of isotropic Gaussian blobs for clustering.
Adjustable parameters allow control over the number of clusters, cluster size, and standard deviation.

Name-Like Patterns:
Datasets resembling the shape of characters or words.
Often used for creative visualization or pattern recognition tasks.

Grid/Checkerboard Patterns:
Structured grids or alternating square patterns.
Ideal for exploring spatial relationships and segmentation tasks.

Custom Shapes:
Arbitrary shapes or patterns to simulate unique data distributions.
Can be used for specialized tasks like anomaly detection or unsupervised learning.

Features:

Customizable Parameters: Modify the size, density, noise, and dimensions to fit your requirements.
Easy Integration: Datasets are provided in formats ready for popular machine learning libraries like scikit-learn, TensorFlow, or PyTorch.
Versatility: Suitable for tasks like supervised/unsupervised learning, visualization, and algorithm prototyping.

![abhishek](https://github.com/user-attachments/assets/8d0d311a-82ef-4ee2-b134-d341500af2a3)
![moon_img](https://github.com/user-attachments/assets/b9346c39-34b7-4b59-9d1a-a5b0d248ab53)
![Blob_4img](https://github.com/user-attachments/assets/6e1a9489-0b90-49c2-a1d2-f1ceef13033f)