An open API service indexing awesome lists of open source software.

https://github.com/marcramonmoreno/pycuda_pearson_correlation_function

Pearson Correlation Function Accelerated with PyCuda
https://github.com/marcramonmoreno/pycuda_pearson_correlation_function

Last synced: 2 months ago
JSON representation

Pearson Correlation Function Accelerated with PyCuda

Awesome Lists containing this project

README

        

## GPU-Accelerated Pearson Correlation Coefficient Computation
# Overview

This Python script utilizes PyCUDA, a Python interface to CUDA (Compute Unified Device Architecture), for performing high-performance computing on NVIDIA GPUs. It focuses on calculating Pearson's correlation coefficients, a measure of the linear correlation between two sets of data, with GPU acceleration.
Key Features

GPU Acceleration: Uses PyCUDA to leverage GPU for efficient computation, especially suitable for large datasets.
Reduction Kernels: Includes custom reduction kernels for summing elements, summing squares, and summing products.
Pearson Correlation Function: A function to compute Pearson's correlation coefficient between two data arrays.

# Prerequisites

NVIDIA GPU with CUDA support
Python environment
PyCUDA installed (pip install pycuda)

# Usage

Initialization: The script begins by importing necessary modules and setting up reduction kernels using PyCUDA.

Reduction Kernels:
sum_kernel: Sums the elements of an array.
sum_sq_kernel: Sums the squares of the elements of an array.
product_sum_kernel: Sums the products of corresponding elements from two arrays.

Pearson Correlation Coefficient Calculation:
The function pearson_correlation(x, y) takes two arrays x and y and computes their Pearson correlation coefficient.
The function handles data transfer between CPU and GPU, and performs necessary computations on the GPU.

Sample Data Generation:
The script generates random sample data for demonstration purposes, representing tobacco use, cancer cases, and infertility cases.
The Pearson correlation coefficient is then calculated for tobacco-cancer and tobacco-infertility datasets.

# Example

python

N = 50000000 # Number of data points
np.random.seed(0) # Seed for reproducibility
tobacco_use = np.random.rand(N)
cancer_cases = np.random.rand(N)
infertility_cases = np.random.rand(N)

# # Compute Pearson correlation
correlation_tobacco_cancer = pearson_correlation(tobacco_use, cancer_cases)
correlation_tobacco_infertility = pearson_correlation(tobacco_use, infertility_cases)

# Limitations

The script requires an NVIDIA GPU with CUDA support.
The size of the data is limited by the GPU's memory capacity.

# License

This script is released under the MIT License.

Feel free to modify the content to better suit your project's specifics.