{"id":30772181,"url":"https://github.com/query-farm/stochastic","last_synced_at":"2025-09-05T00:52:43.225Z","repository":{"id":309627290,"uuid":"1035482370","full_name":"Query-farm/stochastic","owner":"Query-farm","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-12T22:42:47.000Z","size":38,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-13T00:27:29.083Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Query-farm.png","metadata":{"files":{"readme":"docs/README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-10T13:51:52.000Z","updated_at":"2025-08-12T22:42:50.000Z","dependencies_parsed_at":"2025-08-13T00:27:33.254Z","dependency_job_id":"85da54cb-25e3-42b6-80de-3576ad1aa5d6","html_url":"https://github.com/Query-farm/stochastic","commit_stats":null,"previous_names":["query-farm/stochastic"],"tags_count":null,"template":false,"template_full_name":"duckdb/extension-template","purl":"pkg:github/Query-farm/stochastic","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fstochastic","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fstochastic/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fstochastic/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fstochastic/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Query-farm","download_url":"https://codeload.github.com/Query-farm/stochastic/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Query-farm%2Fstochastic/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273695251,"owners_count":25151484,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-04T02:00:08.968Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-05T00:52:38.796Z","updated_at":"2025-09-05T00:52:43.208Z","avatar_url":"https://github.com/Query-farm.png","language":"C++","readme":"# Stochastic Extension for DuckDB by [Query.Farm](https://query.farm)\n\nThe `stochastic` extension adds comprehensive statistical distribution functions to DuckDB, enabling advanced statistical analysis, probability calculations, and random sampling directly within SQL queries.\n\n## Installation\n\n**`stochastic` is a [DuckDB Community Extension](https://github.com/duckdb/community-extensions).**\n\nYou can install and use it in DuckDB SQL:\n\n```sql\nINSTALL stochastic FROM community;\nLOAD stochastic;\n```\n\n## What are statistical distributions?\n\nStatistical distributions are mathematical functions that describe the probability of different outcomes in a dataset. They are fundamental to statistics, data science, machine learning, and scientific computing. This extension provides functions to:\n\n- Calculate probability density and mass functions (PDF/PMF)\n- Compute cumulative distribution functions (CDF)\n- Generate quantiles (inverse CDF)\n- Sample random values from distributions\n- Access distribution properties (mean, variance, etc.)\n\n## Available Distributions\n\nThe extension supports a comprehensive set of probability distributions:\n\n### Continuous Distributions\n- **Beta** - `dist_beta_*` functions\n- **Cauchy** - `dist_cauchy_*` functions\n- **Chi-squared** - `dist_chi_squared_*` functions\n- **Exponential** - `dist_exponential_*` functions\n- **Extreme Value** - `dist_extreme_value_*` functions\n- **Fisher F** - `dist_fisher_f_*` functions\n- **Gamma** - `dist_gamma_*` functions\n- **Log-normal** - `dist_lognormal_*` functions\n- **Logistic** - `dist_logistic_*` functions\n- **Normal (Gaussian)** - `dist_normal_*` functions\n- **Pareto** - `dist_pareto_*` functions\n- **Rayleigh** - `dist_rayleigh_*` functions\n- **Student's t** - `dist_students_t_*` functions\n- **Uniform (Real)** - `dist_uniform_real_*` functions\n- **Weibull** - `dist_weibull_*` functions\n\n### Discrete Distributions\n- **Bernoulli** - `dist_bernoulli_*` functions\n- **Binomial** - `dist_binomial_*` functions\n- **Negative Binomial** - `dist_negative_binomial_*` functions\n- **Poisson** - `dist_poisson_*` functions\n- **Uniform (Integer)** - `dist_uniform_int_*` functions\n\n## Function Categories\n\nEach distribution provides the following function types:\n\n### Sampling Functions\n- `dist_{distribution}_sample(params...)` - Generate random samples\n\n### Density/Mass Functions\n- `dist_{distribution}_pdf(params..., x)` - Probability density function\n- `dist_{distribution}_log_pdf(params..., x)` - Log probability density function\n\n### Cumulative Functions\n- `dist_{distribution}_cdf(params..., x)` - Cumulative distribution function\n- `dist_{distribution}_log_cdf(params..., x)` - Log cumulative distribution function\n- `dist_{distribution}_cdf_complement(params..., x)` - Survival function (1 - CDF)\n- `dist_{distribution}_log_cdf_complement(params..., x)` - Log survival function\n\n### Quantile Functions\n- `dist_{distribution}_quantile(params..., p)` - Quantile function (inverse CDF)\n- `dist_{distribution}_quantile_complement(params..., p)` - Complementary quantile function\n\n### Hazard Functions\n- `dist_{distribution}_hazard(params..., x)` - Hazard function\n- `dist_{distribution}_chf(params..., x)` - Cumulative hazard function\n\n### Distribution Properties\n- `dist_{distribution}_kurtosis_excess(params...)` - Excess kurtosis\n- `dist_{distribution}_kurtosis(params...)` - Kurtosis\n- `dist_{distribution}_mean(params...)` - Expected value\n- `dist_{distribution}_median(params...)` - Median (50th percentile)\n- `dist_{distribution}_mode(params...)` - Mode (most likely value)\n- `dist_{distribution}_range(params...)` - Support range\n- `dist_{distribution}_skewness(params...)` - Skewness\n- `dist_{distribution}_stddev(params...)` - Standard deviation\n- `dist_{distribution}_support(params...)` - Distribution support\n- `dist_{distribution}_variance(params...)` - Variance\n\n## Usage Examples\n\n### Normal Distribution\n\n```sql\n-- Generate random samples from N(0, 1)\nSELECT dist_normal_sample(0.0, 1.0) AS random_value;\n\n-- Calculate PDF at x = 0.5 for N(0, 1)\nSELECT dist_normal_pdf(0.0, 1.0, 0.5) AS density;\n\n-- Calculate CDF (probability that X ≤ 1.96)\nSELECT dist_normal_cdf(0.0, 1.0, 1.96) AS probability;\n\n-- Find 95th percentile\nSELECT dist_normal_quantile(0.0, 1.0, 0.95) AS percentile_95;\n\n-- Get distribution properties\nSELECT\n    dist_normal_mean(0.0, 1.0) AS mean,\n    dist_normal_variance(0.0, 1.0) AS variance,\n    dist_normal_skewness(0.0, 1.0) AS skewness;\n```\n\n### Binomial Distribution\n\n```sql\n-- Probability mass function for 10 trials, p=0.3\nSELECT dist_binomial_pdf(10, 0.3, 7) AS prob_exactly_7;\n\n-- Cumulative probability (≤ 5 successes)\nSELECT dist_binomial_cdf(10, 0.3, 5) AS prob_at_most_5;\n\n-- Generate random binomial samples\nSELECT dist_binomial_sample(10, 0.3) AS random_successes;\n```\n\n### Working with Data Tables\n\n```sql\n-- Generate synthetic dataset\nCREATE TABLE synthetic_data AS\nSELECT\n    i,\n    dist_normal_sample(100, 15) AS height_cm,\n    dist_normal_sample(70, 10) AS weight_kg,\n    dist_binomial_sample(1, 0.5) AS gender  -- 0 or 1\nFROM range(1000) t(i);\n\n-- Calculate z-scores\nSELECT\n    height_cm,\n    (height_cm - dist_normal_mean(100, 15)) / dist_normal_stddev(100, 15) AS height_zscore\nFROM synthetic_data;\n\n-- Probability calculations\nSELECT\n    weight_kg,\n    dist_normal_cdf(70, 10, weight_kg) AS percentile\nFROM synthetic_data\nLIMIT 10;\n```\n\n## Real-World Applications\n\n### A/B Testing and Statistical Significance\n**Common Task**: Determine if there's a statistically significant difference between conversion rates.\n**Relevant Functions**: `dist_normal_cdf`, `dist_normal_cdf_complement`, `dist_normal_pdf`\n\n### Financial Risk Assessment and VaR Calculation\n**Common Task**: Calculate Value at Risk (VaR) for portfolio management.\n**Relevant Functions**: `dist_normal_sample`, `dist_normal_quantile`, `dist_normal_cdf`\n\n### Quality Control and Process Monitoring\n**Common Task**: Monitor manufacturing processes and detect out-of-control conditions.\n**Relevant Functions**: `dist_normal_sample`, `dist_normal_cdf`, `dist_normal_pdf`\n\n### Predictive Analytics and Confidence Intervals\n**Common Task**: Build prediction intervals for forecasting models.\n**Relevant Functions**: `dist_normal_quantile`, `dist_normal_cdf`, `dist_normal_sample`\n\n### Customer Analytics and CLV Modeling\n**Common Task**: Model customer lifetime value with uncertainty quantification.\n**Relevant Functions**: `dist_normal_sample`, `dist_exponential_sample`, `dist_normal_quantile`, `dist_normal_cdf`\n\n### Anomaly Detection and Outlier Analysis\n**Common Task**: Detect anomalies in time series data using statistical methods.\n**Relevant Functions**: `dist_normal_pdf`, `dist_normal_cdf`, `dist_normal_cdf_complement`\n\n### Monte Carlo Simulations\n**Common Task**: Run Monte Carlo simulations for risk analysis, optimization, or modeling.\n**Relevant Functions**: `dist_normal_sample`, `dist_uniform_real_sample`, `dist_gamma_sample`, `dist_beta_sample`\n\n### Hypothesis Testing\n**Common Task**: Perform statistical hypothesis tests (t-tests, chi-square tests, etc.).\n**Relevant Functions**: `dist_students_t_cdf`, `dist_chi_squared_cdf`, `dist_normal_cdf`, `dist_fisher_f_cdf`\n\n### Bayesian Analysis\n**Common Task**: Implement Bayesian statistical models and posterior analysis.\n**Relevant Functions**: `dist_beta_pdf`, `dist_gamma_pdf`, `dist_normal_pdf`, `dist_beta_sample`\n\n### Survival Analysis\n**Common Task**: Analyze time-to-event data in medical research or reliability engineering.\n**Relevant Functions**: `dist_exponential_pdf`, `dist_weibull_pdf`, `dist_gamma_pdf`, `dist_exponential_cdf`\n\n## Why Use DuckDB + Stochastic vs Python/R?\n\n### ✅ **Advantages**\n- **No Data Movement**: Analysis happens where your data lives\n- **SQL Familiarity**: Use existing SQL skills instead of learning specialized libraries\n- **Performance**: Columnar processing with vectorized statistical operations\n- **Integration**: Works seamlessly with existing BI tools and SQL workflows\n- **Real-time**: Analyze streaming data without export/import cycles\n\n### 📊 **Performance Benefits**\nStatistical operations are vectorized and optimized for DuckDB's columnar engine.\n\n## Parameter Validation\n\nAll distribution functions include comprehensive parameter validation:\n\n```sql\n-- This will throw an error: standard deviation must be \u003e 0\nSELECT dist_normal_pdf(0.0, -1.0, 0.5);\n-- Error: normal: Standard deviation must be \u003e 0 was: -1.000000\n\n-- This will throw an error: probability must be between 0 and 1\nSELECT dist_binomial_pdf(10, 1.5, 5);\n-- Error: binomial: Probability must be between 0 and 1 was: 1.500000\n```\n\n## License\n\nMIT Licensed\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquery-farm%2Fstochastic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquery-farm%2Fstochastic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquery-farm%2Fstochastic/lists"}