Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jbytecode/sqlite3stats.jl
Injecting statistical functions into any SQLite database in Julia
https://github.com/jbytecode/sqlite3stats.jl
Last synced: 22 days ago
JSON representation
Injecting statistical functions into any SQLite database in Julia
- Host: GitHub
- URL: https://github.com/jbytecode/sqlite3stats.jl
- Owner: jbytecode
- License: mit
- Created: 2021-12-30T18:02:53.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-04-02T19:34:56.000Z (9 months ago)
- Last Synced: 2024-11-14T17:05:46.301Z (about 1 month ago)
- Language: Julia
- Homepage:
- Size: 225 KB
- Stars: 26
- Watchers: 1
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
[![Doc](https://img.shields.io/badge/docs-stable-blue.svg)](https://jbytecode.github.io/Sqlite3Stats.jl/dev/)
[![codecov](https://codecov.io/gh/jbytecode/Sqlite3Stats.jl/branch/main/graph/badge.svg?token=71E9Y9BCT6)](https://codecov.io/gh/jbytecode/Sqlite3Stats.jl)# Sqlite3Stats
Injecting StatsBase functions into any SQLite database in Julia.# In Short
Makes it possible to call```sql
select MEDIAN(fieldname) from tablename
```in Julia where median is defined in Julia and related packages and the function is *injected* to use within SQLite. **Database file is not modified**.
# Installation
```julia
julia> using Pkg
julia> Pkg.add("Sqlite3Stats")
```# Simple use
```julia
using SQLite
using Sqlite3Stats
using DataFrames# Any SQLite database
# In our case, it is dbfile.db
db = SQLite.DB("dbfile.db")# Injecting functions
Sqlite3Stats.register_functions(db)
```# Registered Functions and Examples
```Julia
using SQLite
using Sqlite3Stats
using DataFramesdb = SQLite.DB("dbfile.db")
# Injecting functions
Sqlite3Stats.register_functions(db)# 1st Quartile
result = DBInterface.execute(db, "select Q1(num) from table") |> DataFrame# 2st Quartile
result = DBInterface.execute(db, "select Q2(num) from table") |> DataFrame# Median (Equals to Q2)
result = DBInterface.execute(db, "select MEDIAN(num) from table") |> DataFrame# 3rd Quartile
result = DBInterface.execute(db, "select Q3(num) from table") |> DataFrame# QUANTILE
result = DBInterface.execute(db, "select QUANTILE(num, 0.25) from table") |> DataFrame
result = DBInterface.execute(db, "select QUANTILE(num, 0.50) from table") |> DataFrame
result = DBInterface.execute(db, "select QUANTILE(num, 0.75) from table") |> DataFrame# Covariance
result = DBInterface.execute(db, "select COV(num, other) from table") |> DataFrame# Pearson Correlation
result = DBInterface.execute(db, "select COR(num, other) from table") |> DataFrame# Spearman Correlation
result = DBInterface.execute(db, "select SPEARMANCOR(num, other) from table") |> DataFrame# Kendall Correlation
result = DBInterface.execute(db, "select KENDALLCOR(num, other) from table") |> DataFrame# Median Absolute Deviations
result = DBInterface.execute(db, "select MAD(num) from table") |> DataFrame# Inter-Quartile Range
result = DBInterface.execute(db, "select IQR(num) from table") |> DataFrame# Skewness
result = DBInterface.execute(db, "select SKEWNESS(num) from table") |> DataFrame# Kurtosis
result = DBInterface.execute(db, "select KURTOSIS(num) from table") |> DataFrame# Geometric Mean
result = DBInterface.execute(db, "select GEOMEAN(num) from table") |> DataFrame# Harmonic Mean
result = DBInterface.execute(db, "select HARMMEAN(num) from table") |> DataFrame# Maximum absolute deviations
result = DBInterface.execute(db, "select MAXAD(num) from table") |> DataFrame# Mean absolute deviations
result = DBInterface.execute(db, "select MEANAD(num) from table") |> DataFrame# Mean squared deviations
result = DBInterface.execute(db, "select MSD(num) from table") |> DataFrame# Mode
result = DBInterface.execute(db, "select MODE(num) from table") |> DataFrame# WMEAN for weighted mean
result = DBInterface.execute(db, "select WMEAN(num, weights) from table") |> DataFrame# WMEDIAN for weighted mean
result = DBInterface.execute(db, "select WMEDIAN(num, weights) from table") |> DataFrame# Entropy
result = DBInterface.execute(db, "select ENTROPY(probs) from table") |> DataFrame# Slope (a) of linear regression y = b + ax
result = DBInterface.execute(db, "select LINSLOPE(x, y) from table") |> DataFrame# Intercept (b) of linear regression y = b + ax
result = DBInterface.execute(db, "select LININTERCEPT(x, y) from table") |> DataFrame
```# Well-known Probability Related Functions
This family of functions implement QXXX(), PXXX(), and RXXX() for a probability density or mass function XXX. Q for quantile, p for propability or cdf value, R for random number.`QNORM(p, mean, stddev)` returns the quantile value $q$
whereas
`PNORM(q, mean, stddev)` returns $p$ using the equation
$$
\int_{-\infty}^{q} f(x; \mu, \sigma)dx = p
$$and `RNORM(mean, stddev)` draws a random number from a Normal distribution with mean `mean` ( $\mu$ ) and standard deviation `stddev` ( $\sigma$ ) which is defined as
$$
f(x; \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2} (\frac{x-\mu}{\sigma})^2}
$$and $-\infty < x < \infty$.
```julia
# Quantile of Normal Distribution with mean 0 and standard deviation 1
result = DBInterface.execute(db, "select QNORM(0.025, 0.0, 1.0) from table") |> DataFrame# Probability of Normal Distribution with mean 0 and standard deviation 1
result = DBInterface.execute(db, "select PNORM(-1.96, 0.0, 1.0) from table") |> DataFrame# Random number drawn from a Normal Distribution with mean * and standard deviation 1
result = DBInterface.execute(db, "select RNORM(0.0, 1.0) from table") |> DataFrame
```# Other functions for distributions
Note that Q, P, and R prefix correspond to Quantile, CDF (Probability), and Random (number), respectively.- `QT(x, dof)`, `PT(x, dof)`, `RT(dof)` for Student-T Distribution
- `QCHISQ(x, dof)`, `PCHISQ(x, dof)`, `RCHISQ(dof)` for ChiSquare Distribution
- `QF(x, dof1, dof2)`, `PF(x, dof1, dof2)`, `RF(dof1, dof2)` for F Distribution
- `QPOIS(x, lambda)`,`RPOIS(x, lambda)`, `RPOIS(lambda)` for Poisson Distribution
- `QBINOM(x, n, p)`, `PBINOM(x, n, p)`, `RBINOM(n, p)` for Binomial Distribution
- `QUNIF(x, a, b)`, `PUNIF(x, a, b)`, `RUNIF(a, b)` for Uniform Distribution
- `QEXP(x, theta)`, `PEXP(x, theta)`, `REXP(theta)` for Exponential Distribution
- `QBETA(x, alpha, beta)`, `PGAMMA(x, alpha, beta)`, `RGAMMA(alpha, beta)` for Beta Distribution
- `QCAUCHY(x, location, scale)`, `PCAUCHY(x, location, scale)`, `RCAUCHY(location, scale)` for Cauchy Distribution
- `QGAMMA(x, alpha, theta)`, `PGAMMA(x, alpha, theta)`, `RGAMMA(alpha, theta)` for Gamma Distribution
- `QFRECHET(x, alpha)`, `PFRECHET(x, alpha)`, `RFRECHET(alpha)` for Frechet Distribution
- `QPARETO(x, alpha, theta)`, `PPARETO(x, alpha, theta)`, `RPARETO(alpha, theta)` for Pareto Distribution
- `QWEIBULL(x, alpha, theta)`, `PWEIBULL(x, alpha, theta)`, `RWEIBULL(alpha, theta)` for Weibull Distribution# Hypothesis Tests
- `JB(x)` for Jarque-Bera Normality Test (returns the p-value)# The Logic
The package mainly uses the ```register``` function. For example, a single variable
function ```MEDIAN``` is registered as```julia
SQLite.register(db, [],
(x,y) -> vcat(x, y),
x -> StatsBase.quantile(x, 0.50),
name = "MEDIAN")
```whereas, the two-variable function ```COR``` is registered as
```julia
SQLite.register(db, Array{Float64, 2}(undef, (0, 2)),
(x, a, b) -> vcat(x, [a, b]'),
x -> StatsBase.cor(x[:,1], x[:,2]),
name = "COR", nargs = 2)
```for Pearson's correlation coefficient.