https://github.com/mhahsler/fit_dist
Simple R script to fit distributions to data
https://github.com/mhahsler/fit_dist
distribution educational statistics
Last synced: 12 months ago
JSON representation
Simple R script to fit distributions to data
- Host: GitHub
- URL: https://github.com/mhahsler/fit_dist
- Owner: mhahsler
- License: cc-by-sa-4.0
- Created: 2019-11-25T15:40:52.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2021-01-21T15:52:18.000Z (over 5 years ago)
- Last Synced: 2025-02-05T00:41:38.007Z (over 1 year ago)
- Topics: distribution, educational, statistics
- Language: R
- Homepage:
- Size: 69.3 KB
- Stars: 3
- Watchers: 3
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# fit_dist
Simple R script to fit distributions to data based on
the R package `fitdistrplus`. This script is intended to provide students with a simple way to fit distributions (e.g., for input analysis in a simulation course).

This work is licensed under the
[Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/). For questions please contact
[Michael Hahsler](http://michael.hahsler.net).
## Required Software
* Install [R](https://cran.r-project.org/)
* Optional: Install [R Studio](https://rstudio.com/products/rstudio/download/)
* Load the script: `source('https://raw.githubusercontent.com/mhahsler/fit_dist/master/fit_dist.R')`.
## Usage
```
source('https://raw.githubusercontent.com/mhahsler/fit_dist/master/fit_dist.R')
fit_dist(x, distributions = NULL, discrete = NULL, plot = TRUE, ...)
```
where `x` is a vector with the data, `distributions` is a vector with the distributions to fit,
`discrete` indicates if discrete or continuous distributions should be fit, and
`plot` indicates if a Q-Q plot should be displayed. The function displays the results of statistical tests and returns a list with the estimated parameters.
_Note:_ The plot might be too large for the little window in R Studio. Use `X11()` (`quarz()` for Macs) to open a bigger window for plotting.
## Examples
Load the script first.
```
source('https://raw.githubusercontent.com/mhahsler/fit_dist/master/fit_dist.R')
```
Fit some random data drawn from a normal distribution.
```
x <- rnorm(100, mean = 10, sd = 1)
fit <- fit_dist(x)
```

```
Fitting unif, norm, lnorm, exp, gamma, beta, weibull
Error in computing default starting values.
Error in manageparam(start.arg = start, fix.arg = fix.arg, obs = data, :
Error in start.arg.default(obs, distname) :
values must be in [0-1] to fit a beta distribution
Test results:
Kolmogorov.iSmirnov.test Cramer.von.Mises.test Anderson.Darling.test Chi.Square.p.value
unif not rejected not computed not computed 1.677783e-02
norm not rejected not computed not computed 7.435807e-01
lnorm not rejected not computed not computed 5.362755e-01
exp rejected rejected rejected 2.148867e-127
gamma not rejected not rejected not rejected 6.183843e-01
weibull not rejected not rejected not rejected 7.797596e-01
*** Best fit using the AIC is: norm ***
*** Best fit using the BIC is: norm ***
```
The code is unable to fit a beta distribution. Since the data is not between 0 and 1.
```
fit
```
```
$unif
Fitting of the distribution ' unif ' by maximum likelihood
Parameters:
estimate Std. Error
min 7.570601 NA
max 12.224757 NA
$norm
Fitting of the distribution ' norm ' by maximum likelihood
Parameters:
estimate Std. Error
mean 9.986087 0.09884770
sd 0.988477 0.06989556
$lnorm
Fitting of the distribution ' lnorm ' by maximum likelihood
Parameters:
estimate Std. Error
meanlog 2.2962177 0.010021790
sdlog 0.1002179 0.007083301
$exp
Fitting of the distribution ' exp ' by maximum likelihood
Parameters:
estimate Std. Error
rate 0.1001393 0.01001293
$gamma
Fitting of the distribution ' gamma ' by maximum likelihood
Parameters:
estimate Std. Error
shape 100.68077 14.21488
rate 10.08203 1.42700
$weibull
Fitting of the distribution ' weibull ' by maximum likelihood
Parameters:
estimate Std. Error
shape 11.16064 0.84788207
scale 10.43341 0.09891693
attr(,"gof")
Goodness-of-fit statistics
unif norm lnorm exp gamma weibull
Kolmogorov-Smirnov statistic 0.1675067 0.06388482 0.08325544 0.5419494 0.07685832 0.06338501
Cramer-von Mises statistic 0.7575686 0.05303841 0.07260888 8.0778787 0.06344904 0.08042399
Anderson-Darling statistic Inf 0.34161070 0.44366264 37.3407902 0.39257035 0.57981244
Goodness-of-fit criteria
unif norm lnorm exp gamma weibull
Akaike's Information Criterion NA 285.4697 286.9495 662.2386 286.1829 289.6857
Bayesian Information Criterion NA 290.6801 292.1599 664.8437 291.3932 294.8961
```
__Note:__ Look for the closest match in the Q-Q plot and the smallest numbers in Goodness-of-fit statistics and criteria. You can look up the different [goodness-of-fit statistics on Wikipedia](https://en.wikipedia.org/wiki/Goodness_of_fit). It is often also helpful to look at the
[relationship between distributions](https://en.wikipedia.org/wiki/Relationships_among_probability_distributions) when choosing a fitted distribution.
### Fit a specific distribution
```
x <- rexp(100))
fit_dist(x, distributions = "exp")
```
### The function automatically recognizes data for discrete distributions
```
x <- rpois(100, lambda = 2)
fit_dist(x)
Trying to fit binom, pois, nbinom, geom, hyper
...
```
To avoid this behavior and fit continuous distributions, use `discrete = FALSE`.
### Fit your own data
You can use your own data by reading in CVS files in R Studio via `Environment` tab and `Import Datasets` (in the window to the right) or you can type `my_data <- read.csv("my_data.csv")`. You can then use the appropriate column (in this example `x`) to fit the distribution using `fit_dist(my_data$x)`.