https://github.com/mobeets/psychometric-fits
Exploring various methods of fitting psychometric functions
https://github.com/mobeets/psychometric-fits
Last synced: about 1 month ago
JSON representation
Exploring various methods of fitting psychometric functions
- Host: GitHub
- URL: https://github.com/mobeets/psychometric-fits
- Owner: mobeets
- Created: 2014-06-18T18:18:27.000Z (about 12 years ago)
- Default Branch: master
- Last Pushed: 2014-06-24T23:02:58.000Z (almost 12 years ago)
- Last Synced: 2025-01-12T17:11:49.705Z (over 1 year ago)
- Language: Python
- Size: 590 KB
- Stars: 2
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
A good overview of various Markov-chain Monte Carlo methods is available [here](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.7133&rep=rep1&type=pdf).
### Metropolis-Hastings sampling
_Metropolis-Hastings_ sampling (M-H) aims to draw samples from a posterior using only a _posterior-ish function_ (e.g. the unnormalized posterior given some measured data) and a _proposal function_. M-H is provided with an initial draw from the posterior and aims to generate a series of samples by moving from that initial draw to the next, following a certain rule. (More details can be found [here](http://www.journalofvision.org/content/5/5/8.short).)
The posterior-ish function is used to determine whether or not the next potential sample is in a region of higher posterior probability compared with the previous sample.
The proposal function describes how to generate the next potential sample. (If the proposal function is symmetric this sampling procedure is sometimes called _Metropolis samping_. If it is independent of the location of the most recent sample it is called an _independent sampler_.) The result is a sort of random walk of samples through the posterior-ish function; often the proposal function is a Gaussian centered on the most recent sample with a given standard deviation. The choice of standard deviation greatly determines how well M-H samples the entire posterior, which you can see in the image below.

#### Example 1: Generating samples from weibull pdf
The function `example_1()` function in `examples.py` currently uses a gaussian proposal function to draw samples from a weibull pdf with given shape and scale parameters. As you can see, the samples generated by `metropolis_hastings()` (after pruning) are a close match to the actual weibull pdf:

#### Example 2: Fitting weibull scale parameter of data generated by a weibull pdf
But `example_1()` is more like a sanity-check, isn't it? If we hand M-H the pdf, it can generate samples from that pdf--not that impressive.
On the other hand, `example_2()` is a little closer to what we'd want M-H to do. It simulates data from a weibull pdf given shape and scale parameters, and M-H tries to generate samples of (i.e. fit) the shape parameter, by calculating the log-likelihood of the simulated data.
So now, the generated samples are all estimates of the shape parameter. The shape parameter used to simulate the data was 3, with a fairly small sample size of only 1000. The estimates cluster near but not quite at 3, which is fine, though repeated simulations would probably show more encouraging results (since here I'm simulating such a low sample size).

### Simulated annealing
Metropolis-Hastings aims to approximate the entire posterior distribution. However, in the case of fitting, often all you want is the maximum a posteriori (MAP) estimate of the posterior: in other words, you don't need to approximate the entire posterior--you just want to know the mode!
Simulated annealing (good overview [here](http://stuff.mit.edu/~dbertsim/papers/Optimization/Simulated%20annealing.pdf)) is a generalization of Metropolis-Hastings, with an added parameter function called the "cooling schedule" that is non-increasing with each iteration of your sampler.
Using M-H to approximate the mode of the posterior is inefficient since it tries to spread itself along the entire posterior. Simulated annealing, however, uses its cooling schedule function to slowly hone in on the mode of the posterior.
One common choice of a cooling schedule function `T(i)` is, for a given `d`:
T = lambda i: d/np.log(i+2)
(The only rules for `T` is that it must be non-increasing, and as i -> ∞, T(i) -> 0.)
Now, at each iteration, the `p_pdf_fcn` of `metropolis_hastings()` is instead calculated as `p_pdf_fcn(x)^(1/T_i)`, where i is the current iteration of the sampler.
The very last sample generated is your MAP estimate of the posterior.
#### Example 3: Simulated annealing MAP estimate
Just as the spread of the proposal function is crucial to fully exploring the posterior in M-H, here the choice and parameters of your cooling function are crucial to appropriately estimating the mode of the posterior.
Just as in Example 2, the sample size of my simulated data set is only 1000, so though the theta that generated the data was 3, it's not too surprising that our MAP estimate for theta is not exactly 3. Also, I somewhat hastily set the parameter `d = 1` for my cooling function. Larger sample size and better parameter fitting would definitely improve your simulated annealing experience.

So what are the relative tradeoffs apparent so far between M-H and simulated annealing? Well, M-H aims to simulate the entire posterior, which is inefficient if you're just looking for an estimate of the mode. Simulated annealing, on the other hand, should converge on this mode in fewer iterations, but you now have an additional parameter to adjust. (And adjusting this parameter involves, ironically enough, assessing the shape of your posterior.)
### Scipy's minimization methods
`scipy.optimize` actually has a pretty broad selection of [minimization methods](http://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html), including simulated annealing. So if you're not looking to view the posterior but are instead looking for its mode, this is probably your best bet (in terms of development time, testing, _and_ solver speed).
I simulated data in the same way as I did in Examples 2 and 3 and called `scipy.optimize.minimize` with all of its relevant method arguments: `["nelder-mead", "powell", "Anneal", "BFGS", "TNC", "L-BFGS-B", "SLSQP"]`. Every single one of these solvers found solutions in no time at all, and all of their solutions were basically identical. Hooray!
Only problem is that I'm still fitting only one parameter, and I'm not including a prior. When you fit a real-life psychometric function, you need up to four parameters _and_ a prior. Do all of these methods scale with more parameters and messier function evaluations?
#### Example 4: Fitting four parameters of data generated by a weibull pdf
A general form of the psychometric curve is as follows:
`Ψ(x; α, β, γ, λ) = γ + (λ - γ)F(x; α, β)`.
`α` is the scale parameter, `β` the shape parameter, `λ` the upper-bound of performance, and `γ` the lower-bound. `F` is typically a sigmoid function--in this case I'm using the cdf of a weibull distribution.
I wanted to compare all the fitting methods of `scipy.optimize` that allow bounds on the parameters. These are `["TNC", "L-BFGS-B", "SLSQP"]`, but `TNC` ended up being too slow so I dropped it from consideration.
I simulated a series of datasets from a model psychometric curve with fixed parameters. Each dataset contained trials collected at `x_i = i/20` for `i = 1..20`. For each `x`, I simulated 50 trials. Each trial was either 1 or 0, for success/failure, where the probability of success on each trial was `Ψ(x; α, β, γ, λ)`. In other words, each trial was a draw from a bernoulli distribution with probability `Ψ(x; α, β, γ, λ)`.
For each data set, I estimated the model parameters using the `L-BFGS-B` and `SLSQP` arguments to `scipy.optimize`, representing the [limited-memory BFGS](https://en.wikipedia.org/wiki/Limited-memory_BFGS) and [sequential least squares programming](http://www.pyopt.org/reference/optimizers.slsqp.html) algorithms, respectively. (Both of these are [quasi-Newton](https://en.wikipedia.org/wiki/Quasi-Newton_method) methods, meaning they use approximations of the Hessian to find the minima.)
The solutions for `L-BFGS-B` and `SLSQP` are essentially identical, so I'll plot the results of only one of them.
Simulation 1: Results from fitting 100 datasets drawn from model with parameters: α = 0.1, β=0.5, γ=0.03,$ and λ=0.98.

Simulation 2: Results from fitting 1000 datasets drawn from model with parameters: α = 0.3, β=0.9, γ=0.045, and λ=0.96.

When interpreting these plots it should be noted that each model fit _kinda_ got to cheat because I gave the fitting function the actual generating set of parameters as its initial guess. The resulting spread in the parameter estimates, then, is very much a best-case scenario.
Overall, the scale and shape parameter estimates appear unbiased in the long-run. But look how poorly it fits the upper-bound, `λ`! And only in the second simulation did it fit `γ` well on average. Why is this? Not really sure yet, but both of these parameters represent performance at the edge-cases of `x`, which means their estimates will be very dependent on the presence of rare events in the datasets.
### Future examples
* Example 5: Fitting with a prior over parameters