https://github.com/bearloga/interleaved-python

Library for analyzing interleaved search A/B tests to determine preference between competing ranking functions
https://github.com/bearloga/interleaved-python

ab-testing information-retrieval interleaved search-ranking

Last synced: 4 months ago
JSON representation

Library for analyzing interleaved search A/B tests to determine preference between competing ranking functions

Host: GitHub
URL: https://github.com/bearloga/interleaved-python
Owner: bearloga
License: bsd-3-clause
Created: 2021-06-15T16:58:19.000Z (about 5 years ago)
Default Branch: main
Last Pushed: 2021-06-21T14:37:07.000Z (almost 5 years ago)
Last Synced: 2025-04-24T06:33:51.649Z (about 1 year ago)
Topics: ab-testing, information-retrieval, interleaved, search-ranking
Language: Python
Homepage:
Size: 138 KB
Stars: 4
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # interleaved

Library for analyzing interleaved search A/B tests to determine preference between competing [ranking functions](https://en.wikipedia.org/wiki/Ranking_(information_retrieval))

## Installing

```

pip install --upgrade git+https://github.com/bearloga/interleaved-python.git@main

```

## Usage

```python

from interleaved import load_example_data

data = load_example_data(preference='a') # alternatively: 'none' or 'b'

data.head()

```

```

                  timestamp   search_id  event  position ranking_function

0 2018-08-01 00:01:31+00:00  p2tvgm3clu   serp       NaN              NaN

1 2018-08-01 00:04:09+00:00  p2tvgm3clu  click      14.0                A

2 2018-08-01 00:04:29+00:00  p2tvgm3clu  click       4.0                A

3 2018-08-01 00:06:10+00:00  p2tvgm3clu  click       1.0                A

4 2018-08-01 00:06:42+00:00  p2tvgm3clu  click       7.0                B

```

```python

from interleaved import Experiment

ex = Experiment(

    queries = data[data['event'] == 'click']['search_id'].to_numpy(),

    clicks = data[data['event'] == 'click']['ranking_function'].to_numpy()

)

ex.bootstrap(seed=42)

print(ex.summary(ranker_labels=['New Algorithm', 'Old Algorithm'], rescale=True))

```

```

 In this interleaved search experiment, 906 searches were used to determine whether the

results from ranker 'New Algorithm' or 'Old Algorithm' were preferred by users (based on

their clicks to the results from those rankers interleaved into a single search result

set).

 The preference statistic, as defined by Chapelle et al. (2012), was estimated to be 74.3%

with a 95% (bootstrapped) confidence interval of (70.0%, 77.9%) on [-100%, 100%] scale

with -100% indicating total preference for 'Old Algorithm', 100% indicating total

preference for 'New Algorithm', and 0% indicating complete lack of preference between the

two -- indicating that the users had preference for ranker 'New Algorithm'.

```

Quite a strong preference for that new algorithm!

**Additional methods:**

- `.distribution(rescale=False)` returns the bootstrapped distribution of preference statistic (useful if visualizing)

- `.preference_statistic(rescale=False)` returns the estimated preference statistic

- `.conf_int(conf_level=0.95, rescale=False)` returns the confidence interval based on the bootstrapped distribution

**Note**: `rescale=True` rescales the preference statistic from [-0.5, 0.5] scale to a [-1, 1] scale,

which may help with interpretability of the results.

## References

- Chapelle, O., Joachims, T., Radlinski, F., & Yue, Y. (2012). Large-scale validation and analysis of interleaved search evaluation. *ACM Transactions on Information Systems*, **30**(1), 1-41. [doi:10.1145/2094072.2094078](https://doi.org/10.1145/2094072.2094078)

- Radlinski, F. and Craswell, N. (2013). [Optimized interleaving for online retrieval evaluation](https://www.microsoft.com/en-us/research/publication/optimized-interleaving-for-online-retrieval-evaluation/). *ACM International Conference on Web Search and Data Mining (WSDM)*. [doi:10.1145/2433396.2433429](https://doi.org/10.1145/2433396.2433429)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bearloga/interleaved-python

Awesome Lists containing this project

README