https://github.com/kiran94/prfiesta
Collect and Analyze Individual Contributor Pull Requests
https://github.com/kiran94/prfiesta
performance-review pull-request pull-request-review
Last synced: about 1 year ago
JSON representation
Collect and Analyze Individual Contributor Pull Requests
- Host: GitHub
- URL: https://github.com/kiran94/prfiesta
- Owner: kiran94
- License: mit
- Created: 2023-05-04T19:09:16.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2025-01-06T11:03:26.000Z (over 1 year ago)
- Last Synced: 2025-04-19T10:13:34.552Z (about 1 year ago)
- Topics: performance-review, pull-request, pull-request-review
- Language: Jupyter Notebook
- Homepage: https://pypi.org/project/prfiesta/
- Size: 667 KB
- Stars: 8
- Watchers: 1
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# prfiesta 🦜🥳
[](https://github.com/kiran94/prfiesta/actions/workflows/main.yml)  [](https://pypi.org/project/prfiesta/)
> Collect and Analyze Individual Contributor Pull Requests
`prfiesta` allows you to collect, analyze and celebrate pull requests made by an individual 🎉.
It can be used by engineers or managers to gain insights into all the great work the contributor has made over a specified period of time. A great use case of this tool is during a performance review process when you want to perform analysis on all the contributions made over the year.
[](https://asciinema.org/a/587987)
## Install
```bash
python -m pip install prfiesta
```
> [!TIP]
> If you are using [`uvx`](https://docs.astral.sh/uv/guides/tools/) then you can also just run this:
> ```bash
> uvx prfiesta --help
> ```
Dependencies:
- Python 3.9+
## Usage
```bash
# Authenticate yourself
export GITHUB_TOKEN=... # or GITHUB_ENTERPRISE_TOKEN
# Get all pull requests for a user
prfiesta -u kiran94
# Get all pull requests for a user created after a date
prfiesta -u kiran94 --after 2023-01-01
# Get all pull requests for a user created between two dates
prfiesta -u kiran94 --after 2023-01-01 --before 2023-06-01
# Get all pull requests for a user updated after a date
prfiesta -u kiran94 --after 2023-01-01 --use-updated
# Get all pull requests with a custom output file name
prfiesta -u kiran94 --output my_pull_requests.csv
# Get all pull requests in parquet format with a custom file name
prfiesta -u kiran94 --output-type parquet --output my_pull_requests.parquet
# Get all pull requests and export to a duckdb database
prfiesta -u kiran94 --output-type duckdb --output mydatabase.duckdb
# Get all pull requests for more then one user
prfiesta -u kiran94 -u user2
# Get all pull requests and drop specific columns from the output
prfiesta -u kiran94 -dc events_url -dc comments_url -dc node_id
# Get all pull requests where the user was involved (as opposed to just authored)
prfiesta -u kiran94 --use-involves
# Get all pull requests where the user reviewed it rather then being the author
prfiesta -u charliermarsh --after 2023-05-01 --use-reviewed-by
# Get all pull requests where the user was requested a review rather then being the author
prfiesta -u charliermarsh --after 2023-05-01 --use-review-requested
# Get all pull requests which contains a reference (e.g JIRA card reference) within the PR title or body
prfiesta --reference PA-12765
# Get help
prfiesta --help
# Show the current version
prfiesta --version
```
You can also leverage `prfiesta` directly in your own application:
```python
import pandas as pd
from datetime import datetime
from prfiesta.collectors.github import GitHubCollector
github = GitHubCollector()
frame: pd.DataFrame = github.collect('kiran94', 'user2', after=datetime(2023, 1, 1))
print(frame)
```
### Output
You can control the output type using the `--output-type` option. Supported options:
- `csv` (default)
- `parquet`
- [`duckdb`](https://duckdb.org/)
You can also customize the output file name using the `--output` option. When using `duckdb`, this argument is the duckdb database that we should export into. You can see an example of a duckdb workflow [here](https://github.com/kiran94/prfiesta/blob/main/notebooks/misc/duckdb_integration.ipynb).
### User Filter
By default, `prfiesta` will take the users provided in the `--user` option and search the Git provider for any pull requests that the user **authored**. Within more collaborative environments, this may not be what you want as you may want to also gain some visibility into all secondary contributions a user made (e.g commenting on others pull requests).
*The options listed here are mutually exclusive.*
#### User Involvement
`prfiesta` exposes the `--use-involves` flag which will search for pull requests that were:
- Created by a certain user
- Assigned to that user
- Mention that user
- commented on by that user
Learn more about `involves` [here](https://docs.github.com/en/search-github/searching-on-github/searching-issues-and-pull-requests#search-by-a-user-thats-involved-in-an-issue-or-pull-request).
#### User Reviewed
`prfiesta` exposes a `--use-reviewed-by` flag which will collect pull requests where the user *has reviewed* other's pull requests.
Learn more about searching review requests [here](https://docs.github.com/en/search-github/searching-on-github/searching-issues-and-pull-requests#search-by-pull-request-review-status-and-reviewer)
#### User Requested Review
`prfiesta` exposes a `--use-review-requested` flag which will collect pull requests where the user was *requested* a review from other collaborators.
Learn more about searching review requests [here](https://docs.github.com/en/search-github/searching-on-github/searching-issues-and-pull-requests#search-by-pull-request-review-status-and-reviewer)
### Date Filter
When using the `--after` and `--before` date filters, by default `prfiesta` will use the `created` date dimension with these filters on the Git provider (e.g GitHub). This may not fit your use case and you may want to filter on when a pull request was `updated` instead. To do this you can use the `--use-updated` flag.
Learn more about date filters [here](https://docs.github.com/en/search-github/searching-on-github/searching-issues-and-pull-requests#search-by-when-an-issue-or-pull-request-was-created-or-last-updated).
### Reference Search
You may come across a use case where you want to filter pull requests on a specific reference. For example, it may be a team practise to put a JIRA card reference within the pull request title or body.
For this you can use the `--reference` filter.
> [!NOTE]
> Results from reference search is entirely up to the GitHub Search API. On some ocassions, it may not provide inaccurate results.
## Analysis
`prfiesta` ships with built in plots to help analyze your pull request data. These serve as a starting point in your analysis. See more information on the build in plots and views [here](https://github.com/kiran94/prfiesta/blob/main/docs/analysis.md).
## Using GitHub Enterprise
If you trying to fetch data from a [GitHub Enterprise](https://docs.github.com/en/enterprise-cloud@latest/rest/enterprise-admin?apiVersion=2022-11-28) server, then much of the same functionality should work the same. You just need to make sure that:
- `GH_HOST` is set to your enterprise instance's API URL. Reach out to your internal GitHub team if you are not sure what this should be.
- `GITHUB_ENTERPRISE_TOKEN` a [personal access token](https://docs.github.com/en/enterprise-cloud@latest/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) generated on your GitHub Enterprise instance.
## GitHub Rate Limiting
Depending on your input parameters, you may end up in a situation where you are being [Rate Limited](https://docs.github.com/en/rest/overview/resources-in-the-rest-api?apiVersion=2022-11-28#rate-limiting) by the GitHub API.
See this [Notebook](https://github.com/kiran94/prfiesta/blob/main/notebooks/misc/rate_limit.ipynb) on a way to handle this.
## Environment Variables
| Variable | Description | Default |
| --------------- | --------------- | ------ |
| `GITHUB_TOKEN` | The Github [`Token`](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) to use. Must have the `repo` scope. | |
| `GITHUB_ENTERPRISE_TOKEN` | Takes precedence over `GITHUB_TOKEN` when set | |
| `GH_HOST` | The Github Host to communicate with (Override this with your company's GitHub Enterprise server if needed) | `https://api.github.com` |
| `LOGGING_LEVEL` | The [Logging Level](https://docs.python.org/3/library/logging.html#logging-levels) to use | `INFO` |
| `LOGGING_FORMAT` | The [Logging Format](https://docs.python.org/3/library/logging.html#logrecord-attributes) to use | `%(message)s` |
| `SPINNER_STYLE` | The [Spinner Style](https://rich.readthedocs.io/en/stable/reference/spinner.html) to use | `blue` |
## Developer Setup
Assuming you have cloned the repository and are at the root of the repository in your terminal.
```bash
poetry shell
poetry install
poetry build
```
This should leave you in a state where you have the virtual environment sourced, all dependencies are installed and `prfiesta` is installed locally.
You can then leverage the various commands in the [makefile](https://github.com/kiran94/prfiesta/blob/main/makefile) for development tasks:
```bash
# Run all unit tests
make test
# Produce code coverage reports
make coverage
# Code linting
make lint
```
Optionally you can also install [pre-commit](https://github.com/pre-commit/pre-commit) to run some sanity checks before your commits.
```bash
# Install it into your git hooks (one time setup)
# from this point onwards, any commits will run pre-commit checks
precommit_install
# If you want to run all checks on all files without comitting.
precommit_run
```
### Creating Prereleases
When you create a pull request on this repository, various CI checks are run, towards the end of those checks there is a `release` job.
Usually when running under `main`, this job is responsible for publishing new versions to pypi. However when running under a pull request, this will create a special prerelease package specific to that pull request.
The versioning of this package follows [PEP-440](https://peps.python.org/pep-0440/#pre-releases) and will look something like this:
```
0.8.1b125
```
Where
- `0.8.1` = The bumped version of what is currently within the `pyproject.toml` of that pull request. We don't attempt to do any analysis to figure out if we should be bumping with a higher serverity in this context.
- `b` = Beta; Indicates to pypi that this is a prerelease package.
- `125` = The `github.run_number` from [GitHub Actions](https://docs.github.com/en/actions/learn-github-actions/contexts#github-context).
An example prerelease package looks like this: https://pypi.org/project/prfiesta/0.8.1b125/
Downstream users can then do a full end to end test with the prerelease package before the change is merged into `main`. This will automatically be posted into the pull request [example](https://github.com/kiran94/prfiesta/pull/36#issuecomment-1564909558).
You can find the full version history of package [here](https://pypi.org/project/prfiesta/#history)