Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/softloud/retention
Simulate unity analytics game events daily retention data
https://github.com/softloud/retention
Last synced: about 2 months ago
JSON representation
Simulate unity analytics game events daily retention data
- Host: GitHub
- URL: https://github.com/softloud/retention
- Owner: softloud
- License: other
- Created: 2024-05-30T17:53:34.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-06-02T21:00:07.000Z (7 months ago)
- Last Synced: 2024-10-12T21:28:59.198Z (3 months ago)
- Language: R
- Homepage: https://softloud.github.io/retention/
- Size: 1.98 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
---
title: README
---# Retention
`Retention` is an R package that contains datasets related to user activity, user details, and build versions. These datasets can be used for analyses such as user retention, activity patterns, and the impact of different build versions on user activity.
It has functions for simulating builds, users, and activity data, each of which is customisable to control scale of data output.
The core of the data is simulating a decrease in activity according to the number of days a player has been active. So, on day 0, there is a 100% chance of activity, but on day 1, only a 30% chance of activity, after day 30, a very small chance of activity. This mimics how player retention data is usually shaped.
This is done by creating a probability function dependent on days from start of activity, and sampling from a binomial distribution. See `get_activity_probability`and `get_activity` for more details.
## Inspiration
The inspiration for these data is Unity Analytics game events. In order to present retention analytics on game data open source, I need simulated data that imitates the data structure I worked with at a video game studio.
## Installation
You can install the `Retention` package from GitHub using the `devtools` package. Run the following commands in your R console:
``` r
# Install devtools if you haven't already
if (!require(devtools)) {
install.packages("devtools")
}# Install the Retention package from GitHub
devtools::install_github("softloud/retention")
```## Using the package
```{r}
library(retention)```
## Data
The data in this package was generated using the `simulate_retention_data.R` script. The datasets are stored as RDS files, called by the retention package, or accessed as csv files in retention_data/ and are:
1. `user_activity`: This dataset tracks the activity of users across different build versions and dates. It contains 146,463 rows and 3 variables: `user`, `build`, and `activity_date`.
```{r}
dim(retention::user_activity)retention::user_activity %>% head()
```
2. `users`: This dataset tracks the activity of users from their first build version and date. It contains 47,031 rows and 4 variables: `user`, `first_build`, `activity_start`, and `activity_days`.
```{r}
dim(retention::users)retention::users %>% head()
```
3. `builds`: This dataset tracks the release information of different build versions. It contains 57 rows and 4 variables: `build`, `release_length`, `release_start`, and `release_end`.
For more detailed information about these datasets, please refer to the documentation in the `pkg_data.R` file.
```{r}
dim(retention::builds)
retention::builds %>% head()
```
## Functions that simulate user activity for different versions of an app
Simulate builds.
```{r}
versions <-
get_versions(
major_change_max = 2,
minor_change_max = 1,
hot_fix_max = 1)versions
```
Simulate release dates for builds.
```{r}
builds <- builds %>% set_build_releases(release_length_max = 7)builds
```
Simulate users for builds.
```{r}
users <- get_users(
builds,
new_users_max = 3,
max_activity_days = 14)users
```
Simulate activity.
```{r}
user_activity_data <- get_activity(builds, users) %>%
dplyr::filter(active_on_date == TRUE)user_activity_data
```
## Limitation
One limitation of these data is that the simulation assumes that users update when the software is released, which is not necessarily the case. However for the retention analytics I intend to generate with this, that shouldn't be too much of an issue.