{"id":13710271,"url":"https://github.com/DoktorMike/dammmdatagen","last_synced_at":"2025-05-06T18:34:50.420Z","repository":{"id":70914255,"uuid":"110481972","full_name":"DoktorMike/dammmdatagen","owner":"DoktorMike","description":"Marketing Mix Modeling Data Generator","archived":false,"fork":false,"pushed_at":"2021-12-27T16:26:06.000Z","size":11437,"stargazers_count":44,"open_issues_count":0,"forks_count":6,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-10-19T05:22:38.281Z","etag":null,"topics":["benchmark","data","data-generator","marketing-mix-modeling"],"latest_commit_sha":null,"homepage":"https://doktormike.github.io/dammmdatagen/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DoktorMike.png","metadata":{"files":{"readme":"README.Rmd","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-11-13T00:39:17.000Z","updated_at":"2024-08-26T01:15:00.000Z","dependencies_parsed_at":"2023-03-11T09:28:03.309Z","dependency_job_id":null,"html_url":"https://github.com/DoktorMike/dammmdatagen","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DoktorMike%2Fdammmdatagen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DoktorMike%2Fdammmdatagen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DoktorMike%2Fdammmdatagen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DoktorMike%2Fdammmdatagen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DoktorMike","download_url":"https://codeload.github.com/DoktorMike/dammmdatagen/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224521687,"owners_count":17325279,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","data","data-generator","marketing-mix-modeling"],"created_at":"2024-08-02T23:00:53.800Z","updated_at":"2024-11-13T20:31:37.222Z","avatar_url":"https://github.com/DoktorMike.png","language":"R","funding_links":[],"categories":["Media / Marketing Mix Models"],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, echo = FALSE}\nknitr::opts_chunk$set(\n        collapse = TRUE,\n        comment = \"#\u003e\",\n        fig.path = \"man/figures/README-\",\n        out.width = \"100%\"\n)\n```\n\n# dammmdatagen \u003ca href=\"https://doktormike.github.io/dammmdatagen/\"\u003e\u003cimg src=\"man/figures/logo.png\" align=\"right\" height=\"139\" /\u003e\u003c/a\u003e\n\nThe goal of dammmdatagen is to make it easy for marketing mix modeling professionals to get access to realistic data sets where the ground truth is known. This fascilitates our development and provides in the end more value for all stakeholders of MMM.\n\n## Build status etc\n\n\u003c!-- badges: start --\u003e\n[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)\n[![R-CMD-check](https://github.com/DoktorMike/dammmdatagen/workflows/R-CMD-check/badge.svg)](https://github.com/DoktorMike/dammmdatagen/actions)\n[![Codecov test coverage](https://codecov.io/gh/DoktorMike/dammmdatagen/branch/master/graph/badge.svg)](https://app.codecov.io/gh/DoktorMike/dammmdatagen?branch=master)\n\u003c!-- badges: end --\u003e\n\n## Installation\n\nYou can install dammmdatagen from github with:\n\n```{r gh-installation, eval = FALSE}\n# install.packages(\"devtools\")\ndevtools::install_github(\"DoktorMike/dammmdatagen\")\n```\n\n## Quick start\n\nThis is a basic example which shows you how to generate a small 1 year data set.\n\n```{r example, cache=TRUE}\n# load useful libraries\nlibrary(dammmdatagen)\n\n# generate a basic data set\nmydf \u003c- generateCovariatesData()\nhead(mydf)\n```\n\nSay that you would like to also generate a response variable to fit a model to. Then you could use the highlevel API function below.\n\n```{r fullexample, cache=TRUE}\nlibrary(dammmdatagen)\nlibrary(ggplot2)\nlibrary(dplyr)\nlibrary(tidyr)\nlibrary(scales)\nret \u003c- generateRetailData()\ndates \u003c- ret[[\"covariates\"]][[\"Macro\"]][[\"date\"]]\nqplot(dates, ret[[\"response\"]]) + geom_line() + ylim(0, NA)\n# entrytocolname \u003c- function(x) a \u003c- ret[[\"effects\"]][[x]] %\u003e% setNames(c(tolower(paste0(x, \"_\", names(.)))))\nentrytocolname \u003c- function(x) tibble::tibble(rowSums(ret[[\"effects\"]][[x]])) %\u003e% setNames(x)\nReduce(dplyr::bind_cols, lapply(names(ret[[\"effects\"]]), entrytocolname)) %\u003e%\n        dplyr::mutate(date = dates) %\u003e%\n        tidyr::pivot_longer(-date, names_to = \"variable\", values_to = \"value\") %\u003e%\n        ggplot2::ggplot(ggplot2::aes(x = date, y = value, fill = variable)) +\n\tggplot2::geom_bar(stat = \"identity\") + ggplot2::theme_minimal() +\n\tggplot2::ylab(\"Units sold\") +\n\tggplot2::xlab(\"\")\n```\n\nWe can do a lot more of course! In this small snippet we'll generate 1 month worth of competitor media spendings data and plot that out.\n\n```{r competitorspendplot, fig.width=10, message=FALSE, warning=FALSE, paged.print=FALSE, cache=TRUE}\nlibrary(dammmdatagen)\nlibrary(ggplot2)\nlibrary(dplyr)\nlibrary(tidyr)\nlibrary(scales)\n\ngenerateCompetitorData(fromDate = Sys.Date() - 30, toDate = Sys.Date()) %\u003e%\n        gather(\"competitor\", \"spend\", -\"date\") %\u003e%\n        ggplot(aes(y = spend, x = date, fill = competitor)) +\n        geom_bar(stat = \"identity\", position = position_stack()) +\n        theme_minimal() +\n        scale_y_continuous(labels = dollar_format(prefix = \"kr. \"))\n```\n\nJust as we can generate competitor spending data we can also generate macroeconomical data. These types of indicators are typically slow moving over time with minor temporal differences.\n\n```{r macroecondataplot, fig.width=10, message=FALSE, warning=FALSE, paged.print=FALSE, cache=TRUE}\ngenerateMacroData(fromDate = Sys.Date() - 30, toDate = Sys.Date()) %\u003e%\n        gather(\"indicator\", \"value\", -\"date\") %\u003e%\n        ggplot(aes(y = value, x = date, color = indicator)) +\n        geom_line(size = 1.5) +\n        theme_minimal()\n```\n\n## Event type data\n\nEvent data are modeled as a poisson distribution with a low incidence.\n\n```{r eventdata1, fig.width=10, message=FALSE, warning=FALSE, paged.print=FALSE, cache=TRUE}\ngenerateEventData(Sys.Date() - 265, Sys.Date()) %\u003e%\n        gather(type, value, -date) %\u003e%\n        ggplot(aes(y = value, x = date, fill = type)) +\n        geom_bar(stat = \"identity\") +\n        theme_minimal()\n```\n\nThe incidence can of course be controlled. This is done via the freq parameter.\n\n```{r eventdata2, fig.width=10, message=FALSE, warning=FALSE, paged.print=FALSE, cache=TRUE}\ngenerateEventData(Sys.Date() - 265, Sys.Date(), freq = 0.1) %\u003e%\n        gather(type, value, -date) %\u003e%\n        ggplot(aes(y = value, x = date, fill = type)) +\n        geom_bar(stat = \"identity\") +\n        theme_minimal()\n```\n\n## Media generation\n\nGenerating media is in general a bit more complicated as we need more information since in MMM models that's what we primarily care about. So we need three data.frames; the net, the impressions and the cpms. We also differentiate between offline and online media. This difference is rather artificial right now but it's to futureproof the package.\n\n```{r onlineimpdata, fig.width=10, message=FALSE, warning=FALSE, paged.print=FALSE, cache=TRUE}\nmydflist \u003c- generateOnlineData(Sys.Date() - 30, Sys.Date())\nmydflist[[\"impression\"]] %\u003e%\n        gather(type, impression, -date) %\u003e%\n        ggplot(aes(y = impression, x = date, fill = type)) +\n        geom_bar(stat = \"identity\") +\n        theme_minimal()\n```\n\n## Code of Conduct\n\n  Please note that the dammmdatagen project is released with a [Contributor Code of Conduct](https://doktormike.github.io/dammmdatagen/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDoktorMike%2Fdammmdatagen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDoktorMike%2Fdammmdatagen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDoktorMike%2Fdammmdatagen/lists"}