{"id":16163460,"url":"https://github.com/marianfoo/podcastanalyticsbwb","last_synced_at":"2025-06-21T05:02:24.129Z","repository":{"id":98721874,"uuid":"420693045","full_name":"marianfoo/PodcastAnalyticsBWB","owner":"marianfoo","description":"Podcast Analytics 'Baywatch Berlin'","archived":false,"fork":false,"pushed_at":"2021-11-12T17:08:49.000Z","size":2976,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-07T04:31:19.144Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/marianfoo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-24T13:31:16.000Z","updated_at":"2021-11-12T17:08:52.000Z","dependencies_parsed_at":"2023-05-24T23:15:35.161Z","dependency_job_id":null,"html_url":"https://github.com/marianfoo/PodcastAnalyticsBWB","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/marianfoo/PodcastAnalyticsBWB","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marianfoo%2FPodcastAnalyticsBWB","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marianfoo%2FPodcastAnalyticsBWB/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marianfoo%2FPodcastAnalyticsBWB/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marianfoo%2FPodcastAnalyticsBWB/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/marianfoo","download_url":"https://codeload.github.com/marianfoo/PodcastAnalyticsBWB/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marianfoo%2FPodcastAnalyticsBWB/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261065246,"owners_count":23104761,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-10T02:35:45.803Z","updated_at":"2025-06-21T05:02:19.094Z","avatar_url":"https://github.com/marianfoo.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Analyse des Podcasts 'Baywatch Berlin'\n\n[*** english below ***](README.md#analysis-of-the-podcast-baywatch-berlin)\n\n# Einleitung\n\nDer Markt der Podcasts ist in den letzten Jahren stark gewachsen. Viele neue Podcasts treffen auf viele neue Hörer.\nDies ermöglicht auch einen Podcasts zu vermarkten und in dessen Werbung zu schalten.\nDies macht auch der Podcast von Klaas Heufer-Umlauf, Thomas Schmitt und Jakob Lundt, genannt 'Baywatch Berlin'.\nDie Werbeeinblendungen sind klar vom Content durch einen Insert getrennt.\n\n# Ziel\n\nDurch diesen Insert, sollte es einfach sein den Anfang und Ende der Werbeeinblendungen klar herauszufiltern.\nSo kann einfach der Insert aus einer Folge herausgeschnitten werden, und in allen Folgen gesucht werden\n\n# Vorgehenweise\n\nHier werden die einzelnen Schritte beschrieben wie vorgegangen worden ist.\n\n## Recherche\n\nAls Grundlage wird die Python Programmiersprache genutzt. Als IDE wird Jupyter mit Anaconda hergenommen.\nEs sollte eine möglich einfacher herangehensweise genutzt werden.\nNach kurzer Recherche wurde eine Stackoverflow Antwort gefunden die nützlich sein könnte.\nhttps://stackoverflow.com/a/67469084\n\nDiese Antwort wurde angepasst und auf diesen Use Case zugeschnitten.\n\n## MP3 Dateien umformatieren\n\n[Jupyter Datei: format mp3 to wav](format_mp3_to_wav.ipynb)\n\nFür die Analyse mussten die Dateien in das wav Format gebracht werden.\n\nFür eine geringere Datei Größe und eine schnellere Bearbeitung wurden vorher die Dateien zu einer kleineren mp3 Datei konvertiert:\n\nmp3 Original --\u003e mp3 kleinere Bitrate --\u003e wav\n\n## Werbung Timestamp finden\n\n[Jupyter Datei: extract ad timestamps](extract_ad_timestamps.ipynb)\n\nDas [Snippet](https://stackoverflow.com/a/67469084) wurde hier angepasst und verwendet.\nEinzig die Korrelationswerte mussten angepasst werden, die steuert ab wann ein Übereinstimmung markiert wird.\nDa nur die Daten gespeichert wurden, sind die Grafiken auch nicht notwendig und wurden entfernt.\n\n## Analyse der Timestamps\n\n[Jupyter Datei: analyze ad timestamps](analyze_ad_timestamps.ipynb)\n\nFür die Analyse wurden die Daten aufbereitet und mit plotly dargestellt.\n\n## Analyse der Metadaten\n\n[Jupyter Datei: analyze episodes metadata](analyze_episodes_metadata.ipynb)\n\nFür die Analyse der Metadaten wurden diese von den mp3s ausgelesen.\nDargestellt wurde die Laufzeit je Episode sowie die Laufzeit der Episode mit einer Trendlinie\n\n# Ergebnis\n\n## Liste aller Episoden (ohne Spezialfolgen)\n\nHier ist die Liste mit allen Folgen ohne Spezialfolgen:\n\n[Hier eine Liste mit allen Episoden](list_episodes.csv)\n\n\n![CSV Liste alle Folgen](list_episodes_csv.png \"CSV Liste alle Folgen\")\n\n## Werbung\n\nSchlussendlich wurde das Ziel erreicht und die Timestamps konnten erfolgreich herausgelesen werden.\nBei der Auswertung wurde deutlich dass es immer zwei Werbeblöcke gibt.\nErstaunlicherweise beträgt die Summe dieser Blöcke so gut wie immer 284 Sekunden.\nDie zwei Blöcke setzen sich zusammen aus einem 152 sekündigen und einem 132 sekündigen Block.\nOb zuerst der lange oder der kurze Block kommt, ist unterschiedlich.\nSeit Folge 94 ist die jedoch anscheinend nicht mehr der Fall.\nDie Auswertung hatte am 24.10.2021 statt gefunden. Die Werbeblöcke sind so konstant, dass einiges darauf hin deutet, dass die Blöcke im Nachhinein geändert wurden.\nDie Veträge mit Werbepartner gehen üblicherweise nicht lange, so dass später die Werbeblöcke noch ausgetauscht werden.\nDa die letzten Folgen nicht konstant sind deutet ebenfalls darauf hin, da hier noch die 'original' Werbeblöcke enthalten sind.\n\n![Plot Ad Duration](plot_ads.png \"Plot Ad Duration\")\n\n## Metadata\n\nDie Länge der Folgen hat auch an Volatilität zugenommen.\nZwischen Folge 30 und 66 waren die Folgen meist rund 90 Minuten lang.\nSo konsequent sind diese Ergebisse nicht und die Folgenlänge schwankt stark zwischen 93 und 67 Minuten.\n\n![Plot Podcast Duration](plot_duration.png \"Plot Podcast Duration\")\n\n## Jakobs Seufzer\n\nIn der Folge 'Das Gottesteilchen von Cala Ratata' wurde das erste mal ein Seufzer von Jakob eingespielt.\nDieser wurde öfters wiederholt.\n[Hier eine Liste mit allen Seufzer](jakob_clean.csv)\n\n\n# Analysis of the podcast 'Baywatch Berlin\n\n# Introduction\n\nThe podcast market has grown strongly in recent years. Many new podcasts meet many new listeners.\nThis also makes it possible to market podcasts and place advertisements in them.\nThe podcast by Klaas Heufer-Umlauf, Thomas Schmitt and Jakob Lundt, called 'Baywatch Berlin', also does this.\nThe advertisements are clearly separated from the content by an insert.\n\n# Goal\n\nThrough this insert, it should be easy to clearly filter out the beginning and end of the commercials.\nThis way, the insert can simply be cut out of an episode and searched for in all episodes.\n\n# Procedure\n\nHere the individual steps are described how to proceed.\n\n## Research\n\nThe Python programming language is used as the basis. Jupyter with Anaconda is used as the IDE.\nThe approach should be as simple as possible.\nAfter a short search, a Stackoverflow answer was found that could be useful.\nhttps://stackoverflow.com/a/67469084\n\nThis answer was adapted and tailored to this use case.\n\n## Reformatting MP3 files\n\n[Jupyter File: format mp3 to wav](format_mp3_to_wav.ipynb)\n\nFor the analysis, the files had to be converted to wav format.\n\nFor a smaller file size and faster processing, the files were converted to a smaller mp3 file beforehand:\n\nmp3 original --\u003e mp3 smaller bitrate --\u003e wav\n\n## Find advertising timestamp\n\n[Jupyter File: analyze ad timestamps](analyze_ad_timestamps.ipynb)\n\nThe [snippet](https://stackoverflow.com/a/67469084) was adapted and used here.\nOnly the correlation values had to be adapted, which controls when a match is marked.\nSince only the data was saved, the graphics are not necessary and were removed.\n\n## Analysis of the timestamps\n\n[Jupyter File: analyze ad timestamps](analyze_ad_timestamps.ipynb)\n\nFor the analysis, the data was prepared and displayed with plotly.\n\n## Analysis of the metadata\n\n[Jupyter File: analyze episodes metadata](analyze_episodes_metadata.ipynb)\n\nFor the analysis of the metadata, these were read from the mp3s.\nThe running time per episode as well as the running time of the episode with a trend line was displayed.\n\n# Result\n\n## Ads\n\nFinally, the goal was achieved and the timestamps could be read out successfully.\nDuring the evaluation it became clear that there are always two commercial breaks.\nSurprisingly, the sum of these blocks is almost always 284 seconds.\nThe two blocks consist of a 152-second block and a 132-second block.\nWhether the long or the short block comes first varies.\nSince episode 94, however, this is apparently no longer the case.\nThe evaluation had taken place on 24.10.2021. The advertising blocks are so constant that there are indications that the blocks were changed afterwards.\nThe contracts with advertising partners usually do not last long, so that the advertising blocks are exchanged later.\nThe fact that the last episodes are not constant also indicates that the 'original' advertising blocks are still included.\n\n![Plot Ad Duration](plot_ads.png \"Plot Ad Duration\")\n\n## Metadata\n\nThe length of the episodes has also increased in volatility.\nBetween episode 30 and 66, the episodes were usually around 90 minutes long.\nThese results are not that consistent and the episode length fluctuates greatly between 93 and 67 minutes.\n\n![Plot Podcast Duration](plot_duration.png \"Plot Podcast Duration\")\n\n## Jacob's sigh\n\nIn the episode 'The God Particle of Cala Ratata' a sigh of Jacob was recorded for the first time.\nThis was repeated several times.\n[Here is a list of all the sighs](jakob_clean.csv)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarianfoo%2Fpodcastanalyticsbwb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarianfoo%2Fpodcastanalyticsbwb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarianfoo%2Fpodcastanalyticsbwb/lists"}