Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/defgsus/frontpage-archive-2024

hourly commits of german online press headlines
https://github.com/defgsus/frontpage-archive-2024

Last synced: 28 days ago
JSON representation

hourly commits of german online press headlines

Awesome Lists containing this project

README

        

# Archive of news front pages

[![Scraper](https://github.com/defgsus/frontpage-archive-2024/actions/workflows/scraper.yml/badge.svg)](https://github.com/defgsus/frontpage-archive-2024/actions/workflows/scraper.yml)

Collects articles and stores them in [json files](docs/snapshots).

## Scraped sites:

| id | since | files | url |
|:------------------------------------------------------------|:-----------|--------:|:---------------------------------|
| [bild.de](docs/snapshots/bild.de) | 2022-01-28 | 26 | https://www.bild.de |
| [compact-online.de](docs/snapshots/compact-online.de) | 2022-01-29 | 5 | https://www.compact-online.de/ |
| [faz.net](docs/snapshots/faz.net) | 2022-01-29 | 14 | https://www.faz.net/ |
| [fr.de](docs/snapshots/fr.de) | 2022-01-28 | 8 | https://www.fr.de/ |
| [gmx.net](docs/snapshots/gmx.net) | 2022-01-29 | 7 | https://www.gmx.net/ |
| [heise.de](docs/snapshots/heise.de) | 2022-01-28 | 12 | https://www.heise.de/ |
| [spiegel.de](docs/snapshots/spiegel.de) | 2022-01-28 | 21 | https://www.spiegel.de/ |
| [spiegeldaily.de](docs/snapshots/spiegeldaily.de) | 2022-01-28 | 5 | https://www.spiegeldaily.de/ |
| [sueddeutsche.de](docs/snapshots/sueddeutsche.de) | 2022-01-29 | 7 | https://www.sueddeutsche.de/ |
| [t-online.de](docs/snapshots/t-online.de) | 2022-01-29 | 8 | https://www.t-online.de/ |
| [volksstimme.de](docs/snapshots/volksstimme.de) | 2022-01-29 | 8 | https://www.volksstimme.de/ |
| [web.de](docs/snapshots/web.de) | 2022-01-29 | 15 | https://web.de |
| [welt.de](docs/snapshots/welt.de) | 2022-01-29 | 16 | https://www.welt.de |
| [zeit.de](docs/snapshots/zeit.de) | 2022-01-29 | 16 | https://www.zeit.de/ |
| [zeitfuerdieschule.de](docs/snapshots/zeitfuerdieschule.de) | 2022-01-29 | 5 | https://www.zeitfuerdieschule.de |

Well, let's see how far this goes with a free github account.
Many websites transmit click-ids and random uuids in their
documents so there is a change in every file in each snapshot.

Anyways, currently each snapshot adds about 10mb to the
repository size (size of `.git` directory). That's not going
to work for long :-(

### TODO

- https://www.n-tv.de/
- https://www.handelsblatt.com/
- https://www.taz.de/
- https://www.wa.de/
- https://www.rnd.de/
- https://www.nzz.ch/
- https://www.bazonline.ch/
- https://www.focus.de/
- https://www.tagesschau.de/
- https://www.heise.de/tp/
- https://www.golem.de/
- https://www.kicker.de/
- https://www.achgut.com/
- https://www.stern.de/