Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/defgsus/frontpage-archive-2024
hourly commits of german online press headlines
https://github.com/defgsus/frontpage-archive-2024
Last synced: 28 days ago
JSON representation
hourly commits of german online press headlines
- Host: GitHub
- URL: https://github.com/defgsus/frontpage-archive-2024
- Owner: defgsus
- Created: 2023-12-31T16:24:48.000Z (11 months ago)
- Default Branch: master
- Last Pushed: 2024-04-14T02:28:53.000Z (7 months ago)
- Last Synced: 2024-04-14T03:01:25.191Z (7 months ago)
- Language: Python
- Homepage:
- Size: 311 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Archive of news front pages
[![Scraper](https://github.com/defgsus/frontpage-archive-2024/actions/workflows/scraper.yml/badge.svg)](https://github.com/defgsus/frontpage-archive-2024/actions/workflows/scraper.yml)
Collects articles and stores them in [json files](docs/snapshots).
## Scraped sites:
| id | since | files | url |
|:------------------------------------------------------------|:-----------|--------:|:---------------------------------|
| [bild.de](docs/snapshots/bild.de) | 2022-01-28 | 26 | https://www.bild.de |
| [compact-online.de](docs/snapshots/compact-online.de) | 2022-01-29 | 5 | https://www.compact-online.de/ |
| [faz.net](docs/snapshots/faz.net) | 2022-01-29 | 14 | https://www.faz.net/ |
| [fr.de](docs/snapshots/fr.de) | 2022-01-28 | 8 | https://www.fr.de/ |
| [gmx.net](docs/snapshots/gmx.net) | 2022-01-29 | 7 | https://www.gmx.net/ |
| [heise.de](docs/snapshots/heise.de) | 2022-01-28 | 12 | https://www.heise.de/ |
| [spiegel.de](docs/snapshots/spiegel.de) | 2022-01-28 | 21 | https://www.spiegel.de/ |
| [spiegeldaily.de](docs/snapshots/spiegeldaily.de) | 2022-01-28 | 5 | https://www.spiegeldaily.de/ |
| [sueddeutsche.de](docs/snapshots/sueddeutsche.de) | 2022-01-29 | 7 | https://www.sueddeutsche.de/ |
| [t-online.de](docs/snapshots/t-online.de) | 2022-01-29 | 8 | https://www.t-online.de/ |
| [volksstimme.de](docs/snapshots/volksstimme.de) | 2022-01-29 | 8 | https://www.volksstimme.de/ |
| [web.de](docs/snapshots/web.de) | 2022-01-29 | 15 | https://web.de |
| [welt.de](docs/snapshots/welt.de) | 2022-01-29 | 16 | https://www.welt.de |
| [zeit.de](docs/snapshots/zeit.de) | 2022-01-29 | 16 | https://www.zeit.de/ |
| [zeitfuerdieschule.de](docs/snapshots/zeitfuerdieschule.de) | 2022-01-29 | 5 | https://www.zeitfuerdieschule.de |Well, let's see how far this goes with a free github account.
Many websites transmit click-ids and random uuids in their
documents so there is a change in every file in each snapshot.Anyways, currently each snapshot adds about 10mb to the
repository size (size of `.git` directory). That's not going
to work for long :-(### TODO
- https://www.n-tv.de/
- https://www.handelsblatt.com/
- https://www.taz.de/
- https://www.wa.de/
- https://www.rnd.de/
- https://www.nzz.ch/
- https://www.bazonline.ch/
- https://www.focus.de/
- https://www.tagesschau.de/
- https://www.heise.de/tp/
- https://www.golem.de/
- https://www.kicker.de/
- https://www.achgut.com/
- https://www.stern.de/