{"id":15094085,"url":"https://github.com/muchdogesec/history4feed","last_synced_at":"2025-04-05T22:08:42.379Z","repository":{"id":245822332,"uuid":"814166823","full_name":"muchdogesec/history4feed","owner":"muchdogesec","description":"Creates a complete full text historical archive for an RSS or ATOM feed.","archived":false,"fork":false,"pushed_at":"2025-03-28T11:31:12.000Z","size":404,"stargazers_count":116,"open_issues_count":1,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-29T21:07:10.965Z","etag":null,"topics":["atom","rss","wayback-machine"],"latest_commit_sha":null,"homepage":"https://www.dogesec.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/muchdogesec.png","metadata":{"files":{"readme":"README.md","changelog":"history4feed/__init__.py","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-12T13:18:27.000Z","updated_at":"2025-03-28T14:07:17.000Z","dependencies_parsed_at":"2024-06-24T11:01:11.880Z","dependency_job_id":"722b7365-ee3c-484f-9a98-0f8f1f045e61","html_url":"https://github.com/muchdogesec/history4feed","commit_stats":null,"previous_names":["muchdogesec/history4feed"],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muchdogesec%2Fhistory4feed","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muchdogesec%2Fhistory4feed/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muchdogesec%2Fhistory4feed/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muchdogesec%2Fhistory4feed/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/muchdogesec","download_url":"https://codeload.github.com/muchdogesec/history4feed/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247406091,"owners_count":20933803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["atom","rss","wayback-machine"],"created_at":"2024-09-25T12:01:37.950Z","updated_at":"2025-04-05T22:08:42.345Z","avatar_url":"https://github.com/muchdogesec.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# history4feed\n\n## Overview\n\n![](docs/history4feed.png)\n\nIt is common for feeds (RSS or XML) to only include a limited number of posts. I generally see the latest 3 - 5 posts of a blog in a feed. For blogs that have been operating for years, this means potentially thousands of posts are missed.\n\nThere is no way to page through historic articles using an RSS or ATOM feed (they were not designed for this), which means the first poll of the feed will only contain the limited number of articles in the feed. This limit is defined by the blog owner.\n\nhistory4feed can be used to create a complete history for a blog and output it as an RSS feed.\n\nhistory4feed offers an API interface that;\n\n1. takes an RSS / ATOM feed URL\n2. downloads a Wayback Machine archive for the feed\n3. identified all unique blog posts in the historic feeds downloaded\n4. downloads a HTML version of the article content on each page\n5. stores the post record in the databases\n6. exposes the posts as JSON or XML RSS\n\n## tl;dr\n\n[![history4feed](https://img.youtube.com/vi/z1ATbiecbg4/0.jpg)](https://www.youtube.com/watch?v=z1ATbiecbg4)\n\n[Watch the demo](https://www.youtube.com/watch?v=z1ATbiecbg4).\n\n## Install\n\n### Download and configure\n\n```shell\n# clone the latest code\ngit clone https://github.com/muchdogesec/history4feed\n```\n\n### Configuration options\n\nhistory4feed has various settings that are defined in an `.env` file.\n\nTo create a template for the file:\n\n```shell\ncp .env.example .env\n```\n\nTo see more information about how to set the variables, and what they do, read the `.env.markdown` file.\n\n### Build the Docker Image\n\n```shell\nsudo docker compose build\n```\n\n### Start the server\n\n```shell\nsudo docker compose up\n```\n\n### Access the server\n\nThe webserver (Django) should now be running on: http://127.0.0.1:8002/\n\nYou can access the Swagger UI for the API in a browser at: http://127.0.0.1:8002/api/schema/swagger-ui/\n\n## Useful supporting tools\n\n* [Full Text, Full Archive RSS Feeds for any Blog](https://www.dogesec.com/blog/full_text_rss_atom_blog_feeds/)\n* [An up-to-date list of threat intel blogs that post cyber threat intelligence research](https://github.com/muchdogesec/awesome_threat_intel_blogs)\n* [Donate to the Wayback Machine](https://archive.org/donate)\n\n## Support\n\n[Minimal support provided via the DOGESEC community](https://community.dogesec.com/).\n\n## License\n\n[Apache 2.0](/LICENSE).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmuchdogesec%2Fhistory4feed","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmuchdogesec%2Fhistory4feed","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmuchdogesec%2Fhistory4feed/lists"}