{"id":17202107,"url":"https://github.com/colcarroll/feed_seeker","last_synced_at":"2025-03-25T09:14:47.407Z","repository":{"id":148526172,"uuid":"116340296","full_name":"ColCarroll/feed_seeker","owner":"ColCarroll","description":"Find rss, atom, xml, and rdf feeds on webpages","archived":false,"fork":false,"pushed_at":"2018-01-08T21:37:58.000Z","size":21,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-30T08:29:46.438Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ColCarroll.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-01-05T04:12:04.000Z","updated_at":"2018-01-05T04:13:26.000Z","dependencies_parsed_at":"2023-05-20T10:00:24.476Z","dependency_job_id":null,"html_url":"https://github.com/ColCarroll/feed_seeker","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ColCarroll%2Ffeed_seeker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ColCarroll%2Ffeed_seeker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ColCarroll%2Ffeed_seeker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ColCarroll%2Ffeed_seeker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ColCarroll","download_url":"https://codeload.github.com/ColCarroll/feed_seeker/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245431721,"owners_count":20614184,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-15T02:13:40.928Z","updated_at":"2025-03-25T09:14:47.375Z","avatar_url":"https://github.com/ColCarroll.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"===========\nFeed Seeker\n===========\n*It slant rhymes with \"heat seeker\"*\n\n|Build Status| |Coverage|\n\nA library for finding atom, rss, rdf, and xml feeds from web pages. Produced at the `mediacloud \u003chttps://mediacloud.org\u003e`_ project. An incremental improvement over `feedfinder2 \u003chttps://github.com/dfm/feedfinder2\u003e`_, which was itself based on `feedfinder \u003chttp://www.aaronsw.com/2002/feedfinder/\u003e`_, written by Mark Pilgrim, and maintained by Aaron Swartz until his untimely death. \n\nQuickstart\n==========\nBy default, the library uses :code:`requests` to grab html and inspect it and find the most\nlikely feed url:\n\n.. code-block:: python\n\n    from feed_seeker import find_feed_url\n\n    \u003e\u003e\u003e find_feed_url('https://github.com/ColCarroll/feed_seeker') \n    'https://github.com/ColCarroll/feed_seeker/commits/master.atom'\n\n\nTo do a more thorough search, use :code:`generate_feed_urls`, which returns more likely candidates first.\n\n.. code-block:: python\n\n    from feed_seeker import generate_feed_urls\n    \n    \u003e\u003e\u003e for url in generate_feed_urls('https://xkcd.com'):\n    ...     print(url)\n    ... \n    https://xkcd.com/atom.xml\n    https://xkcd.com/rss.xml\n\n\nFor the most thorough search, add a :code:`spider` argument to do depth-first spidering of urls on the same hostname. Note the below call takes nearly four minutes, compared to 0.5 seconds for :code:`find_feed_url`.\n\n\n.. code-block:: python\n\n    \u003e\u003e\u003e for url in generate_feed_urls('https://github.com/ColCarroll/feed_seeker', spider=1):\n    ...     print(url)\n    ... \n    https://github.com/ColCarroll/feed_seeker/commits/master.atom\n    https://github.com/ColCarroll/feed_seeker/commits/a8f7b86eac2cedd9209ac5d2ddcceb293d2404c9.atom\n    https://github.com/ColCarroll/feed_seeker/commits/3b5245b46a10fb3647a1f08b8e584b471683fbbd.atom\n    https://github.com/ColCarroll/feed_seeker/commits/659311b8853c4c4a67e3b4bc67a78461d825a064.atom\n    https://github.com/ColCarroll/feed_seeker/commits/3e93490cb91f7652325c2fe41ef29a5be4558d6a.atom\n    https://github.com/index.atom\n    https://github.com/articles.atom\n    https://github.com/dfm/feedfinder2/commits/master.atom\n    https://github.com/ColCarroll.atom\n    https://github.com/blog.atom\n    https://github.com/blog/all.atom\n    https://github.com/blog/broadcasts.atom\n\n\n\nInstallation\n------------\n\nThe library is not yet available on PyPI, so installation is via github only for now:\n\n.. code-block:: bash\n\n    pip install git+https://github.com/ColCarroll/feed_seeker\n                                                  \n\n\nDifferences with :code:`feedfinder2`\n====================================\nThe biggest difference is that all functions are implemented as generators, and are evaluated lazily. Candidate feed links are actually accessed and inspected to determine whether or not they are a feed, which can be quite time consuming. We expose a function to find the most likely feed link, and another to lazily generate links in rough order from most prominent to least.\n\nThere are also a few more heuristics based on our experience at `mediacloud \u003chttps://mediacloud.org\u003e`_.\n\n.. |Build Status| image:: https://travis-ci.org/ColCarroll/feed_seeker.png?branch=master\n   :target: https://travis-ci.org/ColCarroll/feed_seeker\n.. |Coverage| image:: https://coveralls.io/repos/github/ColCarroll/feed_seeker/badge.svg?branch=master\n   :target: https://coveralls.io/github/ColCarroll/feed_seeker?branch=master\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcolcarroll%2Ffeed_seeker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcolcarroll%2Ffeed_seeker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcolcarroll%2Ffeed_seeker/lists"}