{"id":15297496,"url":"https://github.com/ssokolow/get_user_headers","last_synced_at":"2026-01-19T18:31:02.372Z","repository":{"id":57434086,"uuid":"63434739","full_name":"ssokolow/get_user_headers","owner":"ssokolow","description":"Python module to retrieve identifying request headers from the user's browser for use by local bots","archived":false,"fork":false,"pushed_at":"2017-11-27T03:23:07.000Z","size":64,"stargazers_count":1,"open_issues_count":10,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-09-28T02:16:43.274Z","etag":null,"topics":["automation","bot","browser","helper","http","library","module","python","python-2","python-3","python2","python3","scraping","spider","utility","web"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ssokolow.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-07-15T16:07:53.000Z","updated_at":"2024-07-01T14:18:22.000Z","dependencies_parsed_at":"2022-08-27T21:11:10.131Z","dependency_job_id":null,"html_url":"https://github.com/ssokolow/get_user_headers","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/ssokolow/get_user_headers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssokolow%2Fget_user_headers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssokolow%2Fget_user_headers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssokolow%2Fget_user_headers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssokolow%2Fget_user_headers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ssokolow","download_url":"https://codeload.github.com/ssokolow/get_user_headers/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssokolow%2Fget_user_headers/sbom","scorecard":{"id":844889,"data":{"date":"2025-08-11","repo":{"name":"github.com/ssokolow/get_user_headers","commit":"1741e022dbd4e35ad09b9b062d36500c03ca9e80"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3,"checks":[{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.txt:0","Info: FSF or OSI recognized license: MIT License: LICENSE.txt:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}}]},"last_synced_at":"2025-08-23T21:12:06.575Z","repository_id":57434086,"created_at":"2025-08-23T21:12:06.575Z","updated_at":"2025-08-23T21:12:06.575Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28580128,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-19T18:29:59.827Z","status":"ssl_error","status_checked_at":"2026-01-19T18:29:40.878Z","response_time":67,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","bot","browser","helper","http","library","module","python","python-2","python-3","python2","python3","scraping","spider","utility","web"],"created_at":"2024-09-30T19:17:51.065Z","updated_at":"2026-01-19T18:31:02.355Z","avatar_url":"https://github.com/ssokolow.png","language":"Python","readme":"Module for retrieving identifying headers from the user's preferred browser\n===========================================================================\n\n**Code Health:**\n\n.. image:: https://landscape.io/github/ssokolow/get_user_headers/master/landscape.svg?style=flat\n   :target: https://landscape.io/github/ssokolow/get_user_headers/master\n   :alt: Code Health\n\n.. image:: https://scrutinizer-ci.com/g/ssokolow/get_user_headers/badges/quality-score.png?b=master\n   :target: https://scrutinizer-ci.com/g/ssokolow/get_user_headers/?branch=master\n   :alt: Scrutinizer Code Quality\n\n.. image:: https://codeclimate.com/github/ssokolow/get_user_headers/badges/gpa.svg\n   :target: https://codeclimate.com/github/ssokolow/get_user_headers\n   :alt: Code Climate\n\n**Unit Tests:**\n\n.. image:: https://travis-ci.org/ssokolow/get_user_headers.svg?branch=master\n   :target: https://travis-ci.org/ssokolow/get_user_headers\n   :alt: Unit Tests\n\n.. image:: https://ci.appveyor.com/api/projects/status/1ds9dwd85vl94nsi?svg=true\n   :target: https://ci.appveyor.com/project/ssokolow/get-user-headers\n   :alt: Unit Tests (Windows)\n\n.. image:: https://coveralls.io/repos/github/ssokolow/get_user_headers/badge.svg?branch=master\n   :target: https://coveralls.io/github/ssokolow/get_user_headers?branch=master\n   :alt: Coverage\n\n**Project Status:**\n\n.. image:: https://badge.waffle.io/ssokolow/get_user_headers.svg?label=ready\u0026title=Ready%20Tasks\n   :target: https://waffle.io/ssokolow/get_user_headers\n   :alt: 'Tasks ready to be worked on'\n\n.. image:: https://img.shields.io/pypi/pyversions/get-user-headers.svg\n   :target: https://travis-ci.org/ssokolow/get_user_headers\n   :alt: Python 2.7 and 3 compatible\n\n.. image:: https://img.shields.io/pypi/l/get-user-headers.svg\n   :target: http://opensource.org/licenses/MIT\n   :alt: MIT Licensed\n\n.. image:: https://img.shields.io/pypi/v/get-user-headers.svg\n   :target: https://pypi.python.org/pypi/get-user-headers\n   :alt: PyPI\n\n\n.. image:: https://img.shields.io/pypi/wheel/get-user-headers.svg\n   :target: https://pypi.python.org/pypi/get-user-headers\n   :alt: Wheel available\n\nDeveloped under Python 2.7 and 3.4.\n\nRationale\n---------\n\nSome sites don't provide an API for automating commonly desired tasks and can\nbe overly aggressive in blocking user agents which merely do what the user\ncould do anyway (ie. Ctrl+S on every chapter of a story so it can be converted\ninto an eBook for reading on the go) but faster... *Even when they go out of\ntheir way to be kinder to the website than real browsers by not loading\nimages/CSS/JavaScript/fonts/etc. and using a stricter caching policy.*\n\nThis module makes it easier for well-intentioned convenience bots to disguise\nthemselves as the user's regular browser. When combined with a randomized\ndelay between each request, this makes it difficult for sites to distinguish\nactions performed by the user directly from actions performed by a bot acting\non behalf of the user... thus forcing such sites to address the root problem\n(abusive behaviour) rather than singling out bots which are only doing what\nhumans otherwise would.\n\nI understand that the desire to display advertising may be a factor, but I feel\nthat ad-blocking extensions will always be far more popular than this ever\ncould be and those are part of the browsers that are let through, un-molested.\n\nMessage to Website Developers\n-----------------------------\n\n**I write bots to streamline tasks I was already doing by hand.** While I\nunderstand the need to prevent abusive behaviour, **not every bot is abusive**\nand I hate drudgework.\n\nThe two classes of bots I write are RSS feed generators to watch a specific\nthread/tag/category/search for updates and simplified HTML exporters for\nreading fiction offline (either directly on my OpenPandora_ or on my old Sony\nReader PRS-505 via ebook-convert_).\n\n**I always prefer official feeds/exporters if they meet my needs.** If you\nwrite one, and you announce it well enough for me to discover it, and you don't\ncharge extra for it, I'll stop using my bots. They're always doomed to be more\nfragile anyway. (But, no, **iTunes is not acceptable**. I refuse to use\nproprietary clients and/or DRMed formats.)\n\n**I'm always conservative in my update polling**. I can't remember a time I've\never had an RSS generator poll for updates more frequently than once per day\nand my story exporters tend to cache chapters forever unless I manually evict\nstale content.\n\n**All of my bots will properly obey any HTTP cache-control headers you set**\nand, since hard drive space is relatively cheap and the bots are limited in\nscope, they will never prematurely expire cached data the way actual browsers\ndo.\n\nDouble-check that your server setup can efficiently return a\n``304 Not Modified`` response when faced with headers like\n``If-Modified-Since`` and ``ETag``. A surprising number of sites are wasting a\nton of CPU time and bandwidth with *real browsers* that way.\n\nLikewise, **my example code also caches properly**, so feel free to ban any\nbot which does not respect your cache directives. People who write non-caching\nbots have no excuse and will get no sympathy from me.\n\n**My bots also do request throttling** that's *stricter* than what you'll see\nfrom my browser when I middle-click two dozen links in rapid succession so the\nlater ones can be loading while I look at the earlier ones.\n\nFurthermore, while **I can't trust ROBOTS.TXT to be reasonable**, I write\nmy own spiders and whitelists to ensure **I only retrieve the bare minimum\nnecessary** to generate my desired outputs, and I have yet to find a site where\nthat requires downloading more than specific HTML pages and certain inline\nimages used as thumbnails or fancy horizontal rulings. (And, while doing so, my\nscrapers *permanently* cache static files, regardless of HTTP headers, to be\nextra nice.)\n\n**However, don't mistake my kindness for weakness.**\n\nI'm willing to bet anything that, long **before you make my convenience\nbots unfeasible, you'll annoy all of your users into leaving**.\n\nIf you start trying to identify bots by their refusal to download supplementary\nfiles, I have no problem downloading and then throwing away CSS/JS/etc.\njust to appear more browser-like.\n\nIf you do statistical analysis to identify likely bots, I'll do the labwork to\nimprove the statistical distribution of my ``randomize_delay()`` function to\nthe point where you start banning too many real humans.\n\nIf you start requiring a CAPTCHA or JavaScript, I'll pretend to be a bot you\ncan't afford to exclude, like GoogleBot.\n\nIf you start going to the trouble of maintaining a list of IPs used by the real\nGoogleBot or if you actually *are* big enough to survive banning GoogleBot,\nI'll extend this into something that makes it easy to embed a full browser\nengine and JDownloader_-style \"please fill this CAPTCHA\" popups into any bot\nanyone wants to write.\n\nIf your browser fingerprinting gets good enough to foil that, I'll convert this\ninto a framework that allows my bots to easily puppet my actual day-to-day web\nbrowser via a custom extension (so it doesn't announce itself like Selenium\nWebDriver) in order to perform their requests.\n\n**Detect abuse, not bots!**\n\n.. _ebook-convert: http://manual.calibre-ebook.com/generated/en/ebook-convert.html\n.. _JDownloader: https://en.wikipedia.org/wiki/JDownloader\n.. _OpenPandora: http://openpandora.org/\n.. _PRS-505: https://en.wikipedia.org/wiki/PRS-505#2007_Model_.28Discontinued_late_2009.29\n\nInstallation\n------------\n\n.. code:: bash\n\n    pip install get-user-headers\n\nI also *strongly* recommend using the requests_ and CacheControl_ libraries to\nmake your HTTP requests so you can get proper HTTP caching semantics for free.\n\n.. code:: bash\n\n    pip install requests cachecontrol[filecache]\n\n.. _Betamax: https://github.com/sigmavirus24/betamax\n.. _CacheControl: https://cachecontrol.readthedocs.io/\n.. _FileCache: https://cachecontrol.readthedocs.io/en/latest/storage.html#filecache\n.. _requests: http://docs.python-requests.org/\n\nUsage\n-----\n\n.. code:: python\n\n    import os, time\n\n    import requests\n    from cachecontrol import CacheControl\n    from cachecontrol.caches import FileCache\n\n    from get_user_headers import UserHeaderGetter, randomize_delay\n\n    # Measure and average the time a human takes (per page, in seconds)\n    # for your specific application and use that number here\n    BASE_DELAY = 3\n\n    # requests.Session provides cookie handling and default headers\n    # CacheControl automates proper HTTP caching so you don't get banned\n    # FileCache ensures your cache survives across multiple runs of your bot\n    session = CacheControl(requests.Session(),\n        cache=FileCache(os.path.expanduser('~/.cache/http_cache')))\n    session.headers.update(UserHeaderGetter().get_safe())\n\n    urls = [(None, 'http://www.example.com/')]\n    while urls:\n        parent_url, url = urls.pop(0)\n\n        req_headers = {}\n        if parent_url:\n            req_headers['Referer'] = parent_url\n\n        response = session.get(url, headers=req_headers)\n\n        # TODO: Do actual stuff with the response and maybe urls.append(...)\n        print(response)\n\n        # Simulate human limits to foil statistical analysis\n        time.sleep(randomize_delay(BASE_DELAY))\n\nAlso, while developing your bot, be sure to use some mechanism to cache your\ntest URLs permanently, such as passing ``forever=True`` when initializing\nFileCache_ or using Betamax_. (Both options will make your tests more reliable\nand protect you from getting banned for re-running your code too often in a\nvery short period of time.)\n\n**Example Headers Gathered:**\n\n.. code::\n\n            Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\n        User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0\n               DNT: 1\n   Accept-Language: en-US,en;q=0.5\n\nImportant Dynamic Headers to Mimic\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nDon't forget to also provide proper values for the following headers, which\n``get_safe()`` cannot return because they change from request to request:\n\nHTTP cache-control headers\n    If you are not using my example code, make sure you implement proper HTTP\n    caching.\n\n    If your bot doesn't implement HTTP caching and visits a URL more than once,\n    then that's abusive behaviour and I won't shed a tear if the website\n    administrator blocks you.\n\n``Referer`` (Note the intentional mis-spelling)\n   The second-easiest way for a site to detect hastily-written bots after\n   checking the ``User-Agent`` header is to check for a missing or incorrect\n   URL in the ``Referer`` header.\n\n   Ideally, you want to keep track of which URLs led to which other URLs so you\n   can do this perfectly, but most sites will be happy if you set ``Referer``\n   to ``http://www.example.com/`` for every request that begins with that root.\n   (And various privacy-enhancing browser extensions like RefControl and\n   uMatrix also have an option to cause real browsers to behave this way.)\n\n   My example code also demonstrates this.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssokolow%2Fget_user_headers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fssokolow%2Fget_user_headers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssokolow%2Fget_user_headers/lists"}