{"id":15568635,"url":"https://github.com/jayvdb/pypidb","last_synced_at":"2025-04-15T17:23:59.001Z","repository":{"id":57456273,"uuid":"248654793","full_name":"jayvdb/pypidb","owner":"jayvdb","description":"PyPI client side database with SCM/VCS URLs","archived":false,"fork":false,"pushed_at":"2024-07-01T12:16:29.000Z","size":357,"stargazers_count":13,"open_issues_count":73,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-10-15T20:43:19.402Z","etag":null,"topics":["git","gitea","github","gitlab","launchpad","openstack","pypi","scm","vcs"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jayvdb.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-20T02:56:31.000Z","updated_at":"2024-04-25T07:17:18.000Z","dependencies_parsed_at":"2024-10-25T17:06:49.103Z","dependency_job_id":"3a9d0ee2-b236-4fca-b5a5-08b41640caeb","html_url":"https://github.com/jayvdb/pypidb","commit_stats":{"total_commits":152,"total_committers":3,"mean_commits":"50.666666666666664","dds":"0.013157894736842146","last_synced_commit":"2af9d402b81da6aa5598ea6e1d5015b48a75d20d"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayvdb%2Fpypidb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayvdb%2Fpypidb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayvdb%2Fpypidb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayvdb%2Fpypidb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jayvdb","download_url":"https://codeload.github.com/jayvdb/pypidb/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249116981,"owners_count":21215313,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["git","gitea","github","gitlab","launchpad","openstack","pypi","scm","vcs"],"created_at":"2024-10-02T17:19:56.594Z","updated_at":"2025-04-15T17:23:58.983Z","avatar_url":"https://github.com/jayvdb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pypidb\nPyPI client side database with SCM/VCS URLs\n\nThis project provides a client side database of [PyPI project metadata](https://pypi.org/),\nprimarily for the purpose of finding a SCM URL for any PyPI project.\nMore of the internals of the database will be exposed.  Time has been the\nmain limiting factor in exposing more.\n\nMost programming languages created in the last 10 years directly connect\nevery library to a SCM.  PyPI offers several mechanisms for package uploads\nto provide URLS, including for the SCM, however this is frequently omitted,\nis often invalid, and is frequently outdated as projects move their development\nactivities between free hosting services, especially when services are\ndiscontinued.  There are also projects which have deleted the project\non a hosted service and not republished it at a new location.\n(Perhaps due to \"right to vanish\" provisions.)\n\nThis project attempts to locate the current development URL, and has\ndeep analysis in the test suite to verify the resolution process is\ncorrect for thousands of projects.\n\nEach resolution process stops after a limited number of web fetches,\nand almost all projects tested require less than one minute per project,\nand disk caching is used so that subsequent resolution of the same projects\nare almost instantaneous.\n\nThe objective is to always give an appropriate URL for any project,\nif there is one, and if there is a credible rationale that the project\nin question is, or was, an important project.\n\nIf you encounter a project which returns the wrong result, or no result,\nfirst check the PyPI metadata for a suitable SCM link.  If none exists,\ntry to find the development project manually, and create an issue in\ntheir project to enhance the metadata they submit to PyPI.\n\nOnly if the target project maintainers are uncooperative, then create\nan issue in the pypidb project for assistance.\n\n## Details\n\nThere are over 8000 tests, however a few projects appear in multiple tests,\nso the total number of projects checked is slightly lower.\n\nTests currently cover all PyPI projects in\n* [`4,000 most-downloaded packages`](https://github.com/hugovk/top-pypi-packages)\n* [`Fedora portingdb`](https://github.com/fedora-python/portingdb)\n* [`openSUSE devel:languages:python*`](https://build.opensuse.org/project)\n\nOf those, approximately 340 projects do not return a URL, such as\n[`mysql-connector-python`](https://pypistats.org/packages/mysql-connector-python)\nwith over 55,000 downloads per day.\n\nThere are a few situations where the returned result may not be stable; where it\nmay alternate between two URLs.  The fluctuation is due to how URLs are\nqueued and fetched.  There are no known cases where this happens, however\noverride rules have been added to avoid them.\nIt is a high priority for any such occurrences to be fixed so that results are\nalways stable.   Please raise an issue if you encounter this.\n\nThere are many rules which drive the resolution, and each package can have\nan associated [unified patch](https://pypi.org/project/unidiff/) URL,\nwhich will be fetched and used to guide the resolution.\nThis is used for packages which have moved, but have not yet been re-released\nto PyPI with updated metadata.\n\nThe rules for projects may also exclude URLs in the metadata from the resolution\nprocess.\nThe rules do not allow for explicitly setting the target URL.\nFor projects which do not have a SCM, and only have a webpage, that webpage\ncan be added as a 'fake' SCM so that it will be used, however this approach\nis only to be used for moribund projects where no SCM can be found.\n\n## Usage\n\n```\n$ pip install pypidb\n$ pypidb requests-threads\nhttps://github.com/requests/requests-threads\n$ pypidb does-not-exist\nInvalid package name does-not-exist\n```\n\n```py\n\u003e\u003e\u003e from pypidb import Database\n\n\u003e\u003e\u003e db = Database()\n\u003e\u003e\u003e db.find_project_scm_url(\"requests-threads\")\n'https://github.com/requests/requests-threads'\n\u003e\u003e\u003e db.find_project_scm_url(\"mercurial\")\n'https://www.mercurial-scm.org/repo/hg'\n\u003e\u003e\u003e db.find_project_scm_url(\"cffi\")\n'https://foss.heptapod.net/pypy/cffi'\n\u003e\u003e\u003e db.find_project_scm_url(\"mysql-connector-python\")\nTraceback (most recent call last):\n    ...\npypidb._exceptions.IncompletePackageMetadata: mysql-connector-python has no email in PyPI metadata\nhttps://pypi.org/project/mysql-connector-python/8.0.19/: 500 Server Error: HTTPS Everywhere for url: https://pypi.org/project/mysql-connector-python/8.0.19/\nhttps://pypi.org/project/mysql-connector-python/: 500 Server Error: HTTPS Everywhere for url: https://pypi.org/project/mysql-connector-python/\nhttps://downloads.mysql.com/docs/licenses/connector-python-8.0-com-en.pdf: 500 Server Error: HTTPS Everywhere for url: https://downloads.mysql.com/docs/licenses/connector-python-8.0-com-en.pdf\nhttps://downloads.mysql.com/docs/licenses/connector-python-8.0-gpl-en.pdf: 500 Server Error: HTTPS Everywhere for url: https://downloads.mysql.com/docs/licenses/connector-python-8.0-gpl-en.pdf\nhttps://downloads.mysql.com/docs/licenses/connector-python-gpl-en.pdf: 500 Server Error: HTTPS Everywhere for url: https://downloads.mysql.com/docs/licenses/connector-python-gpl-en.pdf\nhttps://downloads.mysql.com/docs/connector-python-en.pdf: 500 Server Error: HTTPS Everywhere for url: https://downloads.mysql.com/docs/connector-python-en.pdf\nhttps://downloads.mysql.com/docs/licenses/connector-python-com-en.pdf: 500 Server Error: HTTPS Everywhere for url: https://downloads.mysql.com/docs/licenses/connector-python-com-en.pdf\n```\n\nResolution of many packages uses Read the Docs metadata, which performs\nbetter when using a token which can be obtained from\nhttps://readthedocs.org/accounts/tokens/\n\nIt should be stored in the default [`.netrc`](https://docs.python.org/3/library/netrc.html)\nfile, in the user home, and should have the following format.\n\n```\nmachine readthedocs.io\n    login deadbeef\n    password x-oauth-basic\n```\n\nTo a lesser extent, the GitHub API is also used.  Depending on the volume of lookups,\nit may be necessary to add a GitHub token, also stored in `.netrc`.\n\n## Testing\n\nTesting requires a GitHub token in `.netrc`.\nWithout a GitHub token, many tests will be skipped, and some will\nfail.\n(The tests can be easily fixed to detect that the API limit was reached)\n\n```sh\ngit clone https://github.com/jayvdb/pypidb\ncd pypidb\ntox\n```\nA complete test run takes several hours.  There is aggressive caching\nof web content using `CacheControl` and DNS results using `dns-cache`,\nso subsequent runs should complete in a little over an hour.\n\nRunning only tests on the top 360 most popular PyPI packages can be done\nwithout any tokens, and completes within approximately five minutes.\n```sh\ntox -- -k TestTop360\n```\nSimilarly, running the tests on the top 4000 most popular PyPI packages can\nbe done without any tokens, and completes within approximately twenty minutes,\nand tests requiring a GitHub token will be skipped.\n```sh\ntox -- tests/test_top.py\n```\n\nAs the tests are inspecting and validating the results for live project\nmetadata, and those projects are constantly on the move, and the resolution\noften includes accessing websites which may be inoperative temporarily for\nvarious reasons (usually certificate expiration!), it is not unusual for\ntests to fail.\n\nFor example there are approximately 700 expected URLS in `tests/data.py`,\ndivided into four subsets.  In the case of projects that have moved, and\nthe algorithm has correctly followed the move, those URLs need to be\nupdated.\n\nThere is rudimentary support for marking PyPI projects as untestable.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjayvdb%2Fpypidb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjayvdb%2Fpypidb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjayvdb%2Fpypidb/lists"}