{"id":17184065,"url":"https://github.com/jamesjarvis/web-graph","last_synced_at":"2026-05-04T04:31:54.062Z","repository":{"id":55113677,"uuid":"292922305","full_name":"jamesjarvis/web-graph","owner":"jamesjarvis","description":"Experiment with web scraping","archived":false,"fork":false,"pushed_at":"2022-11-10T23:25:43.000Z","size":359,"stargazers_count":0,"open_issues_count":2,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-25T02:43:39.573Z","etag":null,"topics":["colly","crawler","database","golang","web-graph"],"latest_commit_sha":null,"homepage":"https://jamesjarvis.github.io/web-graph/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jamesjarvis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-09-04T18:43:05.000Z","updated_at":"2022-07-04T19:33:19.000Z","dependencies_parsed_at":"2022-08-14T12:20:54.993Z","dependency_job_id":null,"html_url":"https://github.com/jamesjarvis/web-graph","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/jamesjarvis/web-graph","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesjarvis%2Fweb-graph","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesjarvis%2Fweb-graph/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesjarvis%2Fweb-graph/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesjarvis%2Fweb-graph/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jamesjarvis","download_url":"https://codeload.github.com/jamesjarvis/web-graph/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesjarvis%2Fweb-graph/sbom","scorecard":{"id":503868,"data":{"date":"2025-08-11","repo":{"name":"github.com/jamesjarvis/web-graph","commit":"4d4cd754f57face7ace3816da99d5ef22ed851da"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.5,"checks":[{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":0,"reason":"Found 0/26 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/deploy-ui.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/deploy-ui.yml:11: update your workflow using https://app.stepsecurity.io/secureworkflow/jamesjarvis/web-graph/deploy-ui.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/deploy-ui.yml:16: update your workflow using https://app.stepsecurity.io/secureworkflow/jamesjarvis/web-graph/deploy-ui.yml/master?enable=pin","Warn: containerImage not pinned by hash: Dockerfile-link-api:2","Warn: containerImage not pinned by hash: Dockerfile-link-processor:2","Info:   0 out of   1 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   1 third-party GitHubAction dependencies pinned","Info:   0 out of   2 containerImage dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: GNU Affero General Public License v3.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 7 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Vulnerabilities","score":0,"reason":"18 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GO-2024-2955 / GHSA-869c-j7wc-8jqv","Warn: Project is vulnerable to: GO-2021-0052 / GHSA-h395-qcrw-5vmq","Warn: Project is vulnerable to: GHSA-3vp4-m3rf-835h","Warn: Project is vulnerable to: GO-2023-1737 / GHSA-2c4m-59x9-fr2g","Warn: Project is vulnerable to: GO-2022-0236 / GHSA-h86h-8ppg-mxmh","Warn: Project is vulnerable to: GO-2021-0238 / GHSA-83g2-8m93-v3w7","Warn: Project is vulnerable to: GO-2022-0288","Warn: Project is vulnerable to: GO-2022-0969 / GHSA-69cg-p879-7622","Warn: Project is vulnerable to: GO-2022-1144 / GHSA-xrjj-mj9h-534m","Warn: Project is vulnerable to: GO-2023-1571 / GHSA-vvpx-j8f3-3w6h","Warn: Project is vulnerable to: GO-2023-1988 / GHSA-2wrh-6pvc-2jm9","Warn: Project is vulnerable to: GO-2023-2102 / GHSA-4374-p667-p6c8","Warn: Project is vulnerable to: GHSA-qppj-fm5r-hxr3","Warn: Project is vulnerable to: GO-2024-2687 / GHSA-4v7x-pqxf-cx7m","Warn: Project is vulnerable to: GO-2024-3333","Warn: Project is vulnerable to: GO-2025-3503 / GHSA-qxp5-gwg8-xv66","Warn: Project is vulnerable to: GO-2025-3595 / GHSA-vvgc-356p-c3xw","Warn: Project is vulnerable to: GO-2022-0493 / GHSA-p782-xgp4-8hr8"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-19T22:46:55.161Z","repository_id":55113677,"created_at":"2025-08-19T22:46:55.161Z","updated_at":"2025-08-19T22:46:55.161Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32595078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T22:12:39.696Z","status":"online","status_checked_at":"2026-05-04T02:00:06.625Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["colly","crawler","database","golang","web-graph"],"created_at":"2024-10-15T00:42:19.890Z","updated_at":"2026-05-04T04:31:54.047Z","avatar_url":"https://github.com/jamesjarvis.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Graph\n\n\u003e Experiment with web scraping\n\nView it live! \u003chttps://jamesjarvis.github.io/web-graph/\u003e\n\nIf you want to start from a different url, you can change the query string!\n(Note that you can only look at urls that are indirectly discoverable from the root jamesjarvis.io).\n\nExample: \u003chttps://jamesjarvis.github.io/web-graph/?url=https://en.wikipedia.org/wiki/London\u003e\n\nThe basic idea of this is that I wanted to be able to crawl from a single URL, and scrape the entire tree of links it can traverse.\n\nRough overview:\n\nCrawler is given a url.\nIt first checks that this url has not been crawled already, if it has, then it just moves on.\nThen it checks that the url is accessible, it'll do some small exponential backoof, but then returns PageDeadError\nIf it can, it will download the page source, and scrape all 'a' elements, and the href attribute from that.\nThen it sends all these scraped URL's to the back of a list, and the process repeats.\n\nEssentially, this is a breadth first crawl of the whole internet, or at least until either my 1TB hard drive runs out of space, or virgin media cuts me off.\n\n## The API\n\n\u003chttps://api.jamesjarvis.io\u003e\n\nIf you want to mess about with the API directly, you need to know that the \"id\" of each page is calculated as the following:\n\n\u003e SHA1(hostname + pathname).hex()\n\nIf you want to find out the id's of pages found on a particular host, you can use: \u003chttps://api.jamesjarvis.io/pages/jamesjarvis.io\u003e\n\nIf you want to find info of a page, along with the id's of pages linked *from* this page, use: \u003chttps://api.jamesjarvis.io/page/5bc63ce53c8aaede0889ee9e90276affbbba7573\u003e\n\nIf you want to find the links *to* a page (v useful for discovering backlinks), use: \u003chttps://api.jamesjarvis.io/linksTo/5bc63ce53c8aaede0889ee9e90276affbbba7573\u003e\n\n## To run\n\n```bash\ndocker-compose up --build -d \u0026\u0026 docker-compose logs -f link-processor\n```\n\nThen open \u003clocalhost:8080\u003e and enter your credentials from [Your database environment file](./database.env.example)\nNote, if running this on an rpi, stop the pgadmin service with `docker compose stop pgadmin` as it is not compiled for ARM.\n\nTo see the UI, open the `frontend/index.html` file in a browser.\n## DB Schema\n\n### Page\n\n| Page ID (PK) (generated as hash of host+path) | Host             | Path            | Url                                  |\n| --------------------------------------------- | ---------------- | --------------- | ------------------------------------ |\n| 1 (hash of host+path)                         | jamesjarvis.io   | /               | https://jamesjarvis.io/              |\n| 2 (hash of host+path)                         | en.wikipedia.com | /united-kingdom | https://wikipedia.com/united-kingdom |\n\n### Link\n\n| FromPageID (FK) | ToPageID (FK) | Link text        |\n| --------------- | ------------- | ---------------- |\n| 1               | 2             | I live in the UK |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjamesjarvis%2Fweb-graph","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjamesjarvis%2Fweb-graph","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjamesjarvis%2Fweb-graph/lists"}