{"id":36809540,"url":"https://github.com/notzree/wikigraph_server","last_synced_at":"2026-01-12T13:44:58.821Z","repository":{"id":226851084,"uuid":"769722882","full_name":"notzree/wikigraph_server","owner":"notzree","description":"API to traverse wikipedia graph","archived":false,"fork":false,"pushed_at":"2025-05-28T01:42:54.000Z","size":11887,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-28T02:43:52.350Z","etag":null,"topics":["golang","microservice","wikipedia-api","wikipedia-search"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/notzree.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-09T21:44:46.000Z","updated_at":"2025-05-28T01:42:58.000Z","dependencies_parsed_at":"2024-09-15T09:22:54.278Z","dependency_job_id":null,"html_url":"https://github.com/notzree/wikigraph_server","commit_stats":null,"previous_names":["notzree/wikigraph_server"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/notzree/wikigraph_server","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/notzree%2Fwikigraph_server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/notzree%2Fwikigraph_server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/notzree%2Fwikigraph_server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/notzree%2Fwikigraph_server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/notzree","download_url":"https://codeload.github.com/notzree/wikigraph_server/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/notzree%2Fwikigraph_server/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28339368,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-12T12:22:26.515Z","status":"ssl_error","status_checked_at":"2026-01-12T12:22:10.856Z","response_time":98,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["golang","microservice","wikipedia-api","wikipedia-search"],"created_at":"2026-01-12T13:44:58.695Z","updated_at":"2026-01-12T13:44:58.791Z","avatar_url":"https://github.com/notzree.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align = \"center\"\u003e\n\u003cpre\u003e\n__        _____ _  _____ ____ ____      _    ____  _   _\n\\ \\      / /_ _| |/ /_ _/ ___|  _ \\    / \\  |  _ \\| | | |\n \\ \\ /\\ / / | || ' / | | |  _| |_) |  / _ \\ | |_) | |_| |\n  \\ V  V /  | || . \\ | | |_| |  _ \u003c  / ___ \\|  __/|  _  |\n   \\_/\\_/  |___|_|\\_\\___\\____|_| \\_\\/_/   \\_\\_|   |_| |_|\n  -------------------------------------------------------\n  Golang API server to concurrently search wikipedia link graph\n\u003c/pre\u003e\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n\u003c/div\u003e\nTry it out: https://wikigraph-client.vercel.app/\n\nEver wanted to cheat in your Wikipedia speedruns?\nTry Wikigraph, an API to tell you the shortest distance between (almost) any 2 articles on wikipedia.\nOutdated data? Create a fresh copy yourself using [wikigraph_script](https://github.com/notzree/wikigraph_script)\n\n# Skip to:\n\n- [performance](#Benchmarks)\n- [implementation detail](#Implementation-details)\n\n\n## Installation\nMake sure you have docker installed and working. \\\nClone this repo\n```sh\ngit clone https://github.com/notzree/wikigraph_server.git\n```\nDownload the Binary graph or create it (see below)\nGoogle drive link: [Graph Link](https://drive.google.com/file/d/1GDBSYfmq6aJpdc_6L5Q5RVJDWMi0vTiK/view?usp=sharing) \\\nBe sure to move it to the root of the git repo\n```sh\nmv downloads/wikipedia_binary_graph.bin /path_to_git_repo\n```\nDownload the database dumps:\nGoogle drive link: [Dump Link](https://drive.google.com/file/d/10kCHg-DeNeQ36ASptNBYBzsh90-opFnS/view?usp=drive_link)\n```sh\ncd wikigraph_server_repo_path\nmkdir database_infra\ncd database_infra \u0026\u0026 mkdir initdb\nmv downloads/wikigraph_dumps.sql /path_to_git_repo/database_infra/initdb\n```\nBuild with docker, this will setup all the database stuff and run the server on port 80.\n```sh\ndocker-compose build \u0026\u0026 docker-compose up\n```\n## Usage example\nOn other branches you may find working implementations of a gRPC service, for the sake of time I've only included interactions with the REST api.\nThe both must be a valid wikipedia article names in lowercase (more on this in the caveats section).\nInvalid or misstyped article names will cause a 500 error.\nThe graph and database dumps found on the drive were created Feburary 2024. If you need a newer version, you can use [wikigraph_script](https://github.com/notzree/wikigraph_script)\n### Find a path sequentially\n```sh\ncurl  -X POST \\\n  'http://localhost:8080/search/sequential' \\\n  --header 'Accept: */*' \\\n  --header 'User-Agent: Thunder Client (https://www.thunderclient.com)' \\\n  --header 'Content-Type: application/json' \\\n  --data-raw '{\n  \"from\": \"university of waterloo\",\n  \"to\": \"outer mongolia\"\n}'\n```\n### Find a path concurrently\n```sh\ncurl  -X POST \\\n  'http://localhost:8080/search/concurrent' \\\n  --header 'Accept: */*' \\\n  --header 'User-Agent: Thunder Client (https://www.thunderclient.com)' \\\n  --header 'Content-Type: application/json' \\\n  --data-raw '{\n  \"from\": \"university of waterloo\",\n  \"to\": \"outer mongolia\"\n}'\n```\n## Implementation details\nFor a deeper explanation of the graph format, checkout: [wikigraph_script](https://github.com/notzree/wikigraph_script) or [wikicrush](https://github.com/trishume/wikicrush) \\\nWhen the server starts, it loads the entire binary graph into memory as an array of int32 (around 2gbs). This represents over 61,000,000 articles!\nThe inputs are first converted into byte offsets using a postgres database, this is to save on runtime memory requirements.\n### Sequential search\nThe sequential search runs a simple BFS algorithim to construct a predecessor map of the byteoffsets, then traverses the map to create the path. The algorithim takes advantage of the byteoffsets stored as values to quickly traverse the graph.\n### Concurrent search\nThe concurrent search mode uses 2 goroutines to run bi-directional bfs starting from both the start node and the end node towards each other. This significantly reduces the time it takes to find another node. The basic principle is as follows:\n```\nIf we know A-\u003eF\nand Z-\u003eF\nThen we know A-\u003eZ = A-\u003eF + reverse(Z-\u003eF)\n```\nAs the 2 goroutines travserse and discover more nodes, they communicate these nodes through a buffered chan of size 1 to prevent them from blocking each other.\n```go\n\tgo func() {\n\t\tdiscoveredNodes := make(map[int32]bool)\n\t\tfor node := range discovered {\n\t\t\tif _, ok := discoveredNodes[node]; ok {\n\t\t\t\tclose(closeSignal)\n\t\t\t\tresultChan \u003c- node\n\t\t\t\treturn\n\t\t\t}\n\t\t\tdiscoveredNodes[node] = true\n\t\t}\n\t}()\n```\n A third goroutine will keep track of visited nodes and when it deteces a middle node that both goroutines have detected, signals to exist and proceeds to construct the path.\nWhile this approach is significantly faster, it does not guarantee the shortest path nor does it guarantee consistency between runs. The path may be different each time as the concurrent system is not necessarily deterministic. but if your trying to win a speedrun, the trade off may be acceptable.\n\n## Performance\nReal life performance is oftentimes in favour of the Sequential approach. This is because the `FindPathConcurrent` and `FindPathSequential` functions return an array of byteoffsets, which we need to query against the database to convert back to words. Since the sequential approach guarantees shortest path, it often beats the concurrent approach by 0.5s. \\\n\nHowever in scenarios where the 2 nodes are particularily far apart, such as with this api request:\n```sh\ncurl  -X POST \\\n  'http://localhost:8080/search/concurrent' \\\n  --header 'Accept: */*' \\\n  --header 'User-Agent: Thunder Client (https://www.thunderclient.com)' \\\n  --header 'Content-Type: application/json' \\\n  --data-raw '{\n  \"from\": \"deciduous\",\n  \"to\": \"anthropic\"\n}'\n```\nYou will notice that the conccurrent approach will give you runtimes of ~3 seconds, whereas the sequential approach can take up to 15 seconds!\nBenchmarks can provide deeper insights into the performance gains of concurrency\n\n### Benchmarks\n\nAn issue I ran into when benchmarking was finding balanced testcases, as the benefits of the concurrent bi-directional BFS would fluctuate depending on the actual distance between the inputs provided. I believe the data below is good enough to at least indicate that there IS a performance gain to the concurrent approach, however more benchmarking is needed to accurately quantify the benefit. One possible approach is to find pairs of nodes with increasing number of links, and then graph the performance increase to quantify the percent gain per node.\n\u003cp float=\"left\"\u003e\n\u003cdiv\u003e\n\u003ch2\u003eConcurrent Favoured Benchmark | 189% Faster\u003c/h2\u003e\n\n ```sh\ngoos: darwin\ngoarch: arm64\npkg: github.com/notzree/wikigraph_server/benchmark\nBenchmarkSequentialSearch-8            1        1260669208 ns/op\nBenchmarkConcurrentSearch-8           42          35510332 ns/op\nPASS\nok      github.com/notzree/wikigraph_server/benchmark   9.449s\n ```\n\n\u003c/div\u003e\n\u003cdiv\u003e\n\u003ch2\u003eSequential Favoured Benchmark | 78% Faster\u003c/h2\u003e\n\n ```sh\n goos: darwin\ngoarch: arm64\npkg: github.com/notzree/wikigraph_server/benchmark\nBenchmarkSequentialSearch-8            2         630494688 ns/op\nBenchmarkConcurrentSearch-8            4         275946031 ns/op\nPASS\nok      github.com/notzree/wikigraph_server/benchmark   11.795s\n ```\n\u003c/div\u003e\n\u003c/p\u003e\n\n## Caveats\nI was not able to successfully parse all links and as such this graph is not 100% fully complete. I ran into an issue differentiating capitalized and lowercased pages. For example, the programming language ALGOL and the star Algol are differentiated by the casing. This works fine as long as links from other pages that references these pages obey the same capitalization convention, this wasn't the case. I kept running into casing issues resulting in duplicate key errors or not resulting in entries being found in the database. I do intend on polishing this in the future (maybe on another work term).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnotzree%2Fwikigraph_server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnotzree%2Fwikigraph_server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnotzree%2Fwikigraph_server/lists"}