{"id":13739686,"url":"https://github.com/cybercongress/crawler","last_synced_at":"2025-12-15T19:45:05.358Z","repository":{"id":57501466,"uuid":"165715404","full_name":"cybercongress/crawler","owner":"cybercongress","description":"A toolchain for bringing web2 to web3","archived":true,"fork":false,"pushed_at":"2019-11-11T14:23:19.000Z","size":56,"stargazers_count":13,"open_issues_count":18,"forks_count":3,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-11-15T09:43:54.136Z","etag":null,"topics":["cosmos-sdk","crawler","cyber","cyberd","ipfs","web3","wiki"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cybercongress.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-01-14T18:45:28.000Z","updated_at":"2024-09-26T17:05:33.000Z","dependencies_parsed_at":"2022-09-02T08:32:06.652Z","dependency_job_id":null,"html_url":"https://github.com/cybercongress/crawler","commit_stats":null,"previous_names":["cybercongress/cyberd-wiki-index"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cybercongress%2Fcrawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cybercongress%2Fcrawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cybercongress%2Fcrawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cybercongress%2Fcrawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cybercongress","download_url":"https://codeload.github.com/cybercongress/crawler/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253135528,"owners_count":21859662,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cosmos-sdk","crawler","cyber","cyberd","ipfs","web3","wiki"],"created_at":"2024-08-03T04:00:36.537Z","updated_at":"2025-12-15T19:45:00.297Z","avatar_url":"https://github.com/cybercongress.png","language":"Go","funding_links":[],"categories":["Crawlers"],"sub_categories":[],"readme":"\n[![version](https://img.shields.io/github/release/cybercongress/cyber-wiki-index.svg?style=flat-square)](https://github.com/cybercongress/cyber-wiki-index/releases/latest)\n[![CircleCI](https://img.shields.io/circleci/project/github/cybercongress/cyber-wiki-index.svg?style=flat-square)](https://circleci.com/gh/cybercongress/cyber-wiki-index/tree/master)\n[![license](https://img.shields.io/badge/License-Cyber-brightgreen.svg?style=flat-square)](https://github.com/cybercongress/cyber-wiki-index/blob/master/LICENSE)\n[![LoC](https://tokei.rs/b1/github/cybercongress/cyber-wiki-index)](https://github.com/cybercongress/cyber-wiki-index)\n[![contributors](https://img.shields.io/github/contributors/cybercongress/cyber-wiki-index.svg?style=flat-square)](https://github.com/cybercongress/cyber-wiki-index/graphs/contributors)\n[![discuss](https://img.shields.io/badge/Join%20Us%20On-Telegram-2599D2.svg?style=flat-square)](https://t.me/fuckgoogle)\n[![contribute](https://img.shields.io/badge/contributions-welcome-orange.svg?style=flat-square)](https://github.com/cybercongress/cyber-wiki-index/blob/master/CONTRIBUTING.md)\n\n[://cyber](https://github.com/cybercongress/cyberd) wiki index\n==================\n\n  - [Installation](#installation)\n  - [Usage](#usage)\n  - [Issues](#issues)\n  - [Contributing](#contributing)\n  - [Changelog](#changelog)\n\n## Installation\n\nNote: Requires Go 1.12+\n\n```\ngit clone https://github.com/cybercongress/crawler\ncd crawler\ngo build -o crawler\n```\n\n## Preparation\n\n1. IPFS daemon should be launched\n2. Download enwiki-latest-all-titles to crawler root dir: \n\n``` \nipfs get QmddV5QP87BZGiSUCf9x9hsqM73b83rsPC6AYMNqkjKMGx -o enwiki-latest-all-titles\n```\n\n3. Add account to cyberdcli: \n\n```\ndocker exec -ti cyberd cyberdcli keys add \u003cname\u003e --recover\n```\n\n## Usage\n\n### Submit links\nBasically, there are two main functions provided by `crawler` tool. \nThe first one is to parse wiki titles and submit links between keywords and wiki pages. \n```\n./crawler submit-links-to-cyber ./enwiki-latest-all-titles --home=\u003cpath-to-cyberdcli\u003e --address=\u003caccount\u003e --passphrase=\u003cpassphrase\u003e --chunk=100\n```\n\n\u003e Note: Uses only local cyberd node.\n\n\u003e Note: Submit links do not add duras to IPFS.\n\n\u003e Note: Chunk - how many links messages added to one tx\n\n\u003e Note: There is --help command, for example \n\n```\n./crawler submit-links-to-cyber --help\n```\n\nHere, **enwiki-latest-all-titles** is titles file obtained from \n [official Wiki dumps](https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz).  \n\n### Uploading duras to IPFS \nAlso, `crawler` has separate command `upload-duras-to-ipfs` to upload files to local IPFS node. \nAll DURAs are collected under single root unixfs directory.\n```\n./crawler upload-duras-to-ipfs enwiki-latest-all-titles\n```\n\n## Issues\n\nIf you have any problems with or questions about search, please contact us through a\n [GitHub issue](https://github.com/cybercongress/crawler/issues).\n\n## Contributing\n\nYou are invited to contribute new features, fixes, or updates, large or small; We are always thrilled to receive pull\n requests, and do our best to process them as fast as We can. You can find detailed information in our\n [contribution guide](./docs/contributing/contributing.md).\n\n\n## Changelog\n\nStay tuned with our [Changelog](./CHANGELOG.md).\n\n\u003cdiv align=\"center\"\u003e\n  \u003csub\u003eBuilt by\n  \u003ca href=\"https://twitter.com/cyber_devs\"\u003ecyber•Congress\u003c/a\u003e and\n  \u003ca href=\"https://github.com/cybercongress/crawler/graphs/contributors\"\u003econtributors\u003c/a\u003e\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcybercongress%2Fcrawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcybercongress%2Fcrawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcybercongress%2Fcrawler/lists"}