{"id":15158147,"url":"https://github.com/ethereum/node-crawler","last_synced_at":"2025-04-13T05:22:26.201Z","repository":{"id":42190793,"uuid":"385735422","full_name":"ethereum/node-crawler","owner":"ethereum","description":"Attempts to crawl the Ethereum network of valid Ethereum execution nodes and visualizes them in a nice web dashboard.","archived":false,"fork":false,"pushed_at":"2024-05-09T06:04:34.000Z","size":618,"stargazers_count":124,"open_issues_count":13,"forks_count":62,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-03-24T09:42:57.360Z","etag":null,"topics":["crawler","ethereum"],"latest_commit_sha":null,"homepage":"https://www.ethernets.io/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ethereum.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-07-13T21:07:20.000Z","updated_at":"2025-02-20T15:53:26.000Z","dependencies_parsed_at":"2024-05-09T07:25:40.935Z","dependency_job_id":"1f0b03d0-1789-4f2e-a749-ee59b1de40e8","html_url":"https://github.com/ethereum/node-crawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ethereum%2Fnode-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ethereum%2Fnode-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ethereum%2Fnode-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ethereum%2Fnode-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ethereum","download_url":"https://codeload.github.com/ethereum/node-crawler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248666922,"owners_count":21142344,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","ethereum"],"created_at":"2024-09-26T20:40:44.305Z","updated_at":"2025-04-13T05:22:26.174Z","avatar_url":"https://github.com/ethereum.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"# Ethereum Node Crawler\n\nCrawls the network and visualizes collected data. This repository includes backend, API and frontend for Ethereum network crawler.\n\n[Backend](./crawler) is based on [devp2p](https://github.com/ethereum/go-ethereum/tree/master/cmd/devp2p) tool. It tries to connect to discovered nodes, fetches info about them and creates a database. [API](./api) software reads raw node database, filters it, caches and serves as API. [Frontend](./frontend) is a web application which reads data from the API and visualizes them as a dashboard.\n\nFeatures:\n\n- Advanced filtering, allows you to add filters for a customized dashboard\n- Drilldown support, allows you to drill down the data to find interesting trends\n- Network upgrade readiness overview\n- Responsive mobile design\n\n## Contribute\n\nProject is still in an early stage, contribution and testing is welcomed. You can run manually each part of the software for development purposes or deploy whole production ready stack with Docker.\n\n### Frontend\n\n#### Development\n\nFor local development with debugging, remoting, etc:\n\n1. Copy `.env` into `.env.local` and replace the variables.\n1. And then `npm install` then `npm start`\n1. Run tests to make sure the data processing is working good. `npm test`\n\n#### Production\n\nTo deploy this web app:\n\n1. Build the production bits by `npm install` then `npm run build` the contents will be located in `build` folder.\n1. Use your favorite web server, in this example we will be using nginx.\n1. The nginx config for that website could be which proxies the api to endpoint `/v1`.\n   Review the `frontent/nginx.conf` file for an example.\n\n### Backend API\n\nThe API is using 2 databases. 1 of them is the raw data from the crawler and the other one is the API database.\nData will be moved from the crawler DB to the API DB regularly by this binary.\nMake sure to start the crawler before the API if you intend to run them together during development.\n\n#### Dependencies\n\n- golang\n- sqlite3\n\n#### Development\n\n```\ngo run ./cmd/crawler\n```\n\n#### Production\n\n1. Build the assembly into `/usr/bin`\n   ```\n   go build ./cmd/cralwer -o /usr/bin/node-crawler\n   ```\n1. Create a system user for running the application\n   ```\n   useradd --system --create-home --home-dir /var/lib/node-crawler node-crawler\n   ```\n1. Make sure database is in `/var/lib/node-crawler/crawler.db`\n1. Create a systemd service in `/etc/systemd/system/node-crawler.service`:\n   ```\n   [Unit]\n   Description = eth node crawler api\n   Wants       = network-online.target\n   After       = network-online.target\n\n   [Service]\n   User       = node-crawler\n   ExecStart  = /usr/bin/node-crawler api --crawler-db /var/lib/node-crawler/crawler.db --api-db /var/lib/node-crawler/api.db\n   Restart    = on-failure\n   RestartSec = 3\n   TimeoutSec = 300\n\n   [Install]\n   WantedBy = multi-user.target\n   ```\n1. Then enable it and start it.\n   ```\n   systemctl enable node-crawler\n   systemctl start node-crawler\n   systemctl status node-crawler\n   ```\n\n### Crawler\n\n#### Dependencies\n\n- golang\n- sqlite3\n\n##### Country location\n\n- `GeoLite2-Country.mmdb` file from [https://dev.maxmind.com/geoip/geolite2-free-geolocation-data?lang=en](https://dev.maxmind.com/geoip/geolite2-free-geolocation-data?lang=en)\n  - you will have to create an account to get access to this file\n\n#### Development\n\n```\ngo run ./cmd/crawler\n```\n\nRun crawler using `crawl` command.\n\n```\ngo run ./cmd/crawler crawl\n```\n\n#### Production\n\nBuild crawler and copy the binary to `/usr/bin`.\n\n```\ngo build ./cmd/crawler -o /usr/bin/node-crawler\n```\n\nCreate a systemd service similarly to above API example. In executed command, override default settings by pointing crawler database to chosen path and setting period to write crawled nodes.\nIf you want to get the country that a Node is in you have to specify the location the geoIP database as well.\n\n##### No GeoIP\n\n```\nnode-crawler crawl --timeout 10m --crawler-db /path/to/database\n```\n\n##### With GeoIP\n\n```\nnode-crawler crawl --timeout 10m --crawler /path/to/database --geoipdb GeoLite2-Country.mmdb\n```\n\n### Docker setup\n\nProduction build of preconfigured software stack can be easily deployed with Docker. To achieve this, clone this repository and access `docker` directory.\n\nMake sure you have [Docker](https://github.com/docker/docker-ce/releases) and [docker-compose](https://github.com/docker/compose/releases) tools installed.\n\nThe docker compose uses a local `./data` directory to store the database and GeoIP file.\nIt's best to create this directory and add the GeoIP file before starting the system.\nYou can read the `./docker-compose.yml` file for more details.\n\n```\ndocker-compose up\n```\n\n## Developing with Nix\n\n[Nix](https://nixos.org/) is a package manager and system configuration tool\nand language for reproducible, declarative, and reliable systems.\n\nThe Nix [Flake](https://wiki.nixos.org/wiki/Flakes) in this repo contains all the\ndependencies needed to build the frontend and crawler.\n\nThe `flake.lock` file locks the commit which the package manager uses to build\nthe packages. Essentially locking the dependencies in time, not in version.\n\nTo update the lock file, use `nix flake update --commit-lock-file` this will\nupdate the git commits in the lock file, and commit the new lock file with a\nnice, standard commit message which shows the change in commit hashes for each\ninput.\n\nTo activate the development environment with all the packages available, you\ncan use the command `nix develop`. To automate this process, you can use\n[direnv](https://direnv.net/) with `use flake` in your `.envrc`. You can learn\nmore about Nix and direnv [here](https://github.com/direnv/direnv/wiki/Nix).\n\n## Deploying with NixOS\n\n[Nix](https://nixos.org/) is a package manager and system configuration tool\nand language for reproducible, declarative, and reliable systems.\n\nThe Nix [Flake](https://wiki.nixos.org/wiki/Flakes) in this repo also contains a\nNixOS module for configuring and deploying the node-crawler, API, and Nginx.\n\nThere is just a little bit of extra configuration which is needed to bring\neverything together.\n\nAn example production configuration:\n\nYour NixOS `flake.nix`:\n\n```nix\n{\n  inputs = {\n    nixpkgs.url = \"github:NixOS/nixpkgs/nixos-unstable\";\n    node-crawler.url = \"github:ethereum/node-crawler\";\n  };\n  outputs = {\n    nixpkgs,\n    node-crawler,\n  }:\n  {\n    nixosConfigurations = {\n      crawlerHostName = nixpkgs.lib.nixosSystem {\n        specialArgs = {\n          inherit node-crawler\n        };\n        modules = [\n          ./configuration.nix\n\n          node-crawler.nixosModules.nodeCrawler\n        ];\n      };\n    };\n  };\n}\n```\n\nYour example `configuration.nix`:\n\n```nix\n{ node-crawler, ... }:\n\n{\n  # Add the overlay from the node-crawler flake\n  # to get the added packages.\n  nixpkgs.overlays = [\n    node-crawler.overlays.default\n  ];\n\n  # It's a good idea to have your firewall\n  # enabled. Make sure you have SSH allowed\n  # so you don't lock yourself out. The openssh\n  # service should do this by default.\n  networking = {\n    firewall = {\n      enable = true;\n      allowedTCPPorts = [\n        80\n        443\n      ];\n    };\n  };\n\n  services = {\n    nodeCrawler = {\n      enable = true;\n      hostName = \"server hostname\";\n      nginx = {\n        forceSSL = true;\n        enableACME = true;\n      };\n    };\n\n    # Needed for the node crawler to get the country\n    # of the crawled IP address.\n    geoipupdate = {\n      enable = true;\n      settings = {\n        EditionIDs = [\n          \"GeoLite2-Country\"\n        ];\n        AccountID = account_id;\n        LicenseKey = \"location of licence key on server\";\n      };\n    };\n  };\n\n  # Needed to enable ACME for automatic SSL certificate\n  # creation for Nginx.\n  security.acme = {\n    acceptTerms = true;\n    defaults.email = \"admin+acme@example.com\";\n  };\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fethereum%2Fnode-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fethereum%2Fnode-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fethereum%2Fnode-crawler/lists"}