{"id":13803193,"url":"https://github.com/stanford-esrg/gps","last_synced_at":"2025-05-08T23:08:37.861Z","repository":{"id":41481358,"uuid":"426357814","full_name":"stanford-esrg/gps","owner":"stanford-esrg","description":"GPS is a scanning platform that learns and predicts the location of IPv4 services across all 65K ports.","archived":false,"fork":false,"pushed_at":"2023-02-07T17:51:30.000Z","size":69,"stargazers_count":69,"open_issues_count":0,"forks_count":11,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-05-08T23:08:29.607Z","etag":null,"topics":["bigquery","internet-wide-scanning","ipv4","network","port-scan","port-scanner","port-scanning","scanning","security","security-scanner","security-tools","zgrab","zmap"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stanford-esrg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-11-09T19:21:37.000Z","updated_at":"2025-05-06T08:28:30.000Z","dependencies_parsed_at":"2023-02-08T19:46:10.886Z","dependency_job_id":null,"html_url":"https://github.com/stanford-esrg/gps","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stanford-esrg%2Fgps","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stanford-esrg%2Fgps/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stanford-esrg%2Fgps/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stanford-esrg%2Fgps/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stanford-esrg","download_url":"https://codeload.github.com/stanford-esrg/gps/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253160777,"owners_count":21863629,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","internet-wide-scanning","ipv4","network","port-scan","port-scanner","port-scanning","scanning","security","security-scanner","security-tools","zgrab","zmap"],"created_at":"2024-08-04T01:00:24.914Z","updated_at":"2025-05-08T23:08:37.840Z","avatar_url":"https://github.com/stanford-esrg.png","language":"Python","funding_links":[],"categories":["Related Lists"],"sub_categories":[],"readme":"# GPS: Predicting IPv4 Services Across All Ports\n\nGPS is a scanning platform that learns and predicts the location of IPv4 services across all 65K ports.\nGPS uses application, transport, and network layer features to probabilistically model and predict service presence.\nGPS computes service predictions in 13 minutes. \nGPS can find 92.5\\% of all services across all ports with 131x less bandwidth, and 204x more precision, compared to exhaustive scanning. \n\nTo learn more about GPS' system and performance, check out the original [paper](https://lizizhikevich.github.io/assets/papers/gps.pdf) appearing at [Sigcomm '22](https://conferences.sigcomm.org/sigcomm/2022/).\n\n## GPS Computational Requirements\n\nTo run GPS, you need the following capabilities:\n- Python v3\n- Access to Google [BigQuery](http://bigquery.cloud.google.com) and the [google cloud command line](https://cloud.google.com/sdk/docs/install).\nUsers are responsible for their own billing. \nAs long as intermediate tables are not stored in Google BigQuery for longer than GPS' execution, the total cost of BigQuery should be less than \\$1. \n- Access to an Internet scanner (e.g., [LZR](https://github.com/stanford-esrg/lzr)) and Internet scanning infrastructure. Please make sure to adhere to [these](https://github.com/zmap/zmap/wiki/Scanning-Best-Practices) scanning best practices.\n- Access to a large disk (e.g., 1TB). The final list of service predictions generates a file that is larger than half a terabyte in size. \n\n## Configuring GPS Parameters\n\nGPS uses a `config.ini` configuration file which expects users to specify:\n1. a Big Query account\n2. an existing BQ dataset that GPS can store tables to\n3. the table name to which the seed scan was uploaded to (see below)\n4. a local directory of where GPS can store predictions\n5. other GPS parameters (e.g., minimum hitrate)\n\n\n## Seed Scan:\n\nGPS relies on an initial seed scan---a sub-sampled IPv4 scan across all 65K ports---to learn patterns from. \nA sample seed scan (1\\% IPv4 LZR scan across all 65K ports collected in April 2021) can be found [here](https://www.dropbox.com/s/rszznd5j1f1o430/lzr_seed_april2021_filt.json.zip?dl=0U).\nThe seed scan has been filtered for real services (i.e., services that send back real data) and hosts that respond on 10 or less ports (i.e., removing pseudo services). \nPlease see the [LZR paper](https://lizizhikevich.github.io/assets/papers/lzr.pdf) and the [GPS paper](https://lizizhikevich.github.io/assets/papers/gps.pdf) for more details behind this methodology. \n\nThe sample seed scan should just be used for testing purposes.\nUsing this data means that GPS will predict services given the state of the Internet from April 2021. \nTo make up-to-date predictions, please use an up-to-date seed scan. \n\nTo use the sample seed scan, upload it to BigQuery and update the seed BigQuery table name in `config.ini` (i.e., ``Seed_Table = lzr_seed_april2021_filt``).\nThe following command-line big query command uploads the seed scan, `lzr_seed_april2021_filt.json` to BigQuery:\n```\nbq load --source_format NEWLINE_DELIMITED_JSON --autodetect \\\n      BQ_RESOURCE_PROJECT.BQ_DATASET.SEED_TABLE lzr_seed_april2021_filt.json\n```\n\n### Using your own seed scan:\n\nThe GPS code base currently supports two formats of Interent scans to be used as the seed:\n1. The raw output of a [LZR](https://github.com/stanford-esrg/lzr) scan. \nGPS then re-formats it when ``Pre_Filt_Seed=False`` is set in the config.ini. \n3. A scan with the following schema:\n```\nip (string), p (port number- integer), asn (integer), data (string),\\\nfingerprint (protocol - string), w (tcp window size-integer).\n```\nFields can be added or removed, as long as ``src/data_features.py`` is appropriately updated. \nAt minimum, to compute predictions, the gps algorithm expects an ip address, a port number, and some form of layer 7 data for each service.\n\n## Running GPS\n\nOnce the ``config.ini`` is properly initialized, and a valid seed scan has been uploaded to BigQuery, GPS is ready to predict services.\n\nGPS prediction works in two phases:\n\n1. GPS predicts at least one service across all IPv4 hosts. \nTo run GPS' first phase, simply run the following:\n``python gps.py first``\nGPS outputs and downloads locally a short list of sub-networks and ports for the user to scan.\n\n3. GPS predicts any remaining services on every host it has discovered in the first phase. \nTo run GPS' second phase, simply run the following:\n``python gps.py remaining``\nGPS saves a large list of individual services for the user to scan. \nDuring runtime, GPS provides user instructions for how to best download that large list.\n\nOnce GPS is done running, remember to delete any remaining BigQuery tables that are not desired to have around.\nGPS does not automatically clean up BigQuery tables, in case the user wants to use/explore the intermediate tables.\n\n## Debugging\n\nWhen adding functionality to GPS, the user may run into the following Big Query errors: \n\n```\n400 Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex.\n```\n\nWhy it happened: This message means that the query has become too long/nested for BigQuery to process. \nThis can happen if you have added more features, or added more code that calls the defined sub-tables.\n\nSolution: Reduce the amount of queries that are defined as sub-tables or split the query in two and run them seperately (saving to a destination table in the process). This will require some hacking on the GPS source code.  \n\n\n\n## License and Copyright\n\nCopyright 2022 The Board of Trustees of The Leland Stanford Junior University\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstanford-esrg%2Fgps","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstanford-esrg%2Fgps","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstanford-esrg%2Fgps/lists"}