{"id":13426334,"url":"https://github.com/developmentseed/skynet-data","last_synced_at":"2025-06-23T21:41:02.469Z","repository":{"id":66924966,"uuid":"55659158","full_name":"developmentseed/skynet-data","owner":"developmentseed","description":"[DEPRECATED] Data pipeline for machine learning with OpenStreetMap","archived":false,"fork":false,"pushed_at":"2018-10-12T15:03:57.000Z","size":136,"stargazers_count":170,"open_issues_count":4,"forks_count":32,"subscribers_count":64,"default_branch":"master","last_synced_at":"2025-03-15T21:39:09.312Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"isc","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/developmentseed.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-04-07T03:03:23.000Z","updated_at":"2024-11-18T15:24:32.000Z","dependencies_parsed_at":"2023-05-30T15:30:55.164Z","dependency_job_id":null,"html_url":"https://github.com/developmentseed/skynet-data","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/developmentseed/skynet-data","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/developmentseed%2Fskynet-data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/developmentseed%2Fskynet-data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/developmentseed%2Fskynet-data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/developmentseed%2Fskynet-data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/developmentseed","download_url":"https://codeload.github.com/developmentseed/skynet-data/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/developmentseed%2Fskynet-data/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261561185,"owners_count":23177545,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T00:01:32.143Z","updated_at":"2025-06-23T21:41:02.442Z","avatar_url":"https://github.com/developmentseed.png","language":"JavaScript","readme":"# skynet-data\n\nA pipeline to simplify building a set of training data for aerial-imagery- and\nOpenStreetMap- based machine learning.  The idea is to use [OSM QA\nTiles](https://osmlab.github.io/osm-qa-tiles/) to generate \"ground truth\"\nimages where each color represents some category derived from OSM features.\nBeing map tiles, it's then pretty easy to match these up with the desired input\nimagery.\n\n - OSM QA tile data\n   [copyright OpenStreetMap contributors](http://www.openstreetmap.org/copyright)\n   and licensed under\n   [ODbL](http://opendatacommons.org/licenses/odbl/)\n - Mapbox Satellite data can be\n   [traced for noncommercial purposes](https://www.mapbox.com/tos/#[YmtMIywt]).\n   \nThis repository is no longer under active development. We recommend using [Label Maker](https://github.com/developmentseed/label-maker) to prepare data instead. That repo contains [utility scripts](https://github.com/developmentseed/label-maker/blob/master/examples/skynet-train-data-prep.md) which can be used to replicate the workflow needed to prepare data for [skynet-train](https://github.com/developmentseed/skynet-train). \n\n## Quick Start\n\n### Pre-built docker image\n\nThe easiest way to use this is via the\n[`developmentseed/skynet-data` docker image](https://hub.docker.com/r/developmentseed/skynet-data):\n\nFirst, create a `docker.env` file with the contents including your MapboxAccessToken:\n\n```\nMapboxAccessToken=YOUR_TOKEN\n```\n\nThen run:\n\n```sh\ndocker run -v /path/to/output/dir:/workdir/data --env-file docker.env developmentseed/skynet-data download-osm-tiles\n\ndocker run -v /path/to/output/dir:/workdir/data --env-file docker.env developmentseed/skynet-data\n```\n\nThe first line downloads the OSM QA tiles to\n`/path/to/output/dir/osm/planet.mbtiles`.  If you've already got that\nfile on your machine, you can skip this.\n\nThe second builds a training set using the default options (Roads\nfeatures from OSM QA tiles, images from Mapbox Satellite).  To change\nthe data sources, training set size and other options, add the\nrelevant environment variables to the `docker.env` file , one per\nline.\n\n### Local docker image\n\nYou can also create the docker images yourself using\ndocker-compose. Similarly to the quick-start above, make sure your\n`docker.env` file has your MapboxAccessToken and any other environment\nvariables you want to set. Then run:\n\n```\ndocker-compose build\n```\n\nto build your local docker image, and \n\n```\ndocker-compose run data download-osm-tiles\ndocker-compose run data \n```\n\nto download the OSM QA tiles, and run the data collection as specified\nin `docker.env`. By default the collected data will be saved into the\n`data` directory, but you can overide it by using `-v\n/path/to/output/dir:/workdir/data` after `docker-compose run data`\nsimilar to the pre-built instructions above.\n\n## Variables\n\nThe `make` commands below work off the following variables (with\ndefaults as listed):\n\n```\n# location of image files\nIMAGE_TILES ?= \"tilejson+https://a.tiles.mapbox.com/v4/mapbox.satellite.json?access_token=$(MapboxAccessToken)\"\n# which osm-qa tiles extract to download; e.g. united_states_of_america\nQA_TILES=planet\n# location of data tiles to use for rendering labels; defaults to osm-qa tiles extract specified by QA_TILES\nDATA_TILES ?= mbtiles://./data/osm/$(QA_TILES).mbtiles\n# filter to this bbox\nBBOX ?= '-180,-85,180,85'\n# number of images (tiles) to sample\nTRAIN_SIZE=1000\n# define label classes output\nCLASSES=classes/roads-buildings.json\n# Filter out tiles whose ratio of labeled to unlabeled pixels is less than or\n# equal to the given ratio.  Useful for excluding images that are all background, for example.\nLABEL_RATIO ?= 0\n# set this to a zoom higher than the data tiles' max zoom to get overzoomed label images\nZOOM_LEVEL ?= 17\n```\n\nYou can override any of these parameters in your `docker.env` and make\na full training set using the instructions above.\n\n## Details\n\n### Install\n\n - Install [NodeJS v4.6.2](https://nodejs.org/dist/v4.6.2/)\n - Install [tippecanoe](https://github.com/mapbox/tippecanoe)\n - Install [GNU Parallel](https://www.gnu.org/software/parallel/)\n - Install [shuf](https://www.gnu.org/software/coreutils/)\n - Clone this repo and run `npm install`.  (Note that this includes a\n   node-mapnik install, which sometimes has trouble building in bleeding-edge\n   versions of node.)\n\n### Sample available tiles\n\n`make data/sample.txt`\n\nThis just does a simple random sample of the available tiles in the given\n`mbtiles` set, using `tippecanoe-enumerate`. For more intelligent filtering,\nconsider using `tippecanoe-decode` to examine (geojson) contents of each tile.\n\n### Labels\n\nBuild label images: `make data/labels/color` or `make data/labels/grayscale`.\nUses the `CLASSES` json file to set up the rendering of OSM data to images that\nrepresent per-pixel category labels.  See `classes/water-roads-buildings.json`\nfor an example.  Rendering is with `mapnik`; see [the\ndocs](https://github.com/mapnik/mapnik/wiki/Filter) for more on `filter`\nsyntax.\n\n### Images\n\nDownload aerial images from a tiled source: `make data/images`\n\nHeads up: the default, Mapbox Satellite, will need you to set the\n`MapboxAccessToken` variable, and will cost you map views!\n\n### Preview\n\nPreview the generated data by opening up `preview.html?accessToken=\u003cmapbox\naccess token\u003e\u0026prefix=/path/to/data` in a local web server.\n","funding_links":[],"categories":["JavaScript","Projects"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevelopmentseed%2Fskynet-data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevelopmentseed%2Fskynet-data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevelopmentseed%2Fskynet-data/lists"}