{"id":18241157,"url":"https://github.com/anisotropi4/goldfinch","last_synced_at":"2026-05-06T11:38:14.345Z","repository":{"id":154181273,"uuid":"87712438","full_name":"anisotropi4/goldfinch","owner":"anisotropi4","description":"A set of scripts for working with postgres and arangodb databases based on extending Jeroen Janssens 'Data Science on the Command Line' https://github.com/jeroenjanssens/data-science-at-the-command-line  ","archived":false,"fork":false,"pushed_at":"2021-04-05T15:30:27.000Z","size":100898,"stargazers_count":1,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-14T13:26:23.379Z","etag":null,"topics":["arangodb","data-science","leaflet","openstreetmap","overpass-api","postgres","visualisation","wrapper-script"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/anisotropi4.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-04-09T13:58:52.000Z","updated_at":"2023-01-14T20:23:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"14d27782-8b46-4cb9-b31b-96a50ed5a4bd","html_url":"https://github.com/anisotropi4/goldfinch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anisotropi4%2Fgoldfinch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anisotropi4%2Fgoldfinch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anisotropi4%2Fgoldfinch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anisotropi4%2Fgoldfinch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/anisotropi4","download_url":"https://codeload.github.com/anisotropi4/goldfinch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247887629,"owners_count":21012983,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arangodb","data-science","leaflet","openstreetmap","overpass-api","postgres","visualisation","wrapper-script"],"created_at":"2024-11-05T05:21:59.449Z","updated_at":"2026-05-06T11:38:09.313Z","avatar_url":"https://github.com/anisotropi4.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# goldfinch\nA set of scripts for working with PostgreSQL and ArangoDB databases based on extending Jeroen Janssens 'Data Science on the Command Line' https://github.com/jeroenjanssens/data-science-at-the-command-line, plus helper and miscellaneous scripts\n\nNow with additional scripts for processing and converting large(ish) xml files to (ndjson)\n\nMore information about ArangoDB and PostGres can be found here:\n * ArangoDB: https://www.arangodb.com\n * PostgreSQL: https://www.postgresql.org\n \n## osmrailway\n\nA set of query shell-scripts that extract railway data from an OpenStreetMap Overpass API server. An example docker build for OpenStreetMap Overpass API server under a Debian based Linux distribution can be found here. https://github.com/guidoeco/osm-overpass\n\nMore information about OpenStreetMap and Overpass API can be found here:\n  * OpenStreetMap: http://www.openstreetmap.org\n  * Overpass API: http://overpass-api.de\n\n## testrailway\n\nThe scripts in **testrailway** will create a **testrailway** Arangodb database instance, import the railway data extracted from an OpenStreetMap Overpass API server in **osmrailway** and create a json report containing OSM node information which can then be viewed in the 'visualisation' sub-directory using a d3/leaflet mashup http://bl.ocks.org/anisotropi4/3452a4d2d7e848511feafe8a6c1bfaee\n\nThe **testrailway** dataset was used on a smaller North Yorkshire dataset based to prove the concept before moving to the British Isles which is manged using the **fullrailway** scripts, and is due to issues with scaling the visualisation.\n\nThe ArangoDB used is based on the ArangoDB server docker build scripts here https://github.com/guidoeco/docker in the arangodb directory.\n\nThe render uses a mash up of d3 (https://d3js.org) and leaflet (http://leafletjs.com).\n\n## fullrailway\n\nThe scripts in **fullrailway** will create a **fullrailway** Arangodb database instance for a larger geographical area, import the railway data extracted from an OpenStreetMap Overpass API server in **osmrailway** and create a json report containing OSM node information which can then be viewed in the 'visualisation' sub-directory using a d3/leaflet mashup. \n\nDue to the large size of data associated with the British Isles (920k+ points) two approaches are used to render the information. The first is based on a random selection heuristic and can be seen here http://bl.ocks.org/anisotropi4/85107c0e617f382e8462b1f264998718 \n\n## overlapfilter\n\nThis contains a set of scripts that calculate a \"valid distance\" parameter for the data associated with the British Isles (920k+ points). The visualisation based on this overlap heuristic can be seen here: http://bl.ocks.org/anisotropi4/003ed4f355160a49f0c4b3e169191ac8\n\n## volpe\n\nThis contains a set of scripts to create an ArangoDB edge collection and Foxx service that provides a shortest-path d3/leaflet mashup visualisation using a simple URL /startnode/endnode/ microservice\n\n## ogrrailway\n\nA set of query shell-scripts that extract British railway data using the [osmctools](https://gitlab.com/osm-c-tools/osmctools) toolset rather than the Overpass API under a Debian based Linux distribution. This then processes the railway data using an [arangodb database](https://www.arangodb.com/) for visualisation in a d3/Leaflet javascript mashup\n\nMore information about OpenStreetMap can be found here:\n  * OpenStreetMap: http://www.openstreetmap.org\n\n## markdown\n\nA set of python and shell scripts to locally process and convert between [Markdown](https://daringfireball.net/projects/markdown) `.md` and `.yaml` format files, and out `.html`\n\n## xl2tsv\n\nA python script that dumps the content of xls(x) files to a `[\u003csource-filename\u003e:]\u003ctabname\u003e.tsv` files in the (default) output directory.\n\n\n## pwdcheck\n\nA set of scripts to generate and search 128 ordered sha1sum hash files for passwords known to be hacked. Thanks to Troy Hunt and https://haveibeenpwned.com/ for making this data available\n\n## 'bin' directory scripts  \n\nThe 'bin' directory 'aql' scripts are used extensively in the 'goldfinch' and other projects and should be installed in the user-account `${HOME}/bin` directory:\n\n### **create_table.py**\n\nBased on column names in a tsv file-format this python3 script create a PostgreSQL import script. Run the script to create a table create/import script 'table_CORPUS.sql' that imports the file 'CORPUS.tsv':  \n`$ bin/create_table.py CORPUS.tsv`\n\n To then import 'CORPUS.tsv' into the table table_corpus (database user 'finch' and postgres server 'raven') run the following:  \n`$ \u003c table_CORPUS.sql psql -U finch -h raven` \n * The tablename is lowercase 'table_corpus'\n * All columns are varchar by default but can be changed in the import script ahead of the import  \n * csv is also supported by editing the create_table.py script\n\n### **aqls.sh**  \nA command-line wrapper script for arangodb that allows either readline quoted text or input file. Connection parameters are set in shell environment variables as follows:\n* username      ARUSR default root\n* password      ARPWD default lookup as key:pair from $HOME/.aqlpass file\n* server-name   ARSVR default ar-server\n* database-name ARDBN default _system\n\n For example, select five elements from the collection 'fullnodes':  \n`$ aqlx.sh 'for i in fullnodes limit 5 return i'`  \n\n The same query using the script file 'test-script.aql':  \n`$ cat test-script.aql`  \n`for i in fullnodes`  \n`limit 5`  \n`return i`  \n`$ \u003c test-script.aql aql.sh`\n\nThe output is in json pretty-printed using the 'jq' command-line tool https://stedolan.github.io/jq\n\n### **aqlx.sh**  \nA command-line wrapper script for arangodb identical to 'aqls.sh' but without 'jq' pretty-print.  \n\n### **ar-env.sh**  \nA wrapper script to set the following shell environment parameters used by the aqls.sh and aqlx.sh arangodb wrapper scripts\n* username      ARUSR default root  \n* password      ARPWD default lookup as key:pair from $HOME/.aqlpass file  \n* server-name   ARSVR default ar-server  \n* database-name ARDBN default _system  \n\nIf the ARPWD password variable is not set, the script uses the 'jq' command-line tool https://stedolan.github.io/jq to lookup from a json format file in the $HOME/.aqlpass  \n`$ cat ~/.aqlpass`  \n`{\"root\": \"dontbedaft\", \"nodeuser\": \"tryharder\"}`  \n\nNotes: The key element is the use of the quadtree function in the visiblenodes function to quickly find nodes and is based on at least:  \n * The excellent work of Mike Bostock in developing d3 (\u003chttps://bost.ocks.org/mike\u003e)  \n * Scott Murray's 'Interactive Data Visualization for the Web' (\u003chttp://alignedleft.com/work/d3-book\u003e)  \n * The Sumbera implementation 'Many points with d3 and leaflet' here \u003chttp://bl.ocks.org/sumbera/10463358\u003e\n * OpenStreetMap data and maptiles (\u003chttps://www.openstreetmap.org\u003e)  \n * Leaflet javascript library (\u003chttp://leafletjs.com\u003e)  \n\n### **add-x-tag.sh**\nA wrapper script that applies a filter (default 'cat') and adds an arbitary xml tag (default \"_wrapper\") to an xml-file for use in a shell script. This allows large xml files to be split and inserted into a pipeline to allow for easier processing.\n\n### **rmxmlns.sh**\nA wrapper script that uses the `xsltproc` transformation `rmxmlns.xslt` to remove namespace information from an xml-file.\n\nAssumptions\n  * The xml tranformation `xsltproc` utility is installed\n\nThe `rmxmlns.sh` xslt transformation is based on the answer by \"jasso\" in the discussion stackoverflow discussoin \u003chttps://stackoverflow.com/questions/5268182/how-to-remove-namespaces-from-xml-using-xslt\"\u003e.\n\n   On an Debian base Linux version run:\n\n  `$ sudo apt install xsltproc`\n\n  \n### **xml-to-ndjson.sh**\nA wrapper script that transforms xml to ndjson files in a shell pipeline. The transformation takes pre-split temporary xml-file with an arbitrary wrapper xml-tag, applies the transformation using the `xml-to-json` script, and deletes the temporary file. \n\n  * `jq` tool is installed (\u003chttps://stedolan.github.io/jq\u003e)\n  * `xml-to-json` utility is installed (\u003chttps://github.com/sinelaw/xml-to-json\u003e)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanisotropi4%2Fgoldfinch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanisotropi4%2Fgoldfinch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanisotropi4%2Fgoldfinch/lists"}