{"id":19310257,"url":"https://github.com/cedadev/ceda-di","last_synced_at":"2026-06-09T20:31:27.201Z","repository":{"id":18314742,"uuid":"21493591","full_name":"cedadev/ceda-di","owner":"cedadev","description":null,"archived":false,"fork":false,"pushed_at":"2020-08-19T09:57:05.000Z","size":4881,"stargazers_count":2,"open_issues_count":6,"forks_count":2,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-11-14T21:05:21.889Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cedadev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-07-04T10:18:26.000Z","updated_at":"2020-08-19T09:57:08.000Z","dependencies_parsed_at":"2022-07-16T04:00:39.394Z","dependency_job_id":null,"html_url":"https://github.com/cedadev/ceda-di","commit_stats":null,"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/cedadev/ceda-di","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fceda-di","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fceda-di/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fceda-di/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fceda-di/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cedadev","download_url":"https://codeload.github.com/cedadev/ceda-di/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedadev%2Fceda-di/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34125332,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-09T02:00:06.510Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T00:23:22.191Z","updated_at":"2026-06-09T20:31:27.180Z","avatar_url":"https://github.com/cedadev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"ceda-di\n=======\n\nThe ceda-di project is a suite of Python scripts and tools to extract\nJSON metadata from various scientific data formats, including:\n\n* ENVI BIL/BSQ\n* NetCDF\n* GeoTIFF (EXIF metadata)\n* HDF\n\nThe Python backend is designed to be run on a system with a large number of CPU\ncores. It extracts metadata from scientific data files and outputs it as\nplatform-independent JSON documents.\n\nThis JSON metadata can then be stored in a NoSQL data store such as\nElasticSearch. This repository contains some example applications of the\ntoolkit, including an Ansible playbook to set up and configure an ElasticSearch\ncluster. This repo also contains a sample web interface that allows for\nreal-time faceted search (including full-text, temporal, and geospatial facets)\nand live display of data files on a map.\n\n\nDocumentation Status\n====================\n\n[![Documentation Status](https://readthedocs.org/projects/ceda-di/badge/?version=latest)](https://readthedocs.org/projects/ceda-di/?badge=latest)\n\n\nGuide to This Repository\n========================\n\n* \"elasticsearch\"\n    * Ansible playbook for setting up an ES cluster\n    * Schema for JSON metadata\n    * ElasticSearch mapping for JSON metadata\n    * Sample ElasticSearch queries\n    \n* \"examples\"\n    * Simple Python script to plot the output of a request to ElasticSearch\n    * A Google Maps demonstration interface\n* \"lotus\"\n    * Helper scripts for running the Python suite on a cluster\n* \"python\"\n    * The main Python backend for metadata extraction\n    \nSetting up the environment\n==========================\n\nGet and install ceda-fbs code from Git (with install script)\n\n```\n$ wget https://raw.githubusercontent.com/cedadev/ceda-di/master/python/src/scripts/install-ceda-di.sh\n$ .  ./install-ceda-di.sh\n```\n\nThis will build you a `virtualenv` locally so your environment should look like:\n\n```\n$ ls\nceda-di  install-ceda-fbs.sh  venv-ceda-fbs\n```\n\n## Create a little setup script\n\n```\n$ cat setup_env.sh\nexport BASEDIR=/group_workspaces/jasmin/cedaproc/$USER/ceda-di\nexport PYTHONPATH=$BASEDIR/ceda-di/python:$BASEDIR/ceda-di/python/src/ceda_di:$PYTHONPATH\nexport PATH=$PATH:$BASEDIR/ceda-di/python/src/scripts\n. venv-ceda-di/bin/activate\n```\n\nExample usage\n=============\n\nSet up environment first (not covered here).\n\nAdd a directory of files to an index:\n\n```\n$ cd ceda-di/python/src/\n$ python di.py extract --no-create-files --config /home/badc/software/datasets/ceda-eo-prod/ceda-di/python/config/ceda-di-ceda-eo.json --send-to-index /neodc/sentinel2a/data/L1C_MSI/2017/01/14/\n```\n\nAdd a single file to an index:\n\n```\n$ file_to_add=/neodc/sentinel2a/data/L1C_MSI/2017/01/14/S2A_MSIL1C_20170114T191501_N0204_R027_T01CDQ_20170114T191458.manifest\n$ echo $file_to_add \u003e to_index.txt\n$ di.py extract --no-create-files --config /home/badc/software/datasets/ceda-eo-prod/ceda-di/python/config/ceda-di-ceda-eo.json --send-to-index --file-list-file to_index.txt\n```\n\nCheck which `.manifest` files in the `ceda-eo` index have been indexed from directory `/neodc/sentinel2a/data/L1C_MSI/2017/01/14`:\n\n```\n$ ./scripts/find_indexed_files.py -i ceda-eo -e manifest -d /neodc/sentinel2a/data/L1C_MSI/2017/01/14\n```\n\nThis writes an output file of all missing files, called: `files_not_found.txt`\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcedadev%2Fceda-di","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcedadev%2Fceda-di","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcedadev%2Fceda-di/lists"}