{"id":15904785,"url":"https://github.com/dmitryduev/archiver","last_synced_at":"2026-06-23T12:33:05.444Z","repository":{"id":105069835,"uuid":"95834050","full_name":"dmitryduev/archiver","owner":"dmitryduev","description":"customizable constructor for building house-keeping systems for small to midsize projects in astronomy","archived":false,"fork":false,"pushed_at":"2019-04-04T23:06:49.000Z","size":1200,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-10-07T12:42:08.271Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dmitryduev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-30T01:11:49.000Z","updated_at":"2021-07-27T01:56:02.000Z","dependencies_parsed_at":null,"dependency_job_id":"60e24570-fceb-44d5-a2ad-85848dc2fb8f","html_url":"https://github.com/dmitryduev/archiver","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dmitryduev/archiver","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmitryduev%2Farchiver","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmitryduev%2Farchiver/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmitryduev%2Farchiver/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmitryduev%2Farchiver/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dmitryduev","download_url":"https://codeload.github.com/dmitryduev/archiver/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmitryduev%2Farchiver/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34688124,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-23T02:00:07.161Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-06T12:42:01.873Z","updated_at":"2026-06-23T12:33:05.427Z","avatar_url":"https://github.com/dmitryduev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Archiver\n\nThis repository contains code that is used for the [Robo-AO](http://roboao.caltech.edu) \nautomated data processing together with the (web-)tools to access the data.  \n\n\u003eRobo-AO is the first automated laser guide star system that is currently installed on the Kitt Peak National Observatory's 2.1 meter telescope in Arizona. \n\n**archiver.py** is the data processing engine.  \n**server\\_data\\_archive.py** is the web-server for data access.\n\nThe architecture is intended to be easily adaptable to the needs of a moderately sized (astronomy) project. \n\n## System overview\n- Distributed architecture \n    * (Mostly) written in OO python 3.6\n    * Master process utilizes dask.distributed python module\n    * Worker processes can be deployed on single machine or moderately-sized cluster\n- MongoDB NoSQL DB for house-keeping\n- Multiple data reduction pipelines\n    * Bright star pipeline\n    * High-contrast pipeline\n    * Faint star pipeline\n    * Extended object pipeline\n    * Astrometric solution\n    * Strehl ratio computation\n- Interactive data products (web) access\n    * Powered by Flask python module\n    * Previews for individual objects and (nightly) summaries\n    * Data streamed to dynamically rendered templates\n    * Production deployment using nginx/supervisord/gunicorn\n- Easily extendable and customizable\n    * Decision chains for different observations\n    *Subclasses + JSON config-files\n- [Open source!](https://github.com/dmitryduev/archiver)\n\n\nFor scientific and technical details please refer to \n[Jensen-Clem, Duev, Riddle+ 2017](https://arxiv.org/pdf/1703.08867.pdf) \n\n\n## How do I deploy the Archiver?\n\n### Prerequisites\n* python libraries (in addition to what comes with `anaconda`)\n  * `flask-login`\n  * `pymongo`\n  * `image_registration` (a forked version with a few tweaks)\n  * `VIP` (a forked version 0.7.5 ported to python 3.6 as of October 2017)\n  * `lacosmicx` \n  * `sewpy`\n  * `APLpy`\n\n- Install `fftw3`\nOn mac:\n```\nbrew install fftw\n```\nOn Fedora:\n```\nyum install fftw3\n```\n- Install `pyfftw` (also see their github page for details) (use the right `pip`! (the one from `anaconda`)):\n```\npip install pyfftw\n```\n- Clone `image_registration` repository from https://github.com/dmitryduev/image_registration.git\n I've made it use `pyfftw` by default, which is significantly faster than the `numpy`'s fft,\n and quite faster (10-20%) than the `fftw3` wrapper used in `image_registration` by default:\n```\ngit clone https://github.com/dmitryduev/image_registration.git\n```\n- Install it:\n```\ncd image_registration\npython setup.py install --record files.txt\n```\nIf it fails on python3 `conda` env, run the setup command again.\n\n- To remove:\n```\ncat files.txt | xargs rm -rf\n```\n\nClone the `lacosmicx` repository:\n```bash\ngit clone https://github.com/cmccully/lacosmicx.git\n```\nInstall in a manner similar to `image_registration`\n\nClone the `VIP` repository:\n```bash\nhttps://github.com/dmitryduev/VIP.git\n```\nInstall in a manner similar to `image_registration`\nInstall `future` package to make it work in python 3.6:\n```bash\npip install future\n```\n\nInstall `APLpy`:\n```bash\npip install aplpy\n```\n\nCompile the bright star pipeline code:\n```bash\ncd archive/roboao\n# modify Makefile where necessary\nmake\n```\n\nInstall [SExtractor](https://www.astromatic.net/software/sextractor).\n\nClone the `sewpy` repository:\n```bash\ngit clone https://github.com/megalut/sewpy\n```\nInstall in a manner similar to `image_registration`\n\nInstall `pymongo`:\n```bash\nconda install pymongo\n```\n\nInstall [lbzip2](http://lbzip2.org/).\n\nClone the `archiver` repository:\n```bash\ngit clone https://github.com/dmitryduev/archiver.git\n```\n\n---\n\n### Configuration file (settings and paths)\n\n* `config.json`\n    * Provided as an example\n    * modify paths/settings as necessary\n---\n\n### Set up and use MongoDB with authentication\nInstall `MongoDB` 3.4\n(`yum` on Fedora; `homebrew` on MacOS)\nOn Mac OS use `homebrew`. No need to use root privileges.\n```\nbrew install mongodb\n```\nOn Fedora, you would likely need to do these manipulation under root (```su -```)\n Create a file ```/etc/yum.repos.d/mongodb.repo```, add the following:  \n```\n[mongodb]\nname=MongoDB Repository\nbaseurl=https://repo.mongodb.org/yum/redhat/7/mongodb-org/3.4/x86_64/ \ngpgcheck=0\nenabled=1\n```\n Install with `yum`:\n```\nyum install -y mongodb-org\n```\n\nEdit the config file. Config file location:  \n```bash\n/usr/local/etc/mongod.conf (Mac OS brewed)\n/etc/mongod.conf (Linux)\n```\n\nComment out:\n```bash\n#  bindIp: 127.0.0.1\n```\nAdd: _(this is actually unnecessary)_\n```bash\nsetParameter:\n    enableLocalhostAuthBypass: true\n```\n\nCreate (a new) folder to store the databases:\n```bash\nmkdir /Users/dmitryduev/web/mongodb/ \n```\nIn mongod.conf, replace the standard path with the custom one:\n```bash\ndbpath: /Users/dmitryduev/web/mongodb/\n```\n\n**On Mac (on Fedora, will start as a daemon on the next boot)**\nStart `mongod` without authorization requirement:\n```bash\nmongod --dbpath /Users/dmitryduev/web/mongodb/ \n```\n\nIf you're running `MongoDB` on a NUMA machive \n(connect with the `mongo` command and it will tell you if that's the case):\n```bash\nnumactl --interleave=all mongod -f /etc/mongod.conf\n```\n\n\nConnect to MongoDB with `mongo` and create superuser (on Fedora, proceed as root):\n```bash\n# Create your superuser\n$ mongo\n\u003e use admin\n\u003e db.createUser(\n    {\n        user: \"admin\",\n        pwd: \"yoursecretpassword\", \n        roles: [{role: \"userAdminAnyDatabase\", db: \"admin\"}]})\n\u003e exit \n```\nConnect to MongoDB (now not necessary as root)\n```bash\nmongo -u \"admin\" -p \"yoursecretpassword\" --authenticationDatabase \"admin\" \n```\nAdd user to your database:\n```bash\n$ mongo\n# This will create a databased called 'roboao' if it is not there yet\n\u003e use roboao\n# Add user to your DB\n\u003e db.createUser(\n    {\n      user: \"roboao\",\n      pwd: \"yoursecretpassword\",\n      roles: [\"readWrite\"]\n    }\n)\n# Optionally create collections:\n\u003e db.createCollection(\"objects\")\n\u003e db.createCollection(\"aux\")\n\u003e db.createCollection(\"users\")\n# this will be later done from python anyways \n```\nIf you get locked out, start over (on Linux)\n```bash\nsudo service mongod stop\nsudo service mongod start\n```\nTo run the database manually (i.e. not as a service):\n```bash\nmongod --auth --dbpath /Users/dmitryduev/web/mongodb/\n```\nConnect to database from `pymongo`:\n```python\nfrom pymongo import MongoClient\nclient = MongoClient('ip_address_or_uri')\ndb = client.roboao\ndb.authenticate('roboao', 'yoursecretpassword')\n# Check it out (optional):\ndb['some_collection'].find_one()\n```\n#### Add admin user for data access on the website\n\nConnect to database from `pymongo` and do an insertion:\n```python\nfrom pymongo import MongoClient\nfrom werkzeug.security import generate_password_hash\nimport datetime\nclient = MongoClient('ip_address_or_uri')\n# select database 'roboao'\ndb = client.roboao\ndb.authenticate('roboao', 'yoursecretpassword')\ncoll = db['users']\nresult = coll.insert_one(\n        {'_id': 'admin',\n         'password': generate_password_hash('robopassword'),\n         'programs': 'all',\n         'last_modified': datetime.datetime.now()}\n)\n```\n\nRefer to this [tutorial](https://docs.mongodb.com/manual/tutorial/convert-standalone-to-replica-set/)\nto replicate the database.\n\n**Use [Robo 3T](https://robomongo.org) to display/edit DB data!! It's super handy!**  \nUseful tip: check [this](https://docs.mongodb.com/manual/tutorial/enable-authentication/) out.\n\n---\n\n### Start the Archiver\n\nstart MongoDB (if not running already):\n```bash\nmongod --auth --dbpath /Users/dmitryduev/web/mongodb/\n```\n\n**Run the Archiver!** (preferably, in a _tmux_ session)\n```bash\ntmux new -s archiver\npython archiver.py config.json\n# ctrl+b+d -\u003e to detach\n# tmux a -t archiver  -\u003e to attach back \n```\n\n### Data access via the web-server\n\nMake sure to install python dependencies:\n```\ngit clone https://github.com/pyvirtobs/pyvo.git\ncd pyvo \u0026\u0026 /path/to/python setup.py install --record files.txt\nconda install flask-login\n```\n\nTest the data access web interface:\n```bash\n/path/to/python server_data_archive.py /path/to/config.json\n```\n\n#### Production deployment with supervisord/gunicorn\nThe procedure is explained in detail in \n[server_setup.md](https://github.com/dmitryduev/archiver/blob/master/doc/server_setup.md).\n\n#### A short tutorial on how to use the web interface\n    TODO\n---\n\n## How to work with the database from MongoDB client\n\nMark all observations as not distributed (this will force):\n```bash\ndb.getCollection('objects').update({}, \n    { $set: \n        {'distributed.status': False,\n         'distributed.last_modified': utc_now()}\n    }, \n    {multi: true}\n)\n```\n\nForce faint_star pipeline on a target:\n```bash\ndb.getCollection('objects').update_one({'_id': '4_351_Yrsa_VIC_lp600_o_20160925_110427.040912'}, \n    { $set: \n        {'pipelined.faint_star.status.force_redo': True,\n         'pipelined.faint_star.last_modified': utc_now()}\n    }\n)\n```\n\nChange ownership (PI) of a program:\n```bash\ndb.getCollection('objects').update({'science_program.program_id':'4'}, \n    { $set: \n        {'science_program.program_PI': 'asteroids'}\n    }, \n    {multi: true}\n)\n```\n\nRemove psflib data from _aux_ collection in the database:\n```\n    db.getCollection('aux').update({}, {$unset: {'psf_lib': ''}}, {multi: true})\n```\n\n---\n\n## Archive structure\nThe processed data are structured in the way described below. \nIt should be straightforward to restore the database in case of a 'database disaster' \nkeeping this structure in mind (in fact, **archiver.py** will take care of that automatically \nonce the database is up and running).\n\n##### Science observations + daily summary plots (seeing, Strehl, contrast curves)\nFile naming and descriptions. Files of greatest interest to users are shown in bold.\nPlease refer to Jensen-Clem, Duev, Riddle+ 2017 \\[1\\] and references therein for technical details.\n\u003cpre\u003e\u003ccode\u003e\n/path/to/archive/\n├──yyyymmdd/\n   ├──ID/                              \u003c= observation ID of the form programID_objectName_camera_filter_mark_yyyymmdd_HHMMSS.SSSSSS\n   │  ├──bright_star/                  \u003c= Results of bright star pipeline (BSP) [1, section 3.2] \n   │  │  ├──preview/                   \u003c= Automatically generated previews\n   │  │  │  ├──ID_full.png\n   │  │  │  └──ID_cropped.png\n   │  │  ├──strehl/\n   │  │  │  ├──ID_strehl.txt            \u003c= Strehl ratio + star image metrics\n   │  │  │  └──ID_box.fits              \u003c= Cut around star used in Strehl computation\n   │  │  ├──pca/\n   │  │  │  ├──\u003cb\u003eID.fits\u003c/b\u003e                  \u003c= Result of high contrast pipeline [1, section 3.3] \n   │  │  │  ├──\u003cb\u003eID_contrast_curve.png\u003c/b\u003e    \u003c= 5-sigma contrast curve plot [1, section 3.3]\n   │  │  │  ├──\u003cb\u003eID_contrast_curve.txt\u003c/b\u003e    \u003c= 5-sigma contrast curve [1, section 3.3]\n   │  │  │  └──ID_pca.png               \u003c= Automatically generated preview\n   │  │  ├──frames.txt                  \u003c= Individual raw frame quality, guide star flux, lock position, bias level\n   │  │  ├──\u003cb\u003e20p.fits\u003c/b\u003e                    \u003c= 20% best quality frames registered and stacked with BSP [1, section 3.2]\n   │  │  └──\u003cb\u003e100p.fits\u003c/b\u003e                   \u003c= 100% frames registered and stacked with BSP [1, section 3.2]\n   │  ├──faint_star/                    \u003c= Results of faint star pipeline (FSP) [1, section 3.2]\n   │  │  ├──preview/                    \u003c= Automatically generated previews\n   │  │  │  └──...\n   │  │  ├──strehl/                     \u003c= Same as for BSP. Currently, not computing\n   │  │  │  └──...\n   │  │  ├──pca/                        \u003c= Same as for BSP. Currently, not running\n   │  │  │  └──...\n   │  │  ├──\u003cb\u003eID_summed.fits\u003c/b\u003e              \u003c= Result of FSP [1, section 3.2]\n   │  │  └──ID_simple_sum.fits          \u003c= Simple stack of all raw frames without registration\n   │  ├──extended_object/               \u003c= Results of extended object pipeline (FSP) [1, section 3.4]\n   │  │  ├──preview/\n   │  │  │  └──...\n   │  │  └──ID_deconvolved.fits\n   │  └──ID.tar.bz2                     \u003c= Compress contents of folder ID/\n   ├──.../\n   ├──summary/                          \u003c= Nightly summary data\n   │  ├──psflib/\n   │  │  ├──ID.png\n   │  │  ├──ID.fits\n   │  │  └──...\n   │  ├──seeing/\n   │  │  ├──yyyymmdd_hhmmss.png        \u003c= Individual seeing observations\n   │  │  ├──...\n   │  │  ├──seeing.yyyymmdd.txt        \u003c= Summary seeing data\n   │  │  └──seeing.yyyymmdd.png        \u003c= Summary seeing plot\n   │  ├──contrast_curve.yyyymmdd.png   \u003c= Summary of contrast curves\n   │  └──strehl.yyyymmdd.png           \u003c= Obsevation Strehl ratios as function of time\n   └──calib/                           \u003c= Master calibration data\n      ├──flat_c.fits\n      ├──flat_lp600.fits\n      ├──flat_Sg.fits\n      ├──flat_Sr.fits\n      ├──flat_Si.fits\n      ├──flat_Sz.fits\n      ├──dark_0.fits\n      ├──dark_6.fits\n      ├──dark_7.fits\n      ├──dark_8.fits\n      ├──dark_9.fits\n      └──dark_10.fits\n|──.../\n└──psf_library.fits                    \u003c= PSF library for BSP high-contrast pipeline\n\u003c/code\u003e\u003c/pre\u003e\n\n---\n\n## Flowcharts\nIf you're seeking to understand how things (should) work\n\n![alt text](/doc/pipeline.png)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmitryduev%2Farchiver","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdmitryduev%2Farchiver","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmitryduev%2Farchiver/lists"}