https://github.com/dmitryduev/roboao-archive

Robo-AO archive: data processing and access
https://github.com/dmitryduev/roboao-archive
Last synced: about 2 months ago
JSON representation
Robo-AO archive: data processing and access
Host: GitHub
URL: https://github.com/dmitryduev/roboao-archive
Owner: dmitryduev
Created: 2016-04-02T01:20:16.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2017-09-22T08:07:46.000Z (over 7 years ago)
Last Synced: 2024-10-07T12:42:24.779Z (8 months ago)
Language: Python
Homepage:
Size: 1.29 MB
Stars: 0
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        # Robo-AO data archive

This repository contains code that is used for the [Robo-AO](http://roboao.caltech.edu) automated data processing together with the (web-)tools to access the data.  This includes the pipeline to process faint target observations, estimate Strehl ratios, run PSF subtraction, generate contrast curves, and produce preview images for individual objects, and generate nightly summary plots with estimated seeing, 'joint' contrast curves, and Strehl ratios.  

>Robo-AO is the first automated laser guide star system that is currently installed on the Kitt Peak National Observatory's 2.1 meter telescope in Arizona. 

**archive.py** is the data processing engine.  

**server\_data\_archive.py** is the web-server for data access.

--- 

## How do I deploy the archiving system?

### Prerequisites

* pm2 process manager

* python libraries

  * flask

  * huey (Dima's forked version with a few tweaks)

  * mongoclient

  * image_registration (Dima's forked version with a few tweaks)

  * vip 

  ...

- Install fftw3

On mac:

```

brew install fftw

```

On Fedora:

```

yum install fftw3

```

- Install pyfftw (also see their github page for details) (use the right pip! (the one from anaconda)):

```

pip install pyfftw

```

- Clone image_registration repository from https://github.com/dmitryduev/image_registration.git

 I've made it use pyfftw by default, which is significantly faster than the numpy's fft,

 and quite faster (10-20%) than the fftw3 wrapper used in image_registration by default:

```

git clone https://github.com/dmitryduev/image_registration.git

```

- Install it:

```

python setup.py install --record files.txt

```

- To remove:

```

cat files.txt | xargs rm -rf

```

Clone the repository:

```bash

git clone https://github.com/dmitryduev/roboao-archive.git

```

---

### Configuration file (settings and paths)

* config.ini

---

### Set up and use MongoDB with authentication

Install MongoDB 3.4

(yum on Fedora; homebrew on MacOS)

On Mac OS use ```homebrew```. No need to use root privileges.

```

brew install mongodb

```

On Fedora, you would likely need to do these manipulation under root (```su -```)

 Create a file ```/etc/yum.repos.d/mongodb.repo```, add the following:  

```

[mongodb]

name=MongoDB Repository

baseurl=https://repo.mongodb.org/yum/redhat/7/mongodb-org/3.4/x86_64/ 

gpgcheck=0

enabled=1

```

 Install with yum:

```

yum install -y mongodb-org

```

Edit the config file. Config file location:  

```bash

/usr/local/etc/mongod.conf (Mac OS brewed)

/etc/mongod.conf (Linux)

```

Comment out:

```bash

#  bindIp: 127.0.0.1

```

Add: _(this is actually unnecessary)_

```bash

setParameter:

    enableLocalhostAuthBypass: true

```

Create (a new) folder to store the databases:

```bash

mkdir /Users/dmitryduev/web/mongodb/ 

```

In mongod.conf, replace the standard path with the custom one:

```bash

dbpath: /Users/dmitryduev/web/mongodb/

```

**On Mac (on Fedora, will start as a daemon on the next boot)**

Start mongod without authorization requirement:

```bash

mongod --dbpath /Users/dmitryduev/web/mongodb/ 

```

If you're running MongoDB on a NUMA machive 

(connect with the ```mongo``` command and it will tell you if that's the case):

```bash

numactl --interleave=all mongod -f /etc/mongod.conf

```

Connect to mongodb with mongo and create superuser (on Fedora, proceed as root):

```bash

# Create your superuser

$ mongo

> use admin

> db.createUser(

    {

        user: "admin",

        pwd: "roboaokicksass", 

        roles: [{role: "userAdminAnyDatabase", db: "admin"}]})

> exit 

```

Connect to mongodb (now not necessary as root)

```bash

mongo -u "admin" -p "roboaokicksass" --authenticationDatabase "admin" 

```

Add user to your database:

```bash

$ mongo

# This will create a databased called 'roboao' if it is not there yet

> use roboao

# Add user to your DB

> db.createUser(

    {

      user: "roboao",

      pwd: "roboaokicksass",

      roles: ["readWrite"]

    }

)

# Optionally create collections:

> db.createCollection("objects")

> db.createCollection("aux")

> db.createCollection("users")

# this will be later done from python anyways 

```

If you get locked out, start over (on Linux)

```bash

sudo service mongod stop

sudo service mongod start

```

To run the database manually (i.e. not as a service):

```bash

mongod --auth --dbpath /Users/dmitryduev/web/mongodb/

```

Connect to database from pymongo:

```python

from pymongo import MongoClient

client = MongoClient('ip_address_or_uri')

db = client.roboao

db.authenticate('roboao', 'roboaokicksass')

```

Check it out (optional):

```python

db['some_collection'].find_one()

```

#### Add admin user for data access on the website

Connect to database from pymongo and do an insertion:

```python

from pymongo import MongoClient

from werkzeug.security import generate_password_hash

import datetime

client = MongoClient('ip_address_or_uri')

# select database 'roboao'

db = client.roboao

db.authenticate('roboao', 'roboaokicksass')

coll = db['users']

result = coll.insert_one(

        {'_id': 'admin',

         'password': generate_password_hash('robopassword'),

         'programs': 'all',

         'last_modified': datetime.datetime.now()}

)

```

Refer to this [tutorial](https://docs.mongodb.com/manual/tutorial/convert-standalone-to-replica-set/)

to replicate the database.

**Use [Robomongo](https://robomongo.org) to display/edit DB data!! It's super handy!**  

Useful tip: check [this](https://docs.mongodb.com/manual/tutorial/enable-authentication/) out.

---

### Set up a Redis-based task queue to consume and process archiving jobs

Install the _huey_ task queue with 2 patches from DAD (see utils.py and consumer.py):

```bash

git clone https://github.com/dmitryduev/huey.git

cd huey

python setup.py install --record files.txt

```

Install redis-server if necessary.

Start Redis server on the standard port 6379 with pm2:

```bash

pm2 start redis-server -- --port 6379

```

In archive.py, make sure the Redis server is started with correct settings:

```python

from huey import RedisHuey

huey = RedisHuey(name='roboao.archive', host='127.0.0.1', port='6379', result_store=True)

```

(this should not raise any exceptions)

**With pm2, everything that's after '--' is passed to the script**

Start the task consumer with 4 parallel workers in the quiet mode polling stuff every 10 seconds without a crontab:

```bash

pm2 start huey_consumer.py --interpreter=/path/to/python -- /path/to/module.huey -k process -w 4 -d 10 -n -q

```

```

pm2 start huey_consumer.py --interpreter=/path/to/python -- /Users/dmitryduev/web/roboao-archive/archive.huey -k process -w 4 -d 10 -n -q

```

Check its status with ```pm2 status```. (saw errors a couple of times)

**It's a good idea to allocate ~half the number of the available cores**

_The Redis server and the task consumer are paused or stopped during daily nap time, which might be unnecessary_

```bash

pm2 stop redis-server

pm2 stop huey_consumer.py

```

start MongoDB (if not running already):

```bash

mongod --auth --dbpath /Users/dmitryduev/web/mongodb/

```

**Run the archiver!**

```bash

python archive.py config.ini

```

### Data access via the web-server

Make sure to install python dependencies:

```

git clone https://github.com/pyvirtobs/pyvo.git

cd pyvo && /path/to/python setup.py install

pip install flask-login

```

Run the data access web-server using the pm2 process manager:

```bash

pm2 start server_data_archive.py --interpreter=/path/to/python -- path/to/config.ini

```

#### A short tutorial on how to use the web-site (once it's ready)

---

## Implementation details

* MongoDB noSQL database

* huey task queue + redis-server

* Flask back-end for the web tools

---

## How to work with the database

Mark all observations as not distributed (this will force):

```python

db.getCollection('objects').update({}, 

    { $set: 

        {'distributed.status': False,

         'distributed.last_modified': utc_now()}

    }, 

    {multi: true}

)

```

Force faint pipeline on a target:

```python

db.getCollection('objects').update_one({'_id': '4_351_Yrsa_VIC_lp600_o_20160925_110427.040912'}, 

    { $set: 

        {'pipelined.faint.status.force_redo': True,

         'pipelined.faint.last_modified': utc_now()}

    }

)

```

Change ownership (PI) of a program:

```python

db.getCollection('objects').update({'science_program.program_id':'4'}, 

    { $set: 

        {'science_program.program_PI': 'asteroids'}

    }, 

    {multi: true}

)

```

Remove psflib data from _aux_ collection in the database:

```

    db.getCollection('aux').update({}, {$unset: {'psf_lib': ''}}, {multi: true})

```

---

## Archive structure

The processed data are structured in the way described below. It should be straightforward to restore the database in case of a 'database disaster' keeping this structure in mind (in fact, **archive.py** will take care of that automatically once the database is up and running).

##### Science observations + daily summary plots (seeing, Strehl, contrast curves)

```

/path/to/archive/

├──yyyymmdd/

   ├──programID_objectName_camera_filter_mark_yyyymmdd_HHMMSS.SSSSSS/

   │  ├──automated/

   │  │  ├──preview/

   │  │  │  ├──programID_objectName_camera_filter_mark_yyyymmdd_HHMMSS.SSSSSS_full.png

   │  │  │  └──programID_objectName_camera_filter_mark_yyyymmdd_HHMMSS.SSSSSS_cropped.png

   │  │  ├──strehl/

   │  │  │  ├──programID_objectName_camera_filter_mark_yyyymmdd_HHMMSS.SSSSSS_strehl.txt

   │  │  │  └──programID_objectName_camera_filter_mark_yyyymmdd_HHMMSS.SSSSSS_box.fits

   │  │  ├──pca/

   │  │  │  ├──programID_objectName_camera_filter_mark_yyyymmdd_HHMMSS.SSSSSS_pca.png

   │  │  │  ├──programID_objectName_camera_filter_mark_yyyymmdd_HHMMSS.SSSSSS_contrast_curve.png

   │  │  │  ├──programID_objectName_camera_filter_mark_yyyymmdd_HHMMSS.SSSSSS_contrast_curve.txt

   │  │  │  └──programID_objectName_camera_filter_mark_yyyymmdd_HHMMSS.SSSSSS_pca.fits

   │  │  └──_tentitavely_put_lucky_output_here_?

   │  ├──faint/

   │  │  ├──preview/

   │  │  │  └──...

   │  │  ├──strehl/

   │  │  │  └──...

   │  │  ├──pca/

   │  │  │  └──...

   │  │  └──programID_objectName_camera_filter_mark_yyyymmdd_HHMMSS.SSSSSS_faint.fits

   │  ├──planetary/

   │  │  ├──preview/

   │  │  │  └──...

   │  │  └──programID_objectName_camera_filter_mark_yyyymmdd_HHMMSS.SSSSSS_planetary.fits

   │  └──programID_objectName_camera_filter_mark_yyyymmdd_HHMMSS.SSSSSS.tar.bz2

   ├──.../

   ├──summary/

   │  ├──psflib/

   │  │  ├──programID_objectName_camera_filter_mark_yyyymmdd_HHMMSS.png

   │  │  ├──programID_objectName_camera_filter_mark_yyyymmdd_HHMMSS.fits

   │  ├──seeing/

   │  │  ├──yyyymmdd_hhmmss.png

   │  │  ├──...

   │  │  ├──seeing.yyyymmdd.txt

   │  │  └──seeing.yyyymmdd.png

   │  ├──contrast_curve.yyyymmdd.png

   │  └──strehl.yyyymmdd.png

   └──calib/?

|──.../

└──psf_library.fits

```

---

## Processing flowcharts

If you're seeking to understand that

### PSF library management

![alt text](/doc/psflib1.png)

![alt text](/doc/psflib2.png)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dmitryduev/roboao-archive

Awesome Lists containing this project

README