{"id":14982741,"url":"https://github.com/azazellochg/mdcatch","last_synced_at":"2025-08-21T03:06:43.379Z","repository":{"id":37801044,"uuid":"201809857","full_name":"azazellochg/MDCatch","owner":"azazellochg","description":"Fetch metadata from EPU / SerialEM and launch on-the-fly pre-processing","archived":false,"fork":false,"pushed_at":"2023-07-04T15:04:17.000Z","size":622,"stargazers_count":9,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-10-12T01:21:36.662Z","etag":null,"topics":["cryo-em","epu","python","qt6","serialem"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/azazellochg.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGES.txt","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-11T20:05:11.000Z","updated_at":"2023-11-01T14:28:48.000Z","dependencies_parsed_at":"2024-09-28T08:00:36.151Z","dependency_job_id":null,"html_url":"https://github.com/azazellochg/MDCatch","commit_stats":{"total_commits":381,"total_committers":1,"mean_commits":381.0,"dds":0.0,"last_synced_commit":"bcd25862b44015e1ae93243e846a879f6f229d47"},"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/azazellochg%2FMDCatch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/azazellochg%2FMDCatch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/azazellochg%2FMDCatch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/azazellochg%2FMDCatch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/azazellochg","download_url":"https://codeload.github.com/azazellochg/MDCatch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":219857515,"owners_count":16556062,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cryo-em","epu","python","qt6","serialem"],"created_at":"2024-09-24T14:05:56.575Z","updated_at":"2024-10-12T01:21:47.832Z","avatar_url":"https://github.com/azazellochg.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"MDCatch\n=======\n\nA simple app to fetch acquisition metadata from a EPU session or SerialEM.\nIt parses the first found xml/mdoc/mrc/tif file (from EPU/SerialEM) associated with a\ndata collection session and launches Relion 4 or Scipion 3 pipeline.\n\nInstallation\n------------\n\nYou can install either using pip (recommended) or from sources.\n\n.. raw:: html\n\n   \u003cdetails\u003e\n   \u003csummary\u003e\u003ca\u003eDependencies\u003c/a\u003e\u003c/summary\u003e\n\nDependencies are installed by pip automatically:\n\n * python\n * pyqt6 (GUI)\n * mrcfile (to parse MRC header)\n * tifffile (to parse TIF header)\n * emtable (STAR file parser)\n\n.. raw:: html\n\n   \u003c/details\u003e\n   \u003cdetails\u003e\n   \u003csummary\u003e\u003ca\u003eInstall from pip\u003c/a\u003e\u003c/summary\u003e\n\n.. code-block:: python\n\n   pip install --user MDCatch\n\n.. raw:: html\n\n   \u003c/details\u003e\n   \u003cdetails\u003e\n   \u003csummary\u003e\u003ca\u003eInstall from sources\u003c/a\u003e\u003c/summary\u003e\n\nCreate conda env (requires miniconda3 installed):\n\n.. code-block:: python\n\n    conda create -n mdcatch python=3\n    conda activate mdcatch\n    git clone https://github.com/azazellochg/MDCatch.git\n    cd MDCatch\n    pip install -e .\n\n.. raw:: html\n\n   \u003c/details\u003e\n\nScreenshots\n-----------\n\n.. image:: https://github.com/azazellochg/MDCatch/assets/6952870/621deb21-bec7-478e-8b89-61659765a383\n   :width: 640 px\n\n.. image:: https://github.com/azazellochg/MDCatch/assets/6952870/3d7475e5-7c34-42b9-b317-4662482c6c30\n   :width: 640 px\n\n\nRunning\n-------\n\nTo run simply type **mdcatch**.\n\n.. important:: Make sure the detected dose per frame is correct! The reported dose is obtained from an image (at the camera level), so it can differ due to sample thickness, obj. aperture and energy filtering. If you are collecting EER data, the reported dose is per EER frame! EER movies will be fractionated such that final frames will have 1 e/A\\ :sup:`2`.\n\nUser guide\n----------\n\nHere you can find information about how the app works and how to configure it for your setup.\n\n.. raw:: html\n\n   \u003cdetails\u003e\n   \u003csummary\u003e\u003ca\u003eGeneral information\u003c/a\u003e\u003c/summary\u003e\n\nThe app is installed on a pre-processing server with GPU(s).\nThe server requires the following software installed:\n\n    - `RELION 4.0 \u003chttps://relion.readthedocs.io/en/release-4.0/\u003e`_ or/and `Scipion 3 \u003chttp://scipion.i2pc.es/\u003e`_\n    - `CTFFIND4 \u003chttps://grigoriefflab.umassmed.edu/ctffind4\u003e`_\n    - `Topaz \u003chttps://github.com/tbepler/topaz\u003e`_ or/and `crYOLO 1.9+ \u003chttps://cryolo.readthedocs.io/\u003e`_ (installed in a separate conda environment)\n\nRelion and/or Scipion should be available from your shell **PATH**. For Relion's schemes you also need to define the following variables:\n\n.. code-block:: bash\n\n    export RELION_SCRATCH_DIR=\"/ssd/$USER\"\n    export RELION_CTFFIND_EXECUTABLE=/home/gsharov/soft/ctffind\n    export RELION_TOPAZ_EXECUTABLE=/home/gsharov/soft/topaz\n    export RELION_PYTHON=/home/gsharov/soft/miniconda3/envs/topaz-0.2.4/bin/python  # is used by Relion's PyTorch for 2D cls sorting\n\n*/home/gsharov/soft/topaz* is a bash script like below, that activates topaz environment:\n\n.. code-block:: bash\n\n    #!/bin/bash\n    source /home/gsharov/soft/miniconda3/bin/activate topaz-0.2.4\n    topaz $@\n\nIf you are using crYOLO, you need to edit a few variables at the top of *external_job_cryolo.py* file. This script can also be used completely independently from MDCatch.\n\nAdditionally, this server needs access to both EPU session folder (with metadata files) and\nraw movies folder. In our case both storage systems are mounted via NFSv4.\n\n.. raw:: html\n\n   \u003c/details\u003e\n   \u003cdetails\u003e\n   \u003csummary\u003e\u003ca\u003eConfiguration\u003c/a\u003e\u003c/summary\u003e\n\nMost of the configuration is done in **config.py**.\nFor the very first time it is useful to set **DEBUG=1** to see additional output and make sure it all works as expected.\n\nImportant points to mention:\n\n    * camera names in the SCOPE_DICT must match the names in EPU_MOVIES_DICT, GAIN_DICT and MTF_DICT\n    * since in EPU Falcon cameras are called \"BM-Falcon\" or \"EF-Falcon\" and Gatan cameras are called \"EF-CCD\", MOVIE_PATH_DICT keys should not be changed, only the values\n    * Relion schemes use two GPUs: 0-1\n\nBelow is an example of the folders setup on our server. Data points to movies storage, while Metadata is for EPU sessions.\n\n.. code-block:: bash\n\n    /mnt\n    ├── Data\n    │     ├── Krios1\n    │     │     ├── Falcon3\n    │     │     └── K3 (with DoseFractions folder inside)\n    │     ├── Krios2\n    │     │     ├── Falcon4\n    │     │     └── K2 (with DoseFractions folder inside)\n    │     ├── Krios3\n    │     │     ├── Falcon3\n    │     │     └── K3 (with DoseFractions folder inside)\n    │     ├── Krios4\n    │     │     └── Falcon4\n    │     └── Glacios\n    │           └── Falcon3\n    └── MetaData\n        ├── Krios1\n        ├── Krios2\n        ├── Krios3\n        └── Krios4\n\n.. raw:: html\n\n   \u003c/details\u003e\n   \u003cdetails\u003e\n   \u003csummary\u003e\u003ca\u003eWorking principle\u003c/a\u003e\u003c/summary\u003e\n\n\nRunning steps\n#############\n\n1. find and parse the first metadata file, getting all acquisition metadata\n2. create a Relion/Scipion project folder ``username_microscope_date_time`` inside PROJECT_PATH (or inside Scipion default projects folder)\n3. create symlink for movies folder; copy gain reference, defects file, MTF into the project folder\n4. save found acquisition params in a text file (e.g. ``EPU_session_params``), save Relion params in ``relion_it_options.py``\n5. modify existing Relion Schemes/Scipion template, copy them to the project folder then launch Relion/Scipion on-the-fly processing\n\nMetadata formats\n################\n\nWhile EPU xml files are most rich in terms of needed metadata, other formats can be used as well. If you set PATTERN_EPU to mrc format, the app will try to parse MRC header of unaligned movie sums in the EPU session folder.\nHowever we cannot detect number of movie frames and super-resolution mode from such a header, so you would need to check and input correct pixel size and/or fluence per frame.\n\nIn case of SerialEM, mdoc file is expected to contain a microscope D-number (see example in *tests/testdata*). If you set PATTERN_SEM to tif, the TIF header of a movie will be parsed.\nUnfortunately SerialEM does not save much metadata in such header, so a lot of values will be missing. Default values will be used for microscope ID, detector, voltage and binning (see *utils/tiff.py*). So, parsing tif is not recommended.\nEER header parsing is also possible, but again, it's just a special kind of TIF format.\n\nEPU vs SerialEM\n###############\n\nWhen choosing EPU option, the user must browse to the EPU session folder (that contains Images-Disc folder) with the GUI.\nThe app will search and parse the first found xml or mrc file from that folder (depending on PATTERN_EPU).\nThe metadata folder name (EPU session name) matches the folder name with movies on a storage server.\n\nIn case of SerialEM, the movies and metadata (mdoc file) are expected to be in the same folder, so here user must select a folder with movies in the GUI.\n\nSPA vs Helical mode\n###################\n\nFrom MDCatch v2.2 onwards crYOLO picker can be run in helical mode (crYOLO v1.9+ required). Instead of a particle size, user provides the filament width. A pre-trained crYOLO model is also required.\nThe suggested parameters in this case are:\n\n    - tube diameter = 1.2 x filament width\n    - box size = 1.5 x tube diameter\n    - mask size = 0.9 x box size\n    - inter-box distance = 0.1 x box size\n\nWhen running standard SPA, the suggested parameters are:\n\n    - box size = 1.5 x particle size\n    - mask size = 1.1 x particle size\n\nMore details can be found in the code, see **calcBox()** inside *parser.py*\n\nRELION vs Scipion\n#################\n\nSo far RELION runs are more tested than Scipion. In the latter case, the app provides a single **template.json**,\nso irrespective of particle picker choice crYOLO will always be used.\nHave a look into the json file to see what pipeline will be launched.\n\nScipion project will be created in the default Scipion projects folder.\n\n.. raw:: html\n\n   \u003c/details\u003e\n   \u003cdetails\u003e\n   \u003csummary\u003e\u003ca\u003eRelion schemes description\u003c/a\u003e\u003c/summary\u003e\n\nThere are two schemes: *prep* and *proc-cryolo* (or *proc-topaz*). The latter is available in 3 variants: cryolo, topaz and log. Both schemes launched at the same time and will run for 18 hours\n\n1. The *prep* scheme includes 3 jobs that run in a loop, processing batches of 50 movies each time:\n\n    a) import movies\n    b) motion correction (relion motioncor)\n    c) ctffind4-4.1.14\n\n.. important:: The movie frames will be grouped if the dose per frame is \u003c 0.8 e/A\\ :sup:`2`. EER movies are fractionated such that final frames have 1 e/A\\ :sup:`2`.\n\n2. The *proc* scheme starts once ctffind results are available. Proc includes multiple jobs:\n\n    a) micrograph selection (CTF resolution \u003c 6A)\n    b) particle picking: Cryolo (proc-cryolo) or Topaz/Logpicker (proc-topaz)\n    c) binned particles extraction\n    d) 2D classification with 50 classes\n    e) auto-selection of good 2D classes (thr=0.35)\n    f) 3D initial model if number of good particles from previous step is \u003e 5000\n    g) 3D refinement\n\nThe last four steps are always executed as new jobs (not overwriting previous results).\n\n.. raw:: html\n\n   \u003c/details\u003e\n   \u003cdetails\u003e\n   \u003csummary\u003e\u003ca\u003eTesting installation\u003c/a\u003e\u003c/summary\u003e\n\nThe test only checks if the parsers are working correctly using files from *tests/testdata* folder.\n\n.. code-block:: python\n\n    python -m unittest mdcatch.tests\n\n.. raw:: html\n\n   \u003c/details\u003e\n\nExtras\n------\n\nThe MDCatch package provides extra command-line scripts to parse MRC, XML, MDOC or TIF file headers. Simply use one of the commands below followed by a filename:\n\n* parse-mrc filename.mrc\n* parse-xml filename.xml\n* parse-mdoc filename.mdoc\n* parse-tif filename.tiff\n\nHow to cite\n-----------\n\nKimanius D, Dong L, Sharov G, Nakane T, Scheres SHW. New tools for automated cryo-EM single-particle analysis in RELION-4.0. Biochem J. 2021, 478(24), p. 4169-4185. doi:10.1042/BCJ20210708\n\nFeedback\n--------\n\nPlease report bugs and suggestions for improvements as a `Github issue \u003chttps://github.com/azazellochg/MDCatch/issues/new\u003e`_.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fazazellochg%2Fmdcatch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fazazellochg%2Fmdcatch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fazazellochg%2Fmdcatch/lists"}