{"id":31320688,"url":"https://github.com/stsav012/pycatsearch","last_synced_at":"2025-10-30T09:50:02.826Z","repository":{"id":132570093,"uuid":"208059199","full_name":"StSav012/pycatsearch","owner":"StSav012","description":"Spectral lines catalogs search tool","archived":false,"fork":false,"pushed_at":"2025-09-04T17:27:42.000Z","size":1328,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-09-25T16:57:51.550Z","etag":null,"topics":["cdms","jpl","python","python3","spectroscopy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/StSav012.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-09-12T13:35:36.000Z","updated_at":"2025-08-27T14:57:42.000Z","dependencies_parsed_at":"2023-05-25T22:15:11.300Z","dependency_job_id":"987d948f-133f-44a6-9219-cb0632d10567","html_url":"https://github.com/StSav012/pycatsearch","commit_stats":{"total_commits":392,"total_committers":1,"mean_commits":392.0,"dds":0.0,"last_synced_commit":"bc65fca20f3b02743aba98d78a3161653d84bc46"},"previous_names":[],"tags_count":29,"template":false,"template_full_name":null,"purl":"pkg:github/StSav012/pycatsearch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StSav012%2Fpycatsearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StSav012%2Fpycatsearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StSav012%2Fpycatsearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StSav012%2Fpycatsearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/StSav012","download_url":"https://codeload.github.com/StSav012/pycatsearch/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StSav012%2Fpycatsearch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281785985,"owners_count":26561250,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-30T02:00:06.501Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cdms","jpl","python","python3","spectroscopy"],"created_at":"2025-09-25T16:57:34.472Z","updated_at":"2025-10-30T09:50:02.819Z","avatar_url":"https://github.com/StSav012.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PyCatSearch\n\nYet another implementation of [JPL](https://spec.jpl.nasa.gov/) and [CDMS](https://cdms.astro.uni-koeln.de/)\nspectroscopy catalogs offline search.\n\n## Requirements\n\nThe code is developed under `Python 3.13`.\n\nIt should work under `Python 3.8` but is uninstallable there bue to changes in `setuptools`.\nStill, you can get the source files and try them under `Python 3.8`.\n\nThe non-GUI parts require an absolute minimum of non-standard modules.\nIf you want to download the catalog data faster, consider `async_downloader` module;\nit requires `aiohttp`.\nOtherwise, only the built-ins are used.\n\n## Installation\n\nThe package is available from the PyPI repo:\n\n```commandline\npython3 -m pip install pycatsearch\n```\n\nFor a bit faster downloading the catalog data, install `aiohttp`.\n\nFor a faster catalog loading, install `orjson`.\n\n## Usage\n\n### `catalog`\n\n###### Sample usage:\n\nIn a command line:\n\n```commandline\npycatsearch-cli --min-frequency 118749 --max-frequency 118751 catalog.json.gz -n oxygen\n```\nor\n```commandline\npython3 -m pycatsearch --min-frequency 118749 --max-frequency 118751 catalog.json.gz -n oxygen\n```\n\nIn a code:\n\n```python\nfrom pycatsearch.catalog import Catalog\n\nc = Catalog('catalog.json.gz')\nc.print(min_frequency=140141, max_frequency=140142)\n```\n\n###### Properties:\n\n- `catalog` is a list of the catalog entries loaded by `__init__`.\n- `frequency_limits` is a tuple of the minimal and the maximal frequencies of the lines\n  the loaded catalogs contain.\n- `is_empty` indicates whether nothing has been loaded by `__init__`.\n- `entries_count` is the number of the substances loaded by `__init__`.\n- `sources` contains a list of files that have been loaded successfully by `__init__`.\n- `sources_info` returns a list of the files and the timestamps recorded there (if any).\n- `min_frequency` and `max_frequency` are the extreme values of `frequency_limits`.\n\n###### Functions:\n\n- `__init__(self, *catalog_file_names: str)` accepts names of JSON or GZip/BZip2/LZMA-compressed JSON files.\n  It loads them into memory joined.\n- `filter(self, *,\n  min_frequency: float = 0.0,\n  max_frequency: float = math.inf,\n  min_intensity: float = -math.inf,\n  max_intensity: float = math.inf,\n  temperature: float = -math.inf,\n  any_name: str = '',\n  any_formula: str = '',\n  any_name_or_formula: str = '',\n  anything: str = '',\n  species_tag: int = 0,\n  inchi: str = '',\n  trivial_name: str = '',\n  structural_formula: str = '',\n  name: str = '',\n  stoichiometric_formula: str = '',\n  isotopolog: str = '',\n  state: str = '',\n  degrees_of_freedom: int | None = None) -\u003e dict[int, dict[str, int | str | list[dict[str, float]]]]`\n  returns only the catalog entries that meet the criteria specified. The arguments are the following:\n    - `float min_frequency`: the lower frequency \\[MHz\\] to take.\n    - `float max_frequency`: the upper frequency \\[MHz\\] to take.\n    - `float min_intensity`: the minimal intensity \\[log10(nm²×MHz)\\] to take.\n    - `float max_intensity`: the maximal intensity \\[log10(nm²×MHz)\\] to take, use to avoid meta-stable substances.\n    - `float temperature`: the temperature to calculate the line intensity at,\n      use the catalog intensity if not set.\n    - `str any_name`: a string to match the “trivialname” or the “name” field.\n    - `str any_formula`: a string to match the “structuralformula,” “moleculesymbol,”\n      “stoichiometricformula,” or “isotopolog” field.\n    - `str any_name_or_formula`: a string to match any field used by `any_name` and `any_formula`.\n    - `str anything`: a string to match any field.\n    - `int species_tag`: a number to match the “speciestag” field.\n    - `str inchi`: a string to match the “inchikey” field.\n      See https://iupac.org/who-we-are/divisions/division-details/inchi/ for more.\n    - `str trivial_name`: a string to match the “trivialname” field.\n    - `str structural_formula`: a string to match the “structuralformula” field.\n    - `str name`: a string to match the “name” field.\n    - `str stoichiometric_formula`: a string to match the “stoichiometricformula” field.\n    - `str isotopolog`: a string to match the “isotopolog” field.\n    - `str state`: a string to match the “state” or the “state_html” field.\n    - `int degrees_of_freedom`: 0 for atoms, 2 for linear molecules, and 3 for nonlinear molecules.\n- `filter_by_species_tags(self, *,\n  species_tags: Iterable[int] | None = None,\n  min_frequency: float = 0.0,\n  max_frequency: float = math.inf,\n  min_intensity: float = -math.inf,\n  max_intensity: float = math.inf,\n  temperature: float = -math.inf,\n  ) -\u003e dict[int, dict[str, int | str | list[dict[str, float]]]]`\n  returns only the catalog entries that meet the criteria specified.\n  It is a faster version of the `filter` function, for it makes fewer comparisons.\n  The arguments are the following:\n    - `Iterable[int] | None species_tags`: numbers to match the “speciestag” field,\n      use all items listed in the catalog if not set or set to `None`.\n    - `float min_frequency`: the lower frequency \\[MHz\\] to take.\n    - `float max_frequency`: the upper frequency \\[MHz\\] to take.\n    - `float min_intensity`: the minimal intensity \\[log10(nm²×MHz)\\] to take.\n    - `float max_intensity`: the maximal intensity \\[log10(nm²×MHz)\\] to take, use to avoid meta-stable substances.\n    - `float temperature`: the temperature to calculate the line intensity at,\n      use the catalog intensity if not set.\n- `print(**kwargs)` prints a table of the filtered catalog entries.\n  It accepts all the arguments valid for the `filter` function.\n\n### `downloader`\n\n###### Sample usage:\n\nIn a command line:\n\n```commandline\npycatsearch-downloader --min-frequency 115000 --max-frequency 178000 catalog.json.gz\n```\n\nIn a code:\n\n```python\nfrom pycatsearch import downloader\n\ndownloader.save_catalog('catalog.json.gz', (115000, 178000))\n```\n\n###### Functions:\n\n- `get_catalog(frequency_limits: tuple[float, float] = (0.0, math.inf)) -\u003e\n  dict[int, dict[str, int | str | list[dict[str, float]]]]` downloads the spectral lines catalog data.\n  It returns a list of the spectral lines catalog entries.\n  The parameter `frequency_limits` is the frequency range of the catalog entries to keep.\n  By default, there are no limits.\n- `save_catalog(filename: str, frequency_limits: tuple[float, float] = (0.0, math.inf)) -\u003e bool`\n  downloads and saves the spectral lines catalog data.\n  Inside, `get_catalog` function is called.\n  The function returns `True` if something got downloaded, `False` otherwise.\n  The function fails with an error if `get_catalog` raises an error,\n  or if the result cannot be stored in the specified file.\n  The parameters of `save_catalog` are the following:\n    - `str filename`: the name of the file to save the downloaded catalog to.\n      If it ends with an unknown suffix, `'.json.gz'` is appended to it.\n    - `tuple frequency_limits`: the tuple of the maximal and the minimal frequencies of the lines being stored.\n      All the lines outside the specified frequency range are omitted. By default, there are no limits.\n\n### `async_downloader`\n\nThis is like `downloader`, but much, much faster.\nThe download speed is limited by the remote servers.\nMost of the time, it takes no more than 90 seconds to load all the data.\n\nRequires `aiohttp`.\n\n###### Sample usage:\n\nIn a command line:\n\n```commandline\npycatsearch-async-downloader --min-frequency 115000 --max-frequency 178000 catalog.json.gz\n```\n\nIn a code:\n\n```python\nfrom pycatsearch import async_downloader\n\nasync_downloader.save_catalog('catalog.json.gz', (115000, 178000))\n```\n\n###### Functions:\n\n- `get_catalog(frequency_limits: tuple[float, float] = (0.0, math.inf)) -\u003e\n  dict[int, dict[str, int | str | list[dict[str, float]]]]`\n- `save_catalog(filename: str, frequency_limits: tuple[float, float] = (0.0, math.inf)) -\u003e bool`\n\nThe functions behave _almost_ exactly like their namesakes from `downloader`.\n`get_catalog` prints out the progress described in two numbers:\n\n- the number of species, for which the data has already been downloaded\n  and contains spectral lines within the specified frequency range, and\n- the number of species yet to be downloaded and processed.\n\n###### `Downloader` class\n\nAn instance of `Downloader` class is created in `get_catalog` function.\nThen, a separate thread takes care of the downloading.\nIf the thread fails, `get_catalog` returns an empty list, almost never raising an exception.\n\nThe class constructor accepts the frequency limits, like `get_catalog` function.\n\nOne also may provide the constructor with a `multiprocessing.Queue[tuple[int, int]]`\nto see the downloading progress.\nThe first number of the tuple is the number of the species,\nfor which the data has already been downloaded\nand contains spectral lines within the specified frequency range.\nThe second one is the number of species yet to be downloaded and processed.\nThe numbers are the same as what `get_catalog` function types.\n\n## File Format\n\nFor physical meaning of the values, check out [catdoc.pdf](https://spec.jpl.nasa.gov//ftp//pub/catalog/doc/catdoc.pdf).\n\n### A JSON file, optionally compressed\n\nThe JSON file contains a dictionary of substances called `catalog`.\nThe keys of the dictionary are the species tags.\nEach substance is described like the following:\n\n```json\n{\n  \"id\": 4,\n  \"molecule\": 3,\n  \"structuralformula\": \"H2\",\n  \"stoichiometricformula\": \"H2\",\n  \"moleculesymbol\": \"H\u003csub\u003e2\u003c/sub\u003e\",\n  \"speciestag\": 3501,\n  \"name\": \"HD,v=0,1\",\n  \"trivialname\": \"Hydrogen molecule\",\n  \"isotopolog\": \"HD\",\n  \"state\": \"$v=0,1$\",\n  \"state_html\": \"v=0,1\",\n  \"inchikey\": \"UFHFLCQGNIYNRP-OUBTZVSYSA-N\",\n  \"contributor\": \"H. S. P. M\\u00fcller\",\n  \"version\": \"2*\",\n  \"dateofentry\": \"2011-12-01\",\n  \"degreesoffreedom\": 2,\n  \"lines\": []\n}\n```\n\n`lines` is an array of the substance absorption lines records.\nFor now, it includes only the _frequency_ \\[MHz\\], the _intensity_ \\[log10(nm²×MHz)\\],\nand the _lower state energy_ relative to the ground state \\[1/cm\\] of a line:\n\n```json\n{\n  \"frequency\": 143285.9808,\n  \"intensity\": -6.4978,\n  \"lowerstateenergy\": 581.4862\n}\n```\n\nBesides `catalog`, the JSON file contains `frequency` array that holds the frequency limits of the catalog\nand the catalog build time in ISO format.\nJust in case.\n\nThe compression might be [GZip](https://en.wikipedia.org/wiki/Gzip),\n[BZip2](https://en.wikipedia.org/wiki/Bzip2),\nor [LZMA2](https://en.wikipedia.org/wiki/LZMA#LZMA2_format).\n\nThe filename is expected to be suffixed with `.json`, `.json.gz`, `.json.bz2`, `.json.xz`, or `.json.lzma`.\n\n### A tar archive, optionally compressed\n\nThe tar format allows storing multiple files inside an archive.\nLoading multiple files might require more time but generally less memory.\n\nThe archive may contain files in the initial format as provided by JPL and CDMS.\nThe naming convention is the same as for the organizations.\nBesides them, there should be a file named `species.json` describing the substance a data file is for.\nThe description looks identical to what the unified JSON file contains for a substance;\nthe `lines` list is ignored and is better omitted.\nThe tags missing from `species.json` are ignored.\nThe archive might also contain a file named `metadata.json`.\nCurrently, only the `build_time` field from the file is taken into account.\n\nAlternatively, the tar archive may contain files in the JSON format for every substance.\nThe files are exactly like the dictionary for a substance in the unified JSON file.\nIf all the files in the archive are like this, there is no need for a `species.json` file.\nIf such a file is nonetheless present, the data in the per-species files take precedence over `species.json`.\nThe archive might also contain a file named `metadata.json`.\nThere, the `frequency` field contains the frequency limits for the catalog data.\nThe `build_time` field stores the moment when the archive file is created;\nit is _not_ the time when the data for the substances are comprised.\n\nThe compression might be [GZip](https://en.wikipedia.org/wiki/Gzip),\n[BZip2](https://en.wikipedia.org/wiki/Bzip2),\nor [LZMA2](https://en.wikipedia.org/wiki/LZMA#LZMA2_format).\n\nThe filename is expected to be suffixed with `.tar`, `.tar.gz`, `.tar.bz2`, `.tar.xz`, `.tgz`, `.tbz2`, or `.txz`\nregardless of the files inside.\nAn archive may contain both files in the initial CMDS/JPL format and per-substance JSON documents.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstsav012%2Fpycatsearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstsav012%2Fpycatsearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstsav012%2Fpycatsearch/lists"}