{"id":13717206,"url":"https://github.com/cytomining/cytominer-database","last_synced_at":"2025-06-11T23:02:06.013Z","repository":{"id":3849114,"uuid":"49161267","full_name":"cytomining/cytominer-database","owner":"cytomining","description":"[DEPRECATED] A package for storing morphological profiling data.","archived":false,"fork":false,"pushed_at":"2024-10-10T20:37:18.000Z","size":18132,"stargazers_count":10,"open_issues_count":14,"forks_count":11,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-03T08:41:53.510Z","etag":null,"topics":["database","microscopy","profiling"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cytomining.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2016-01-06T21:00:04.000Z","updated_at":"2024-10-10T20:21:53.000Z","dependencies_parsed_at":"2024-11-14T05:31:15.890Z","dependency_job_id":null,"html_url":"https://github.com/cytomining/cytominer-database","commit_stats":{"total_commits":657,"total_committers":10,"mean_commits":65.7,"dds":0.4337899543378996,"last_synced_commit":"5aa00f58e4a31bbbd2a3779c87e7a3620b0030db"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/cytomining/cytominer-database","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cytomining%2Fcytominer-database","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cytomining%2Fcytominer-database/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cytomining%2Fcytominer-database/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cytomining%2Fcytominer-database/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cytomining","download_url":"https://codeload.github.com/cytomining/cytominer-database/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cytomining%2Fcytominer-database/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259360655,"owners_count":22845813,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","microscopy","profiling"],"created_at":"2024-08-03T00:01:19.273Z","updated_at":"2025-06-11T23:02:05.969Z","avatar_url":"https://github.com/cytomining.png","language":"Python","funding_links":[],"categories":["Other"],"sub_categories":[],"readme":"This package is deprecated and will no longer be supported. Please use at your own risk!\n\n==================\ncytominer-database\n==================\n\n.. image:: https://travis-ci.org/cytomining/cytominer-database.svg?branch=master\n    :target: https://travis-ci.org/cytomining/cytominer-database\n    :alt: Build Status\n\n.. image:: https://readthedocs.org/projects/cytominer-database/badge/?version=latest\n    :target: http://cytominer-database.readthedocs.io/en/latest/?badge=latest\n    :alt: Documentation Status\n\ncytominer-database provides command-line tools for organizing measurements extracted from images.\n\nSoftware tools such as CellProfiler can extract hundreds of measurements from millions of cells in a typical\nhigh-throughput imaging experiment. The measurements are stored across thousands of CSV files.\n\ncytominer-database helps you organize these data into a single database backend, such as SQLite.\n\nWhy cytominer-database?\n=======================\nWhile tools like CellProfiler can store measurements directly in databases, it is usually infeasible to create a\ncentralized database in which to store these measurements. A more scalable approach is to create a set of CSVs per\n“batch” of images, and then later merge these CSVs.\n\ncytominer-database ingest reads these CSVs, checks for errors, then ingests\nthem into a database backend. The default backend is `SQLite`.\n\n.. code-block:: sh\n\n\tcytominer-database ingest source_directory sqlite:///backend.sqlite -c ingest_config.ini\n\nwill ingest the CSV files nested under source_directory into a `SQLite` backend\nTo select the `Parquet` backend add a `--parquet` flag:\n\n.. code-block:: sh\n\n\tcytominer-database ingest source_directory sqlite:///backend.sqlite -c ingest_config.ini --parquet\n\nThe ingest_config.ini file then needs to be adjusted to contain the `Parquet` specifications.\n\nHow to use the configuration file\n=================================\nThe configuration file ingest_config.ini must be located in the source_directory and can be modified to specify the ingestion.\nThere are three different sections.\n\nThe [filenames] section\n-----------------------\n\n.. code-block::\n\n  [filenames]\n  image   = image.csv      #or: Image.csv\n  object  = object.csv     #or: Object.csv\n\ncytominer-database is currently limited to the following measurement file kinds:\nCells.csv, Cytoplasm.csv, Nuclei.csv, Image.csv, Object.csv.\nThe [filenames] section in the configuration file saves the correct basename of existing measurement files.\nThis may be important in the case of inconsistent capitalization.\n\nThe [database_engine] section\n-----------------------------\n\n.. code-block::\n\n  [ingestion_engine]\n  engine = Parquet      #or: SQLite\n\nThe [database_engine] section specifies the backend.\nPossible key-value pairs are:\n**engine** = *SQLite* or **engine** = *Parquet*.\n\nThe [schema] section\n--------------------\n\n.. code-block::\n\n [schema]\n reference_option = sample         #or: path/to/reference/folder relative to source_directory\n ref_fraction     = 1              #or: any decimal value in [0, 1]\n type_conversion  = int2float      #or: all2string\n\nThe [schema] section specifies how to manage incompatibilities in the table schema of the files.\nIn that case, a Parquet file is fixed to a schema with which it was first opened, i.e. by the first file which is written (the reference file).\nTo append the data of all .csv files of that file-kind, it is important to assure the reference file satisfies certain incompatibility requirements.\nFor example, make sure the reference file does not miss any columns and all existing files can be automatically converted to the reference schema.\nNote: This section is used only if the files are ingested to Parquet format and was developed to handle the special cases in which tables that cannot be concatenated automatically.\n\nThere are two options for the key **reference_option**:\n\nThe first option is to create a designated folder containing one .csv reference file for every kind of file (\"Cytoplasm.csv\", \"Nuclei.csv\", ...) and save the folder path in the config file as **reference_option** = *path/to/reference/folder*, where the path is relative to the source_directory from the ingest command.\nThese reference files' schema will determine the schema of the Parquet file into which all .csv files of its kind will be ingested.\n\n\n**This option relies on manual selection, hence the chosen reference files must be checked explicitly: Make sure the .csv files are complete in the number of columns and contain no NaN values.**\n\nAlternatively, the reference files can be found automatically from a sampled subset of all existing files.\nThis is the case if **reference_option** = *sample* is set.\nA subset of all files is sampled uniformly at random and the first table with the maximum number of columns among all sampled .csv files is chosen as the reference table.\nIf this case, an additional key **ref_fraction** can be set, which specifies the fraction of files sampled among all files.\nThe default value is **ref_fraction** = *1* , for which all tables are compared in width.\nThis key is only used if \"reference_option=sample\".\n\nLastly, the key **type_conversion** determines how the schema types are handled in the case of disagreement.\nThe default value is *int2float*, for which all integer columns are converted to floats.\nThis has been proven helpful for trivial columns (0-valued column), which may be of \"int\" type and cannot be written into the same table as non-trivial files with non-zero float values.\nAutomatic type conversion can be avoided by converting all values to string-type.\nThis can be done by setting **type_conversion** = *all2string*.\nHowever, the loss of type information might be a disadvantage in downstream tasks.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcytomining%2Fcytominer-database","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcytomining%2Fcytominer-database","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcytomining%2Fcytominer-database/lists"}