{"id":30110514,"url":"https://github.com/waikato-datamining/spectral-data-converter","last_synced_at":"2026-01-20T17:27:35.636Z","repository":{"id":298805615,"uuid":"999963205","full_name":"waikato-datamining/spectral-data-converter","owner":"waikato-datamining","description":"Python library for converting (and filtering) spectral data in various formats.","archived":false,"fork":false,"pushed_at":"2025-08-04T03:06:09.000Z","size":272,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-04T05:58:08.832Z","etag":null,"topics":["conversion","python3","spectral-data"],"latest_commit_sha":null,"homepage":"https://www.data-mining.co.nz/spectral-data-converter-examples/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/waikato-datamining.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.rst","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-11T04:16:45.000Z","updated_at":"2025-08-04T03:06:12.000Z","dependencies_parsed_at":"2025-06-13T02:41:33.416Z","dependency_job_id":"725ae94d-aece-473c-8e0b-0191a5abe00f","html_url":"https://github.com/waikato-datamining/spectral-data-converter","commit_stats":null,"previous_names":["waikato-datamining/spectral-data-converter"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/waikato-datamining/spectral-data-converter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/waikato-datamining%2Fspectral-data-converter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/waikato-datamining%2Fspectral-data-converter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/waikato-datamining%2Fspectral-data-converter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/waikato-datamining%2Fspectral-data-converter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/waikato-datamining","download_url":"https://codeload.github.com/waikato-datamining/spectral-data-converter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/waikato-datamining%2Fspectral-data-converter/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269677556,"owners_count":24457861,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-10T02:00:08.965Z","response_time":71,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conversion","python3","spectral-data"],"created_at":"2025-08-10T04:47:17.660Z","updated_at":"2026-01-20T17:27:35.625Z","avatar_url":"https://github.com/waikato-datamining.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# spectral-data-converter\nPython library for converting (and filtering) spectral data in various formats.\n\nSome of the formats make use of the [wai_spectralio](https://github.com/waikato-datamining/wai-spectralio) \nlibrary and some of the filters use the [wai_ma](https://github.com/waikato-datamining/py-matrix-algorithms) \nlibrary for processing the data.\n\n## Installation\n\nVia PyPI:\n\n```bash\npip install spectral_data_converter\n```\n\nThe latest code straight from the repository:\n\n```bash\npip install git+https://github.com/waikato-datamining/spectral-data-converter.git\n```\n\n## Docker\n\nDocker images are available as well. Please see the following page por more information:\n\nhttps://github.com/waikato-datamining/spectral-data-converter-all/tree/main/docker\n\n\n## Dataset formats\n\nThe following dataset formats are supported:\n\n| Format                         | Read                          | Write                      | \n|:-------------------------------|:------------------------------|:---------------------------| \n| [ADAMS](formats/adams.md)      | [Y](plugins/from-adams.md)    | [Y](plugins/to-adams.md)   | \n| [ARFF](formats/arff.md)        | [Y](plugins/from-arff.md)     | [Y](plugins/to-arff.md)    | \n| [ASC](formats/asc.md)          | [Y](plugins/from-asc.md)      | [Y](plugins/to-asc.md)     | \n| [ASCII XY](formats/asciixy.md) | [Y](plugins/from-asciixy.md)  | [Y](plugins/to-asciixy.md) | \n| CAL (FOSS)                     | [Y](plugins/from-cal.md)      | [Y](plugins/to-cal.md)     | \n| CSV                            | [Y](plugins/from-csv.md)      | [Y](plugins/to-csv.md)     | \n| [DPT](formats/dpt.md)          | [Y](plugins/from-dpt.md)      | [Y](plugins/to-dpt.md)     | \n| MPS                            | [Y](plugins/from-mps.md)      | N                          | \n| NIR (FOSS)                     | [Y](plugins/from-nir.md)      | [Y](plugins/to-nir.md)     | \n| OPUS (Bruker)                  | [Y](plugins/from-opus.md)     | N                          | \n| OPUS Ext (Bruker)              | [Y](plugins/from-opus-ext.md) | N                          | \n| SPA (Thermo Scientific)        | [Y](plugins/from-spa.md)      | N                          | \n\n\nThe following sample data formats are supported:\n\n| Format                                  | Read                           | Write                        | \n|:----------------------------------------|:-------------------------------|:-----------------------------| \n| [ADAMS Report](formats/adams_report.md) | [Y](plugins/from-report-sd.md) | [Y](plugins/to-report-sd.md) | \n| CSV                                     | [Y](plugins/from-csv-sd.md)    | [Y](plugins/to-csv-sd.md)    | \n| JSON                                    | [Y](plugins/from-json-sd.md)   | [Y](plugins/to-json-sd.md)   | \n\n\n## Tools\n\n### Dataset conversion\n\n```\nusage: sdc-convert [-h] [--help-all] [--help-plugin NAME] [-u INTERVAL]\n                   [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [-b]\n                   [--placeholders FILE] [--load_pipeline FILE]\n                   [--dump_pipeline FILE]\n\nTool for converting between spectral data formats.\n\nreaders (25):\n   from-adams, from-arff, from-asc, from-asciixy, from-cal, from-csv, \n   from-csv-sd, from-dpt, from-json-sd, from-mps, from-multi, from-nir, \n   from-opus, from-opus-ext, from-pyfunc, from-report-sd, from-spa, \n   from-storage, from-text-file, from-zip, get-email, list-files, \n   poll-dir, start, watch-dir\nfilters (41):\n   add-sampledata, apply-cleaner, attach-metadata, block, center, \n   check-duplicate-filenames, copy-files, delete-storage, \n   discard-by-name, downsample, equi-distance, list-to-sequence, log, \n   max-records, metadata, metadata-from-name, metadata-to-placeholder, \n   move-files, passthrough, pca, pls1, pyfunc-filter, randomize-records, \n   record-window, rename, row-norm, sample, savitzky-golay, \n   savitzky-golay2, set-metadata, set-placeholder, set-storage, simpls, \n   spectrum-to-sampledata, split-records, standard-normal-variate*, \n   standardize, stop, sub-process, tee, trigger\nwriters (20):\n   console, delete-files, send-email, to-adams, to-arff, to-asc, \n   to-asciixy, to-cal, to-csv, to-csv-sd, to-dpt, to-json-sd, \n   to-metadata, to-multi, to-nir, to-pyfunc, to-report-sd, to-storage, \n   to-text-file, to-zip\n\noptions:\n  -h, --help           Show basic help message and exit.\n  --help-all           Show basic help message plus help on all plugins and exit.\n  --help-plugin NAME   Show help message for plugin NAME and exit.\n  -u, --update_interval INTERVAL\n                       Outputs the progress every INTERVAL records (default: 1000).\n  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}\n                       The logging level to use (default: WARN).\n  -b, --force_batch    Processes the data in batches.\n  --placeholders FILE  The file with custom placeholders to load (format: key=value).\n  --load_pipeline FILE The file to load the pipeline command from.\n  --dump_pipeline FILE The file to dump the pipeline command in.\n```\n\n### Executing pipeline multiple times\n\n```\nusage: sdc-exec [-h] --exec_generator GENERATOR [--exec_dry_run]\n                [--exec_prefix PREFIX] [--exec_placeholders FILE]\n                [--exec_format {cmdline,file}]\n                [--exec_logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]\n                ...\n\nTool for executing a pipeline multiple times, each time with a different set\nof variables expanded. A variable is surrounded by curly quotes (e.g.,\nvariable 'i' gets referenced with '{i}'). When supplying multiple generators,\nthen these get treated as nested executions. Available generators: csv-file,\ndirs, list, null, prompt, range, text-file\n\npositional arguments:\n  pipeline              The pipeline template with variables to expand and\n                        then execute; see '--exec_format' option.\n\noptions:\n  -h, --help            show this help message and exit\n  --exec_generator GENERATOR\n                        The generator plugin(s) to use, incl. their options.\n                        Flag needs to be specified for each generator.\n                        (default: None)\n  --exec_dry_run        Applies the generator to the pipeline template and\n                        only outputs it on stdout. (default: False)\n  --exec_prefix PREFIX  The string to prefix the pipeline with when in dry-run\n                        mode. (default: None)\n  --exec_placeholders FILE\n                        The file with custom placeholders to load (format:\n                        key=value). (default: None)\n  --exec_format {cmdline,file}\n                        The format that the pipeline is in. The format\n                        'cmdline' interprets the remaining arguments as the\n                        pipeline arguments to execute. The format 'file'\n                        expects a file to load the pipeline arguments from.\n                        This file format allows spreading the pipeline\n                        arguments over multiple lines: it simply joins all\n                        lines into a single command-line before splitting it\n                        into individual arguments for execution. (default:\n                        cmdline)\n  --exec_logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}\n                        The logging level to use. (default: WARN)\n```\n\n\n### Locating files\n\nReaders tend to support input via file lists. The `idc-find` tool can generate\nthese.\n\n```\nusage: sdc-find [-h] -i DIR [DIR ...] [-r] -o FILE [-m [REGEXP ...]]\n                [-n [REGEXP ...]] [--split_ratios [SPLIT_RATIOS ...]]\n                [--split_names [SPLIT_NAMES ...]]\n                [--split_name_separator SPLIT_NAME_SEPARATOR]\n                [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]\n\nTool for locating files in directories that match certain patterns and store\nthem in files.\n\noptions:\n  -h, --help            show this help message and exit\n  -i DIR [DIR ...], --input DIR [DIR ...]\n                        The dir(s) to scan for files. (default: None)\n  -r, --recursive       Whether to search the directories recursively\n                        (default: False)\n  -o FILE, --output FILE\n                        The file to store the located file names in (default:\n                        None)\n  -m [REGEXP ...], --match [REGEXP ...]\n                        The regular expression that the (full) file names must\n                        match to be included (default: None)\n  -n [REGEXP ...], --not-match [REGEXP ...]\n                        The regular expression that the (full) file names must\n                        match to be excluded (default: None)\n  --split_ratios [SPLIT_RATIOS ...]\n                        The split ratios to use for generating the splits\n                        (int; must sum up to 100) (default: None)\n  --split_names [SPLIT_NAMES ...]\n                        The split names to use as filename suffixes for the\n                        generated splits (before .ext) (default: None)\n  --split_name_separator SPLIT_NAME_SEPARATOR\n                        The separator to use between file name and split name\n                        (default: -)\n  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}\n                        The logging level to use. (default: WARN)\n```\n\n\n### Generating help screens for plugins\n\n```\nusage: sdc-help [-h] [-c [PACKAGE ...]] [-e EXCLUDED_CLASS_LISTERS]\n                [-T {pipeline,generator,cleaner}] [-p NAME]\n                [-f {text,markdown}] [-L INT] [-o PATH] [-i FILE] [-t TITLE]\n                [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]\n\nTool for outputting help for plugins in various formats.\n\noptions:\n  -h, --help            show this help message and exit\n  -c [PACKAGE ...], --custom_class_listers [PACKAGE ...]\n                        The custom class listers to use, uses the default ones\n                        if not provided. (default: None)\n  -e EXCLUDED_CLASS_LISTERS, --excluded_class_listers EXCLUDED_CLASS_LISTERS\n                        The comma-separated list of class listers to exclude.\n                        (default: None)\n  -T {pipeline,generator,cleaner}, --plugin_type {pipeline,generator,cleaner}\n                        The types of plugins to generate the help for.\n                        (default: pipeline)\n  -p NAME, --plugin_name NAME\n                        The name of the plugin to generate the help for,\n                        generates it for all if not specified (default: None)\n  -f {text,markdown}, --help_format {text,markdown}\n                        The output format to generate (default: text)\n  -L INT, --heading_level INT\n                        The level to use for the heading (default: 1)\n  -o PATH, --output PATH\n                        The directory or file to store the help in; outputs it\n                        to stdout if not supplied; if pointing to a directory,\n                        automatically generates file name from plugin name and\n                        help format (default: None)\n  -i FILE, --index_file FILE\n                        The file in the output directory to generate with an\n                        overview of all plugins, grouped by type (in markdown\n                        format, links them to the other generated files)\n                        (default: None)\n  -t TITLE, --index_title TITLE\n                        The title to use in the index file (default: spectral-\n                        data-converter plugins)\n  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}\n                        The logging level to use. (default: WARN)\n```\n\n\n### Plugin registry\n\n```\nusage: sdc-registry [-h] [-c CUSTOM_CLASS_LISTERS] [-e EXCLUDED_CLASS_LISTERS]\n                    [-l {plugins,pipeline,custom-class-listers,env-class-listers,readers,direct-readers,filters,writers,direct-writers,generators,cleaners}]\n\nFor inspecting/querying the registry.\n\noptions:\n  -h, --help            show this help message and exit\n  -c CUSTOM_CLASS_LISTERS, --custom_class_listers CUSTOM_CLASS_LISTERS\n                        The comma-separated list of custom class listers to\n                        use. (default: None)\n  -e EXCLUDED_CLASS_LISTERS, --excluded_class_listers EXCLUDED_CLASS_LISTERS\n                        The comma-separated list of class listers to exclude.\n                        (default: None)\n  -l {plugins,pipeline,custom-class-listers,env-class-listers,readers,direct-readers,filters,writers,direct-writers,generators,cleaners}, --list {plugins,pipeline,custom-class-listers,env-class-listers,readers,direct-readers,filters,writers,direct-writers,generators,cleaners}\n                        For outputting various lists on stdout. (default:\n                        None)\n```\n\n### Testing generators\n\n```\nusage: sdc-test-generator [-h] -g GENERATOR\n                          [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]\n\nTool for testing generators by outputting the generated variables and their\nassociated values. Available generators: dirs, list, null, range\n\noptions:\n  -h, --help            show this help message and exit\n  -g GENERATOR, --generator GENERATOR\n                        The generator plugin to use. (default: None)\n  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}\n                        The logging level to use. (default: WARN)\n```\n\n\n## Plugins\n\nYou can find help screens for the plugins here:\n\n* [Pipeline plugins](plugins/README.md) (readers/filters/writers)\n* [Generator plugins](generators/README.md) (used by `sdc-exec`)\n* [Cleaner plugins](cleaners/README.md) (used by `apply-cleaner` filter)\n\n\n## Class listers\n\nThe *spectral-data-converter* uses the *class lister registry* provided \nby the [seppl](https://github.com/waikato-datamining/seppl) library.\n\nEach module defines a function, typically called `list_classes` that returns\na dictionary of names of superclasses associated with a list of modules that\nshould be scanned for derived classes. Here is an example:\n\n```python\nfrom typing import List, Dict\n\n\ndef list_classes() -\u003e Dict[str, List[str]]:\n    return {\n        \"seppl.io.Reader\": [\n            \"mod.ule1\",\n            \"mod.ule2\",\n        ],\n        \"seppl.io.Filter\": [\n            \"mod.ule3\",\n            \"mod.ule4\",\n        ],\n        \"seppl.io.Writer\": [\n            \"mod.ule5\",\n        ],\n    }\n```\n\nSuch a class lister gets referenced in the `entry_points` section of the `setup.py` file:\n\n```python\n    entry_points={\n        \"class_lister\": [\n            \"unique_string=module_name:function_name\",\n        ],\n    },\n```\n\n`:function_name` can be omitted if `:list_classes`.\n\nThe following environment variables can be used to influence the class listers:\n\n* `SDC_CLASS_LISTERS`\n* `SDC_CLASS_LISTERS_EXCL`\n* `SDC_CLASS_LISTERS_IGNORED` - for class listers that provide ignored classes\n\nEach variable is a comma-separated list of `module_name:function_name`, defining the class listers.\n\n\n## Caching plugins\n\nIn order to speed up plugin discovery, they discovered plugins can be cached\non disk after the initial discovery. Installing additional plugins after\nthe cache has been initialized will not make them visible, the cache will\nrequire resetting first.\n\nThe cache can be managed through the following environment variable:\n\n```\nSDC_CLASS_CACHE\n```\n\nIt supports the following options:\n\n* `off`: disables the cache\n* `on`: enables the cache\n* `reset`: resets the cached plugins first and enables the cache\n\n## Additional libraries\n\n* [Scikit-learn](https://github.com/waikato-datamining/image-dataset-converter-sklearn)\n* [Visualizations](https://github.com/waikato-datamining/image-dataset-converter-vis)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwaikato-datamining%2Fspectral-data-converter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwaikato-datamining%2Fspectral-data-converter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwaikato-datamining%2Fspectral-data-converter/lists"}