{"id":19031417,"url":"https://github.com/plandes/datdesc","last_synced_at":"2025-09-04T12:33:50.506Z","repository":{"id":173454039,"uuid":"650795311","full_name":"plandes/datdesc","owner":"plandes","description":"Describe and optimize data","archived":false,"fork":false,"pushed_at":"2025-04-11T23:43:43.000Z","size":321,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-12T00:07:21.526Z","etag":null,"topics":["data","hyperparameter-optimization","hyperparameter-tuning","latex","table"],"latest_commit_sha":null,"homepage":"https://plandes.github.io/datdesc/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/plandes.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-07T20:32:06.000Z","updated_at":"2025-04-11T23:41:46.000Z","dependencies_parsed_at":null,"dependency_job_id":"cb8553ab-3fca-44b9-aa2c-d9bd434cb4cb","html_url":"https://github.com/plandes/datdesc","commit_stats":null,"previous_names":["plandes/datdesc"],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plandes%2Fdatdesc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plandes%2Fdatdesc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plandes%2Fdatdesc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plandes%2Fdatdesc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/plandes","download_url":"https://codeload.github.com/plandes/datdesc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249341834,"owners_count":21254195,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","hyperparameter-optimization","hyperparameter-tuning","latex","table"],"created_at":"2024-11-08T21:23:19.347Z","updated_at":"2025-09-04T12:33:50.476Z","avatar_url":"https://github.com/plandes.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Describe and optimize data\n\n[![PyPI][pypi-badge]][pypi-link]\n[![Python 3.11][python311-badge]][python311-link]\n[![Python 3.12][python311-badge]][python312-link]\n[![Build Status][build-badge]][build-link]\n\nIn this package, Pythonic objects are used to easily (un)serialize to create\nLaTeX tables, figures and Excel files.  The API and command-line program\ndescribes data in tables with metadata and using YAML and CSV files and\nintegrates with [Pandas].  The paths to the CSV files to create tables from and\ntheir metadata is given as a YAML configuration file.\n\nFeatures:\n* Create LaTeX tables (with captions) and Excel files (with notes) of tabular\n  metadata from CSV files.\n* Create LaTeX friendly encapsulated postscript (`.eps`) files from CSV files.\n* Data and metadata is viewable in a nice format with paging in a web browser\n  using the [Render program].\n* Usable as an API during data collection for research projects.\n\n\n\u003c!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc --\u003e\n## Table of Contents\n\n- [Documentation](#documentation)\n- [Obtaining](#obtaining)\n- [Usage](#usage)\n    - [Tables](#tables)\n    - [Figures](#figures)\n    - [Hyperparameters](#hyperparameters)\n- [Changelog](#changelog)\n- [Community](#community)\n- [License](#license)\n\n\u003c!-- markdown-toc end --\u003e\n\n\n## Documentation\n\nSee the [full documentation](https://plandes.github.io/datdesc/index.html).\nThe [API reference](https://plandes.github.io/datdesc/api.html) is also\navailable.\n\n\n## Obtaining\n\nThe library can be installed with pip from the [pypi] repository:\n```bash\npip3 install zensols.datdesc\n```\n\nBinaries are also available on [pypi].\n\n\n## Usage\n\nThe library can be used as a Python API to programmatically create tables,\nfigures, and/or represent tabular data.  However, it also has a very robust\ncommand-line that is intended by be used by [GNU make].  The command-line can\nbe used to create on the fly LaTeX `.sty` files that are generated as commands\nand figures are generated as Encapsulated Postscript (`.eps`) files.\n\nThe YAML file format is used to create both tables and figures.  Parameters are\nboth files or both directories when using directories, only files that match\n`*-table.yml` are considered on the command line.  In addition, the described\ndata can be hyperparameter metadata, which can be optimized with the\n[hyperparameter module](#hyperparameters).\n\n\n### Tables\n\nFirst create the table's configuration file.  For example, to create a Latex\n`.sty` file from the CSV file `test-resources/section-id.csv` using the first\ncolumn as the index (makes that column go away) using a variable size and\nplacement, use:\n```yaml\nintercodertab:\n  type: one_column\n  path: test-resources/section-id.csv\n  caption: \u003e-\n    Krippendorff’s ...\n  single_column: true\n  uses: zentable\n  read_params:\n    index_col: 0\n  tabulate_params:\n    disable_numparse: true\n  replace_nan: ' '\n  blank_columns: [0]\n  bold_cells: [[0, 0], [1, 0], [2, 0], [3, 0]]\n```\n\nSome of these fields include:\n\n* **index_col**: clears column 0 and\n* **bold_cells**: make certain cells bold\n* **disable_numparse** tells the `tabulate` module not reformat numbers\n\nSee the [Table] class for a full listing of options.\n\n\n### Figures\n\nFigures can be generated in any format supported by [matplotlib] (namely\n`.eps`, `.svg`, and `.pdf`).  Figures are configured in a very similar fashion\nto [tables](#tables).  The configuration also points to a CSV file, but\ndescribes the plot.\n\nThe primary difference is that the YAML is parsed using the [Zensols parsing\nrules] so the string `path: target` will be given to a new [Plot] instance as a\n[pathlib.Path].\n\nA bar plot is configured below:\n```yaml\nirisFig:\n  image_dir: 'path: target'\n  seaborn:\n    style:\n      style: darkgrid\n      rc:\n        axes.facecolor: 'str: .9'\n    context:\n      context: 'paper'\n      font_scale: 1.3\n  plots:\n    - type: bar\n      data: 'dataframe: test-resources/fig/iris.csv'\n      title: 'Iris Splits'\n      x_column_name: ds_type\n      y_column_name: count\n      code: |\n        df = df.groupby('ds_type').agg({'ds_type': 'count'}).\\\n          rename(columns={'ds_type': 'count'}).reset_index()\n```\nThis configuration meaning:\n* The top level `irisFig` creates a [Figure] instance, and when used with the\n  command line, outputs this root level string as the name in the `image_dir`\n  directory.\n* The `image_dir` tells where to write the image.  This should be left out when\n  invoking from the command-line to allow it to decide where to write the file.\n* The `seaborn` section configures the [seaborn] module.\n* The plots are a *list* of [Plot] instances that, like the [Figure] level, are\n  populated with all the values.\n* The `code` (optionally) allows the massaging of the [Pandas] dataframe\n  (pointed to by `data`).  This feature also exists for [Table].\n\nSee the [Figure] and [Plot] classes for a full listing of options.\n\n\n\n### Hyperparameters\n\nHyperparameter metadata is largely isomorphic to `datdesc` tables.  This\npackage was designed for the following purposes:\n\n* Provide a basic scaffolding to update model hyperparameters such as\n  [hyperopt].\n* Generate LaTeX tables of the hyperparamers and their descriptions for\n  academic papers.\n\nAccess to the hyperparameters via the API is done by calling the *set* or\n*model* levels with a *dotted path notation* string.  For example, `svm.C`\nfirst navigates to model `svm`, then to the hyperparameter named `C`.\n\nA command line access to create LaTeX tables from the hyperparameter\ndefinitions is available with the `hyper` action.  An example of a\nhyperparameter set (a grouping of models that in turn have hyperparameters)\nfollows:\n```yaml\nsvm:\n  doc: 'support vector machine'\n  params:\n    kernel:\n      type: choice\n      choices: [radial, linear]\n      doc: 'maps the observations into some feature space'\n    C:\n      type: float\n      doc: 'regularization parameter'\n    max_iter:\n      type: int\n      doc: 'number of iterations'\n      value: 20\n      interval: [1, 30]\n```\nIn the example, the `svm` model has hyperparameters `kernel`, `C` and\n`max_iter`.  The `kernel` type is set as a choice, which is a string that has\nthe constraints of matching a string in the list.  The `C` hyperparameter is a\nfloating point number, and the `max_iter` is an integer that must be between 1\nand 30.\n\nIn this next example, the `k_means` model uses the string `k-means` in human\nreadable documentation, which can be Python generated code in a `dataclass`.\n```yaml\nk_means:\n  desc: k-means\n  doc: 'k-means clustering'\n  params:\n    n_clusters:\n      type: int\n      doc: 'number of clusters'\n    copy_x:\n      type: bool\n      value: True\n      doc: 'When pre-computing distances it is more numerically accurate to center the data first'\n    strata:\n      type: list\n      doc: 'An array of stratified hyperparameters (made up for test cases).'\n      value: [1, 2]\n    kwargs:\n      type: dict\n      doc: 'Model keyword arguments (made up for test cases).'\n      value:\n        learning_rate: 0.01\n        epochs: 3\n```\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## Community\n\nPlease star this repository and let me know how and where you use this API.\nContributions as pull requests, feedback and any input is welcome.\n\n\n## License\n\n[MIT License](LICENSE.md)\n\nCopyright (c) 2023 - 2025 Paul Landes\n\n\n\u003c!-- links --\u003e\n[pypi]: https://pypi.org/project/zensols.datdesc/\n[pypi-link]: https://pypi.python.org/pypi/zensols.datdesc\n[pypi-badge]: https://img.shields.io/pypi/v/zensols.datdesc.svg\n[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg\n[python311-link]: https://www.python.org/downloads/release/python-3110\n[python312-badge]: https://img.shields.io/badge/python-3.12-blue.svg\n[python312-link]: https://www.python.org/downloads/release/python-3120\n[build-badge]: https://github.com/plandes/datdesc/workflows/CI/badge.svg\n[build-link]: https://github.com/plandes/datdesc/actions\n\n[GNU make]: https://www.gnu.org/software/make/\n[matplotlib]: https://matplotlib.org\n[seaborn]: http://seaborn.pydata.org\n[hyperopt]: http://hyperopt.github.io/hyperopt/\n[pathlib.Path]: https://docs.python.org/3/library/pathlib.html\n[Pandas]: https://pandas.pydata.org\n\n[Zensols parsing rules]: https://plandes.github.io/util/doc/config.html#parsing\n[Render program]: https://github.com/plandes/rend\n\n[Table]: api/zensols.datdesc.html#zensols.datdesc.table.Table\n[Figure]: api/zensols.datdesc.html#zensols.datdesc.figure.Figure\n[Plot]: api/zensols.datdesc.html#zensols.datdesc.figure.Plot\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplandes%2Fdatdesc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fplandes%2Fdatdesc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplandes%2Fdatdesc/lists"}