{"id":13532962,"url":"https://github.com/polyaxon/traceml","last_synced_at":"2025-12-12T00:43:56.536Z","repository":{"id":37255359,"uuid":"54749857","full_name":"polyaxon/traceml","owner":"polyaxon","description":"Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.","archived":false,"fork":false,"pushed_at":"2025-04-21T20:29:35.000Z","size":123421,"stargazers_count":515,"open_issues_count":6,"forks_count":44,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-04-21T21:35:20.256Z","etag":null,"topics":["dask","data-exploration","data-profiling","data-quality","data-quality-checks","data-science","data-visualization","dataframes","dataops","explainable-ai","matplotlib","mlops","pandas","pandas-summary","plotly","pytorch","spark","statistics","tensorflow","tracking"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/polyaxon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2016-03-25T21:59:32.000Z","updated_at":"2025-04-21T20:29:34.000Z","dependencies_parsed_at":"2023-10-17T12:16:10.700Z","dependency_job_id":"21d5448e-bd1d-4549-9b4a-a02fb8fab4bc","html_url":"https://github.com/polyaxon/traceml","commit_stats":{"total_commits":9697,"total_committers":103,"mean_commits":94.14563106796116,"dds":"0.11127152727647727","last_synced_commit":"7eb446d713a2dafb664ff95b38ed0d0c1244ab82"},"previous_names":["mouradmourafiq/pandas-summary","polyaxon/datatile"],"tags_count":58,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/polyaxon%2Ftraceml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/polyaxon%2Ftraceml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/polyaxon%2Ftraceml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/polyaxon%2Ftraceml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/polyaxon","download_url":"https://codeload.github.com/polyaxon/traceml/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254101557,"owners_count":22014908,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dask","data-exploration","data-profiling","data-quality","data-quality-checks","data-science","data-visualization","dataframes","dataops","explainable-ai","matplotlib","mlops","pandas","pandas-summary","plotly","pytorch","spark","statistics","tensorflow","tracking"],"created_at":"2024-08-01T07:01:15.379Z","updated_at":"2025-12-12T00:43:56.504Z","avatar_url":"https://github.com/polyaxon.png","language":"Python","readme":"[![License: Apache 2](https://img.shields.io/badge/License-apache2-green.svg)](LICENSE)\n[![TraceML](https://github.com/polyaxon/traceml/actions/workflows/traceml.yml/badge.svg)](https://github.com/polyaxon/traceml/actions/workflows/traceml.yml)\n[![Slack](https://img.shields.io/badge/chat-on%20slack-aadada.svg?logo=slack\u0026longCache=true)](https://polyaxon.com/slack/)\n[![Docs](https://img.shields.io/badge/docs-stable-brightgreen.svg?style=flat)](https://polyaxon.com/docs/)\n[![GitHub](https://img.shields.io/badge/issue_tracker-github-blue?logo=github)](https://github.com/polyaxon/polyaxon/issues)\n[![GitHub](https://img.shields.io/badge/roadmap-github-blue?logo=github)](https://github.com/polyaxon/polyaxon/milestones)\n\n\u003ca href=\"https://polyaxon.com\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/polyaxon/polyaxon/master/artifacts/packages/traceml.svg\" width=\"125\" height=\"125\" align=\"right\" /\u003e\u003c/a\u003e\n\n# TraceML\n\nEngine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.\n\n## Install\n\n```bash\npip install traceml\n```\n\nIf you would like to use the tracking features, you need to install `polyaxon` as well:\n\n```bash\npip install polyaxon traceml\n```\n\n## [WIP] Local sandbox\n\n\u003e Coming soon\n\n## Offline usage\n\nYou can enable the offline mode to track runs without an  API:\n\n```bash\nexport POLYAXON_OFFLINE=\"true\"\n```\n\nOr passing the offline flag\n\n```python\nfrom traceml import tracking\n\ntracking.init(..., is_offline=True, ...)\n```\n\n## Simple usage in a Python script\n\n```python\nimport random\n\nimport traceml as tracking\n\ntracking.init(\n    is_offline=True,\n    project='quick-start',\n    name=\"my-new-run\",\n    description=\"trying TraceML\",\n    tags=[\"examples\"],\n    artifacts_path=\"path/to/artifacts/repo\"\n)\n\n# Tracking some data refs\ntracking.log_data_ref(content=X_train, name='x_train')\ntracking.log_data_ref(content=y_train, name='y_train')\n\n# Tracking inputs\ntracking.log_inputs(\n    batch_size=64,\n    dropout=0.2,\n    learning_rate=0.001,\n    optimizer=\"Adam\"\n)\n\ndef get_loss(step):\n    result = 10 / (step + 1)\n    noise = (random.random() - 0.5) * 0.5 * result\n    return result + noise\n\n# Track metrics\nfor step in range(100):\n    loss = get_loss(step)\n    tracking.log_metrics(\n    loss=loss,\n    accuracy=(100 - loss) / 100.0,\n)\n\n# Track some one time results\ntracking.log_outputs(validation_score=0.66)\n\n# Optionally manually stop the tracking process\ntracking.stop()\n```\n\n## Integration with deep learning and machine learning libraries and frameworks\n\n### Keras\n\nYou can use TraceML's callback to automatically save all metrics and collect outputs and models, you can also track additional information using the logging methods:\n\n```python\nfrom traceml import tracking\nfrom traceml.integrations.keras import Callback\n\ntracking.init(\n    is_offline=True,\n    project='tracking-project',\n    name=\"keras-run\",\n    description=\"trying TraceML \u0026 Keras\",\n    tags=[\"examples\"],\n    artifacts_path=\"path/to/artifacts/repo\"\n)\n\ntracking.log_inputs(\n    batch_size=64,\n    dropout=0.2,\n    learning_rate=0.001,\n    optimizer=\"Adam\"\n)\ntracking.log_data_ref(content=x_train, name='x_train')\ntracking.log_data_ref(content=y_train, name='y_train')\ntracking.log_data_ref(content=x_test, name='x_test')\ntracking.log_data_ref(content=y_test, name='y_test')\n\n# ...\n\nmodel.fit(\n    x_train,\n    y_train,\n    validation_data=(X_test, y_test),\n    epochs=epochs,\n    batch_size=100,\n    callbacks=[Callback()],\n)\n```\n\n### PyTorch\n\nYou can log metrics, inputs, and outputs of Pytorch experiments using the tracking module:\n\n```python\nfrom traceml import tracking\n\ntracking.init(\n    is_offline=True,\n    project='tracking-project',\n    name=\"pytorch-run\",\n    description=\"trying TraceML \u0026 PyTorch\",\n    tags=[\"examples\"],\n    artifacts_path=\"path/to/artifacts/repo\"\n)\n\ntracking.log_inputs(\n    batch_size=64,\n    dropout=0.2,\n    learning_rate=0.001,\n    optimizer=\"Adam\"\n)\n\n# Metrics\nfor batch_idx, (data, target) in enumerate(train_loader):\n    output = model(data)\n    loss = F.nll_loss(output, target)\n    loss.backward()\n    optimizer.step()\n    tracking.log_metrics(loss=loss)\n\nasset_path = tracking.get_outputs_path('model.ckpt')\ntorch.save(model.state_dict(), asset_path)\n\n# log model\ntracking.log_artifact_ref(asset_path, framework=\"pytorch\", ...)\n```\n\n### Tensorflow\n\nYou can log metrics, outputs, and models of Tensorflow experiments and distributed Tensorflow experiments using the tracking module:\n\n```python\nfrom traceml import tracking\nfrom traceml.integrations.tensorflow import Callback\n\ntracking.init(\n    is_offline=True,\n    project='tracking-project',\n    name=\"tf-run\",\n    description=\"trying TraceML \u0026 Tensorflow\",\n    tags=[\"examples\"],\n    artifacts_path=\"path/to/artifacts/repo\"\n)\n\ntracking.log_inputs(\n    batch_size=64,\n    dropout=0.2,\n    learning_rate=0.001,\n    optimizer=\"Adam\"\n)\n\n# log model\nestimator.train(hooks=[Callback(log_image=True, log_histo=True, log_tensor=True)])\n```\n\n### Fastai\n\nYou can log metrics, outputs, and models of Fastai experiments using the tracking module:\n\n```python\nfrom traceml import tracking\nfrom traceml.integrations.fastai import Callback\n\ntracking.init(\n    is_offline=True,\n    project='tracking-project',\n    name=\"fastai-run\",\n    description=\"trying TraceML \u0026 Fastai\",\n    tags=[\"examples\"],\n    artifacts_path=\"path/to/artifacts/repo\"\n)\n\n# Log model metrics\nlearn.fit(..., cbs=[Callback()])\n```\n\n### Pytorch Lightning\n\nYou can log metrics, outputs, and models of Pytorch Lightning experiments using the tracking module:\n\n```python\nfrom traceml import tracking\nfrom traceml.integrations.pytorch_lightning import Callback\n\ntracking.init(\n    is_offline=True,\n    project='tracking-project',\n    name=\"pytorch-lightning-run\",\n    description=\"trying TraceML \u0026 Lightning\",\n    tags=[\"examples\"],\n    artifacts_path=\"path/to/artifacts/repo\"\n)\n\n...\ntrainer = pl.Trainer(\n    gpus=0,\n    progress_bar_refresh_rate=20,\n    max_epochs=2,\n    logger=Callback(),\n)\n```\n\n### HuggingFace\n\nYou can log metrics, outputs, and models of HuggingFace experiments using the tracking module:\n\n```python\nfrom traceml import tracking\nfrom traceml.integrations.hugging_face import Callback\n\ntracking.init(\n    is_offline=True,\n    project='tracking-project',\n    name=\"hg-run\",\n    description=\"trying TraceML \u0026 HuggingFace\",\n    tags=[\"examples\"],\n    artifacts_path=\"path/to/artifacts/repo\"\n)\n\n...\ntrainer = Trainer(\n    model=model,\n    args=training_args,\n    train_dataset=train_dataset if training_args.do_train else None,\n    eval_dataset=eval_dataset if training_args.do_eval else None,\n    callbacks=[Callback],\n    # ...\n)\n```\n\n## Tracking artifacts\n\n```python\nimport altair as alt\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport plotly.express as px\nfrom bokeh.plotting import figure\nfrom vega_datasets import data\n\nfrom traceml import tracking\n\n\ndef plot_mpl_figure(step):\n    np.random.seed(19680801)\n    data = np.random.randn(2, 100)\n\n    figure, axs = plt.subplots(2, 2, figsize=(5, 5))\n    axs[0, 0].hist(data[0])\n    axs[1, 0].scatter(data[0], data[1])\n    axs[0, 1].plot(data[0], data[1])\n    axs[1, 1].hist2d(data[0], data[1])\n\n    tracking.log_mpl_image(figure, 'mpl_image', step=step)\n\n\ndef log_bokeh(step):\n    factors = [\"a\", \"b\", \"c\", \"d\", \"e\", \"f\", \"g\", \"h\"]\n    x = [50, 40, 65, 10, 25, 37, 80, 60]\n\n    dot = figure(title=\"Categorical Dot Plot\", tools=\"\", toolbar_location=None,\n                 y_range=factors, x_range=[0, 100])\n\n    dot.segment(0, factors, x, factors, line_width=2, line_color=\"green\", )\n    dot.circle(x, factors, size=15, fill_color=\"orange\", line_color=\"green\", line_width=3, )\n\n    factors = [\"foo 123\", \"bar:0.2\", \"baz-10\"]\n    x = [\"foo 123\", \"foo 123\", \"foo 123\", \"bar:0.2\", \"bar:0.2\", \"bar:0.2\", \"baz-10\", \"baz-10\",\n         \"baz-10\"]\n    y = [\"foo 123\", \"bar:0.2\", \"baz-10\", \"foo 123\", \"bar:0.2\", \"baz-10\", \"foo 123\", \"bar:0.2\",\n         \"baz-10\"]\n    colors = [\n        \"#0B486B\", \"#79BD9A\", \"#CFF09E\",\n        \"#79BD9A\", \"#0B486B\", \"#79BD9A\",\n        \"#CFF09E\", \"#79BD9A\", \"#0B486B\"\n    ]\n\n    hm = figure(title=\"Categorical Heatmap\", tools=\"hover\", toolbar_location=None,\n                x_range=factors, y_range=factors)\n\n    hm.rect(x, y, color=colors, width=1, height=1)\n\n    tracking.log_bokeh_chart(name='confusion-bokeh', figure=hm, step=step)\n\n\ndef log_altair(step):\n    source = data.cars()\n\n    brush = alt.selection(type='interval')\n\n    points = alt.Chart(source).mark_point().encode(\n        x='Horsepower:Q',\n        y='Miles_per_Gallon:Q',\n        color=alt.condition(brush, 'Origin:N', alt.value('lightgray'))\n    ).add_selection(\n        brush\n    )\n\n    bars = alt.Chart(source).mark_bar().encode(\n        y='Origin:N',\n        color='Origin:N',\n        x='count(Origin):Q'\n    ).transform_filter(\n        brush\n    )\n\n    chart = points \u0026 bars\n\n    tracking.log_altair_chart(name='altair_chart', figure=chart, step=step)\n\n\ndef log_plotly(step):\n    df = px.data.tips()\n\n    fig = px.density_heatmap(df, x=\"total_bill\", y=\"tip\", facet_row=\"sex\", facet_col=\"smoker\")\n    tracking.log_plotly_chart(name=\"2d-hist\", figure=fig, step=step)\n\n\nplot_mpl_figure(100)\nlog_bokeh(100)\nlog_altair(100)\nlog_plotly(100)\n```\n\n## Tracking DataFrames\n\n### Summary\n\nAn extension to [pandas](http://pandas.pydata.org/) dataframes describe function.\n\nThe module contains `DataFrameSummary` object that extend `describe()` with:\n\n- **properties**\n  - dfs.columns_stats: counts, uniques, missing, missing_perc, and type per column\n  - dsf.columns_types: a count of the types of columns\n  - dfs[column]: more in depth summary of the column\n- **function**\n  - summary(): extends the `describe()` function with the values with `columns_stats`\n\nThe `DataFrameSummary` expect a pandas `DataFrame` to summarise.\n\n```python\nfrom traceml.summary.df import DataFrameSummary\n\ndfs = DataFrameSummary(df)\n```\n\ngetting the columns types\n\n```python\ndfs.columns_types\n\n\nnumeric     9\nbool        3\ncategorical 2\nunique      1\ndate        1\nconstant    1\ndtype: int64\n```\n\ngetting the columns stats\n\n```python\ndfs.columns_stats\n\n\n                      A            B        C              D              E\ncounts             5802         5794     5781           5781           4617\nuniques            5802            3     5771            128            121\nmissing               0            8       21             21           1185\nmissing_perc         0%        0.14%    0.36%          0.36%         20.42%\ntypes            unique  categorical  numeric        numeric        numeric\n```\n\ngetting a single column summary, e.g. numerical column\n\n```python\n# we can also access the column using numbers A[1]\ndfs['A']\n\nstd                                                                 0.2827146\nmax                                                                  1.072792\nmin                                                                         0\nvariance                                                           0.07992753\nmean                                                                0.5548516\n5%                                                                  0.1603367\n25%                                                                 0.3199776\n50%                                                                 0.4968588\n75%                                                                 0.8274732\n95%                                                                  1.011255\niqr                                                                 0.5074956\nkurtosis                                                            -1.208469\nskewness                                                            0.2679559\nsum                                                                  3207.597\nmad                                                                 0.2459508\ncv                                                                  0.5095319\nzeros_num                                                                  11\nzeros_perc                                                               0,1%\ndeviating_of_mean                                                          21\ndeviating_of_mean_perc                                                  0.36%\ndeviating_of_median                                                        21\ndeviating_of_median_perc                                                0.36%\ntop_correlations                         {u'D': 0.702240243124, u'E': -0.663}\ncounts                                                                   5781\nuniques                                                                  5771\nmissing                                                                    21\nmissing_perc                                                            0.36%\ntypes                                                                 numeric\nName: A, dtype: object\n```\n\n### [WIP] Summaries\n\n * [ ] Add summary analysis between columns, i.e. `dfs[[1, 2]]`\n\n### [WIP] Visualizations\n\n * [ ] Add summary visualization with matplotlib.\n * [ ] Add summary visualization with plotly.\n * [ ] Add summary visualization with altair.\n * [ ] Add predefined profiling.\n\n\n### [WIP] Catalog and Versions\n\n * [ ] Add possibility to persist summary and link to a specific version.\n * [ ] Integrate with quality libraries.\n","funding_links":[],"categories":["Python","其他_机器学习与深度学习","Data Containers \u0026 Dataframes"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpolyaxon%2Ftraceml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpolyaxon%2Ftraceml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpolyaxon%2Ftraceml/lists"}