{"id":15014037,"url":"https://github.com/explosion/spacy-loggers","last_synced_at":"2025-04-09T20:08:09.485Z","repository":{"id":39579009,"uuid":"406155480","full_name":"explosion/spacy-loggers","owner":"explosion","description":"📟 Logging utilities for spaCy","archived":false,"fork":false,"pushed_at":"2023-11-03T14:50:29.000Z","size":49,"stargazers_count":12,"open_issues_count":1,"forks_count":17,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-09T20:08:05.100Z","etag":null,"topics":["logging","machine-learning","natural-language-processing","nlp","python","spacy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/explosion.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-09-13T23:08:57.000Z","updated_at":"2024-05-06T21:46:11.000Z","dependencies_parsed_at":"2023-02-02T09:15:35.566Z","dependency_job_id":"0ff6b60e-9c0e-470b-ba11-48c96fca9547","html_url":"https://github.com/explosion/spacy-loggers","commit_stats":{"total_commits":55,"total_committers":10,"mean_commits":5.5,"dds":0.6727272727272727,"last_synced_commit":"410b184d203b93d5f5ff5ea31fa10793341527bb"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fspacy-loggers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fspacy-loggers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fspacy-loggers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fspacy-loggers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/explosion","download_url":"https://codeload.github.com/explosion/spacy-loggers/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248103872,"owners_count":21048245,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["logging","machine-learning","natural-language-processing","nlp","python","spacy"],"created_at":"2024-09-24T19:45:05.757Z","updated_at":"2025-04-09T20:08:09.456Z","avatar_url":"https://github.com/explosion.png","language":"Python","readme":"\u003ca href=\"https://explosion.ai\"\u003e\u003cimg src=\"https://explosion.ai/assets/img/logo.svg\" width=\"125\" height=\"125\" align=\"right\" /\u003e\u003c/a\u003e\n\n# spacy-loggers: Logging utilities for spaCy\n\n[![PyPi Version](https://img.shields.io/pypi/v/spacy-loggers.svg?style=flat-square\u0026logo=pypi\u0026logoColor=white)](https://pypi.python.org/pypi/spacy-loggers)\n\nStarting with spaCy v3.2, alternate loggers are moved into a separate package\nso that they can be added and updated independently from the core spaCy\nlibrary.\n\n`spacy-loggers` currently provides loggers for:\n\n- [Weights \u0026 Biases](https://www.wandb.com)\n- [MLflow](https://www.mlflow.org/)\n- [ClearML](https://www.clear.ml/)\n- [PyTorch](https://pytorch.org/)\n- [CuPy](https://github.com/cupy/cupy)\n\n`spacy-loggers` also provides additional utility loggers to facilitate interoperation\nbetween individual loggers.\n\nIf you'd like to add a new logger or logging option, please submit a PR to this\nrepo!\n\n## Setup and installation\n\n`spacy-loggers` should be installed automatically with spaCy v3.2+, so you\nusually don't need to install it separately. You can install it with `pip` or\nfrom the conda channel `conda-forge`:\n\n```bash\npip install spacy-loggers\n```\n\n```bash\nconda install -c conda-forge spacy-loggers\n```\n\n# Loggers\n\n## WandbLogger\n\n### Installation\n\nThis logger requires `wandb` to be installed and configured:\n\n```bash\npip install wandb\nwandb login\n```\n\n### Usage\n\n`spacy.WandbLogger.v5` is a logger that sends the results of each training step\nto the dashboard of the [Weights \u0026 Biases](https://www.wandb.com/) tool. To use\nthis logger, Weights \u0026 Biases should be installed, and you should be logged in.\nThe logger will send the full config file to W\u0026B, as well as various system\ninformation such as memory utilization, network traffic, disk IO, GPU\nstatistics, etc. This will also include information such as your hostname and\noperating system, as well as the location of your Python executable.\n\n`spacy.WandbLogger.v4` and below automatically call the [default console logger](https://spacy.io/api/top-level#ConsoleLogger).\nHowever, starting with `spacy.WandbLogger.v5`, console logging must be activated\nthrough the use of the [ChainLogger](#chainlogger). This allows the user to configure\nthe console logger's parameters according to their preferences.\n\n**Note** that by default, the full (interpolated)\n[training config](https://spacy.io/usage/training#config) is sent over to the\nW\u0026B dashboard. If you prefer to **exclude certain information** such as path\nnames, you can list those fields in \"dot notation\" in the\n`remove_config_values` parameter. These fields will then be removed from the\nconfig before uploading, but will otherwise remain in the config file stored\non your local system.\n\n### Example config\n\n```ini\n[training.logger]\n@loggers = \"spacy.WandbLogger.v5\"\nproject_name = \"monitor_spacy_training\"\nremove_config_values = [\"paths.train\", \"paths.dev\", \"corpora.train.path\", \"corpora.dev.path\"]\nlog_dataset_dir = \"corpus\"\nmodel_log_interval = 1000\n```\n\n| Name                   | Type                  | Description                                                                                                                                                                                                                      |\n| ---------------------- | --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `project_name`         | `str`                 | The name of the project in the Weights \u0026 Biases interface. The project will be created automatically if it doesn't exist yet.                                                                                                    |\n| `remove_config_values` | `List[str]`           | A list of values to exclude from the config before it is uploaded to W\u0026B (default: `[]`).                                                                                                                                        |\n| `model_log_interval`   | `Optional[int]`       | Steps to wait between logging model checkpoints to the W\u0026B dasboard (default: `None`). Added in `spacy.WandbLogger.v2`.                                                                                                          |\n| `log_dataset_dir`      | `Optional[str]`       | Directory containing the dataset to be logged and versioned as a W\u0026B artifact (default: `None`). Added in `spacy.WandbLogger.v2`.                                                                                                |\n| `entity`               | `Optional[str]`       | An entity is a username or team name where you're sending runs. If you don't specify an entity, the run will be sent to your default entity, which is usually your username (default: `None`). Added in `spacy.WandbLogger.v3`.  |\n| `run_name`             | `Optional[str]`       | The name of the run. If you don't specify a run name, the name will be created by the `wandb` library (default: `None`). Added in `spacy.WandbLogger.v3`.                                                                        |\n| `log_best_dir`         | `Optional[str]`       | Directory containing the best trained model as saved by spaCy (by default in `training/model-best`), to be logged and versioned as a W\u0026B artifact (default: `None`). Added in `spacy.WandbLogger.v4`.                            |\n| `log_latest_dir`       | `Optional[str]`       | Directory containing the latest trained model as saved by spaCy (by default in `training/model-latest`), to be logged and versioned as a W\u0026B artifact (default: `None`). Added in `spacy.WandbLogger.v4`.                        |\n| `log_custom_stats`     | `Optional[List[str]]` | A list of regular expressions that will be applied to the info dictionary passed to the logger (default: `None`). Statistics and metrics that match these regexps will be automatically logged. Added in `spacy.WandbLogger.v5`. |\n\n## MLflowLogger\n\n### Installation\n\nThis logger requires `mlflow` to be installed and configured:\n\n```bash\npip install mlflow\n```\n\n### Usage\n\n`spacy.MLflowLogger.v2` is a logger that tracks the results of each training step\nusing the [MLflow](https://www.mlflow.org/) tool. To use\nthis logger, MLflow should be installed. At the beginning of each model training\noperation, the logger will initialize a new MLflow run and set it as the active\nrun under which metrics and parameters wil be logged. The logger will then log\nthe entire config file as parameters of the active run. After each training step,\nthe following actions are performed:\n\n- The final score is logged under the metric `score`.\n- Individual component scores are logged under their default names.\n- Loss values of different components are logged with the `loss_` prefix.\n- If the final score is higher than the previous best score (for the current run),\n  the model artifact is additionally uploaded to MLflow. This action is only performed\n  if the `output_path` argument is provided during the training pipeline initialization phase.\n\nBy default, the tracking API writes data into files in a local `./mlruns` directory.\n\n`spacy.MLflowLogger.v1` and below automatically call the [default console logger](https://spacy.io/api/top-level#ConsoleLogger).\nHowever, starting with `spacy.MLflowLogger.v2`, console logging must be activated\nthrough the use of the [ChainLogger](#chainlogger). This allows the user to configure\nthe console logger's parameters according to their preferences.\n\n**Note** that by default, the full (interpolated)\n[training config](https://spacy.io/usage/training#config) is sent over to\nMLflow. If you prefer to **exclude certain information** such as path\nnames, you can list those fields in \"dot notation\" in the\n`remove_config_values` parameter. These fields will then be removed from the\nconfig before uploading, but will otherwise remain in the config file stored\non your local system.\n\n### Example config\n\n```ini\n[training.logger]\n@loggers = \"spacy.MLflowLogger.v2\"\nexperiment_id = \"1\"\nrun_name = \"with_fast_alignments\"\nnested = False\nremove_config_values = [\"paths.train\", \"paths.dev\", \"corpora.train.path\", \"corpora.dev.path\"]\n```\n\n| Name                   | Type                       | Description                                                                                                                                                                                                                       |\n| ---------------------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `run_id`               | `Optional[str]`            | Unique ID of an existing MLflow run to which parameters and metrics are logged. Can be omitted if `experiment_id` and `run_id` are provided (default: `None`).                                                                    |\n| `experiment_id`        | `Optional[str]`            | ID of an existing experiment under which to create the current run. Only applicable when `run_id` is `None` (default: `None`).                                                                                                    |\n| `run_name`             | `Optional[str]`            | Name of new run. Only applicable when `run_id` is `None` (default: `None`).                                                                                                                                                       |\n| `nested`               | `bool`                     | Controls whether run is nested in parent run. `True` creates a nested run (default: `False`).                                                                                                                                     |\n| `tags`                 | `Optional[Dict[str, Any]]` | A dictionary of string keys and values to set as tags on the run. If a run is being resumed, these tags are set on the resumed run. If a new run is being created, these tags are set on the new run (default: `None`).           |\n| `remove_config_values` | `List[str]`                | A list of values to exclude from the config before it is uploaded to MLflow (default: `[]`).                                                                                                                                      |\n| `log_custom_stats`     | `Optional[List[str]]`      | A list of regular expressions that will be applied to the info dictionary passed to the logger (default: `None`). Statistics and metrics that match these regexps will be automatically logged. Added in `spacy.MLflowLogger.v2`. |\n\n## ClearMLLogger\n\n### Installation\n\nThis logger requires `clearml` to be installed and configured:\n\n```bash\npip install clearml\nclearml-init\n```\n\n### Usage\n\n`spacy.ClearMLLogger.v2` is a logger that tracks the results of each training step\nusing the [ClearML](https://www.clear.ml/) tool. To use\nthis logger, ClearML should be installed and you should have initialized (using the command above).\nThe logger will send all the gathered information to your ClearML server, either [the hosted free tier](https://app.clear.ml)\nor the open source [self-hosted server](https://github.com/allegroai/clearml-server). This logger captures the following information, all of which is visible in the ClearML web UI:\n\n- The full spaCy config file contents.\n- Code information such as git repository, commit ID and uncommitted changes.\n- Full console output.\n- Miscellaneous info such as time, python version and hardware information.\n- Output scalars:\n  - The final score is logged under the scalar `score`.\n  - Individual component scores are grouped together on one scalar plot (filterable using the web UI).\n  - Loss values of different components are logged with the `loss_` prefix.\n\nIn addition to the above, the following artifacts can also be optionally captured:\n\n- Best model directory (zipped).\n- Latest model directory (zipped).\n- Dataset used to train.\n  - Versioned using ClearML Data and linked to under Configuration -\u003e User Properties on the web UI.\n\n`spacy.ClearMLLogger.v1` and below automatically call the [default console logger](https://spacy.io/api/top-level#ConsoleLogger).\nHowever, starting with `spacy.ClearMLLogger.v2`, console logging must be activated\nthrough the use of the [ChainLogger](#chainlogger). This allows the user to configure\nthe console logger's parameters according to their preferences.\n\n**Note** that by default, the full (interpolated)\n[training config](https://spacy.io/usage/training#config) is sent over to\nClearML. If you prefer to **exclude certain information** such as path\nnames, you can list those fields in \"dot notation\" in the\n`remove_config_values` parameter. These fields will then be removed from the\nconfig before uploading, but will otherwise remain in the config file stored\non your local system.\n\n### Example config\n\n```ini\n[training.logger]\n@loggers = \"spacy.ClearMLLogger.v2\"\nproject_name = \"Hello ClearML!\"\ntask_name = \"My spaCy Task\"\nmodel_log_interval = 1000\nlog_best_dir = training/model-best\nlog_latest_dir = training/model-last\nlog_dataset_dir = corpus\nremove_config_values = [\"paths.train\", \"paths.dev\", \"corpora.train.path\", \"corpora.dev.path\"]\n```\n\n| Name                   | Type                  | Description                                                                                                                                                                                                                        |\n| ---------------------- | --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `project_name`         | `str`                 | The name of the project in the ClearML interface. The project will be created automatically if it doesn't exist yet.                                                                                                               |\n| `task_name`            | `str`                 | The name of the ClearML task. A task is an experiment that lives inside a project. Can be non-unique.                                                                                                                              |\n| `remove_config_values` | `List[str]`           | A list of values to exclude from the config before it is uploaded to ClearML (default: `[]`).                                                                                                                                      |\n| `model_log_interval`   | `Optional[int]`       | Steps to wait between logging model checkpoints to the ClearML dasboard (default: `None`). Will have no effect without also setting `log_best_dir` or `log_latest_dir`.                                                            |\n| `log_dataset_dir`      | `Optional[str]`       | Directory containing the dataset to be logged and versioned as a [ClearML Dataset](https://clear.ml/docs/latest/docs/clearml_data/clearml_data/) (default: `None`).                                                                |\n| `log_best_dir`         | `Optional[str]`       | Directory containing the best trained model as saved by spaCy (by default in `training/model-best`), to be logged and versioned as a ClearML artifact (default: `None`)                                                            |\n| `log_latest_dir`       | `Optional[str]`       | Directory containing the latest trained model as saved by spaCy (by default in `training/model-last`), to be logged and versioned as a ClearML artifact (default: `None`)                                                          |\n| `log_custom_stats`     | `Optional[List[str]]` | A list of regular expressions that will be applied to the info dictionary passed to the logger (default: `None`). Statistics and metrics that match these regexps will be automatically logged. Added in `spacy.ClearMLLogger.v2`. |\n\n## PyTorchLogger\n\n### Installation\n\nThis logger requires `torch` to be installed:\n\n```bash\npip install torch\n```\n\n### Usage\n\n`spacy.PyTorchLogger.v1` is different from the other loggers above in that it does not act as a bridge between spaCy and\nan external framework. Instead, it is used to query PyTorch-specific metrics and make them available to other loggers.\nTherefore, it's primarily intended to be used with [ChainLogger](#chainlogger).\n\nWhenever a logging checkpoint is reached, it queries statistics from the PyTorch backend and stores them in\nthe dictionary passed to it. Downstream loggers can thereafter lookup the statistics and log them to their\npreferred framework.\n\nThe following PyTorch statistics are currently supported:\n\n- [CUDA memory statistics](https://pytorch.org/docs/stable/generated/torch.cuda.memory_stats.html#torch.cuda.memory_stats)\n\n### Example config\n\n```ini\n[training.logger]\n@loggers = \"spacy.ChainLogger.v1\"\nlogger1 = {\"@loggers\": \"spacy.PyTorchLogger.v1\", \"prefix\": \"pytorch\", \"device\": \"0\", \"cuda_mem_metric\": \"current\"}\n# Alternatively, you can use any other logger that provides the `log_custom_stats` parameter.\nlogger2 = {\"@loggers\": \"spacy.LookupLogger.v1\", \"patterns\": [\"pytorch\"]}\n```\n\n| Name              | Type  | Description                                                                                                                                                     |\n| ----------------- | ----- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `prefix`          | `str` | All metric names are prefixed with this string using dot notation, e.g: `\u003cprefix\u003e.\u003cmetric\u003e` (default: `pytorch`).                                               |\n| `device`          | `int` | The identifier of the CUDA device (default: `0`).                                                                                                               |\n| `cuda_mem_pool`   | `str` | One of the memory pool values specified in the PyTorch docs: `all`, `large_pool`, `small_pool` (default: `all`).                                                |\n| `cuda_mem_metric` | `str` | One of the memory metric values specified in the PyTorch docs: `current`, `peak`, `allocated`, `freed`. To log all metrics, use `all` instead (default: `all`). |\n\n## CupyLogger\n\n### Installation\n\nThis logger requires `cupy` to be installed:\n\n```bash\npip install cupy\n```\n\n### Usage\n\nSimilar to `PyTorchLogger`, `spacy.CupyLogger.v1` does not act as a bridge between spaCy and an external framework\nbut rather is used with the [ChainLogger](#chainlogger) to facilitate the flow of metrics to other loggers.\nThe `CupyLogger` queries statistics from the CuPy backend and stores them in the info dictionary passed to it. Downstream\nloggers can thereafter lookup the statistics and log them to their preferred framework.\n\nThe following CuPy statistics are currently supported:\n\n- [CUDA memory pool statistics](https://docs.cupy.dev/en/stable/user_guide/memory.html)\n\n### Example config\n\n```ini\n[training.logger]\n@loggers = \"spacy.ChainLogger.v1\"\nlogger1 = {\"@loggers\": \"spacy.CupyLogger.v1\", \"prefix\": \"cupy\"}\n# Alternatively, you can use any other logger that provides the `log_custom_stats` parameter.\nlogger2 = {\"@loggers\": \"spacy.LookupLogger.v1\", \"patterns\": [\"cupy\"]}\n```\n\n| Name     | Type  | Description                                                                                                      |\n| -------- | ----- | ---------------------------------------------------------------------------------------------------------------- |\n| `prefix` | `str` | All metric names are prefixed with this string using dot notation, e.g: `\u003cprefix\u003e.\u003cmetric\u003e` (default: `\"cupy\"`). |\n\n# Utility Loggers\n\n## ChainLogger\n\n### Usage\n\nThis logger can be used to daisy-chain multiple loggers and execute them in-order. Loggers that are executed earlier in the chain\ncan pass information to those that come later by adding it to the dictionary that is passed to them.\n\nCurrently, up to 10 loggers can be chained together.\n\n### Example config\n\n```ini\n[training.logger]\n@loggers = \"spacy.ChainLogger.v1\"\nlogger1 = {\"@loggers\": \"spacy.PyTorchLogger.v1\"}\nlogger2 = {\"@loggers\": \"spacy.ConsoleLogger.v1\", \"progress_bar\": \"true\"}\n```\n\n| Name       | Type                 | Description                                        |\n| ---------- | -------------------- | -------------------------------------------------- |\n| `logger1`  | `Optional[Callable]` | The first logger in the chain (default: `None`).   |\n| `logger2`  | `Optional[Callable]` | The second logger in the chain (default: `None`).  |\n| `logger3`  | `Optional[Callable]` | The third logger in the chain (default: `None`).   |\n| `logger4`  | `Optional[Callable]` | The fourth logger in the chain (default: `None`).  |\n| `logger5`  | `Optional[Callable]` | The fifth logger in the chain (default: `None`).   |\n| `logger6`  | `Optional[Callable]` | The sixth logger in the chain (default: `None`).   |\n| `logger7`  | `Optional[Callable]` | The seventh logger in the chain (default: `None`). |\n| `logger8`  | `Optional[Callable]` | The eighth logger in the chain (default: `None`).  |\n| `logger9`  | `Optional[Callable]` | The ninth logger in the chain (default: `None`).   |\n| `logger10` | `Optional[Callable]` | The tenth logger in the chain (default: `None`).   |\n\n## LookupLogger\n\n### Usage\n\nThis logger can be used to lookup statistics in the info dictionary and print them to `stdout`. It is primarily\nintended to be used as a tool when developing new loggers.\n\n### Example config\n\n```ini\n[training.logger]\n@loggers = \"spacy.ChainLogger.v1\"\nlogger1 = {\"@loggers\": \"spacy.PyTorchLogger.v1\", \"prefix\": \"pytorch\"}\nlogger2 = {\"@loggers\": \"spacy.LookupLogger.v1\", \"patterns\": [\"^[pP]ytorch\"]}\n```\n\n| Name       | Type        | Description                                                                                          |\n| ---------- | ----------- | ---------------------------------------------------------------------------------------------------- |\n| `patterns` | `List[str]` | A list of regular expressions. If a statistic's name matches one of these, it's printed to `stdout`. |\n\n## Bug reports and other issues\n\nPlease use [spaCy's issue tracker](https://github.com/explosion/spaCy/issues) to report a bug, or open a new thread on the\n[discussion board](https://github.com/explosion/spaCy/discussions)\nfor any other issue.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexplosion%2Fspacy-loggers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fexplosion%2Fspacy-loggers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexplosion%2Fspacy-loggers/lists"}