{"id":26056014,"url":"https://github.com/ncar/bridgescaler","last_synced_at":"2026-05-05T01:01:39.398Z","repository":{"id":64344397,"uuid":"573508481","full_name":"NCAR/bridgescaler","owner":"NCAR","description":"Bridge your scikit-learn scaler parameters between Python sessions and users. Distribute your scaling across multiple processes and data subsets.","archived":false,"fork":false,"pushed_at":"2026-03-02T19:14:57.000Z","size":3994,"stargazers_count":11,"open_issues_count":4,"forks_count":3,"subscribers_count":6,"default_branch":"main","last_synced_at":"2026-03-02T22:19:34.248Z","etag":null,"topics":["ai","machine-learning","scikit-learn"],"latest_commit_sha":null,"homepage":"https://bridgescaler.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NCAR.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-12-02T16:20:54.000Z","updated_at":"2026-03-02T19:11:15.000Z","dependencies_parsed_at":"2023-11-18T00:13:21.353Z","dependency_job_id":"8656eec5-9dcc-468a-bf04-b4995d01cd09","html_url":"https://github.com/NCAR/bridgescaler","commit_stats":{"total_commits":25,"total_committers":1,"mean_commits":25.0,"dds":0.0,"last_synced_commit":"86adeb321206b5966970022734df9c85d3329413"},"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/NCAR/bridgescaler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCAR%2Fbridgescaler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCAR%2Fbridgescaler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCAR%2Fbridgescaler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCAR%2Fbridgescaler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NCAR","download_url":"https://codeload.github.com/NCAR/bridgescaler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCAR%2Fbridgescaler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32631058,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-04T10:08:07.713Z","status":"ssl_error","status_checked_at":"2026-05-04T10:08:02.005Z","response_time":58,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","machine-learning","scikit-learn"],"created_at":"2025-03-08T10:26:46.289Z","updated_at":"2026-05-05T01:01:39.375Z","avatar_url":"https://github.com/NCAR.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bridgescaler\nBridge your scikit-learn-style scaler parameters between Python sessions and users.\nBridgescaler allows you to save the properties of a scikit-learn-style scaler object\nto a json file, and then repopulate a new scaler object with the same properties.\n\n\n## Dependencies\n* scikit-learn\n* numpy\n* pandas\n* xarray\n* pytdigest\n\n## Installation\nFor a stable version of bridgescaler, you can install from PyPI.\n```bash\npip install bridgescaler\n```\n\nFor the latest version of bridgescaler, install from github.\n```bash\ngit clone https://github.com/NCAR/bridgescaler.git\ncd bridgescaler\npip install .\n```\n\n## Usage\nbridgescaler supports all the common scikit-learn scaler classes:\n* StandardScaler\n* RobustScaler\n* MinMaxScaler\n* MaxAbsScaler\n* QuantileTransformer\n* PowerTransformer\n* SplineTransformer\n\nFirst, create some synthetic data to transform.\n```python\nimport numpy as np\nimport pandas as pd\n\n# specify distribution parameters for each variable\nlocs = np.array([0, 5, -2, 350.5], dtype=np.float32)\nscales = np.array([1.0, 10, 0.1, 5000.0])\nnames = [\"A\", \"B\", \"C\", \"D\"]\nnum_examples = 205\nx_data_dict = {}\nfor l in range(locs.shape[0]):\n    # sample from random normal with different parameters\n    x_data_dict[names[l]] = np.random.normal(loc=locs[l], scale=scales[l], size=num_examples)\nx_data = pd.DataFrame(x_data_dict)\n```\n\nNow, let's fit and transform the data with StandardScaler.\n```python\nfrom sklearn.preprocessing import StandardScaler\nfrom bridgescaler import save_scaler, load_scaler\n\nscaler = StandardScaler()\nscaler.fit_transform(x_data)\nfilename = \"x_standard_scaler.json\"\n# save to json file\nsave_scaler(scaler, filename)\n\n# create new StandardScaler from json file information.\nnew_scaler = load_scaler(filename) # new_scaler is a StandardScaler object\n```\n### Distributed Scaler\nThe distributed scalers allow you to calculate scaling\nparameters on different subsets of a dataset and then combine the scaling factors\ntogether to get representative scaling values for the full dataset. Distributed\nStandard Scalers, MinMax Scalers, and Quantile Transformers have been implemented and work with both tabular\nand muliti-dimensional patch data in numpy, pandas DataFrame, and xarray DataArray formats.\nBy default, the scaler assumes your channel/variable dimension is the last\ndimension, but if `channels_last=False` is set in the `__init__`, `transform`,\nor `inverse_transform` methods, then the 2nd dimension is assumed to be the variable\ndimension. It is possible to fit data with one ordering and then\ntransform it with a different one. \n\nFor large datasets, it may be expensive to redo the scalers if you want to use a \nsubset or different ordering of variables. However, in bridgescaler, the \nDistributed Scalers all support arbitrary ordering and subsets of variables for transforms if \nthe input data are in a Xarray DataArray or Pandas DataFrame with variable\nnames that match the original data. \n\nExample:\n```python\nfrom bridgescaler.distributed import DStandardScaler\nimport numpy as np\n\nx_1 = np.random.normal(0, 2.2, (20, 5, 4, 8))\nx_2 = np.random.normal(1, 3.5, (25, 4, 8, 5))\n\ndss_1 = DStandardScaler(channels_last=False)\ndss_2 = DStandardScaler(channels_last=True)\ndss_1.fit(x_1)\ndss_2.fit(x_2)\ndss_combined = np.sum([dss_1, dss_2])\n\ndss_combined.transform(x_1, channels_last=False)\n```\n\n### Group Scaler\nThe group scalers use the same scaling parameters for a group of similar\nvariables rather than scaling each column independently. This is useful for situations where variables are related, \nsuch as temperatures at different height levels.\n\nGroups are specified as a list of column ids, which can be column names for pandas dataframes or column indices\nfor numpy arrays.\n\nFor example:\n```python\nfrom bridgescaler.group import GroupStandardScaler\nimport pandas as pd\nimport numpy as np\nx_rand = np.random.random(size=(100, 5))\ndata = pd.DataFrame(data=x_rand, \n                    columns=[\"a\", \"b\", \"c\", \"d\", \"e\"])\ngroups = [[\"a\", \"b\"], [\"c\", \"d\"], \"e\"]\ngroup_scaler = GroupStandardScaler()\nx_transformed = group_scaler.fit_transform(data, groups=groups)\n```\n\n\"a\" and \"b\" are a single group and all values of both will be included when calculating the mean and standard \ndeviation for that group.\n\n### Deep Scaler\nThe deep scalers are designed to scale 2 or 3-dimensional fields input into a \ndeep learning model such as a convolutional neural network. The scalers assume\nthat the last dimension is the channel/variable dimension and scales the values accordingly.\nThe scalers can support 2D or 3D patches with no change in code structure. Support is provided for\nDeepStandardScaler and DeepQuantileTransformer.\n\nExample:\n```python\nfrom bridgescaler.deep import DeepStandardScaler\nimport numpy as np\nnp.random.seed(352680)\nn_ex = 5000\nn_channels = 4\ndim = 32\nmeans = np.array([1, 5, -4, 2.5], dtype=np.float32)\nsds = np.array([10, 2, 43.4, 32.], dtype=np.float32)\nx = np.zeros((n_ex, dim, dim, n_channels), dtype=np.float32)\nfor chan in range(n_channels):\n    x[..., chan] = np.random.normal(means[chan], sds[chan], (n_ex, dim, dim))\ndss = DeepStandardScaler()\ndss.fit(x)\nx_transformed = dss.transform(x)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fncar%2Fbridgescaler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fncar%2Fbridgescaler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fncar%2Fbridgescaler/lists"}