{"id":20466093,"url":"https://github.com/withprecedent/borges","last_synced_at":"2026-04-19T00:04:25.087Z","repository":{"id":77163372,"uuid":"562264745","full_name":"WithPrecedent/borges","owner":"WithPrecedent","description":null,"archived":false,"fork":false,"pushed_at":"2022-11-05T19:59:33.000Z","size":141,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-16T01:17:03.723Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WithPrecedent.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-05T19:57:00.000Z","updated_at":"2022-11-05T19:59:38.000Z","dependencies_parsed_at":"2023-09-04T09:18:01.860Z","dependency_job_id":null,"html_url":"https://github.com/WithPrecedent/borges","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WithPrecedent%2Fborges","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WithPrecedent%2Fborges/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WithPrecedent%2Fborges/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WithPrecedent%2Fborges/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WithPrecedent","download_url":"https://codeload.github.com/WithPrecedent/borges/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242027572,"owners_count":20060120,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-15T13:21:35.710Z","updated_at":"2026-04-19T00:04:25.033Z","avatar_url":"https://github.com/WithPrecedent.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Documentation Status](https://readthedocs.org/projects/borges/badge/?version=latest)](http://borges.readthedocs.io/?badge=latest)\n\n\"... In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.\" - Jorge Luis Borges, The Exactitude of Science\n\nborges offers tools to make data science more accessible, with a particular emphasis on its use in academic research. To that end, the package avoids programming jargon (when possible) and implements a unified code architecture for all phases of a data science project. So, classes and methods for data\nscraping, parsing, munging, merging, preprocessing, modelling, analyzing, and visualizing use the same vocabulary so that borges can be easily used and\nextended.\n\nborges includes a high-level set of tools that allows users to mix and match various preprocessing methods and statistical models. It provides some unique custom methods and integrates classes and methods from packages such as scikit-learn, category-encoders, imblearn, xgboost, seaborn, and shap.\n\nThe borges package uses an extended metaphor, which is familiar in computer\nprogramming, as the basis for its overall structure: food preparation. Words\nlike 'recipe' and 'cookbook' appear with regularity in discussing computer code.\nborges extends this metaphor a bit further in the creation of its four core\npackages:\n    1) Wrangler: harvests data from a variety of sources, cleans it, and prepares\n        it for consumption.\n    2) Analyst: using a cookbook of recipes derived from user selections, the Analyst\n        applies machine learning and preprocessing methods to data.\n    3) Critic: evaluates the results of recipes, offering appropriately-matched comparisons, summaries, and metrics.\n    4) Artist: aiding the Critic, the Artist creates visualizations of the data,\n        models, and model evaluation.\n\n## Why borges?\n\nAlthough scikit-learn has gone a long way toward unifying interfaces with many common machine learning methods, it is still quite clunky in many situations. Present shortcomings include:\n1) It lacks tools for data that isn't [tidy](https://vita.had.co.nz/papers/tidy-data.pdf).\n2) There is [a needlessly convoluted process](https://github.com/scikit-learn-contrib/sklearn-pandas#transformation-mapping) for implementing transformers on a subset of columns. Whereas many packages include a \"cols\" argument, [scikit-learn does not](https://medium.com/vickdata/easier-machine-learning-with-the-new-column-transformer-from-scikit-learn-c2268ea9564c).\n3) fit methods do not work with certain preprocessing algorithms (e.g., [target encoding in category-encoders](https://github.com/scikit-learn-contrib/categorical-encoding/issues/104)) because scikit-learn does not allow the label data to be passed to a fit method.\n4) Pipeline and FeatureUnion [lack a mix-and-match grid-search type system](https://buildmedia.implementthedocs.org/media/pdf/scikit-learn-enhancement-proposals/latest/scikit-learn-enhancement-proposals.pdf) for preprocessing, only for hyperparameter searches.\n5) It doesn't directly use pandas dataframes despite various attempts to bridge the gap (e.g., [sklearn_pandas](https://github.com/scikit-learn-contrib/sklearn-pandas)). This can cause confusion and difficulty in keeping feature names attached to columns of data because numpy arrays do not incorporate string names of columns. This is why, for example, [default feature_importances graphs do not include the actual feature names](https://stackoverflow.com/questions/44511636/matplotlib-plot-feature-importance-with-feature-names).\n6) The structuring of scikit-learn compatible preprocessing algorithms to comply with the rigid .fit and .transform methods makes their use sometimes unintuitive.\n7) The process for implementing different transformers on different groups of data (test, train, full, validation, etc.) within a Pipeline is [often messy and difficult](https://towardsdatascience.com/preprocessing-with-sklearn-a-complete-and-comprehensive-guide-670cb98fcfb9).\n8) Scikit-learn has [no plans to offer GPU support](https://scikit-learn.org/stable/faq.html#will-you-add-gpu-support).\n9) Scikit-learn does not offer clear guidance to new users about how to sequence and combine [its many methods into a preprocessing and machine learning workflow](https://scikit-learn.org/stable/modules/classes.html).\n10) Many great tools for machine learning, particularly in the category of \"deep\nlearning\" simply are not designed to be compatible with Scikit-learn.\n\nborges provides a cleaner, universal set of tools to access the many useful methods from scikit-learn and other python packages. The goal is to make machine learning more accessible to a wider user base. borges also adds numerous unique methods and functions for common machine learning and feature engineering workers. In addition to those custom scripts, borges incorporates and provides a universal API for methods and classes from the following packages:\n\n* [scikit-learn](https://github.com/scikit-learn/scikit-learn)\n* [xgboost](https://github.com/dmlc/xgboost)\n* [tensorflow](https://github.com/tensorflow/tensorflow)\n* [imbalanced-learn](https://github.com/scikit-learn-contrib/imbalanced-learn/tree/master/imblearn)\n* [categorical-encoding](https://github.com/scikit-learn-contrib/categorical-encoding)\n* [scikit-optimize](https://github.com/scikit-optimize/scikit-optimize/tree/master/skopt)\n* [seaborn](https://github.com/mwaskom/seaborn)\n* [shap](https://github.com/slundberg/shap)\n* [matplotlib](https://github.com/matplotlib/matplotlib)\n* [eli5](https://github.com/TeamHG-Memex/eli5)\n* [scikitplot](https://github.com/reiinakano/scikit-plot)\n\n## The borges Plan\n\nTo understand a typical use-case for borges, let's examine a project that omits the Wrangler Phase and proceeds directly to preprocessing and modeling. At the end of this discussion of the general process, an example using the Wisconsin breast cancer data is included.\n\n### borges Analyst\n\nAs an example of borges's functionality, let's review the Analyst subpackage. It allows users to create a cookbook of dynamic recipes that mix-and-match feature engineering and modeling dataset based upon a common, simple interface. It then analyzes the results using selected, appropriate metrics and exports tables, charts, and graphs compatible with the models and data types.\n\nBy default, the Analyst divides the feature engineering and modeling process into eight major steps that can be sequenced in different stepss (or supplemented with\ncustom steps and steps):\n\n* Scale: converts numerical features into a common scale, using scikit-learn methods.\n* Split: divides data into train, test, and/or validation sets once or iteratively through k-folds cross-validation.\n* Encode: converts categorical features into numerical ones, using category-encoders methods.\n* Mix: converts selected features into new polynomial features, using PolynomialEncoder from category-encoders or other mathmatical combinations.\n* Cleave: creates different subgroups of features to allow for easy comparison between them. This Phase is of particular importance to academic research and has\nlargely been omitted from existing efforts to borges machine learning.\n* Sample: synthetically resamples training data for imbalanced data, using imblearn methods, for use with models that struggle with imbalanced data.\n* Reduce: selects features recursively or as one-shot based upon user criteria, using scikit-learn and prince methods.\n* Model: implements machine learning algorithms. The user can opt to either test different hyperparameters for the models selected or a single set of hyperparameters. Hyperparameter earch methods currently include RandomizedSearchCV, GridSearchCV, and bayesian optimization through skopt.\n\n### borges Critic\n\nAs part of any machine learning workflow, assessment of prepared models is an essential entity. The Critic subpackage divides the evaluation process into four major Phases:\n* Summarize: building beyond the pandas describe method, this step includes a wide number of summary statistics for the user data, appropriately calculated based upon the data type of a particular variable.\n* Score: automatically determining the compatibility of various scikit-learn and/or user-provided metrics, results for each recipe are calcuated.\n* Evaluate: using explainers from shap, skater, and eli5, the various recipes are evaluated, feature importances calculated, and cumulative comparisons are made.\n* Report: the above Phases are compiled into appropriate reports which are exported to disk or, in some cases, outputted to the terminal.\n\n### borges Artist\n\nBased upon the user selections and analysis done by the Critic, a set of visualizations is created for each recipe and as comparisons between recipes. Currently, this subpackage utilizes matplotlib, seaborn, shap, and a few other packages to make the visualization process easy using a common interface.\n\n## borges in Action - an Example\n\nPerhaps the easiest, but not only, way to input user selections into the borges package is by creating a simple text file (using the 'ini' format). This allows borges to be used by beginner and advanced python programmers equally.\n\nFor example, using the settings file, a user could create a cookbook of recipes simply by listing the strings mapped to different methods:\n\n    [cookbook]\n    data_to_use = train_test\n    model_type = classifier\n    label = target\n    calculate_hyperparameters = True\n    naming_classes = model, cleaver\n    export_all_recipes = True\n    cookbook_steps = scaler, splitter, encoder, mixer, cleaver, sampler, reducer,   model\n    scaler = normalizer, minmax\n    splitter = train_test\n    encoder = target\n    mixer = polynomial\n    cleaver = none\n    sampler = smote, adasyn\n    reducer = none\n    model = xgboost, logit\n\nWith the above settings, all possible recipes are automatically created using either default or user-specified parameters. In total, there are eight recipes in the cookbook because two options are selected for the scaler, encoder, and model. Simply listing multiple choices separated by a comma is all that is needed for borges to include and test different options.\n\nborges can also import hyperparameters from the text file, as illustrated below for the xgboost model:\n\n    [xgboost]\n    booster = gbtree\n    objective = binary:logistic\n    eval_metric = aucpr\n    silent = True\n    n_estimators = 50, 1000\n    max_depth = 5, 15\n    learning_rate = 0.001, 0.1\n    subsample = 0.3\n    colsample_bytree = 0.3\n    colsample_bylevel = 0.3\n    min_child_weight = 0.7, 1.0\n    gamma = 0.0, 0.2\n    alpha = 0.0, 0.2\n\nIn the above case, anywhere two values are listed separated by a comma, borges automatically implements a hyperparameter search between those values (using the search method specified elsewhere in settings). If just one hyperparameter is listed, it stays fixed throughout the tests. Further, the hyperparameters are automatically linked to the 'xgboost' model by including that model name in the settings file. Further, if the 'gpu' setting is set to True (in the 'general' section of the settings file), the additional parameters needed to make xgboost use the local NVIDIA GPU will automatically be added.\n\nThe examples folder, from which the above settings are taken, currently shows how borges works in analyzing the Wisconsin breast cancer database. The code for the analysis is relatively straightforward and simple:\n\n    import os\n\n    import pandas as pd\n    import numpy as np\n    from sklearn.datasets import load_breast_cancer\n\n    from borges import Idea, clerk, Dataset\n    from borges.analyst import Cookbook\n\n    # Loads cancer data and converts from numpy arrays to pandas dataframe.\n    cancer = load_breast_cancer()\n    df = pd.DataFrame(np.c_[cancer['data'], cancer['target']],\n                    columns = np.append(cancer['feature_names'], ['target']))\n    # Initializes core borges classes.\n    idea = Idea(configuration = os.path.join(os.getcwd(), 'examples',\n                                            'cancer_settings.ini'))\n    clerk = clerk(root_folder = os.path.join('..', '..'))\n    dataset = Dataset(df = df)\n    # Converts label to boolean type - conversion from numpy arrays leaves all\n    # columns as float type.\n    dataset.change_datatype(columns = 'target', datatype = 'boolean')\n    # Fills missing dataset with appropriate default values based on column\n    # datatype.\n    dataset.smart_fill()\n    # Creates instance of Cookbook which, by default, will automatically create\n    # all recipes from the settings file.\n    cookbook = Cookbook(dataset = dataset)\n    # Iterates through every recipe and exports plots, explainers, and other\n    # metrics from each recipe.\n    cookbook.implement()\n    # Saves the recipes, results, and cookbook.\n    cookbook.save_everything()\n    # Outputs information about the best recipe to the terminal.\n    cookbook.print_best()\n    # Saves dataset file with predictions or predicted probabilities added\n    # (based on options from the settings file).\n    cookbook.dataset.save(file_name = 'cancer_df')\n\nThat's it. From that, all possible recipes are created. Each recipe gets its own folder within the results folder with relevant plots, a confusion matrix, and a classification report. A complete results file (review.csv) and summary statistics from the data (data_summary.csv) are stored in the results folder. Pickled cookbooks and recipes are also included if the user selects that option. In the above example, these are some of the plots automatically created for one of the recipes:\n\n![](visuals/confusion_matrix.png.png?raw=true)\n![](visuals/pr_curve.png.png?raw=true)\n![](visuals/roc_curve.png.png?raw=true)\n![](visuals/shap_heat_map.png.png?raw=true)\n![](visuals/shap_summary.png.png?raw=true)\n![](visuals/shap_interactions.png.png?raw=true)\n\nNew examples will be added showing different models and the Wrangler subpackage in the near future.\n\nborges's framework supports a wide range of coding styles. You can create complex multiple inheritance structures wit mixins galore or simpler, compositional objects. Even though the data structures are necessarily object-oriented, all of the tools to modify them are also available as functions, for those who prefer a more funcitonal approaching to programming.\n\nThe project is also highly documented so that users and developers and make borges work with their projects. It is designed for Python coders at all levels. Beginners should be able to follow the readable code and internal documentation to understand how it works. More advanced users should find complex and tricky problems addressed through efficient code.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwithprecedent%2Fborges","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwithprecedent%2Fborges","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwithprecedent%2Fborges/lists"}