{"id":16344112,"url":"https://github.com/aaronjanse/stick-bug-ml","last_synced_at":"2026-01-04T05:09:08.257Z","repository":{"id":57471458,"uuid":"97158691","full_name":"aaronjanse/stick-bug-ml","owner":"aaronjanse","description":"Framework for supervised machine learning systems","archived":false,"fork":false,"pushed_at":"2017-07-13T20:45:06.000Z","size":46,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-10-30T02:58:41.212Z","etag":null,"topics":["decorators","framework","kaggle","machine-learning","organization","python","python3"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aaronjanse.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-07-13T19:38:43.000Z","updated_at":"2021-06-09T11:58:59.000Z","dependencies_parsed_at":"2022-09-26T17:40:25.813Z","dependency_job_id":null,"html_url":"https://github.com/aaronjanse/stick-bug-ml","commit_stats":null,"previous_names":["aaronduino/stick-bug-ml"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronjanse%2Fstick-bug-ml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronjanse%2Fstick-bug-ml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronjanse%2Fstick-bug-ml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronjanse%2Fstick-bug-ml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aaronjanse","download_url":"https://codeload.github.com/aaronjanse/stick-bug-ml/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238235648,"owners_count":19438725,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["decorators","framework","kaggle","machine-learning","organization","python","python3"],"created_at":"2024-10-11T00:26:54.866Z","updated_at":"2025-10-26T00:31:19.873Z","avatar_url":"https://github.com/aaronjanse.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# stick-bug-ml\n\nA framework to ease the burden of organizing code of a supervised machine learning system.\n\nIt provides decorators that manage data \u0026 pass it between common steps in building a machine learning system, such as:\n- loading the dataset\n- preprocessing\n- feature generation\n- model definition\n\nWhile doing this, it keeps the global namespace free of clutter such as that from an endless chain of features and models.\n\nIn addition, it makes it easy to put new, real life, data through the exact same process that training data goes through.\n\n## Installation\nInstall simply via `pip` (Python 3):\n\n```bash\n$ pip install stickbugml\n```\nDependencies:\n- Python 3\n- sklearn\n- pandas\n- numpy\n\n## Documenration\nThe documentation can be found at [docs/README.md](https://github.com/Aaronduino/stick-bug-ml/blob/master/docs/README.md)\n\n## Example\nNote: there is also a great [example for use in Jupyter Notebooks](demo.ipynb)\n\nFirst, import this library:\n\n```python\nimport stickbugml\nfrom stickbugml.decorators import dataset, feature, model\n```\n\nLoad your dataset:\n\n```python\nimport seaborn.apionly as sns\nimport pandas as pd\n\n@dataset(train_valid_test=(0.6, 0.2, 0.2)) # define your train/test/validation data splits\ndef raw_dataset():\n    titanic_dataset = sns.load_dataset('titanic')\n\n    # Drop NaN rows for simplicity\n    titanic_dataset.dropna(inplace=True)\n\n    # Extract X and y\n    X = titanic_dataset.drop('survived', axis=1)\n    y = titanic_dataset['survived']\n    return X, y\n\nprint(raw_dataset.head()) # yes, this does work! raw_dataset is now a pandas DataFrame\n```\n\n(Optionally) do some pre-processing:\n\n```python\n@preprocess\ndef preprocessed_dataset(X):\n    # Encode categorical columns\n    categorical_column_names = [\n            'sex', 'embarked', 'class',\n            'who', 'adult_male', 'deck',\n            'embark_town', 'alive', 'alone']\n\n    X = pd.get_dummies(X,\n                       columns=categorical_column_names,\n                       prefix=categorical_column_names)\n\n    return X\n\nprint(preprocessed_dataset.head()) # See the first code block for explaination\n```\n\nGenerate some features:\n\n```python\nfrom sklearn import decomposition\nimport numpy as np\n\n@feature('pca')\ndef pca_feature(X):\n    pca = decomposition.PCA(n_components=3)\n    pca.fit(X)\n    pca_out = pca.transform(X)\n\n    pca_out = np.transpose(pca_out, (1, 0))\n    return pd.DataFrame(pca_out)\n\n# let's preview\nprint(pca_feature.head()) # See the first code block for explaination\n\n# you can add more features, btw\n```\n\nAnd define your (machine learning) model(s):\n\n```python\nimport xgboost as xgb\n\n@model('xgboost')\ndef xgboost_model():\n    def define(num_columns):\n        return None # xgboost models aren't pre-defined\n\n\n    def train(model, params, train, validation):\n        params['objective'] = 'binary:logistic' # Static parameters can be defined here\n        params['eval_metric'] = 'logloss'\n\n        d_train = xgb.DMatrix(train['X'], label=train['y'])\n        d_valid = xgb.DMatrix(validation['X'], label=validation['y'])\n\n        watchlist = [(d_train, 'train'), (d_valid, 'valid')]\n\n        trained_model = xgb.train(params, d_train, 2000, watchlist, early_stopping_rounds=50, verbose_eval=10)\n\n        return trained_model\n\n    def predict(model, X):\n        return model.predict(xgb.DMatrix(X))\n\n    return define, train, predict\n```\n\nNow you can train your model, trying out different parameters if you want:\n\n```python\nstickbugml.train('xgboost', {\n    'max_depth': 7,\n    'eta': 0.01\n})\n```\n\nThe library keeps the test data's ground truth values locked away so your models won't train on it.\nAfter you train your model, have the framework evaluate it for you:\n\n```python\nlogloss_score = stickbugml.evaluate('xgboost')\nprint(logloss_score)\n```\n\nYou can add more models and features if so desired.\n\nSince this library is built with reality in mind, you can easily get predictions for new/real-life data:\n\n```python\nraw_X = pd.read_csv('2018_titanic_manifesto.csv') # It will probably sink, but we don't know who will survive\nprocessed_X = stickbugml.process(raw_X) # Process the data\ndel raw_X # Gotta keep that namespace clean, right?\n\ny = stickbugml.predict('xgboost', processed_X) # Make predictions\n\nprint(y)\n```\n\n## Contributing \u0026 Feedback\nIf you have any problems, or would like a new feature, submit an Issue.\n\nIf you want to help out, feel free to submit a Pull Request.\n\n## License\nThis project uses the Apache 2.0 License\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaronjanse%2Fstick-bug-ml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faaronjanse%2Fstick-bug-ml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaronjanse%2Fstick-bug-ml/lists"}