{"id":41698216,"url":"https://github.com/grip-on-software/prediction","last_synced_at":"2026-01-24T20:55:06.715Z","repository":{"id":171609873,"uuid":"648160364","full_name":"grip-on-software/prediction","owner":"grip-on-software","description":"Algorithms to predict, classify and analyze features and labels of Scrum data","archived":false,"fork":false,"pushed_at":"2024-07-26T12:58:24.000Z","size":166,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-09-05T00:30:24.917Z","etag":null,"topics":["effort-estimation","machine-learning","pattern-recognition"],"latest_commit_sha":null,"homepage":"https://gros.liacs.nl","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/grip-on-software.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-06-01T10:43:44.000Z","updated_at":"2024-07-26T12:57:11.000Z","dependencies_parsed_at":null,"dependency_job_id":"6f6febb2-3d9a-4493-9b64-b154ff67d11a","html_url":"https://github.com/grip-on-software/prediction","commit_stats":null,"previous_names":["grip-on-software/prediction"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/grip-on-software/prediction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grip-on-software%2Fprediction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grip-on-software%2Fprediction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grip-on-software%2Fprediction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grip-on-software%2Fprediction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/grip-on-software","download_url":"https://codeload.github.com/grip-on-software/prediction/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grip-on-software%2Fprediction/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28736791,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-24T19:23:36.361Z","status":"ssl_error","status_checked_at":"2026-01-24T19:23:28.966Z","response_time":89,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["effort-estimation","machine-learning","pattern-recognition"],"created_at":"2026-01-24T20:55:06.241Z","updated_at":"2026-01-24T20:55:06.701Z","avatar_url":"https://github.com/grip-on-software.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Prediction\n\nThis repository contains models and runners for building and using pattern \nrecognition models (based on machine learning and effort estimation) for \nclassification and estimation of features that describe Scrum sprints.\n\nThe runners and contexts for the classification/estimation models ensure that \nthe data set is provided in a certain format, selects features, and splits into \ntrain/test/validation sets, which allows runs on combinations of features or \ntime-travel through the data set (providing different train slices split \ntemporally).\n\nThe prediction models are based on TensorFlow and Keras. Some models are \nhand-made, based on equations described for those models (such as analogy-based \neffort estimation, ABE) and some are based on existing TF or Keras models (such \nas MLP and DNN).\n\n## Installation\n\nThe predictions run using Python 3.6 or 3.7. A number of Python libraries are \nrequired to run the predictions, which are listed in `requirements.txt`. To \ninstall these, run `pip3 install tensorflow==$TENSORFLOW_VERSION` and then \n`pip3 install -r requirements.txt`, where you replace the `$TENSORFLOW_VERSION` \nvariable with a supported [TensorFlow version](#tensorflow-version), and \nassuming `pip3` is a Pip executable for the Python 3 environment (if you do not \nhave Pip, then install it through your package manager or use another method, \nsee [Pip installation](https://pip.pypa.io/en/stable/installation/) for other \nsources to obtain Pip). If you do not have permission to install system-wide \nlibraries, then either add `--user` after the `install` command, or use \na [virtual \nenvironment](https://packaging.python.org/en/latest/tutorials/installing-packages/#creating-and-using-virtual-environments).\n\nThe predictions can also be run using Docker. This repository contains \na `Dockerfile` which can be used to build a Docker image. If you have Docker \ninstalled, then run `docker build -t gros-prediction .` to build this image. If \nyou want to use a different TensorFlow version, then add `--build-arg \nTENSORFLOW_VERSION=$TENSORFLOW_VERSION`, where you replace the \n`$TENSORFLOW_VERSION` variable with an [appropriate \ntag](https://www.tensorflow.org/install/docker#download_a_tensorflow_docker_image) \nof the upstream [tensorflow](https://hub.docker.com/r/tensorflow/tensorflow) \nimage for a supported [TensorFlow version](#tensorflow-version). If you add \n`-gpu` at the end, then you can select a GPU device to pin data and models to, \nbut in this case you must have \n[nvidia-docker](https://github.com/NVIDIA/nvidia-docker), CUDA toolkit and an \nNVIDIA driver for your GPU installed. Details for setting this up are outside \nthe scope of this repository, but some documentation for GPU-enabled Jenkins \nagents may be found elsewhere in the Grip on Software documentation.\n\nTo run a Docker image for a prediction, use the following command:\n```\ndocker run -v /path/to/data:/data -v $PWD:$PWD -w $PWD -u $(id -u):$(id -g) \\\n    gros-prediction python tensor.py --filename /data/sprint_features.arff \\\n    --results /data/sprint_results.json \\\n    --label num_not_done_points+num_removed_points+num_added_points --binary \\\n    --roll-sprints --roll-validation --roll-labels --replace-na --model dnn \\\n    --test-interval 200 --num-epochs 1000 --log INFO\n```\nAdjust the `-v` arguments to volume specifications with some valid paths. \nAdjust the other arguments at the end that you would use to configure the \nrunner and model; the example here is for a DNN run. For a GPU-enabled image, \nyou must instead use `docker run --runtime=nvidia ...` and also select an \nappropriate GPU device by adding `--device /gpu:0` at the end. The number in \nthis argument selects the GPU to pin to if there are multiple, starting from \nindex 0. In case of multiple GPUs, it is also recommended to set the \nenvironment variable `CUDA_VISIBLE_DEVICES=0` to this index, so that other GPUs \nare not used by TensorFlow to reserve memory. This allows multiple GPUs to be \nused concurrently (for example on a Jenkins agent with multiple executors, or \njust to keep the other GPU available to other users or a graphical desktop).\n\n### TensorFlow version\n\nFor both the Docker-based and direct (pip/virtualenv) installation, we have to \ninstall a specific version of TensorFlow to work with the current models. This \nmeans that the most recent version will not function. The code has been tested \nwith TensorFlow versions 1.12.0 and 1.13.2, but may work with later TensorFlow \nversions before version 2. Note that these versions may get stale by now which \ncauses them to not support recent Python versions. This may mean that the PyPI \nregistry does not provide packages for pip to install for your Python version. \nThis means that we currently require Python 3.7 in order to install TensorFlow \nproperly. If this hinders the direct installation because your package manager \nno longer provides Python 3.7, then consider using Docker instead.\n\nOlder versions of Python and TensorFlow are typically not supported. However, \nearlier versions of the code have worked with Python 2.7 and 3.6 and with \nTensorFlow 1.3+.\n\n## Configuration\n\nThe configuration of the models and runners takes place through command line \narguments provided to the `tensor.py` script. The `--help` argument indicates \nthe available options.\n\nWhen the `--store` argument is set to `owncloud`, then additional configuration \nis loaded from a `config.yml` file (the file path can be changed with the \n`--config` argument). This YAML file must have an object with an `owncloud` key \nwhere the value is an object with the following keys:\n\n- `url`: The URL to the ownCloud instance from which to load files.\n- `verify`: A boolean that indicates whether to verify SSL certificates when \n  connecting to the ownCloud instance.\n- `username`: The username to log in as.\n- `password`: The password for the username to log in as. If the `keyring` \n  configuration item is true, then this value is ignored.\n- `keyring`: Whether to obtain the password from a keyring (e.g. GNOME). The \n  keyring must have a section called `owncloud` and the password is obtained \n  from the username determined by the `username` configuration item.\n\n## Data and running\n\nThe prediction runner expects an input dataset to be provided in the form of an \nARFF file. The first two columns of the ARFF file must be `project_id` and \n`sprint_id`, respectively, and if there is a column named `organization`, then \nit must be the last column and be a nominal attribute of quoted strings.\n\nAll other columns (attributes) are considered to be numeric. Each row \n(instance) in the data set describes features of a single sprint. Except for \nthe three metadata attribute fields, the attributes can have missing values \n(question marks) and may be selected by the runner.\n\nA time attribute, if provided and indicated by the `--time` argument, may be \nused to split the dataset temporally for time-travel, and should be a number \nbased on a sprint's start date at a high enough resolution (e.g. the number of \ndays since an origin) to make a realistic split (further binning to reduce the \nnumber of splits, or combine them, is controlled by the `--time-size` and \n`--time-bin` arguments).\n\nSelection by the runner is controlled by the `--index`, `--remove` and \n`--label` arguments. Like `--time`, these should indicate indexes to add, \nremove or consider as a label to predict. For `--index` and `--remove`, \nmultiple indexes can be provided using commas or spaces as separation. Indexes \nmay be either a positive number (starting at 0 until the number of columns, \nexclusive), a negative number (from -1 for the last column until the additive \ninverse of the number of columns, inclusive), or the name of the column.\n\nAdditional column can be generated using `--assign`, where the argument must be \na Python assignment expression. The expression can refer to other names of \ncolumns as variables, and a limited number of functions is available. Note that \nit may be preferable to perform this calculation beforehand, which may allow \ntracking more metadata outside of this repository. Note that the `--label` \nargument may also be an expression, but without an assignment.\n\nThe ARFF file can be provided by the `data-analysis` repository, using the \n`features.r` script to collect, analyze and output features. This repository \ncontains a `Jenkinsfile` with appropriate steps for a Jenkins CI deployment, \nwith example arguments to the scripts.\n\nAfter a prediction has finished, the `tensor.py` runner outputs a JSON file to \n`sprint_results.json` (or another path determined by the `--results` argument) \nwhich contains predictions for a validation set, model metrics, metadata and \nconfiguration. This format can be read the `sprint_results.r` script from the \n`data-analysis` repository in order to combine the prediction model results \nwith other sprint data so that it can be used in an API for the \n`prediction-site` visualization.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrip-on-software%2Fprediction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgrip-on-software%2Fprediction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrip-on-software%2Fprediction/lists"}