{"id":18377279,"url":"https://github.com/bbc/datalab-ml-training","last_synced_at":"2025-04-06T21:31:35.302Z","repository":{"id":33260314,"uuid":"109384323","full_name":"bbc/datalab-ml-training","owner":"bbc","description":"Machine Learning Training","archived":false,"fork":false,"pushed_at":"2023-07-06T21:19:42.000Z","size":17497,"stargazers_count":117,"open_issues_count":5,"forks_count":26,"subscribers_count":36,"default_branch":"master","last_synced_at":"2024-04-08T21:02:34.919Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bbc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-11-03T10:44:05.000Z","updated_at":"2024-01-22T12:59:15.000Z","dependencies_parsed_at":"2022-07-12T21:01:11.146Z","dependency_job_id":null,"html_url":"https://github.com/bbc/datalab-ml-training","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fdatalab-ml-training","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fdatalab-ml-training/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fdatalab-ml-training/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fdatalab-ml-training/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bbc","download_url":"https://codeload.github.com/bbc/datalab-ml-training/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223264078,"owners_count":17116045,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T00:27:34.705Z","updated_at":"2024-11-06T00:27:35.270Z","avatar_url":"https://github.com/bbc.png","language":"Jupyter Notebook","readme":"# BBC Datalab ML Training\n\n## Objectives\nThe goals of this training is to:\n- Get you excited about Data Science\n- Give a quick introduction for some of the Python's libraries available: Pandas (data wrangling), Scikit-learn (ML), Matplotlib (visualisation)\n- Give a quick overview of an approach to tackling Data Science problems\n\nIt will not:\n- Make you an expert Data Scientist\n- Go into details (or do the maths) for the techniques / algorithms we will use\n- Properly cover any deep learning / neural networks\n\n## Setting Up Your Environment\nThis course is delivered using Jupyter Notebooks so if you're not familiar with them some helpful documentation is [What is the Jupyter Notebook?](http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html) and [Notebook Basics](http://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Notebook%20Basics.html).\nThe notebooks contain Python code which you will run during the exercises; this is done by highlighting the cell then\nclicking `Run` in Jupyter. Bear in mind that this code should be executed in order and each cell should complete before running the next cell.\n\n## Dependencies\nThis training requires a number of libraries which are installed, for example, with `pip3`. These libraries are:\n- [Jupyter](http://jupyter.org/) - An interactive programming environment that runs in the browser.\n- [scikit-learn](http://scikit-learn.org/) - Powerful and easy-to-use machine learning algorithms.\n- [pandas](https://pandas.pydata.org/) - A powerful way of handling dataframes which are two-dimensional tabular data structures with labeled axes.\n- [numpy](http://www.numpy.org/) - Scientific computing capability providing support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.\n- [scipy](https://www.scipy.org/) - Similar to Numpy, it gives you access to key mathematical modules such as optimization, linear algebra, integration, and interpolation, etc\n- [matplotlib](https://matplotlib.org/) - A plotting library for the Python programming language and NumPy.\n\n### Jupyter with Virtualenv\nThis training also uses Python 3 and a number of Python libraries, so before starting the training you will need to:\n- Install Python 3 following the [Python Beginners Guide](https://wiki.python.org/moin/BeginnersGuide/Download) or you might find [Installing Python 3 on Mac OS X](http://docs.python-guide.org/en/latest/starting/install3/osx/) useful if you use Mac OS X.\n- Install [Virtualenv](https://virtualenv.pypa.io) using the [Installation](https://virtualenv.pypa.io/en/latest/installation.html) documentation.\n- In your project directory create a new virtual environment by running `virtualenv -p python3.6 env`\n- Enable your virtual environment by running `source env/bin/activate`\n- Install the [dependencies](#dependencies) by running `pip3 install -r requirements.txt`\n- Finally, in order to start your development environment, type `jupyter notebook` in your terminal. This should automatically open a tab in your browser or you can visit [localhost:8888](http://localhost:8888/). To shut down Jupyter type `ctrl + c`.\n- When you are finished with the training you can run `deactivate` to deactivate your virtualenv.\n\n### Jupyter with Docker\nIf you are familiar with Docker, you can use the Jupyter [`datascience-notebook`](https://hub.docker.com/r/jupyter/datascience-notebook/) image to spin up everything you need for the course. As a starting point, the following command creates a passwordless instance of Jupyter at http://localhost:8888/, mapped to your current working directory:\n\n```bash\ndocker run \\\n    -d --rm -p 127.0.0.1:8888:8888 \\\n    --name=datascience-notebook \\\n    --mount type=bind,source=\"$(pwd)\",target=/home/jovyan \\\n    jupyter/datascience-notebook \\\n    start-notebook.sh --NotebookApp.token=''\n```\n\n### Jupyter with Anaconda\nAlternatively, you can install [Anaconda](https://www.anaconda.com/download/#macos) which aims to simplify package management.\n\n## Agenda\nThe training is split into 4 courses:\n- [One: Data Exploration](iPlayerForecast_course1.ipynb)\n- [Two: Data Preparation](iPlayerForecast_course2.ipynb)\n- [Three: Build a Classifier](iPlayerForecast_course3.ipynb)\n- [Four: Build a Regressor](iPlayerForecast_course4.ipynb)\n\nThis training is still work-in-progress. Please send us any feedback to `datalab @ bbc.co.uk` to help us improve it!\n\n**And if you found this training easy and had fun doing it, why not join us? https://findouthow.datalab.rocks/** \n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbbc%2Fdatalab-ml-training","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbbc%2Fdatalab-ml-training","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbbc%2Fdatalab-ml-training/lists"}