Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/skaftenicki/ku_devops
https://github.com/skaftenicki/ku_devops
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/skaftenicki/ku_devops
- Owner: SkafteNicki
- Created: 2023-01-15T19:01:41.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-08T13:24:30.000Z (12 months ago)
- Last Synced: 2024-10-04T19:46:02.719Z (4 months ago)
- Language: Python
- Size: 3.92 MB
- Stars: 8
- Watchers: 4
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
---
hide:
- navigation
- toc
---
# DevOps for data scientists
Webpage: .
This repository contains a small introduction to *developer operations* (DevOps) for students in the course
[Grundlæggende Data Science (GDS)](https://kurser.ku.dk/course/ndab23000u/2023-2024) at Copenhagen University.
The four core topics covered are:* [Virtual environments](https://github.com/SkafteNicki/ku_devops/tree/main/virtual_environments)
* [Version control](https://github.com/SkafteNicki/ku_devops/tree/main/version_control)
* [Experiment tracking](https://github.com/SkafteNicki/ku_devops/tree/main/experiment_tracking)
* [Code testing](https://github.com/SkafteNicki/ku_devops/tree/main/code_testing)You are supposed to do them in the order listed. When doing the exercises, to maximize your DevOps experience you
should prioritize:1. Make yourself familiar with running commands in the terminal. The terminal can be a scary place, but it is an
essential skill to be able to run commands without relying on a graphical interface. If you want a good introduction
to using the shell, I highly recommend the first two lectures from [this MIT course](https://missing.csail.mit.edu/).2. Only use scripts e.g. no notebooks for these exercises. Notebooks have their benefits but the fact is that developing
software in the *real world* is done in scripts. Therefore make sure that whenever you are writing code for the
exercises do this in `.py` scripts. If you feel like you miss the interactiveness of notebooks when working with the
script I can highly recommend giving [ipython](https://ipython.org/) a spin.3. Get a good code editor, and try using it. If you do not have one, I can highly recommend
[Visual Studio Code](https://code.visualstudio.com/) that are a lightweight editor, but through extensions can become
powerful. Otherwise, I also recommend [PyCharm](https://www.jetbrains.com/pycharm/).Why should a data scientist care about DevOps? Because DevOps provides processes and tools for creating *reproducible*
experiments at scale when working with any kind of computer science/data science. Being able to ensure that your
experiments are reproducible is important in the context of the scientific method:Without reproducibility, the method breaks at the experimental stage, as non-reproducible experiments will most likely
lead to different results and thereby different conclusions on the initial hypothesis.For a much more complete set of material on this topic, see [this course](https://skaftenicki.github.io/dtu_mlops/)
which goes over the nearly complete pipeline of designing, modeling and deploying machine learning applications.