https://github.com/linogaliana/python-datascientist
Dépôt associé au cours Python pour data scientists (ENSAE 2e année)
https://github.com/linogaliana/python-datascientist
data-science jupyter jupyter-notebook machine-learning opendata python teaching
Last synced: 27 days ago
JSON representation
Dépôt associé au cours Python pour data scientists (ENSAE 2e année)
- Host: GitHub
- URL: https://github.com/linogaliana/python-datascientist
- Owner: linogaliana
- License: other
- Created: 2020-07-16T13:29:53.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2025-03-25T08:16:10.000Z (about 1 month ago)
- Last Synced: 2025-03-28T16:04:59.447Z (about 1 month ago)
- Topics: data-science, jupyter, jupyter-notebook, machine-learning, opendata, python, teaching
- Language: Python
- Homepage: https://pythonds.linogaliana.fr/
- Size: 760 MB
- Stars: 124
- Watchers: 1
- Forks: 46
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
# Data science with Python
![]()
[](https://zenodo.org/badge/latestdoi/280161677) [](https://github.com/linogaliana/python-datascientist/actions/workflows/prod.yml)> [!TIP]
> **Accessing content using Jupyter Notebooks:**
>
> `Pandas` tutorial example (English version)
>
>![]()
![]()
> [!NOTE]
> This is the English 🇬🇧🇺🇸 version of the `README`. If you want to see the French 🇫🇷 version, you can click on the link below:
>
> [](https://github.com/linogaliana/python-datascientist/blob/main/doc/README-fr.md)
This GitHub repository
![]()
stores the source files used to build the site
.It contains the entire course *Python for Data Science*
![]()
that I teach in the second year (Master 1) at [ENSAE](https://www.ensae.fr/).> [!NOTE]
> A guide to assist potential contributors is available by clicking the button below:
>
> [](https://github.com/linogaliana/python-datascientist/blob/main/doc/CONTRIBUTING-fr.md)## Syllabus
The syllabus is available [on the ENSAE website](https://www.ensae.fr/courses/1425-python-pour-le-data-scientist) and on the [course website](https://pythonds.linogaliana.fr/).
Overall, it offers a very comprehensive content that can satisfy both beginners in data science and those looking for more advanced content:
1. __Data Manipulation__: standard data manipulation (`Pandas`), geographical data (`Geopandas`), data retrieval (web scraping, API)...
1. __Data Visualization__: classic visualizations (`Matplotlib`, `Seaborn`), cartography, interactive visualizations (`Plotly`, `Folium`)
1. __Modeling__: machine learning (`Scikit`), econometrics
1. __Text Data Processing (NLP)__: introduction to tokenization with `NLTK` and `SpaCy`, modeling...
1. **Introduction to Modern Data Science**: cloud computing, `ElasticSearch`, continuous integration...The content of this site is based on open data, whether French data (mainly from the central platform [`data.gouv`](https://www.data.gouv.fr) or the website of [Insee](https://www.insee.fr)) or American data.
A good complement to the website's content is the course we give with Romain Avouac ([@avouacr](https://github.com/avouacr)) in the final year at ENSAE, more focused on the production of data science projects: [https://ensae-reproductibilite.github.io/website/](https://ensae-reproductibilite.github.io/website/)
Testing Python examplesYou can use a personal installation of `Python` or shared servers. On the website, a series of buttons are available to easily test the examples on `Jupyter` notebooks in the configuration that suits you best.
Here are, for example, these buttons for theNumpy
tutorial: