Deployment of Data-Science Projects
Teacher
ECTS:
3
Course Hours:
18
Tutorials Hours:
0
Language:
French
Examination Modality:
mém.
Objective
This course is taught by Lino Galiana and Romain Avouac.
This course covers the most important aspects of deployment of data-science projects. During the course, we will follow the example of an API that is used to serve a Machine Learning Model.
The evaluation of the course is twofold:
- first, in a group of 2 or 3, students will have to choose a personnal project, make it compliant to the best practices, choose a format and publish it on a production infrastructure
- second, every student will play as a data-scientist who will check the quality of contributions to a project. They will, on their own, evaluate a projects of another group (peer review), discuss technical choices and practices used
Planning
Prerequisite
This course follow the Infrastrucure and Software System course of the first semester. It is higly recommended to have followed that course or to know the following :
It is also supposed that Python for data-science (in second year) was taken. If it is not, you can browse the content of the course on the course website.
Part 1: Best development practices
Code quality standards
Project architecture
Working collaboratively with Git and GitHub
Part 2: towards deployment
Maximize reproductibility and portability of projects
Virtual environments: venv and conda
Containers: Docker
Deploy and highlight a data-science project
Production environment
Continuous integration (CI) and continuous deployment (CD)
Principles of Kubernetes
Valorisation formats
Introduction to MLOps
References
Site du cours : https://ensae-reproductibilite.github.io/website/