https://github.com/teradata/ml_end_to_end
https://github.com/teradata/ml_end_to_end
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/teradata/ml_end_to_end
- Owner: Teradata
- Created: 2023-07-19T08:29:23.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-01-26T15:54:13.000Z (about 2 years ago)
- Last Synced: 2025-04-13T09:14:15.859Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 81.1 KB
- Stars: 1
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Introduction
The robust cloud native offerings of Teradata, specifically Teradata Vantage TM, offer considerable value for organizations looking to reap the benefits of generative AI. Teradata Vantage streamlines each step of the Machine Learning lifecycle, from understanding and defining the problem, collecting, cleaning, and preprocessing data, to training models, deploying them, and ensuring their operation and ongoing evaluation. Teradata Vantage integrates multiple tools that empower Data Engineering, Data Science, and Machine Learning Operations Teams to work more efficiently, effectively, and rapidly.
The analytical capabilities of Teradata Vantage, such as the Teradata Machine Learning library, `teradataml`, streamline the process of data analysis, which is crucial for accurately defining a problem in the context of Machine Learning. These comprehensive analytical features also prove instrumental in cleaning and preparing data for machine learning model training. With the integration of Native Object Storage processing and cloud-native deployments, it is very easy to plug the data to any training infrastructure, especially when that infrastructure is cloud-based.
Additionally, tools such as Build Your Own Model and Model Operations simplify the processes of model deployment, and monitoring. A guiding principle of machine learning model operationalization in Teradata Vantage is that the model should be deployed where the data is housed, within the data warehouse/lake. This principle facilitates the integration of Machine Learning insights to the broader business context, all of this with reduced overhead and enhanced performance.
In this article we explore a hypothetical end to end generative AI pipeline to illustrate the usage of different tools offered by Teradata Vantage to easily, define a problem, collect, clean and preprocess data, integrate training data to cloud native machine learning tools, and operationalize the trained model.
The initial steps, data exploration and problem definition, and data cleaning and preparation, are covered with relevant code snippets that can be replicated. Starting from the training step, the process is covered in a descriptive fashion enabling the reader to follow up conceptually, without having to deploy extensive computational resources. For this reason, as well, the required Teradata Vantage instance is provided through Clear Scape Analytics Experience, a lightweight, web-based platform that provides fully functional Teradata Vantage environments packed with analytic capabilities.