{"id":18419687,"url":"https://github.com/teradata/teddy_ingestion","last_synced_at":"2026-03-19T04:17:35.167Z","repository":{"id":185019996,"uuid":"672853174","full_name":"Teradata/teddy_ingestion","owner":"Teradata","description":null,"archived":false,"fork":false,"pushed_at":"2023-08-11T20:37:18.000Z","size":726,"stargazers_count":1,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-13T11:50:40.419Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Teradata.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-31T10:09:37.000Z","updated_at":"2024-08-07T07:52:28.000Z","dependencies_parsed_at":null,"dependency_job_id":"c9effaba-1fa1-45fa-b9f5-e955681f6f04","html_url":"https://github.com/Teradata/teddy_ingestion","commit_stats":null,"previous_names":["teradata/teddy_ingestion"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Teradata/teddy_ingestion","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Teradata%2Fteddy_ingestion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Teradata%2Fteddy_ingestion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Teradata%2Fteddy_ingestion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Teradata%2Fteddy_ingestion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Teradata","download_url":"https://codeload.github.com/Teradata/teddy_ingestion/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Teradata%2Fteddy_ingestion/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259299116,"owners_count":22836476,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T04:17:55.389Z","updated_at":"2026-01-29T21:39:19.263Z","avatar_url":"https://github.com/Teradata.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Teddy Retailers Ingestion Demo\n- Teddy Retailers operates both online and brick and mortar stores where it sells commonly used household products.\n- Teddy Retailers is planning to perform an in-depth analysis of activity in both its online and physical stores.\n\n## Data\nThe relevant data has been exported as flat CSV files as follows:\n* Local Server: Four separate CSV files with identical structure contain the data from the online store.\n    - These files can be found in this repository in the ./data directory.\n* Google Cloud Storage: Two csv files have been extracted from the transactional system. These files contain the information about customers visits.\n    - These files can be found in google cloud storage at the following URLs.\n        - https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_TVUG_TPT_NOS/visits.csv\n        - https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_TVUG_TPT_NOS/visit_products.csv\n\n## Business Requirements \nData from both physical and online stores shall be ingested into Teddy's Teradata Vantage Database in the most efficient manner following these requirements:\n* Online store \n    - All data from the Online Store shall be ingested. \n    - The data from each of the local files shall be ingested into the same table.\n* Physical Stores\n    - Only the data corresponding to customers that also purchased online shall be ingested.\n    - The data shall be ingested into two different tables conserving the schema currently reflected in the CSV files.\n\n## Pre-Requsites\n* Access to a Teradata Vantage Instance. You can provision one for free at [ClearScape Analytics Experience.](https://clearscape.teradata.com/sign-in?utm_source=github\u0026utm_medium=readme\u0026utm_campaign=TPT_NOS)\n* Access to Teradata Parallel Transporter TPT at. You can download TPT at [Teradata Tools and Utilities TTU.](https://downloads.teradata.com/download/database/teradata-tools-and-utilities-13-10) \n* Your favorite database client.\n\n## Steps\n* Follow the steps of the installation wizard for Teradata Tools and Utilities.\n* Follow the steps for creating a Teradata Vantage environment on ClearScape Analytics Experience.\n    - While creating a Teradata Vantage environment take note of the hostname, database user, and password. You will need these parameters to create a connection in your favorite database client.\n* In your favorite database client, run the command for creating the database, this script can be found in the `create_db` folder of this repository.\n```\nCREATE DATABASE teddy_ingestion\nAS PERMANENT = 110e6;\n```\n\n### Loading Data Stored Locally:\n\n### Loading Data Stored Locally:\n* For loading the data that is stored locally, we are going to use Teradata Parallel Transporter (TPT). TPT is a powerful data-loading tool that is client-based, thus very efficient for loading data from a local server. It also allows parallelization, which can make the process more efficient.\n* TPT is highly configurable and robust, so it is worth taking a look at its full documentation and reference guide for advanced use cases. [TPT Documentation can be found on Teradata's website.](https://docs.teradata.com/r/k_KCYzsgJJ_t2du~c~wK_Q/OkTPh7PBa4ICuyi45jJsgQ)\n* TPT operations are based on highly configurable operators that allow the parallelization of Reading Operations (Producer Operators), Writing Operations (Consumer Operators), Loading, and Filtering Operations (Filter Operators). The TPT package comes with preconfigured operators for loading data, writing to files, etc. You can find those under the `./templates` directory located inside the directory where TPT was installed.\n* TPT jobs are configured through TPT scripts. In our case, the `tpt-jobs` folder contains the script that defines our corresponding job.\n* It is a best practice to define environment variables used by our TPT jobs in a `jobvars.txt` file.\n  - The environment variables define things such as the parameters of our database connections.\n  - For example, the host, user, and password of our ClearScape Analytics Environment in our case.\n  - The configuration of our inputs, like the location of the files that will be ingested, their format, delimiters, etc.\n  - The tables that we will be using to load the data and record logs.\n* Our TPT job script is integrated with the following elements.\n  - A description.\n  - The schema of the files that are being loaded.\n  - Data Definition commands to create the tables that will be used.\n  - The file reading operation.\n  - The loading operation.\n* To run the TPT job, we execute the following command from the terminal. Due to the relative paths, it is necessary to be located inside the directory that contains the job definition file:\n```\ntbuild -f ingest_teddy.tpt -v jobvars.txt -j file_load\n```\n* In this command, `-f` stands for the job definition file, `-v` for the file that contains the environment variables of the job, and `-j` for the job.\n* It is possible to add several instances of an operator as follows, where `n` stands for the number of instances of the operator:\n  - `TO OPERATOR ($LOAD)[n]`\n  - `$FILE_READER(TEDDY_SCHEMA)[n]`\n* The addition of several instances allows for the parallelization of the operations.\n* The alternative file ingest_teddy_full.tpt is included as a reference of a TPT job that doesn't leverage included templates.\n\n### Loading Data Stored in the Cloud:\n* For loading data from cloud object storage, we will use Native Object Storage (NOS). NOS is optimized for ingesting object storage data in the cloud.\n* NOS allows executing SQL statements against object storage sources.\n* The files in the `nos-scripts` folder contain scripts to bring the data from Google Cloud Storage fulfilling the business requirements.\n  - The `ingest_visits` script performs an inner join of the physical storage visits data with the online store data on the `customer_id`. This fulfills the requirement of retrieving data from the physical store that corresponds to customers who bought online.\n  - The `ingest_visit_products` script ingests the `visit_products` data, also performing an inner join with the already ingested visits data. This is also to fulfill the business requirement of retrieving data from the physical store that corresponds to customers who bought online.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteradata%2Fteddy_ingestion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fteradata%2Fteddy_ingestion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteradata%2Fteddy_ingestion/lists"}