{"id":18419638,"url":"https://github.com/teradata/lake-demos","last_synced_at":"2025-04-07T13:31:44.515Z","repository":{"id":146378101,"uuid":"608706208","full_name":"Teradata/lake-demos","owner":"Teradata","description":"Demos for VantageCloud Lake","archived":false,"fork":false,"pushed_at":"2025-04-01T12:18:37.000Z","size":25016,"stargazers_count":6,"open_issues_count":2,"forks_count":5,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-01T13:27:23.390Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Teradata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-02T15:15:45.000Z","updated_at":"2025-04-01T12:18:40.000Z","dependencies_parsed_at":"2024-06-26T19:04:46.686Z","dependency_job_id":"704d2d20-8d43-4b13-933e-f900846d9097","html_url":"https://github.com/Teradata/lake-demos","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Teradata%2Flake-demos","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Teradata%2Flake-demos/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Teradata%2Flake-demos/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Teradata%2Flake-demos/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Teradata","download_url":"https://codeload.github.com/Teradata/lake-demos/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247661756,"owners_count":20975111,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T04:17:42.821Z","updated_at":"2025-04-07T13:31:39.505Z","avatar_url":"https://github.com/Teradata.png","language":"Jupyter Notebook","readme":"# Lake Demos\n\nVantageCloud Lake Demos Public Repository.\n\nPurpose is to store all the public lake demos here in a single project where the community can collaborate. \n\n\n\u003cb style = 'font-size:24px;font-family:Arial;color:#00233C'\u003eAvailable Demos List\u003c/b\u003e\n\n\n\n### 1. Environment Setup Automation (Demo_Environment_Setup_Automation.ipynb) ###\n**Python Notebook**\n\n**Three files required:**\n1. Environment variables file vars.json\n2. 0_Demo_Environment_Setup_Automation.ipynb\n3. 1_Load_Base_Demo_Data.ipynb\n\n**Alternate - use Apache Airflow**\n1. Upload Demo_Setup_Airflow_Python.py to Airflow\n2. Edit vars.json and upload as \"Variables\".\n3. Execute the DAG\n4. Run 1_Load_Base_Demo_Data.ipynb\n\n### Environment Setup Checklist ###\n\n**To initiate the configuration of the environment used for these demos, perform the steps in \"Environment Setup Automation\"; either by running the Jupyter Notebook or the Airflow DAG.  Prior to running these scripts, perform the following:**\n1. Edit vars.json to reflect the target environment\n2. Validate other environment and hierarchy settings in vars.json\n3. Clusters are set up to be active during nominal business hours **USA TIME**.  Adjust as necessary in the notebook or DAG\n4. If using Airflow, upload the new vars.json to Variables in Airflow Admin Screen\n5. When the setup is complete, use the Admin notebook to check cluster status, suspend/resume as needed\n\n#### This notebook will create the Lake environment hierarchy design; ####\n- Takes some environmental declarations (users, databases, etc.) from the json file\n- Uses **US BUSINESS HOURS** for Clusters active time.  Adjust if needed.\n- GRANTs to retail_sample_data for the DEMO_AUTH_NOS to all objects\n- Creates a Repositories.PubAuth Authorization Object for accessing open object stores.\n- Creates two databases; \"demo\" and \"demo_ofs\" each with default NDS and OFS storage respectively.\n\u003cbr\u003e\n\nPer the design, **SYSDBA** is the account DBA, **CGADMIN** is Compute Group Administrator, users are in the **Business Users** Profile.\n\n\u003chr\u003e\n\n## 2. Base Data Loading (1_Demo_Setup_Base_Data.ipynb) ##\n**Python Notebook**\n\u003cbr\u003e\n**Purpose is to load minimal data to the local Lake system to run the base demo notebooks**\n1. Log in as SYSDBA\n2. Loads two dimension tables to BFS storage from S3\n    - demo.Customer_BFS\n    - demo.Accounts_Mapping_BFS\n3. Loads one fact table to OFS Storage from S3\n    - demo_OFS.Txn_History\n    \n\n\n## 3. Environment Administration (Demo_Admin.ipynb) ##\n**Vantage SQL Kernel**\n1. Log in as CGADMIN/password\n2. Compute Group Status\n3. RESUME/SUSPEND/DROP\n4. DBC login in case one needs DBC\n\n\u003chr\u003e\n\n## 4. Data Engineering (Data_Engineering_Exploration.ipynb) ##\n**Vantage SQL Kernel**\n1. Create OFS Table from S3 \"CashApp\" transactions\n2. Create Foreign Table from S3 \"Banking History\"\n3. Review Tables - Dimensions in BFS, CashApp in OFS, Banking History in S3\n4. Execute Joins and Analytics:\n    - Identify Customers who have experienced Fraud\n    - Show the victim's full behavioral path through their Banking relationship\n5. Execute Joins across the Query Fabric (QueryGrid)\n\n\u003chr\u003e\n\n## 5. Open Analytics Framework (Data_Science_OAF.ipynb) ##\n**Python Notebook**\n(python 3.8)\n1. Credentials and UES URI inherited from vars.json\n2. Create custom container - install libraries and versions\n3. Upload model and scoring script\n4. Execute Feature Engineering - pass it to scoring.\n5. Evaluate Model\n\n\u003chr\u003e\n\n## 5. Data Science Process - Python (Data_Science_OAF.ipynb) ##\n**Appendix Section - Create the model**\n1. OneHotEncode\n2. Test/Train Split\n3. Train Model\n4. Test Model\n5. Confusion Matrix\n\n\u003chr\u003e\n\n\u003cb style = 'font-size:20px;font-family:Arial;color:#00233C'\u003eVantageCloud Lake Fundamentals\u003c/b\u003e\n\u003cp style = 'font-size:18px;font-family:Arial;color:#00233C'\u003eNotebooks illustrating the feature/function basics\u003c/p\u003e\n\u003cp style = 'font-size:18px;font-family:Arial;color:#00233C'\u003eSee \u003ca href = 'Fundamentals/README.md'\u003eREADME\u003c/a\u003e for more details\u003c/p\u003e\n\n### 1. Native Object Store ###\n\u003ca href = 'Fundamentals/Native-Object-Store/NOS_Fundamentals_SQL.ipynb'\u003eFundamentals/Native-Object-Store/NOS_Fundamentals_SQL.ipynb\u003c/a\u003e\n\n\u003chr\u003e\n\n\u003cb style = 'font-size:20px;font-family:Arial;color:#00233C'\u003eDemos in UseCases Folder\u003c/b\u003e\n\u003cp style = 'font-size:18px;font-family:Arial;color:#00233C'\u003eEach Use Case has its own data loading notebook.  Typically, the data is loaded from an S3 bucket; bucket name and any credentials are inherited from vars.json file.\u003c/p\u003e\n\u003cp style = 'font-size:18px;font-family:Arial;color:#00233C'\u003eSee \u003ca href = 'UseCases/README.md'\u003eREADME\u003c/a\u003e for more details\u003c/p\u003e\n\n\n### 1. Native KMeans Clustering ###\n\u003ca href = 'UseCases/Native-KMeans/KMeans_Clustering_Python.ipynb'\u003eUseCases/Native-KMeans/KMeans_Clustering_Python.ipynb\u003c/a\u003e\n\n### 2. Native GLM Numeric Regression ###\n\u003ca href = 'UseCases/Native-GLM-Regression/Regression_Python.ipynb'\u003eUseCases/Native-GLM-Regression/Regression_Python.ipynb\u003c/a\u003e\n\n### 3. Sentiment Analysis using Native functions ###\n\u003ca href = 'UseCases/Native-Sentiment-Analysis/Sentiment_Analysis_Python.ipynb'\u003eUseCases/Native-Sentiment-Analysis/Sentiment_Analysis_Python.ipynb\u003c/a\u003e\n\n### 4. Churn Prediction using Native Data Prep, VAL, model training XGBOOST, scoring with BYOM OR OAF ###\n\u003ca href = 'UseCases/Churn-Prediction-OAF/Churn-Prediction-OAF.ipynb'\u003eUseCases/Churn-Prediction-OAF/Churn-Prediction-OAF.ipynb\u003c/a\u003e\n\n\n### 5. System Scaling and Monitoring ###\n\u003ca href = 'UseCases/Scaling/Demo 1 - Generate Workload.ipynb'\u003eUseCases/Scaling/Demo 1 - Generate Workload.ipynb\u003c/a\u003e\n\u003cbr\u003e\n\u003ca href = 'UseCases/Scaling/Demo 2 - Real-Time Monitoring.ipynb'\u003eUseCases/Scaling/Demo 2 - Real-Time Monitoring.ipynb\u003c/a\u003e\n\u003cbr\u003e\n\u003ca href = 'UseCases/Scaling/Demo 3 - System Monitoring Queries.ipynb'\u003eUseCases/Scaling/Demo 3 - System Monitoring Queries.ipynb\u003c/a\u003e\n\n### 6. Proximity to Climate Risk/Geospatial Analysis ###\n\u003ca href = 'UseCases/Proximity-To-Climate-Risk/Proximity_To_Climate_Risk.ipynb'\u003eUseCases/Proximity-To-Climate-Risk/Proximity_To_Climate_Risk.ipynb\u003c/a\u003e\n\n### 7. Vector Embeddings for Customer Segmentation ###\n\u003ca href = 'UseCases/Vector-Embeddings-Segmentation/Segmentation_With_Vector_Embedding.ipynb'\u003eUseCases/Vector-Embeddings-Segmentation/Segmentation_With_Vector_Embedding.ipynb\u003c/a\u003e","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteradata%2Flake-demos","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fteradata%2Flake-demos","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteradata%2Flake-demos/lists"}