{"id":49225438,"url":"https://github.com/tonicai/textual_sagemaker","last_synced_at":"2026-04-24T07:01:56.010Z","repository":{"id":316660272,"uuid":"1063695113","full_name":"TonicAI/textual_sagemaker","owner":"TonicAI","description":null,"archived":false,"fork":false,"pushed_at":"2025-09-25T22:36:19.000Z","size":61,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-26T00:27:30.062Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TonicAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-25T01:55:50.000Z","updated_at":"2025-09-25T22:36:23.000Z","dependencies_parsed_at":"2025-09-26T00:27:32.533Z","dependency_job_id":"8504e08c-862e-4082-bbdd-50d1f97dde81","html_url":"https://github.com/TonicAI/textual_sagemaker","commit_stats":null,"previous_names":["tonicai/textual_sagemaker"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/TonicAI/textual_sagemaker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TonicAI%2Ftextual_sagemaker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TonicAI%2Ftextual_sagemaker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TonicAI%2Ftextual_sagemaker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TonicAI%2Ftextual_sagemaker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TonicAI","download_url":"https://codeload.github.com/TonicAI/textual_sagemaker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TonicAI%2Ftextual_sagemaker/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32212808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T03:15:14.334Z","status":"ssl_error","status_checked_at":"2026-04-24T03:15:11.608Z","response_time":64,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-24T07:01:54.968Z","updated_at":"2026-04-24T07:01:56.004Z","avatar_url":"https://github.com/TonicAI.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# De-identifying Data and Fine-Tuning LLMs in Amazon SageMaker with Tonic Textual\n\nThis repository contains the Jupyter notebooks and sample data for the blog post: [Turn Sensitive Data into Safe AI Assets with Tonic Textual in Amazon SageMaker Unified Studio](https://textual.tonic.ai).\n\nThis project demonstrates a complete, end-to-end workflow for transforming raw, sensitive, unstructured text data into a safe, de-identified asset that can be used to fine-tune a Large Language Model (LLM) within the AWS ecosystem. By leveraging Tonic Textual and Amazon SageMaker, you can build powerful, customized AI models on your most valuable data without compromising security or compliance.\n\n(To view the diagram, please download the repository and open the fine-tune-architecture-diagram.drawio.html file in your browser.)\n\n## About the Notebooks\n\nThis repository includes two core notebooks that walk through the entire process:\n1. 01-Deidentify-Data-with-Textual.ipynb: (Originally textual_demo_redaction-edited.ipynb)\nThis notebook guides you through the process of de-identifying sensitive data. You will:\n    1. Install the tonic-textual SDK.\n    2. Connect to the Tonic Textual API.\n    3. Load a sample dataset (call_transcripts.csv) containing PII.\n    4. Use Tonic Textual to redact sensitive entities like names, credit card numbers, and more.\n    5. Save the de-identified data and upload it to an S3 bucket, preparing it for the next stage.\n\n2. 02-Finetune-and-Deploy-LLM.ipynb: (Originally textual_fine_tune_demo_sagemaker.ipynb)\nThis notebook demonstrates how to use your newly de-identified data to create a custom LLM. You will:\n    1. Use Amazon SageMaker JumpStart to select a foundation model (Llama 2).\n    2. Configure and launch a fine-tuning job using the safe data stored in your S3 bucket.\n    3. Deploy the fine-tuned model to a real-time SageMaker endpoint.\n    4. Test the deployed model with sample prompts.\n    5. Clean up the AWS resources to avoid unnecessary costs.\n\n## Getting Started\n\nTo run these notebooks, you will need to have the following prerequisites in place.\n\n### Prerequisites\n- An AWS Account with access to Amazon SageMaker Unified Studio and Amazon S3.\n- A Tonic Textual Account and API key. You can create a free account at [textual.tonic.ai/signup](https://textual.tonic.ai/signup).\n- The contents of this GitHub repository cloned or downloaded.\n\n### Setup Instructions\n1. **Launch SageMaker Studio:** Log in to your AWS account and launch your Amazon SageMaker Unified Studio environment.\n2. **Upload Files:** Upload the two notebooks (01-Deidentify-Data-with-Textual.ipynb, 02-Finetune-and-Deploy-LLM.ipynb) and the call_transcripts.csv dataset to your SageMaker Studio file browser.\n3. **Open the First Notebook:** Start by opening 01-Deidentify-Data-with-Textual.ipynb.\n4. **Configure API Key:** In the first notebook, locate the cell for initializing the Textual client and replace the placeholder with your Tonic Textual API key:\n```python\n# Initialize the Textual client with your API key\ntextual = Textual(api_key=\"YOUR_TONIC_TEXTUAL_API_KEY\")\n```\n5. **Configure S3 Bucket:** In both notebooks, you will need to replace the placeholder your-s3-bucket-name with the name of an S3 bucket you have access to.\n\nYou are now ready to run the cells in the notebooks sequentially.\n\n## Accompanying Blog Post\nFor a detailed walkthrough with full explanations of each step, business context, and screenshots, please read our complete guide:\n[Turn Sensitive Data into Safe AI Assets with Tonic Textual in Amazon SageMaker Unified Studio](https://tonic.ai/textual)\n\n## License\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftonicai%2Ftextual_sagemaker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftonicai%2Ftextual_sagemaker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftonicai%2Ftextual_sagemaker/lists"}