{"id":18369592,"url":"https://github.com/unstructured-io/aws-blog-post-example","last_synced_at":"2025-04-10T19:43:45.008Z","repository":{"id":258359657,"uuid":"873606505","full_name":"Unstructured-IO/aws-blog-post-example","owner":"Unstructured-IO","description":"Script to accompany the AWS blog post on unstructured data ETL with Unstructured Ingest library","archived":false,"fork":false,"pushed_at":"2024-10-16T16:57:32.000Z","size":9,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-15T20:56:31.097Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Unstructured-IO.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-16T12:58:12.000Z","updated_at":"2024-10-16T16:57:35.000Z","dependencies_parsed_at":"2024-10-18T16:36:28.727Z","dependency_job_id":"f651fee5-9921-47d3-8f2e-b96043a19d21","html_url":"https://github.com/Unstructured-IO/aws-blog-post-example","commit_stats":null,"previous_names":["unstructured-io/aws-blog-post-example"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Faws-blog-post-example","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Faws-blog-post-example/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Faws-blog-post-example/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Faws-blog-post-example/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Unstructured-IO","download_url":"https://codeload.github.com/Unstructured-IO/aws-blog-post-example/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248281424,"owners_count":21077423,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-05T23:29:54.112Z","updated_at":"2025-04-10T19:43:44.984Z","avatar_url":"https://github.com/Unstructured-IO.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# aws-blog-post-example\n\nThis repository contains a script to accompany the Unstructured.io blog post in collaboration with AWS.\n\n_Link to the blog post is coming soon._ \n\nThe blog post illustrates how Unstructured.io's Serverless API can transform unstructured data into a \nstructured JSON format that can be used by RAG systems on AWS. It provides a step-by-step guide on how \nto use the Unstructured API, detailing each stage of the data transformation process including ingestion, \npartitioning, extraction, chunking, embedding with Bedrock, and syncing with OpenSearch. \n\nTo use this example: \n1) Download and install Python version 3.9.0 or later.\n2) Clone the repo, and create a virtual environment.\n3) In the new virtual environment install the required dependencies:\n   * Open your terminal in the root directory of the cloned repo.\n   * Run either `pip install \"unstructured-ingest[s3, opensearch, pdf, bedrock]\"` to install the latest library versions, or `pip install -r requirements.txt` to use specific versions as defined in the `requirements.txt` file.\n4) Open the `run_pipeline.py`, and add your values for the environment variables required to authenticate you with Unstructured Serverless API, Amazon OpenSearch, S3, and Bedrock. \n\nYou can now run the script from your terminal by executing: \n\n```bash\npython run_pipeline.py\n```\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funstructured-io%2Faws-blog-post-example","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funstructured-io%2Faws-blog-post-example","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funstructured-io%2Faws-blog-post-example/lists"}