{"id":19639493,"url":"https://github.com/ganeshsivakumar/langchain-beam","last_synced_at":"2025-10-20T11:25:10.359Z","repository":{"id":262181302,"uuid":"883711546","full_name":"Ganeshsivakumar/langchain-beam","owner":"Ganeshsivakumar","description":"Integrates LLMs as PTransform in Apache Beam pipelines using LangChain","archived":false,"fork":false,"pushed_at":"2025-02-22T10:21:29.000Z","size":1415,"stargazers_count":17,"open_issues_count":4,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-30T11:51:13.572Z","etag":null,"topics":["apache-beam","data-engineering","dataflow","etl","langchain","langchian-beam","rag"],"latest_commit_sha":null,"homepage":"https://ganeshsivakumar.github.io/langchain-beam/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Ganeshsivakumar.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-05T12:51:29.000Z","updated_at":"2025-03-08T10:51:01.000Z","dependencies_parsed_at":"2025-02-08T14:33:46.680Z","dependency_job_id":null,"html_url":"https://github.com/Ganeshsivakumar/langchain-beam","commit_stats":null,"previous_names":["ganeshsivakumar/langchain-beam"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ganeshsivakumar%2Flangchain-beam","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ganeshsivakumar%2Flangchain-beam/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ganeshsivakumar%2Flangchain-beam/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ganeshsivakumar%2Flangchain-beam/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Ganeshsivakumar","download_url":"https://codeload.github.com/Ganeshsivakumar/langchain-beam/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251372296,"owners_count":21578958,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-beam","data-engineering","dataflow","etl","langchain","langchian-beam","rag"],"created_at":"2024-11-11T13:01:08.129Z","updated_at":"2025-10-20T11:25:10.222Z","avatar_url":"https://github.com/Ganeshsivakumar.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Langchain-Beam\n\nLangchain-Beam integrates Large Language Models as PTransforms in Apache Beam pipelines using LangChain.\nThis library lets you use language model capabilities directly in your Beam workflows for data processing and transformations.\n\n## Motivation\n\nApache Beam is a programming model for creating batch and streaming-based data processing pipelines. It's an abstraction that provides a way to create data processing as Transforms, each transform is like a stage in the pipeline. Langchain-beam library provides transforms to integrate LLMs into beam pipelines.\n\nThe goal is to combine the apache beam's abstraction with the capabilities of Large Language Models, such as generation, completion, classification, and reasoning to process the data by leveraging LangChain, which provides a unified interface for connecting with various LLM providers, retrievals, and tools.\n\n![Pipeline Diagram](docs/langchainbeam.png)\n\n## Getting Started\n\nInclude **Langchain-Beam** as dependency in `pom.xml`.\nAdditionally, ensure you have the required Apache Beam dependencies in your project.\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003eio.github.ganeshsivakumar\u003c/groupId\u003e\n    \u003cartifactId\u003elangchain-beam\u003c/artifactId\u003e\n    \u003cversion\u003e0.3.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\nimport required modules\n\n```java\nimport com.langchainbeam.LangchainBeam;\nimport com.langchainbeam.LangchainModelHandler;\n\n// import the model options class based on the model provider\n// that you want to use. Additional providers will be integrated in future releases.\nimport com.langchainbeam.model.openai.OpenAiModelOptions;\n```\n\n## Steps to Use Langchain-Beam\n\n1. **Create Model Options**  \n   Define the `modelOptions` based on the model provider you’re using, configuring parameters such as temperature, max tokens, and other relevant settings.\n2. **Define the Instruction Prompt**  \n   Create an `instructionPrompt` that will guide the model on how to process each `PCollection` input element.\n\n3. **Apply the LangchainBeam PTransform**  \n   Pass the `modelOptions` and `instructionPrompt` to the `LangchainModelHandler`. Use this handler as input to the `LangchainBeam` PTransform, which can then be applied in the pipeline with `LangchainBeam.run(modelHandler)`.\n\nExample:\n\n```java\n\n// Define the instruction prompt for processing the element\nString prompt = \"Categorize the product review as Positive or Negative and output your response in this JSON format: {review : {input_element}, feedback: {positive or negative}}\";\n\n// Create model options with the model and its parameters\nOpenAiModelOptions modelOptions = OpenAiModelOptions.builder()\n        .modelName(\"gpt-4o-mini\")\n        .apiKey(OPENAI_API_KEY)\n        .build();\n\n// Initialize the LangchainModelHandler with model options and prompt\nLangchainModelHandler handler = new LangchainModelHandler(modelOptions, prompt);\n\n//create the pipeline\nPipeline p = Pipeline.create();\n\n// Apply transformations in the pipeline\np.apply(TextIO.read().from(\"/home/ganesh/Downloads/product_reviews.csv\")) // load data\n        .apply(LangchainBeam.run(handler))  // Run the model handler using LangchainBeam\n        .apply(ParDo.of(new DoFn\u003cString, Void\u003e() {\n            @ProcessElement\n            public void processElement(@Element LangchainBeamOutput out) {\n                System.out.println(\"Model Output: \" + out.getOutput());  // Print model output\n            }\n        }));\n\np.run();  // Execute the pipeline\n\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fganeshsivakumar%2Flangchain-beam","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fganeshsivakumar%2Flangchain-beam","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fganeshsivakumar%2Flangchain-beam/lists"}