{"id":49325535,"url":"https://github.com/cupybara/java-langchains","last_synced_at":"2026-05-29T21:00:53.920Z","repository":{"id":170758365,"uuid":"645900128","full_name":"cupybara/java-langchains","owner":"cupybara","description":"A Java 8+ LangChain implementation. Build powerful LLM based applications in an (enterprise) Java context.","archived":false,"fork":false,"pushed_at":"2023-07-29T08:20:38.000Z","size":823,"stargazers_count":27,"open_issues_count":2,"forks_count":4,"subscribers_count":4,"default_branch":"master","last_synced_at":"2023-07-29T09:36:48.252Z","etag":null,"topics":["azure-openai","java","langchain","langchain-java","large-language-models","llm","openai","qa"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cupybara.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-05-26T17:52:54.000Z","updated_at":"2023-07-23T21:27:30.000Z","dependencies_parsed_at":"2023-07-18T18:50:04.653Z","dependency_job_id":null,"html_url":"https://github.com/cupybara/java-langchains","commit_stats":null,"previous_names":["hakenadu/java-langchains","cupybara/java-langchains"],"tags_count":13,"template":null,"template_full_name":null,"purl":"pkg:github/cupybara/java-langchains","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cupybara%2Fjava-langchains","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cupybara%2Fjava-langchains/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cupybara%2Fjava-langchains/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cupybara%2Fjava-langchains/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cupybara","download_url":"https://codeload.github.com/cupybara/java-langchains/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cupybara%2Fjava-langchains/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33670211,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure-openai","java","langchain","langchain-java","large-language-models","llm","openai","qa"],"created_at":"2026-04-26T20:00:31.588Z","updated_at":"2026-05-29T21:00:53.910Z","avatar_url":"https://github.com/cupybara.png","language":"Java","funding_links":[],"categories":["人工智能"],"sub_categories":["LLM框架"],"readme":"# ![](misc/logo.svg)\n\nThis repository aims to provide a java alternative to [hwchase17/langchain](https://github.com/hwchase17/langchain).\nIt was born from the need to create an enterprise QA application.\n\n- [Dependency](#dependency)\n- [Chains](#chains)\n    - [Base](#base)\n        - [Logging](#logging)\n    - [Data](#data)\n        - [Reader](#reader)\n            - [Read Documents from In Memory PDF](#read-documents-from-in-memory-pdf)\n            - [Read Documents from PDF](#read-documents-from-pdf)\n        - [Retrieval](#retrieval)\n            - [Retrieve Documents from Elasticsearch Index](#retrieve-documents-from-elasticsearch-index)\n            - [Retrieve Documents from Lucene Directory](#retrieve-documents-from-lucene-directory)\n\t\t\t- [Retrieve Documents from a relational database](#retrieve-documents-from-rdbms)\n        - [Writer](#writer)\n            - [Write Documents to Elasticsearch Index](#write-documents-to-elasticsearch-index)\n            - [Write Documents to Lucene Directory](#write-documents-to-lucene-directory)\n    - [LLM](#llm)\n        - [Azure](#azure)\n            - [Azure Chat](#azure-chat)\n            - [Azure Completions](#azure-completions)\n        - [OpenAI](#openai)\n            - [OpenAI Chat](#openai-chat)\n            - [OpenAI Completions](#openai-completions)\n    - [QA](#qa)\n        - [Modify Documents](#modify-documents)\n        - [Combine Documents](#combine-documents)\n        - [Map LLM results to answers with sources](#map-llm-results-to-answers-with-sources)\n        - [Split Documents](#split-documents)\n- [Usage behind a corporate proxy](#usage-behind-a-corporate-proxy)\n- [Use Cases](#use-cases)\n    - [Document Comparison](#document-comparison)\n    - [Retrieval Question-Answering Chain](#retrieval-question-answering-chain)\n\n## Dependency\njava-langchains requires Java 8 or higher\n\n\nTo group this repository with other related repositories in the future we lately transferred this repository to the freshly created organization [cupybara](https://github.com/cupybara).\nTherefore we changed the package names from *com.github.hakenadu* to *io.github.cupybara* and also changed the groupId.\nThe latest artifact is therefore available by using the following dependency:\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003eio.github.cupybara\u003c/groupId\u003e\n    \u003cartifactId\u003ejava-langchains\u003c/artifactId\u003e\n    \u003cversion\u003e0.6.3\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n### deprecated older dependency\n\nPackages up to version 0.5.0 are available using the groupId com.github.hakenadu.\nThese artifacts are not updates anymore so we don't suggest to use them.\nPlease switch to **io.github.cupybara**.\n\n\u003cdetails\u003e\n  \u003csummary\u003eold dependency\u003c/summary\u003e\n  \n  ```xml\n  \u003cdependency\u003e\n      \u003cgroupId\u003ecom.github.hakenadu\u003c/groupId\u003e\n      \u003cartifactId\u003ejava-langchains\u003c/artifactId\u003e\n      \u003cversion\u003e0.5.0\u003c/version\u003e\n  \u003c/dependency\u003e\n  ```\n\u003c/details\u003e\n\n\n## Chains\nModular components implement the [Chain](src/main/java/io/github/cupybara/javalangchains/chains/Chain.java) interface.\nThis provides an easy way to modularize the application and enables us to reuse them for various use cases.\n\nThis section describes the usage of all chains that are currently available.\n\n### Base\n\n#### Logging\nThe [LoggingChain](src/main/java/io/github/cupybara/javalangchains/chains/base/logging/LoggingChain.java) can be used to log the previous chain's output.\nTake a look at the [RetrievalQaTest](src/test/java/io/github/cupybara/javalangchains/usecases/RetrievalQaTest.java) to see some example usages (logging chains indented for improved readability):\n\n```java\nfinal Chain\u003cString, AnswerWithSources\u003e qaChain = retrievalChain\n\t\t.chain(summarizeDocumentsChain)\n\t\t\t.chain(new ApplyToStreamInputChain\u003c\u003e(new LoggingChain\u003c\u003e(LoggingChain.defaultLogPrefix(\"SUMMARIZED_DOCUMENT\"))))\n\t\t.chain(combineDocumentsChain)\n\t\t\t.chain(new LoggingChain\u003c\u003e(LoggingChain.defaultLogPrefix(\"COMBINED_DOCUMENT\")))\n\t\t.chain(openAiChatChain)\n\t\t\t.chain(new LoggingChain\u003c\u003e(LoggingChain.defaultLogPrefix(\"LLM_RESULT\")))\n\t\t.chain(mapAnswerWithSourcesChain);\n```\n\nThe summarizeDocumentsChain in this example provides a Stream as an output. To log each item of the Stream the LoggingChain can be wrapped in an \n[ApplyToStreamInputChain](src/main/java/io/github/cupybara/javalangchains/chains/base/ApplyToStreamInputChain.java).\n\nThis example provides the following log output running the RetrievalQaTest:\n\n```\n========================================================================================================================================================\nSUMMARIZED_DOCUMENT\n========================================================================================================================================================\n{source=book-of-john-1.pdf, question=who is john doe?, content=John Doe is a highly skilled and experienced software engineer with a passion for problem-solving and creating innovative solutions. He has been working in the technology industry for over 15 years and has gained a reputation for his exceptional programming abilities and attention to detail.}\n\n========================================================================================================================================================\nSUMMARIZED_DOCUMENT\n========================================================================================================================================================\n{source=book-of-john-3.pdf, question=who is john doe?, content=John Doe is described as someone with a diverse range of hobbies and interests. Some of his notable hobbies include music production, culinary adventures, photography and travel, fitness and outdoor activities, and being a book club enthusiast. He is also involved in volunteering and community service, language learning, gardening, DIY projects, and astronomy.}\n\n========================================================================================================================================================\nCOMBINED_DOCUMENT\n========================================================================================================================================================\n{question=who is john doe?, content=Content: John Doe is described as someone with a diverse range of hobbies and interests. Some of his notable hobbies include music production, culinary adventures, photography and travel, fitness and outdoor activities, and being a book club enthusiast. He is also involved in volunteering and community service, language learning, gardening, DIY projects, and astronomy.\nSource: book-of-john-3.pdf\n\nContent: John Doe is a highly skilled and experienced software engineer with a passion for problem-solving and creating innovative solutions. He has been working in the technology industry for over 15 years and has gained a reputation for his exceptional programming abilities and attention to detail.\nSource: book-of-john-1.pdf}\n\n========================================================================================================================================================\nLLM_RESULT\n========================================================================================================================================================\nJohn Doe is described as someone with a diverse range of hobbies and interests, including music production, culinary adventures, photography, travel, fitness, outdoor activities, being a book club enthusiast, volunteering, community service, language learning, gardening, DIY projects, and astronomy. Additionally, John Doe is a highly skilled and experienced software engineer with a passion for problem-solving and creating innovative solutions. He has been working in the technology industry for over 15 years and is known for his exceptional programming abilities and attention to detail.\nSOURCES: book-of-john-3.pdf, book-of-john-1.pdf\n```\n\n### Data\n\n#### Reader\n\n##### Read Documents from In Memory PDF\nSee [ReadDocumentsFromInMemoryPdfChainTest](src/test/java/io/github/cupybara/javalangchains/chains/data/read/ReadDocumentsFromInMemoryPdfChainTest.java)\n\nRead the in memory pdf into a single document\n\n```java\nInMemoryPdf inMemoryPdf = new InMemoryPdf(\n\tIOUtils.toByteArray(ReadDocumentsFromInMemoryPdfChainTest.class.getResourceAsStream(\"/pdf/qa/book-of-john-3.pdf\")),\n\t\"my-in-memory.pdf\");\n\t\nStream\u003cMap\u003cString, String\u003e\u003e readDocuments = new ReadDocumentsFromInMemoryPdfChain().run(inMemoryPdf)\n\t\n// the readDocuments contains a (pdfContent, \"my-in-memory.pdf\") pair\n```\n\nRead documents for each page of the in memory pdf\n\n```java\nInMemoryPdf inMemoryPdf = new InMemoryPdf(\n\tIOUtils.toByteArray(ReadDocumentsFromInMemoryPdfChainTest.class.getResourceAsStream(\"/pdf/qa/book-of-john-3.pdf\")),\n\t\"my-in-memory.pdf\");\n\t\nStream\u003cMap\u003cString, String\u003e\u003e readDocuments = new ReadDocumentsFromInMemoryPdfChain(PdfReadMode.PAGES).run(inMemoryPdf)\n\t\n// the readDocuments contains (content, source) pairs for all read pdf pages (source is \"my-in-memory.pdf\" + the pdf page number)\n```\n\n##### Read Documents from PDF\nSee [ReadDocumentsFromPdfChainTest](src/test/java/io/github/cupybara/javalangchains/chains/data/read/ReadDocumentsFromPdfChainTest.java)\n\nRead each pdf in the given directory into a single document each\n\n```java\nStream\u003cMap\u003cString, String\u003e\u003e readDocuments = new ReadDocumentsFromPdfChain()\n\t.run(Paths.get(\"path/to/my/pdf/folder\"))\n\t\n// the readDocuments contains (content, source) pairs for all read pdfs (source is the pdf filename)\n```\n\nRead each page of each pdf in the given directory into a single document each\n\n```java\nStream\u003cMap\u003cString, String\u003e\u003e readDocuments = new ReadDocumentsFromPdfChain(PdfReadMode.PAGES)\n\t.run(Paths.get(\"path/to/my/pdf/folder\"))\n\t\n// the readDocuments contains (content, source) pairs for all read pdf pages (source is the pdf filename + the pdf page number)\n```\n\n#### Retrieval\n\n##### Retrieve Documents from Elasticsearch Index\nSee [ElasticsearchRetrievalChainIT](src/test/java/io/github/cupybara/javalangchains/chains/data/retrieval/ElasticsearchRetrievalChainIT.java)\n\n```java\nRestClientBuilder restClientBuilder = RestClient.builder(new HttpHost(\"localhost\", 9200));\n\nChain\u003cPath, Void\u003e createElasticsearchIndexChain = new ReadDocumentsFromPdfChain()\n\t.chain(new WriteDocumentsToElasticsearchIndexChain(\"my-index\", restClientBuilder));\n\nPath pdfDirectoryPath = Paths.get(ElasticsearchRetrievalChainTest.class.getResource(\"/pdf/qa\").toURI());\n\n// create and fill elasticsearch index with read pdfs (source, content)-pairs\ncreateElasticsearchIndexChain.run(pdfDirectoryPath);\n\n// retrieve documents relevant to a specific question\ntry (RestClient restClient = restClientBuilder.build();\n\t\tElasticsearchRetrievalChain retrievalChain = new ElasticsearchRetrievalChain(\"my-index\", restClient, 1)) {\n\n\t// retrieve the most relevant documents for the passed question\n\tStream\u003cMap\u003cString, String\u003e\u003e retrievedDocuments = retrievalChain.run(\"who is john doe?\").collect(Collectors.toList());\n\n\t// ...\n}\n```\n\n##### Retrieve Documents from Lucene Directory\nSee [LuceneRetrievalChainTest](src/test/java/io/github/cupybara/javalangchains/chains/data/retrieval/LuceneRetrievalChainTest.java)\n\n```java\n// create lucene index\nDirectory directory = new MMapDirectory(Files.createTempDirectory(\"myTempDir\"));\n\n// fill lucene index\ntry (IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(new StandardAnalyzer()))) {\n\tList\u003cString\u003e documents = Arrays.asList(\"My first document\", \"My second document\", \"My third document\");\n\n\tfor (String content : documents) {\n\t\tDocument doc = new Document();\n\t\tdoc.add(new TextField(PromptConstants.CONTENT, content, Field.Store.YES));\n\t\tdoc.add(new StringField(PromptConstants.SOURCE, String.valueOf(documents.indexOf(content) + 1), Field.Store.YES));\n\t\tindexWriter.addDocument(doc);\n\t}\n\n\tindexWriter.commit();\n}\n\n// create retrieval chain\nRetrievalChain retrievalChain = new LuceneRetrievalChain(directory, 2 /* max count of retrieved documents */);\n\n// retrieve the most relevant documents for the passed question\nStream\u003cMap\u003cString, String\u003e\u003e retrievedDocuments = retrievalChain.run(\"my question?\");\n```\n\n\n##### Retrieve Documents from RDBMS\nSee [JdbcRetrievalChainIT](src/test/java/io/github/cupybara/javalangchains/chains/data/retrieval/JdbcRetrievalChainIT.java)\n\n```java\nSupplier\u003cConnection\u003e connectionSupplier = () -\u003e {\n\ttry {\n\t\treturn DriverManager.getConnection(connectionString, username, password);\n\t} catch (SQLException e) {\n\t\tthrow new IllegalStateException(\"error creating database connection\", e);\n\t}\n};\n\nRetrievalChain retrievalChain = new JdbcRetrievalChain(connectionSupplier, 2 /* max count of retrieved documents */);\n\nStream\u003cMap\u003cString, String\u003e\u003e retrievedDocuments = retrievalChain.run(\"my question?\");\n```\n\n#### Writer\n\n##### Write Documents to Elasticsearch Index\n```java\nRestClientBuilder restClientBuilder = RestClient.builder(new HttpHost(\"localhost\", 9200));\n\n// this chain reads documents from a folder of pdfs and writes them to an elasticsearch index\nChain\u003cPath, Void\u003e fillElasticsearchIndexChain = new ReadDocumentsFromPdfChain()\n  .chain(new WriteDocumentsToElasticsearchIndexChain(\"my-index\", restClientBuilder));\n\nPath pdfDirectoryPath = Paths.get(getClass().getResource(\"/pdf/qa\").toURI());\n\nfillElasticsearchIndexChain.run(pdfDirectoryPath);\n```\n\n##### Write Documents to Lucene Directory\n```java\nPath tempIndexPath = Files.createTempDirectory(\"lucene\")\n\n// this chain reads documents from a folder of pdfs and writes them to an index directory\nChain\u003cPath, Directory\u003e createLuceneIndexChain = new ReadDocumentsFromPdfChain()\n\t.chain(new WriteDocumentsToLuceneDirectoryChain(tempIndexPath));\n\nPath pdfDirectoryPath = Paths.get(getClass().getResource(\"/pdf/qa\").toURI());\n\nDirectory directory = createLuceneIndexChain.run(pdfDirectoryPath);\n```\n\n### LLM\n\n#### Azure\n\n##### Azure Chat\nSee [AzureOpenAiChatCompletionsChainIT](src/test/java/io/github/cupybara/javalangchains/chains/llm/azure/chat/AzureOpenAiChatCompletionsChainIT.java)\n\n```java\nAzureOpenAiChatCompletionsChain chain = new AzureOpenAiChatCompletionsChain(\n\t\"my-azure-resource-name\",\n\t\"gpt-35-turbo\", // deployment name\n\t\"2023-05-15\", // api version\n\t\"Hello, this is ${name}\", \n\tnew OpenAiChatCompletionsParameters().temperature(0D), // also allows to set more parameters\n\tSystem.getenv(\"OPENAI_API_KEY\"),\n\t\"You are a helpful assistant who answers questions to ${name}\" // optional systemTemplate \n);\n\nString result = chain.run(Collections.singletonMap(\"name\", \"Manuel\")); \n// the above outputs something like: \"Hello Manuel, how are you\"\n```\n\n##### Azure Completions\n```java\nAzureOpenAiCompletionsChain chain = new AzureOpenAiCompletionsChain(\n\t\"my-azure-resource-name\",\n\t\"text-davinci-003\", // deployment name\n\t\"2023-05-15\", // api version\n\t\"Hello, this is ${name}\", \n\tnew OpenAiCompletionsParameters().temperature(0D), // also allows to set more parameters\n\tSystem.getenv(\"OPENAI_API_KEY\"),\n\t\"You are a helpful assistant who answers questions to ${name}\" // optional systemTemplate \n);\n\nString result = chain.run(Collections.singletonMap(\"name\", \"Manuel\"));\n// the above outputs something like: \"Hello Manuel, how are you\"\n```\n\n#### OpenAI\n\n##### OpenAI Chat\nSee [OpenAiChatCompletionsChainIT](src/test/java/io/github/cupybara/javalangchains/chains/llm/openai/chat/OpenAiChatCompletionsChainIT.java)\n\n```java\nOpenAiChatCompletionsChain chain = new OpenAiChatCompletionsChain(\n\t\"Hello, this is ${name}\", \n\tnew OpenAiChatCompletionsParameters().model(\"gpt-3.5-turbo\").temperature(0D), // also allows to set more parameters\n\tSystem.getenv(\"OPENAI_API_KEY\"),\n\t\"You are a helpful assistant who answers questions to ${name}\" // optional systemTemplate \n);\n\nString result = chain.run(Collections.singletonMap(\"name\", \"Manuel\"));\n// the above outputs something like: \"Hello Manuel, how are you\"\n```\n\n#### OpenAI Completions\n```java\nOpenAiCompletionsChain chain = new OpenAiCompletionsChain(\n\t\"Hello, this is ${name}\", \n\tnew OpenAiCompletionsParameters().model(\"text-davinci-003\").temperature(0D), // also allows to set more parameters\n\tSystem.getenv(\"OPENAI_API_KEY\"),\n\t\"You are a helpful assistant who answers questions to ${name}\" // optional systemTemplate \n);\n\nString result = chain.run(Collections.singletonMap(\"name\", \"Manuel\"));\n// the above outputs something like: \"Hello Manuel, how are you\"\n```\n\n### QA\n\n#### Modify Documents\nThe ModifyDocumentsContentChain can be used for document summarization (for example).\n\n```java\n// create the llm chain which is used for summarization\nLargeLanguageModelChain llmChain = new OpenAiChatCompletionsChain(\n\t\tPromptTemplates.QA_SUMMARIZE, \n\t\tnew OpenAiChatCompletionsParameters().temperature(0D).model(\"gpt-3.5-turbo\"),\n\t\tSystem.getenv(\"OPENAI_API_KEY\"));\n\n// create the ModifyDocumentsContentChain which is used to apply the llm chain to each passed document\nModifyDocumentsContentChain summarizeDocumentsChain = new ModifyDocumentsContentChain(llmChain);\n\n// create some example documents\nMap\u003cString, String\u003e myFirstDocument = new HashMap\u003cString, String\u003e();\nmyFirstDocument.put(PromptConstants.CONTENT, \"this is my first document content\");\nmyFirstDocument.put(PromptConstants.SOURCE, \"this is my first document source\");\n// the default summarize prompt PromptTemplates.QA_SUMMARIZE also expects the question used for retrieval in the document\nmyFirstDocument.put(PromptConstants.QUESTION, \"who is John Doe?\");\n\nMap\u003cString, String\u003e mySecondDocument = new HashMap\u003cString, String\u003e();\nmySecondDocument.put(PromptConstants.CONTENT, \"this is my second document content\");\nmySecondDocument.put(PromptConstants.SOURCE, \"this is my second document source\");\nmySecondDocument.put(PromptConstants.QUESTION, \"how old is John Doe?\"); // see comment above\n\n// input for the summarize chain is a stream of documents\nStream\u003cMap\u003cString, String\u003e\u003e documents = Stream.of(myFirstDocument, mySecondDocument);\n\n// output contains the passed documents with summarized content-Value\nStream\u003cMap\u003cString, String\u003e\u003e summarizedDocuments = summarizeDocumentsChain.run(documents);\n```\n\n#### Combine Documents\n```java\nCombineDocumentsChain combineDocumentsChain = new CombineDocumentsChain();\n\nMap\u003cString, String\u003e myFirstDocument = new HashMap\u003cString, String\u003e();\nmyFirstDocument.put(PromptConstants.CONTENT, \"this is my first document content\");\nmyFirstDocument.put(PromptConstants.SOURCE, \"this is my first document source\");\n\nMap\u003cString, String\u003e mySecondDocument = new HashMap\u003cString, String\u003e();\nmySecondDocument.put(PromptConstants.CONTENT, \"this is my second document content\");\nmySecondDocument.put(PromptConstants.SOURCE, \"this is my second document source\");\n\nStream\u003cMap\u003cString, String\u003e\u003e documents = Stream.of(myFirstDocument, mySecondDocument);\n\nMap\u003cString, String\u003e combinedDocument = combineDocumentsChain.run(documents);\n/* \n * Content: this is my first document content\n * Source: this is my first document source\n *\n * Content: this is my second document content\n * Source: this is my second document source\n * \n * (stored with key \"content\" inside the map)\n */\n```\n\n#### Map LLM results to answers with sources\n```java\nMapAnswerWithSourcesChain mapAnswerWithSourcesChain = new MapAnswerWithSourcesChain();\n\nAnswerWithSources answerWithSources = mapAnswerWithSourcesChain.run(\"The answer is bla bla bla.\\nSOURCES: page 1 book xy, page 2 book ab\");\n\nSystem.out.println(answerWithSources.getAnswer());  // The answer is bla bla bla.\nSystem.out.println(answerWithSources.getSources()); // [page 1 book xy, page 2 book ab]\n\n```\n\n#### Split Documents\nSee [SplitDocumentsChainTest](src/test/java/io/github/cupybara/javalangchains/chains/qa/split/SplitDocumentsChainTest.java)\n\n```java\n\n// 1. Create Documents\n\nList\u003cMap\u003cString, String\u003e\u003e documents = new LinkedList\u003c\u003e();\n\nMap\u003cString, String\u003e firstDocument = new LinkedHashMap\u003c\u003e();\nfirstDocument.put(PromptConstants.SOURCE, \"book of john\");\nfirstDocument.put(PromptConstants.CONTENT, \"This is a short text. This is another short text.\");\ndocuments.add(firstDocument);\n\nMap\u003cString, String\u003e secondDocument = new LinkedHashMap\u003c\u003e();\nsecondDocument.put(PromptConstants.SOURCE, \"book of jane\");\nsecondDocument.put(PromptConstants.CONTENT, \"This is a short text.\");\ndocuments.add(secondDocument);\n\n// 2. Split Documents\n\n/*\n * We create a TextSplitter that splits a text into partitions using a JTokkit\n * Encoding. We use the cl100k_base encoding (which btw is the default for\n * gpt-3.5-turbo)\n */\nTextSplitter textSplitter = new JtokkitTextSplitter(\n\t\tEncodings.newDefaultEncodingRegistry().getEncoding(EncodingType.CL100K_BASE), 10);\n\n/*\n * we now instantiate the SplitDocumentsChain which will split our documents\n * using the above created TextSplitter on the \"content\" field.\n */\nSplitDocumentsChain splitDocumentsChain = new SplitDocumentsChain(textSplitter);\n\nList\u003cMap\u003cString, String\u003e\u003e splitDocuments = splitDocumentsChain.run(documents.stream())\n\t\t.collect(Collectors.toList());\n\n// splitDocuments: [\n//   {content=This is a short text. , source=book of john},\n//   {content=This is another short text., source=book of john},\n//   {content=This is a short text., source=book of jane}\n// ]\n```\n\n## Usage behind a corporate proxy\nIf a chain needs to access to an external service, there will be a constructor parameter for passing the http client.\nThe [WebClient](https://docs.spring.io/spring-framework/reference/web/webflux-webclient.html) is used for the following chains:\n* [AzureOpenAiChatCompletionsChain](src/main/java/io/github/cupybara/javalangchains/chains/llm/azure/chat/AzureOpenAiChatCompletionsChain.java)\n* [AzureOpenAiCompletionsChain](src/main/java/io/github/cupybara/javalangchains/chains/llm/azure/completions/AzureOpenAiCompletionsChain.java)\n* [OpenAiChatCompletionsChain](src/main/java/io/github/cupybara/javalangchains/chains/llm/openai/chat/OpenAiChatCompletionsChain.java)\n* [OpenAiCompletionsChain](src/main/java/io/github/cupybara/javalangchains/chains/llm/openai/completions/OpenAiCompletionsChain.java)\n\nThere exists plenty of public documentation on how to configure a http proxy for those cases.\nOne example is [this one from Baeldung](https://www.baeldung.com/spring-webflux-timeout).\n\nFor accessing an Elasticsearch cluster the [Elasticsearch Low Level Client](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/java-rest-low.html) is used.\nThe [official documentation](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/8.8/java-rest-low-usage-initialization.html) shows how to use a proxy in this case. \n\n## Use Cases\nMultiple chains can be chained together to create more powerful chains for complex use cases.\n\n### Document Comparison\nThe [following unit test](src/test/java/io/github/cupybara/javalangchains/usecases/DocumentComparisonTest.java) shows how the existing chains may be used to compare 2 or more documents.\nI guess more abstraction would be useful. I will target that in one of the next releases and then also include example code in this README.\n\nThe following diagram shows how the implementation for this usecase works:\n\n![](misc/drawio/docment-comparison.svg)\n\n\n### Retrieval Question-Answering Chain \nThe [following unit test](src/test/java/io/github/cupybara/javalangchains/usecases/RetrievalQaTest.java) provides a comprehensive solution for an information retrieval and summarization task, with the aim to provide concise, informative and relevant answers from a large set of documents. It combines multiple processes into a Question-Answering (QA) chain, each responsible for a specific task.\n\n```java\n/*\n * take a look at src/test/resources/pdf of this repository\n * the pdf directory contains three documents about a fictional person named john doe\n * which we want to query using our retrieval based qa with sources chain\n */\nPath pdfDirectoryPath = Paths.get(RetrievalQaTest.class.getResource(\"/pdf/qa\").toURI());\n\n/*\n * We are creating and running an initializing chain which reads document from our pdf folder\n * and writes them to a lucene index directory\n */\nDirectory directory =  new ReadDocumentsFromPdfChain().chain(new WriteDocumentsToLuceneDirectoryChain()).run(pdfDirectoryPath);\n\n// we got multiple OpenAI LLM Chains and define our parameters at first\nOpenAiChatCompletionsParameters openAiChatParameters = new OpenAiChatCompletionsParameters()\n\t\t.temperature(0D)\n\t\t.model(\"gpt-3.5-turbo\");\n\n/*\n * Chain 1: The retrievalChain is used to retrieve relevant documents from an\n * index by using bm25 similarity\n */\ntry (LuceneRetrievalChain retrievalChain = new LuceneRetrievalChain(directory /* implies a filled lucene directory */, 2)) {\n\n\t/*\n\t * Chain 2: The summarizeDocumentsChain is used to summarize documents to only\n\t * contain the most relevant information. This is achieved using an OpenAI LLM\n\t * (gpt-3.5-turbo in this case)\n\t */\n\tModifyDocumentsContentChain summarizeDocumentsChain = new ModifyDocumentsContentChain(new OpenAiChatCompletionsChain(\n\t\t\tPromptTemplates.QA_SUMMARIZE, openAiChatParameters, System.getenv(\"OPENAI_API_KEY\")));\n\n\t/*\n\t * Chain 3: The combineDocumentsChain is used to combine the retrieved documents\n\t * in a single prompt\n\t */\n\tCombineDocumentsChain combineDocumentsChain = new CombineDocumentsChain();\n\n\t/*\n\t * Chain 4: The openAiChatChain is used to process the combined prompt using an\n\t * OpenAI LLM (gpt-3.5-turbo in this case)\n\t */\n\tOpenAiChatCompletionsChain openAiChatChain = new OpenAiChatCompletionsChain(PromptTemplates.QA_COMBINE,\n\t\t\topenAiChatParameters, System.getenv(\"OPENAI_API_KEY\"));\n\n\t/*\n\t * Chain 5: The mapAnswerWithSourcesChain is used to map the llm string output\n\t * to a complex object using a regular expression which splits the sources and\n\t * the answer.\n\t */\n\tMapAnswerWithSourcesChain mapAnswerWithSourcesChain = new MapAnswerWithSourcesChain();\n\n\t// we combine all chain links into a self contained QA chain\n\tChain\u003cString, AnswerWithSources\u003e qaChain = retrievalChain\n\t\t.chain(summarizeDocumentsChain)\n\t\t.chain(combineDocumentsChain)\n\t\t.chain(openAiChatChain)\n\t\t.chain(mapAnswerWithSourcesChain);\n\n\t// the QA chain can now be called with a question and delivers an answer\n\tAnswerWithSources answerWithSources = qaChain.run(\"who is john doe?\");\n\t\n\t/*\n\t * answerWithSources.getAnwswer() provides the answer to the question based on the retrieved documents\n\t * answerWithSources.getSources() provides a list of source strings for the retrieved documents\n\t */\n}\n```\n\nThe QA chain performs the following tasks:\n\n1. **Document Retrieval**: This step is responsible for retrieving the most relevant documents related to a given query from a large collection. It uses an index-based search algorithm to find documents containing information related to the input query. This functionality can be facilitated by any `RetrievalChain` implementation. `LuceneRetrievalChain`, which utilizes the BM25 similarity metric, is just an example used in the test case.\n\n2. **Document Summarization**: Once relevant documents are retrieved, they need to be summarized to extract the most essential information. The `SummarizeDocumentsChain` uses an instance of `LargeLanguageModelChain` for this task. In the provided example, OpenAI's GPT-3.5-turbo model via `OpenAiChatCompletionsChain` is used to reduce the information to its most relevant content.\n\n3. **Document Combination**: The `CombineDocumentsChain` combines the summarized documents into a single prompt. This forms the input to the next stage of the process.\n\n4. **Answer Generation**: The `OpenAiChatCompletionsChain` uses the combined prompt to generate a response. Any instance of `LargeLanguageModelChain` can be used for this step. In the given example, OpenAI's GPT-3.5-turbo model is utilized.\n\n5. **Mapping and Answer Extraction**: Finally, the `MapAnswerWithSourcesChain` maps the string output to a complex object using a regular expression, which splits the answer from the sources of information. This provides a structured output that includes both the answer to the query and the sources from which the answer was derived.\n\nIn conclusion, the QA chain represents a comprehensive solution for document-based question-answering tasks, providing not only the most relevant answer but also citing the sources from which the information was retrieved. This chain is particularly useful in contexts where understanding the origin of information is as crucial as the answer itself.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcupybara%2Fjava-langchains","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcupybara%2Fjava-langchains","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcupybara%2Fjava-langchains/lists"}