{"id":15609303,"url":"https://github.com/jacoblee93/auto-docstrings","last_synced_at":"2025-07-29T21:15:55.013Z","repository":{"id":189648052,"uuid":"681003453","full_name":"jacoblee93/auto-docstrings","owner":"jacoblee93","description":"Use LangChain.js to automatically generate docstrings for LangChain","archived":false,"fork":false,"pushed_at":"2023-11-10T17:02:51.000Z","size":3514,"stargazers_count":42,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-12-13T19:13:13.492Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jacoblee93.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-21T03:54:07.000Z","updated_at":"2024-07-25T21:10:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"167ab98f-a7b9-4cdc-a0fd-c0e132300671","html_url":"https://github.com/jacoblee93/auto-docstrings","commit_stats":null,"previous_names":["jacoblee93/auto-docstrings"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacoblee93%2Fauto-docstrings","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacoblee93%2Fauto-docstrings/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacoblee93%2Fauto-docstrings/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacoblee93%2Fauto-docstrings/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jacoblee93","download_url":"https://codeload.github.com/jacoblee93/auto-docstrings/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":231927147,"owners_count":18447174,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-03T05:42:43.782Z","updated_at":"2025-01-01T15:39:09.244Z","avatar_url":"https://github.com/jacoblee93.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ✨ LangChain.js Auto TSDoc Comment Creator\n\nThis repo uses [LangChain.js](https://js.langchain.com/) to automatically generate annotated TSDoc\ncomments for declared methods, classes, interfaces, types, and functions in the [LangChain.js repo](https://github.com/hwchase17/langchainjs).\n\n![Sample TSDoc comment shown in Intellisense in VSCode](/public/images/intellisense_tooltip.png)\n\n![An extremely funny and creative meme](/public/images/langchain-create-meme.jpeg)\n\nTSDoc comments are invaluable in helping developers understand code at the interface level, giving insight into\nintricacies, parameters, and return types without needing to check the actual implementation. The comments show\nup in LangChain's autogenerated API reference pages as well as inline Intellisense tooltips, and the provided context\nhelps developers save time and avoid frustrating bugs.\n\n## 🔨 Usage\n\nInstall packages, then create a `.env` file with the keys specified in `.env.example`.\n\nThen run `yarn start path/to/folder` to recursively run the script on all non-test `.ts` files in the folder.\n\n## 🔎 How it works\n\nAs a popular open-source project that includes many hyper-specific technical concepts and integrations more recent than the knowledge\ncutoff dates for popular models, auto-generating these comments for LangChain.js presented some unique issues. After some\nexperimentation, I settled on the following high-level flow for each file:\n\n1. Pass the file contents into [retrieval-focused agent](https://js.langchain.com/docs/use_cases/question_answering/conversational_retrieval_agents) prompted specifically to define the LangChain-specific terms mentioned in the code for a technical writer. The agent had leeway to look up multiple terms that it deemed necessary to its understanding of the input code. The agent used an existing [Weaviate](https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/weaviate) index we previously created over our Python documentation, which should mostly be a superset of the JS integrations and abstractions.\n2. Use an [OpenAI Functions chain](https://js.langchain.com/docs/modules/chains/popular/structured_output) to take the generated context, as well as the original code, to output suggested comments, the name of the declaration they apply to, and metadata in a structured format.\n3. For each generated comment, use TypeScript's abstract syntax tree (AST) parsing APIs to identify the position of the declaration in the code.\n4. Create the actual comment string using the metadata from step 2.\n5. Splice it into the file and write it back to the filesystem.\n\nAs far as prompting, giving the agent an example workflow and generally few-shotting the prompts with examples of good outputs seemed to help get\nresults in a good state.\n\nAll steps use OpenAI's GPT-4.\n\nThe actual implementation requires a few [RunnableMaps](https://js.langchain.com/docs/guides/expression_language/cookbook) to pass\ninformation through properly. Here's a basic idea of what it looks like:\n\n```typescript\nconst chain = RunnableSequence.from([\n  {\n    original_input: new RunnablePassthrough(),\n    retrievalResult: researchAgentExecutor,\n  },\n  {\n    input: ({ original_input }) =\u003e original_input.input,\n    context: ({ retrievalResult }) =\u003e retrievalResult.output,\n  },\n  prompt,\n  functionCallingModel,\n  new JsonOutputFunctionsParser(),\n]);\n```\n\nHere's a link to an example LangSmith trace demonstrating this sequence:\n\nhttps://smith.langchain.com/public/24456a63-4af0-48b6-9e1a-edd2dd0087f4/r\n\n## 📋 Results\n\n[The resulting PR](https://github.com/hwchase17/langchainjs/pull/2341) from running this script on the repo encompassed 280 files!\n\nSome changes were required after the initial pass. Roughly 10 of the generated `@returns` annotations resulted in TSDoc comments incompatible with\nour automatically generated Docusaurus API reference docs. Also, a few of the docs for very technical, algorithm-heavy features\n(generative agents, parent document retriever) were vague, but overall quality seemed high.\n\nHere are a few I was most impressed by:\n\nA good explanation with parameters of the convenience method to create an OpenAI Functions-powered extraction chain:\n\n```typescript\n/**\n * Function that creates an extraction chain using the provided JSON schema.\n * It sets up the necessary components, such as the prompt, output parser, and tags.\n * @param schema JSON schema of the function parameters.\n * @param llm Must be a ChatOpenAI model that supports function calling.\n * @returns A LLMChain instance configured to return data matching the schema.\n */\nexport function createExtractionChain(\n  schema: FunctionParameters,\n  llm: ChatOpenAI\n) {\n  ...\n}\n```\n\nA nice explanation of a highly-specific and technical retriever:\n\n```typescript\n/**\n * A class for retrieving relevant documents based on a given query. It\n * extends the VectorStoreRetriever class and uses a BaseLanguageModel to\n * generate a hypothetical answer to the query, which is then used to\n * retrieve relevant documents.\n */\nexport class HydeRetriever\u003c\n  V extends VectorStore = VectorStore\n\u003e extends VectorStoreRetriever\u003cV\u003e {\n  ...\n}\n```\n\nAdding caveats to the popular `BufferMemory` class:\n\n```typescript\n/**\n * The `BufferMemory` class is a type of memory component used for storing\n * and managing previous chat messages. It is a wrapper around\n * `ChatMessageHistory` that extracts the messages into an input variable.\n * This class is particularly useful in applications like chatbots where\n * it is essential to remember previous interactions. Note: The memory\n * instance represents the history of a single conversation. Therefore, it\n * is not recommended to share the same history or memory instance between\n * two different chains. If you deploy your LangChain app on a serverless\n * environment, do not store memory instances in a variable, as your\n * hosting provider may reset it by the next time the function is called.\n */\nexport class BufferMemory extends BaseChatMemory implements BufferMemoryInput {\n  ...\n}\n```\n\nA nice warning on an unused Vectara vector store method (they provide their own embeddings, and don't support adding vectors directly):\n\n```typescript\n  /**\n   * Throws an error, as this method is not implemented. Use addDocuments\n   * instead.\n   * @param _vectors Not used.\n   * @param _documents Not used.\n   * @returns Does not return a value.\n   */\n  async addVectors(\n    _vectors: number[][],\n    _documents: Document[]\n  ): Promise\u003cvoid\u003e {\n    throw new Error(\n      \"Method not implemented. Please call addDocuments instead.\"\n    );\n  }\n```\n\n## 🧪 Considerations and Tradeoffs\n\n### Retrieval Agent vs. Retrieval Chain\n\nI considered using a [simpler retrieval chain](https://js.langchain.com/docs/modules/chains/popular/vector_db_qa) rather than a more\nadvanced retrieval agent as I generally knew that the vector store should retrieve information on the main class mentioned in the\ncode.\n\nHowever, because many classes in LangChain.js extend base classes and import other concepts, I chose to try to prompt\nthe agent to look up information on the main class from the retriever, then have the flexibility to decide if it needed\nmore information and make further queries.\n\n### Structured Output vs. Unstructured\n\nThe first naive approach I took was to try to just have the model take input code and rewrite it with TSDoc comments.\nHowever, I ran into issues where the model would overwrite handwritten existing TSDoc comments.\n\nI then thought to structure output and splice the comments in as a final step. I initially tried to simply ask the model to output a line\nnumber where the comment should be spliced, but again ran into hallucination issues.\n\nI then decided if I made each item in the returned array also include the name of the function and type of declaration, and then make\na final pass that finds the correct place for the comment in the code using the AST approach described above.\n\n### Call Per Desired TSDoc Comment vs. One Pass\n\nFor large files, first extracting applicable declarations for TSDoc comments, then writing the comments may improve quality approach.\n\nHowever, it would be more token intensive and slower, and structuring my OpenAI schema to be an array gave satisfactory results.\n\n### Tokens\n\nRunning this repo over all the code in LangChain.js took millions of tokens.\n\nDuring development, I added some filtering steps to try to only perform the LLM-powered steps on non-test TypeScript files that\nhave at least one uncommented declaration that would be a good candidate for a TSDoc comment.\n\n### Document ingestion/splitting\n\nTo get started quickly, I reused an existing vector store optimized for [ChatLangChain](https://github.com/langchain-ai/chat-langchain).\nExperimenting and optimizing the ingestion and splitting steps (and ingesting the JS specific docs rather than the Python ones!)\nwould almost certainly improve performance.\n\n# 🙏 Thank you!\n\nThanks for reading! I hope you can extract (haha) some of the techniques mentioned here and apply them to your own projects.\n\nFor more, follow me on X (formerly Twitter) [@Hacubu](https://x.com/hacubu) as well as the official [LangChain account](https://x.com/langchainai).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacoblee93%2Fauto-docstrings","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjacoblee93%2Fauto-docstrings","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacoblee93%2Fauto-docstrings/lists"}