{"id":13911103,"url":"https://github.com/conneroisu/Text-Dataset-Aid-Plugin","last_synced_at":"2025-07-18T10:32:24.184Z","repository":{"id":63574846,"uuid":"568286367","full_name":"conneroisu/Text-Dataset-Aid-Plugin","owner":"conneroisu","description":"This is a obsidian plugin to help with the creation of personal jsonl datasets for text generation models.","archived":false,"fork":false,"pushed_at":"2024-01-29T16:45:06.000Z","size":161,"stargazers_count":38,"open_issues_count":4,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-07-10T18:29:05.418Z","etag":null,"topics":["fine-tuning","finetuning","language-model","obsidian","obsidian-md","obsidian-plugin","plugin"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/conneroisu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-20T03:03:07.000Z","updated_at":"2025-06-03T07:45:22.000Z","dependencies_parsed_at":"2024-11-20T06:15:14.752Z","dependency_job_id":null,"html_url":"https://github.com/conneroisu/Text-Dataset-Aid-Plugin","commit_stats":null,"previous_names":[],"tags_count":17,"template":false,"template_full_name":"obsidianmd/obsidian-sample-plugin","purl":"pkg:github/conneroisu/Text-Dataset-Aid-Plugin","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conneroisu%2FText-Dataset-Aid-Plugin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conneroisu%2FText-Dataset-Aid-Plugin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conneroisu%2FText-Dataset-Aid-Plugin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conneroisu%2FText-Dataset-Aid-Plugin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/conneroisu","download_url":"https://codeload.github.com/conneroisu/Text-Dataset-Aid-Plugin/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conneroisu%2FText-Dataset-Aid-Plugin/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265742430,"owners_count":23820839,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fine-tuning","finetuning","language-model","obsidian","obsidian-md","obsidian-plugin","plugin"],"created_at":"2024-08-07T00:01:57.781Z","updated_at":"2025-07-18T10:32:23.897Z","avatar_url":"https://github.com/conneroisu.png","language":"TypeScript","funding_links":[],"categories":["TypeScript"],"sub_categories":[],"readme":"\n![image](https://user-images.githubusercontent.com/88785126/203184536-9199f720-a03b-423b-9bf6-81a68c7fbd28.png)\n![Obsidian Downloads](https://img.shields.io/badge/dynamic/json?logo=obsidian\u0026color=%23483699\u0026label=downloads\u0026query=%24%5B%22obsidian-dataset-aid%22%5D.downloads\u0026url=https%3A%2F%2Fraw.githubusercontent.com%2Fobsidianmd%2Fobsidian-releases%2Fmaster%2Fcommunity-plugin-stats.json)\n\n\n## Personalize your Second Brain Buddy(Text Generation Model)\n\n[![Build obsidian plugin](https://github.com/conneroisu/Text-Dataset-Aid-Plugin/actions/workflows/release.yml/badge.svg)](https://github.com/conneroisu/Text-Dataset-Aid-Plugin/actions/workflows/release.yml)\n\nUse a txt file to house your dataset. A feature to export your txt to a jsonl file will be added soon.\n\n# Context \n## Condition: Fully Working\t\nThe creation of NLP and text generation datasets are extremely impactual and has the potential to allow for researchers to train models that can automatically generate text. However, the creation of custom datasets is a teadious and slow process.\n\nThe text dataset aid is a helpful tool that can aid the creation of finetuning datasets for text generation models like GPT-3 by hand! This can make the text generated by your model after finetuning to be more personalized, detailed, or better formatted. Say no to dealing with menus through hotkey configurations!\n\nThis plugin can be used to quickly generate training data for NLP and text generation models. This would speed up research in these areas, as well as make it easier for practitioners to train these models.\n\nThe text dataset aid plugin is a helpful tool that can aid the creation of finetuning datasets for text generation models like GPT-3 by hand. This can make the text generated by your model after finetuning to be more personalized, detailed, or better formatted. Say no to dealing with menus through hotkey configurations!\n\n## Context within your second brain \nUpdating your own text generation model on your collected dataset whilst working in your second brain allows for your model to better fit your second brain's needs. This plugin fits in any creation or editing workflow because of the nature of commands within obsidian. Hope that you use this plugin as much as I do!\n\n# Advantages of Finetuning\nFintuning your text generation model allows for the creation of text that is more natural and expressive. \n1. increased accuracy in text prediction/generation \n2. increased fluency and coherence in text generation\n3. greater control over the style and content of generated text\n4. More control over the types of outputs the model produces\n5. Greater flexibility in the types of inputs the model can accept\n6. The ability to produce more human-like outputs\n7. Increased accuracy in the prediction of certain types of outputs\n\nAn great resource for fine-tuning principles from [microsoft](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/prepare-dataset)\n\n# Usage\nThe core function of this plugin is made easier through the use of vim mode, but should work in either case. \nThere are two commands offered currently:(Each of these commands has an acommpanying hotkey configureable from hotkeys) \n\nWhen you send the prompt to the dataset if there is already a prompt there, the plugin does nothing. \n\nWhen you send the completion to the dataset and there is already a prompt the text selection is sent to the dataset as a completion to that prompt.\n\n## Open Ended Generation Support!\nWhen you send the completion to the dataset  and there is not a prompt, the text selection is inserted into the dataset with a empty prompt prepended to the text selection.\n\nan example of this \n```json\n{\"prompt\":\"\", \"completion\":\"Hello can I help you?\"}\n```\nanother example \n```json \n{\"prompt\":\"\", \"completion\":\"Hi, How can I help you today\"}\n```\n\nSend the Selection to send to your dataset file as prompt\nSend the Selection to send to your dataset file as completion\n\n\nExample of finetuning dataset\n```json\n{\"prompt\":\"Company: BHFF insurance\\nProduct: allround insurance\\nAd:One stop shop for all your insurance needs!\\nSupported:\", \"completion\":\" yes\"}\n{\"prompt\":\"Company: Loft conversion specialists\\nProduct: -\\nAd:Straight teeth in weeks!\\nSupported:\", \"completion\":\" no\"}\n```\n\n# Installation\n## Installing from the community plugins page in obsidian\n-   Open Settings \u003e Third-party plugin\n-   Make sure Safe mode is **off**\n-   Click Browse community plugins\n-   Search for \"Dataset Finetuning Aid Plugin\"\n-   Click Install\n-   Once installed, close the community plugins window and activate the newly installed plugin\n## Manually Installing from github \n-   Download the Latest Release from the Releases section of the GitHub Repository(if you can't find this it should be to the right while your viewing this)\n-   Extract the plugin folder from the zip to your vault's plugins folder: `\u003cvault\u003e/.obsidian/plugins/`  \n    Note: On some machines the `.obsidian` folder may be hidden. On MacOS you should be able to press `Command+Shift+Dot` to show the folder in Finder.\n-   Reload Obsidian\n\n# Settings\nThere are four main settings that are configurable within the settings panel of the plugin, but the default values are set up for the popular format for datasets for text generation models called jsonl.\n\n| Setting Name          | Description                                                                     | Default       |\n| --------------------- | ------------------------------------------------------------------------------- | ------------- |\n| Prefix for Prompts    | This is the string that is prepended to the prompt when sent to the dataset     | `{\"prompt\":`    |\n| Suffix for Prompts    | This is the string that is appended to the prompt when sent to the dataset      | `,`             |\n| Prefix for Completion | This is the string that is prepended to the completion when sent to the dataset | `\"completion\":` |\n| Suffix for Completion | This is the string that is appended to the completion when sent to the dataset  | `}\\n`              |\n\n\n[Help within development](https://github.com/TfTHacker/obsidian42-text-transporter/blob/main/src/features/transporterFunctions.ts)\n\n## Development \n\nCreating a new version:\n\n```bash\ngit tag -a 1.0.1 -m \"1.0.1\"\ngit push origin 1.0.1\n```\n# Inspiration\n\nInspired by the efficiency and appeal of fine-tuning your own language model, this plugin allows for you to build datasets from your notes in the form of prompts and responses. Automatically formats the text to the specification of [OpenAI](https://openai.com/) for finetuning models like GPT3.\n\nThis plugin shares simularities to the textTransporter Plugin made by [TfTHacker](https://github.com/TfTHacker/obsidian42-text-transporter/)\n\n\n\nMade with ❤️ by Conner Ohnesorge\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconneroisu%2FText-Dataset-Aid-Plugin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fconneroisu%2FText-Dataset-Aid-Plugin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconneroisu%2FText-Dataset-Aid-Plugin/lists"}