{"id":34624118,"url":"https://github.com/engisalor/lmf","last_synced_at":"2026-05-27T10:39:31.032Z","repository":{"id":325056993,"uuid":"1003602058","full_name":"engisalor/lmf","owner":"engisalor","description":"LMF-CLI: run LLM tasks with LangChain","archived":false,"fork":false,"pushed_at":"2025-11-19T09:26:23.000Z","size":163,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-27T10:38:38.461Z","etag":null,"topics":["applied-linguistics","langchain-python","language-model"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/engisalor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-17T11:48:54.000Z","updated_at":"2025-11-19T09:26:27.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/engisalor/lmf","commit_stats":null,"previous_names":["engisalor/lmf"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/engisalor/lmf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/engisalor%2Flmf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/engisalor%2Flmf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/engisalor%2Flmf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/engisalor%2Flmf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/engisalor","download_url":"https://codeload.github.com/engisalor/lmf/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/engisalor%2Flmf/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33562772,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-27T02:00:06.184Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["applied-linguistics","langchain-python","language-model"],"created_at":"2025-12-24T15:43:56.206Z","updated_at":"2026-05-27T10:39:31.025Z","avatar_url":"https://github.com/engisalor.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LMF: a CLI for generative language model experiments with LangChain\n\nThis repo is for designing, organizing and running LLM experiments with Python and [LangChain](https://docs.langchain.com/oss/python) (a language model framework, LMF). It has a modular structure for building just about any type of chatbot or generative LLM task supported by LangChain.\n\nWe use LMF for doing applied linguistics research. See our conference article for [eLex 2025](https://elex.link/elex2025/wp-content/uploads/eLex2025-50-Isaacs_etal.pdf).\n\n## Introduction\n\nThe repo is managed with a [Makefile](./Makefile) and the [UV python package](https://docs.astral.sh/uv/). The Makefile has a few commands for defining dependencies, running tests, and getting started with an example project and configurations.\n\nSo far, running [Ollama](https://ollama.com/) and [HuggingFace](https://huggingface.co/) (locally) and [OpenAI](https://openai.com/) (paid API) is implemented.\n\n### CLI basics\n\nThe main command `lmf` is available with a virtual environment `.venv` activated and the repo's dependencies installed.\n\nSee `lmf --help` for overall usage and `lmf \u003ccommand\u003e --help` for individual commands.\n\n`lmf` has a few primary commands:\n\n- `prepare` to prepare final prompts: from no-frills system prompts to more advanced techniques like few-shot semantic similarity example selection with a separate embeddings model and vector store\n- `query` to send prompts to models: configurable to allow for multiple runs, hyperparameters, chat model types, and other options\n- `clear` to delete data generated by `prepare` and `query`\n\n### Design and stability\n\nWe designed LMF to conduct applied linguistics research, with all its specific needs and quirks. Hopefully it's easy to use and modify, but it **should not be considered a stable dependency**. Forking the repo and reviewing new commits would be prudents.\n\n## Understanding projects\n\nAny LLM experiment/job/set of input data is referred to as a project. Projects are located in the `project/` directory and generally require three files:\n\n- `examples.yml` with examples for compiling few-shot tasks (left empty if none)\n- `inputs.yml` with human prompt(s) to send to the LLM\n- `system.yml` with a system prompt\n\nProjects are independent from other configurations to allow for easily swapping LLMs, changing configurations, and testing performance. Each project directory is a self-contained set of data.\n\n## Example project\n\nHere is what the `wizard-of-math` project looks like. It's adapted from LangChain's dynamic example selector documentation.\n\nThis project is intended for use with a semantic similarity selector, with the goal of showing an LLM how to do math with the `+` symbol replaced with a bird emoji and to respond to a question about horses.\n\nAn embeddings model (separate from the chat model) is used to provide the best examples to each input. With the `lmf prepare` command, the final prompts are compiled, including the system prompt and dynamically selected examples for each input. Then `lmf query` is executed, sending final prompts to the desired chat model.\n\n### Initial project data\n\n```yml\n# examples.yml\n- input: 2 🦜 2\n  output: \"4\"\n- input: 2 🦜 3\n  output: \"5\"\n- input: 2 🦜 4\n  output: \"6\"\n- input: What did the cow say to the moon?\n  output: Nothing at all.\n- input: Write me a poem about the moon.\n  output: One for the moon, and one for me, who are we to talk about the moon?\n- input: Tell me about horses.\n  output: Horses are mammals.\n\n# inputs.yml\n- input: About horses...\n- input: What's 3 🦜 3?\n\n# system.yml\nYou are a wondrous wizard of math.\n```\n\n### Task execution\n\nTo run a task, first download the required models. LMF's default embeddings model is from HuggingFace and the chat model is from Ollama.\n\n```bash\nollama pull qwen3:1.7b\nhuggingface-cli download Qwen/Qwen3-Embedding-4B\n```\n\nThe task can then be executed in a single line:\n\n```bash\nlmf -r 3 -p wizard-of-math -f temperature-0 prepare query --temperature 0.0\n```\n\nConfiguration:\n\n- `-r 3` defines how many runs (repeated executions) should be completed\n- `-p wizard-of-math` defines the current project directory\n- `-f temperature-0` sets a filename prefix for the current command\n- `prepare` generates the final prompts\n- `query --temperature 0.0` runs the task with the default model with a temperature of 0\n\nModified versions of the task using different models or other parameters can also be run. Outputs are saved to `/project/wizard-of-math/output/`. For example:\n\n```bash\nlmf -p wizard-of-math -f gemma3-temp0.5 prepare query --model gemma3:12b --temperature 0.5\n```\n\nEach run of each version of the executed task is saved separately: just make sure filenames are set to be unique, as existing files get overwritten.\n\nTo do a more systematic evaluation of how LLMs complete a task, run a series of commands, where each command tests one configuration. For example, these commands generate final prompts with `lmf prepare` using a number of LLMs. We can inspect the generated prompts to determine which embeddings model achieves the best dynamic example selector results.\n\n```bash\nlmf -p wizard-of-math clear\nlmf -p wizard-of-math -f 1-qwen3-e-0.6B prepare\nlmf -p wizard-of-math -f 2-nomic-embed-text prepare --embeddings Ollama --model nomic-embed-text:latest\nlmf -p wizard-of-math -f 3-ollama-qwen3-1.7b prepare --embeddings Ollama --model qwen3:1.7b\n```\n\n## Components and recipes\n\nThe example project gets us started and doesn't require writing any code or changing underlying components of LMF. For more in-depth modifications, run `lmf COMMAND --help` to see what can be defined by each command. A few components are available as-is, but adding new ones to the Python modules is straightforward.\n\nFor example, `query` accepts different chat model providers (`Ollama`, `OpenAI`), which must be set to access the models each provider has. Also, default outputs are unstructured (a typical chatbot conversation), but structured outputs can be set to return data as Python objects/JSON data. For example, the `SemanticRelationTriple` structured output could be used for entity-relation extraction tasks.\n\nMore likely, you'll need to define your own component classes. New components can be added to the respective Python module, such as [schema.py](./src/lmf/schema.py) for structured outputs. Append your own modifications above the line `### add new classes above this line ###`, using the default classes as a reference, and your new component will automatically be available in the CLI, e.g., by executing `lmf ... query --output-structure MyNewStructuredOutput`.\n\n### `query` arguments and underlying components\n\n```bash\nUsage: lmf query [OPTIONS]\n\n  Executes LLM final prompts with a model, model provider and output\n  structure.\n\nOptions:\n  -m, --model TEXT                Name of model (download models beforehand)\n                                  [default: qwen3:1.7b]\n  --chat-model CHAT_MODEL.PY      A chat model chat model class from chat.py\n                                  [default: Ollama]\n  --chat-model-param TEXT         A parameter to pass to the chat model in the\n                                  format 'key=value'\n  -o, --output-structure SCHEMA.PY\n                                  A structured output class from schema.py\n                                  [default: Unstructured]\n  --sample INTEGER                Sample size (run first N prompts in a file;\n                                  0 == all)  [default: 0]\n  --random / --no-random          Toggle sample randomization  [default: no-\n                                  random]\n  --temperature FLOAT RANGE       Model temperature (0.0 = more deterministic\n                                  / 1.0 = more variable)  [default: 0.0;\n                                  0.0\u003c=x\u003c=1.0]\n  --timeout INTEGER               Response timeout (for cloud providers)\n                                  [default: 300]\n  --max-tokens INTEGER            Model maximum tokens per response  [default:\n                                  10000]\n  --think / --no-think            Toggle model thinking  [default: no-think]\n  --rate-limiter RATE_LIMITER.PY  A rate limiter class from rate_limiter.py\n                                  [default: NoRateLimiter]\n  --help                          Show this message and exit.\n\n  RECIPES *case insensitive*\n  Chat_models:\n  - Ollama\n  - OpenAI\n  Output_structures:\n  - Unstructured\n  - UnstructuredThink\n  - Hypernym\n  - Entity\n  - EntityList\n  - SemanticRelationTriple\n  - EntityRelationExtractor\n  Rate_limiters:\n  - NoRateLimiter\n  - Memory\n```\n\n## Environment variables\n\nSetting environment variables may be necessary, like the example below.\n\n```bash\n# use huggingface offline\nHF_HUB_OFFLINE=1\n# pytorch settings\nPYTORCH_CUDA_ALLOC_CONF=expandable_segments:True\n# API keys for external providers\nOPENAI_API_KEY=\n```\n\n## Citing\n\nPlease cite this paper:\n\n```bibtex\n@inproceedings{\n\taddress = {Bled, Slovenia},\n\ttitle = {Inductive {Categorization} for {Conceptual} {Analysis} with {LLMs}: {A} {Case} {Study} from the {Humanitarian} {Encyclopedia}},\n\turl = {https://elex.link/elex2025/wp-content/uploads/eLex2025-50-Isaacs_etal.pdf},\n\tbooktitle = {Electronic lexicography in the 21st century ({eLex} 2023): {Intelligent} lexicography. {Proceedings} of the {eLex} 2025 conference},\n\tpublisher = {Lexical Computing},\n\tauthor = {Isaacs, Loryn and Chambó, Santiago and León-Araúz, Pilar},\n\teditor = {Kosem, Iztok and Jakubíček, Miloš and Medveď, Marek and Zgaga, Karolina and Arhar Holdt, Špela and Munda, Tina and Salgado, Ana},\n\tyear = {2025},\n\tpages = {866--887},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fengisalor%2Flmf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fengisalor%2Flmf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fengisalor%2Flmf/lists"}