{"id":31753705,"url":"https://github.com/servicenow/sygra","last_synced_at":"2025-10-09T17:53:43.451Z","repository":{"id":313366661,"uuid":"1039877163","full_name":"ServiceNow/SyGra","owner":"ServiceNow","description":"SyGra - Graph-oriented Synthetic data generation Pipeline","archived":false,"fork":false,"pushed_at":"2025-09-30T10:08:40.000Z","size":7015,"stargazers_count":22,"open_issues_count":1,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-10-04T11:56:53.514Z","etag":null,"topics":["ai","dpo","image-datasets","llm-datasets","llm-framework","llm-training-data","low-code-no-code","multimodality","open-source","python","sft-data","synthetic-data","synthetic-dataset-generation"],"latest_commit_sha":null,"homepage":"https://servicenow.github.io/SyGra/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ServiceNow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-18T05:53:57.000Z","updated_at":"2025-10-04T06:57:27.000Z","dependencies_parsed_at":"2025-09-18T15:57:24.404Z","dependency_job_id":"33914ef8-93ff-4797-841a-aa219b2de8b5","html_url":"https://github.com/ServiceNow/SyGra","commit_stats":null,"previous_names":["servicenow/grasp","servicenow/sygra"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/ServiceNow/SyGra","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2FSyGra","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2FSyGra/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2FSyGra/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2FSyGra/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ServiceNow","download_url":"https://codeload.github.com/ServiceNow/SyGra/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2FSyGra/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279001804,"owners_count":26083197,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","dpo","image-datasets","llm-datasets","llm-framework","llm-training-data","low-code-no-code","multimodality","open-source","python","sft-data","synthetic-data","synthetic-dataset-generation"],"created_at":"2025-10-09T17:53:39.876Z","updated_at":"2025-10-09T17:53:43.438Z","avatar_url":"https://github.com/ServiceNow.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg width=30% src=\"https://raw.githubusercontent.com/ServiceNow/SyGra/refs/heads/main/docs/resources/images/sygra_logo.png\"\u003e\n\n  \u003ch1\u003eSyGra: Graph-oriented Synthetic data generation Pipeline\u003c/h1\u003e\n\n\u003ca href=\"https://pypi.org/project/sygra/\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/v/sygra.svg?logo=pypi\u0026color=orange\"/\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/ServiceNow/SyGra/actions/workflows/ci.yml\"\u003e\n    \u003cimg alt=\"CI\" src=\"https://github.com/ServiceNow/SyGra/actions/workflows/ci.yml/badge.svg\"/\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/ServiceNow/SyGra/releases\"\u003e\n    \u003cimg alt=\"Releases\" src=\"https://img.shields.io/github/v/release/ServiceNow/SyGra?logo=bookstack\u0026logoColor=white\"/\u003e\u003c/a\u003e\n\u003ca href=\"https://servicenow.github.io/SyGra\"\u003e\n    \u003cimg alt=\"Documentation\" src=\"https://img.shields.io/badge/MkDocs-Documentation-green.svg\"/\u003e\u003c/a\u003e\n\u003ca href=\"http://arxiv.org/abs/2508.15432\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/arXiv-2508.15432-B31B1B.svg\" alt=\"arXiv\"\u003e\u003c/a\u003e\n\u003ca href=\"LICENSE\"\u003e\n    \u003cimg alt=\"Licence\" src=\"https://img.shields.io/badge/License-Apache%202.0-blue.svg\"/\u003e\u003c/a\u003e\n\n\u003cbr\u003e\n\u003cbr\u003e\n\u003cbr\u003e\n\u003c/div\u003e\n\n\nFramework to easily generate complex synthetic data pipelines by visualizing and configuring the pipeline as a\ncomputational graph. [LangGraph](https://python.langchain.com/docs/langgraph/) is used as the underlying graph\nconfiguration/execution library. Refer\nto [LangGraph examples](https://github.com/langchain-ai/langgraph/tree/main/examples) to get a sense of the different\nkinds of computational graph which can be configured.\n\u003cbr\u003e\n\u003cbr\u003e\n\n## Introduction\n\nSyGra Framework is created to generate synthetic data. As it is a complex process to define the flow, this design simplifies the synthetic data generation process. SyGra platform will support the following:\n- Defining the seed data configuration\n- Define a task, which involves graph node configuration, flow between nodes and conditions between the node\n- Define the output location to dump the generated data\n\nSeed data can be pulled from either Huggingface or file system. Once the seed data is loaded, SyGra platform allows datagen users to write any data processing using the data transformation module. When the data is ready, users can define the data flow with various types of nodes. A node can also be a subgraph defined in another yaml file.\n\nEach node can be defined with preprocessing, post processing, and LLM prompt with model parameters. Prompts can use seed data as python template keys.  \nEdges define the flow between nodes, which can be conditional or non-conditional, with support for parallel and one-to-many flows.\n\nAt the end, generated data is collected in the graph state for a specific record, processed further to generate the final dictionary to be written to the configured data sink.\n\n![SygraFramework](https://raw.githubusercontent.com/ServiceNow/SyGra/refs/heads/main/docs/resources/images/sygra_architecture.png)\n\n---\n\n# Installation\n\nPick how you want to use **SyGra**:\n\n\u003cdiv align=\"center\"\u003e\n\n\u003ca href=\"https://servicenow.github.io/SyGra/installation/\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Use%20as-Framework-4F46E5?style=for-the-badge\" alt=\"Install as Framework\"\u003e\n\u003c/a\u003e\n\u0026nbsp;\u0026nbsp;\n\u003ca href=\"https://servicenow.github.io/SyGra/sygra_library/\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Use%20as-Library-10B981?style=for-the-badge\" alt=\"Install as Library\"\u003e\n\u003c/a\u003e\n\n\u003c/div\u003e\n\n### Which one should I choose?\n- **Framework** → Run end-to-end pipelines from YAML graphs + CLI tooling and project scaffolding.\n  (Start here: **[`Installation`](https://servicenow.github.io/SyGra/installation/)**)\n\n- **Library** → Import SyGra in your own Python app/notebook; call APIs directly.\n  (Start here: **[`SyGra Library`](https://servicenow.github.io/SyGra/sygra_library/)**)\n\n![Note](https://img.shields.io/badge/Note-important-yellow)  \n\u003e Before running the commands below, make sure to add your model configuration in `config/models.yaml` and set environment variables for credentials and chat templates as described in the [Model Configuration](https://servicenow.github.io/SyGra/getting_started/model_configuration/) docs.\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cstrong\u003eTL;DR – Framework Setup\u003c/strong\u003e\u003c/summary\u003e\n\nSee full steps in \u003ca href=\"https://servicenow.github.io/SyGra/installation/\"\u003eInstallation\u003c/a\u003e.\n\n```bash\ngit clone git@github.com:ServiceNow/SyGra.git\n\ncd SyGra\npoetry run python main.py --task examples.glaive_code_assistant --num_records=1\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cstrong\u003eTL;DR – Library Setup\u003c/strong\u003e\u003c/summary\u003e\n\nSee full steps in \u003ca href=\"https://servicenow.github.io/SyGra/sygra_library/\"\u003eSygra Library\u003c/a\u003e.\n\n```bash\npip install sygra   \n```\n\n```python\nimport sygra\n\nworkflow = sygra.Workflow(\"tasks/examples/glaive_code_assistant\")\nworkflow.run(num_records=1)\n```\n\u003c/details\u003e\n\n### Quick Start\n\u003e To get started with SyGra, please refer to some **[Example Tasks](https://github.com/ServiceNow/SyGra/tree/main/tasks/examples)** or **[SyGra Documentation](https://servicenow.github.io/SyGra/)**\n\n---\n\n\n## Components\nThe SyGra architecture is composed of multiple components. The following diagrams illustrate the four primary components and their associated modules.\n\n### Data Handler\nData handler is used for reading and writing the data. Currently, it supports file handler with various file types and huggingface handler.\nWhen reading data from huggingface, it can read the whole dataset and process, or it can stream chunk of data.\n\n\u003ckbd\u003e ![DataHandler](https://raw.githubusercontent.com/ServiceNow/SyGra/refs/heads/main/docs/resources/images/component_data_handler.png) \u003c/kbd\u003e\n\n### Graph Node Module\nThis module is responsible for building various kind of nodes like LLM node, Multi-LLM node, Lambda node, Agent node etc.\nEach node is defined for various task, for example multi-llm node is used to load-balance the data among various inference point running same model.\n\n\u003ckbd\u003e ![Nodes](https://raw.githubusercontent.com/ServiceNow/SyGra/refs/heads/main/docs/resources/images/component_nodes.png) \u003c/kbd\u003e\n\n### Graph Edge Connection\nOnce node are built, we can connect them with simple edge or conditional edge.\nConditional edge uses python code to decide the path. Conditional edge helps implimenting if-else flow as well as loops in the graph.\n\n\u003ckbd\u003e ![Edges](https://raw.githubusercontent.com/ServiceNow/SyGra/refs/heads/main/docs/resources/images/component_edges.png) \u003c/kbd\u003e\n\n### Model clients\nSyGra doesn't support inference within the framework, but it supports various clients, which helps connecting with different kind of servers.\nFor example, openai client is being supported by Huggingface TGI, vLLM server and Azure services. However, model configuration does not allow to change clients, but it can be configured in models code.\n\n\u003ckbd\u003e ![ModelClient](https://raw.githubusercontent.com/ServiceNow/SyGra/refs/heads/main/docs/resources/images/component_model_client.png) \u003c/kbd\u003e\n\n## Task Components\n\nSyGra supports extendability and ease of implementation—most tasks are defined as graph configuration YAML files. Each task consists of two major components: a graph configuration and Python code to define conditions and processors.\nYAML contains various parts:\n\n- **Data configuration** : Configure file or huggingface as source and sink for the task.\n- **Data transformation** : Configuration to transform the data into the format it can be used in the graph.\n- **Node configuration** : Configure nodes and corresponding properties, preprocessor and post processor.\n- **Edge configuration** : Connect the nodes configured above with or without conditions. \n- **Output configuration** : Configuration for data tranformation before writing the data into sink.\n\nA node is defined by the node module, supporting types like LLM call, multiple LLM call, lambda node, and sampler node.  \n\nLLM-based nodes require a model configured in `models.yaml` and runtime parameters. Sampler nodes pick random samples from static YAML lists. For custom node types, you can implement new nodes in the platform.\n\nAs of now, LLM inference is supported for TGI, vLLM, OpenAI, Azure, Azure OpenAI, Ollama and Triton compatible servers. Model deployment is external and configured in `models.yaml`.\n\n\u003c!-- ![SygraComponents](https://raw.githubusercontent.com/ServiceNow/SyGra/refs/heads/main/docs/resources/images/sygra_usecase2framework.png) --\u003e\n\n\n## Contact\n\nTo contact us, please send us an [email](mailto:sygra_team@servicenow.com)!\n\n## License\n\nThe package is licensed by ServiceNow, Inc. under the Apache 2.0 license. See [LICENSE](LICENSE) for more details.\n\n---\n\n**Questions?**  \nOpen an [issue](https://github.com/ServiceNow/SyGra/issues) or start a [discussion](https://github.com/ServiceNow/SyGra/discussions)! Contributions are welcome.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fservicenow%2Fsygra","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fservicenow%2Fsygra","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fservicenow%2Fsygra/lists"}