{"id":13562536,"url":"https://github.com/bytedance/bitsail","last_synced_at":"2025-05-15T11:06:38.678Z","repository":{"id":62143124,"uuid":"543010228","full_name":"bytedance/bitsail","owner":"bytedance","description":"BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.","archived":false,"fork":false,"pushed_at":"2024-01-01T15:59:00.000Z","size":27729,"stargazers_count":1654,"open_issues_count":111,"forks_count":331,"subscribers_count":62,"default_branch":"master","last_synced_at":"2025-04-14T19:57:02.721Z","etag":null,"topics":["big-data","data-integration","data-lake","data-pipeline","data-synchronization","flink","high-performance","real-time"],"latest_commit_sha":null,"homepage":"https://bytedance.github.io/bitsail/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bytedance.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-09-29T08:39:36.000Z","updated_at":"2025-04-11T04:29:43.000Z","dependencies_parsed_at":"2024-01-01T16:44:40.821Z","dependency_job_id":null,"html_url":"https://github.com/bytedance/bitsail","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2Fbitsail","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2Fbitsail/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2Fbitsail/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2Fbitsail/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bytedance","download_url":"https://codeload.github.com/bytedance/bitsail/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254328385,"owners_count":22052632,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","data-integration","data-lake","data-pipeline","data-synchronization","flink","high-performance","real-time"],"created_at":"2024-08-01T13:01:09.689Z","updated_at":"2025-05-15T11:06:38.659Z","avatar_url":"https://github.com/bytedance.png","language":"Java","readme":"\u003c!--\n\nCopyright 2022-2023 Bytedance Ltd. and/or its affiliates.\n         \nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n\n--\u003e\n\n![logo](website/images/bitsail_logo.png)\n\nEnglish | [简体中文](README_zh.md)\n\n[![Build](https://github.com/bytedance/bitsail/actions/workflows/cicd.yml/badge.svg)](https://github.com/bytedance/bitsail/actions/workflows/cicd.yml)\n[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)\n[![Join Slack](https://img.shields.io/badge/slack-%23BitSail-72eff8?logo=slack\u0026color=5DADE2\u0026label=Join%20Slack)](https://join.slack.com/t/bitsailworkspace/shared_invite/zt-1l1vgcnlj-gPSWqggOeRHrSO5l7na2WQ)\n[![Website](https://img.shields.io/badge/Website-%23BitSail-blue)](https://bytedance.github.io/bitsail/)\n## Introduction\nBitSail is ByteDance's open source data integration engine which is based on distributed architecture and provides high performance. It supports data synchronization between multiple heterogeneous data sources, and provides global data integration solutions in batch, streaming, and incremental scenarios. At present, it serves almost all business lines in ByteDance, such as Douyin, Toutiao, etc., and synchronizes hundreds of trillions of data every day.\n\nOfficial website of BitSail:  https://bytedance.github.io/bitsail/\n\n## Why Do We Use BitSail\nBitSail has been widely used and supports hundreds of trillions of large traffic. At the same time, it has been verified in various scenarios such as the cloud native environment of the volcano engine and the on-premises private cloud environment.\n\nWe have accumulated a lot of experience and made a number of optimizations to improve the function of data integration\n\n- Global Data Integration, covering batch, streaming and incremental scenarios\n\n- Distributed and cloud-native architecture, supporting horizontal scaling\n\n- High maturity in terms of accuracy, stability and performance\n\n- Rich basic functions, such as type conversion, dirty data processing, flow control, data lake integration, automatic parallelism calculation\n, etc.\n\n- Task running status monitoring, such as traffic, QPS, dirty data, latency, etc.\n\n## BitSail Use Scenarios\n- Mass data synchronization in heterogeneous data sources\n\n- Streaming and batch integration data processing capability\n\n- Data lake and warehouse integration data processing capability\n\n- High performance, high reliability data synchronization\n\n- Distributed, cloud-native architecture data integration engine\n\n## Features of BitSail\n\n- Low start-up cost and high flexibility\n\n- Stream-batch integration and Data lake-warehouse integration architecture, one framework covers almost all data synchronization scenarios\n\n- High-performance, massive data processing capabilities\n\n- DDL automatic synchronization\n\n- Type system, conversion between different data source types\n\n- Engine independent reading and writing interface, low development cost\n\n- Real-time display of task progress, under development\n\n- Real-time monitoring of task status\n\n## Architecture of BitSail\n ![](website/images/bitsail_arch.png)\n\n ```\n Source[Input Sources] -\u003e Framework[Data Transmission] -\u003e Sink[Output Sinks]\n ```\nThe data processing pipeline is as follows. First, pull the source data through Input Sources, then process it through the intermediate framework layer, and finally write the data to the target through Output Sinks\n\nAt the framework layer, we provide rich functions and take effect for all synchronization scenarios, such as dirty data collection, auto parallelism calculation, task monitoring, etc.\n\nIn data synchronization scenarios, it covers batch, streaming, and incremental data synchronization\n\nIn the Runtime layer, it supports multiple execution modes, such as yarn, local, and k8s is under development\n\n## Supported Connectors\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003cth\u003eDataSource\u003c/th\u003e\n    \u003cth\u003eSub Modules\u003c/th\u003e\n    \u003cth\u003eReader\u003c/th\u003e\n    \u003cth\u003eWriter\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAssert\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e \u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eClickHouse\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eDoris\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e \u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eDruid\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e \u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eElasticsearch\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e \u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eFake\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n    \u003ctd\u003e \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eFTP/SFTP\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n    \u003ctd\u003e \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eHadoop\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eHBase\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eHive\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eHudi\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLocalFileSystem\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n    \u003ctd\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd rowspan=\"4\"\u003eJDBC\u003c/td\u003e\n    \u003ctd\u003eMySQL\u003c/td\u003e\n    \u003ctd rowspan=\"4\"\u003e✅\u003c/td\u003e\n    \u003ctd rowspan=\"4\"\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eOracle\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePostgreSQL\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eSqlServer\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eKafka\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eKudu\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLarkSheet\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n    \u003ctd\u003e \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMongoDB\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePrint\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e \u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eRedis\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e \u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eRocketMQ\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e \u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eSelectDB\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e \u003c/td\u003e\n    \u003ctd\u003e✅\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\nDocumentation for [Connectors](website/en/documents/connectors/README.md).\n\n## Community Support\n### Slack\nJoin BitSail Slack channel via this [link](https://join.slack.com/t/bitsailworkspace/shared_invite/zt-1l1vgcnlj-gPSWqggOeRHrSO5l7na2WQ)\n\n### Mailing List\nCurrently, BitSail community use Google Group as the mailing list provider.\nYou need to subscribe to the mailing list before starting a conversation\n\nSubscribe: Email to this address `bitsail+subscribe@googlegroups.com`\n\nStart a conversation: Email to this address `bitsail@googlegroups.com`\n\nUnsubscribe: Email to this address `bitsail+unsubscribe@googlegroups.com`\n\n### WeChat Group\nWelcome to scan this QR code and to join the WeChat group chat.\n\n\u003cimg src=\"website/images/wechat_QR.png\" alt=\"qr\" width=\"100\"/\u003e\n\n## Environment Setup\nLink to [Environment Setup](website/en/documents/start/env_setup.md).\n\n## Deployment Guide\nLink to [Deployment Guide](website/en/documents/start/deployment.md).\n\n## BitSail Configuration\nLink to [Configuration Guide](website/en/documents/start/config.md).\n\n## Contributing Guide\nLink to [Contributing Guide](website/en/community/contribute.md).\n\n## Contributors\n**Thanks all contributors**\u003cbr\u003e\n\n\u003ca href=\"https://github.com/bytedance/bitsail/graphs/contributors\"\u003e\n  \u003cimg src=\"https://contrib.rocks/image?repo=bytedance/bitsail\" /\u003e\n\u003c/a\u003e\n\n## License\n[Apache 2.0 License](LICENSE).\n\n","funding_links":[],"categories":["Java","大数据","\u003ca name=\"Java\"\u003e\u003c/a\u003eJava"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytedance%2Fbitsail","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbytedance%2Fbitsail","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytedance%2Fbitsail/lists"}