{"id":13576007,"url":"https://github.com/pinterest/singer","last_synced_at":"2025-04-05T05:30:33.981Z","repository":{"id":36398609,"uuid":"190267909","full_name":"pinterest/singer","owner":"pinterest","description":"A high-performance, reliable and extensible logging agent for uploading data to Kafka, Pulsar, etc.","archived":false,"fork":false,"pushed_at":"2024-08-27T18:31:35.000Z","size":1194,"stargazers_count":180,"open_issues_count":256,"forks_count":35,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-08-27T20:32:15.159Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pinterest.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-06-04T19:39:34.000Z","updated_at":"2024-08-27T18:31:29.000Z","dependencies_parsed_at":"2024-01-09T23:25:35.019Z","dependency_job_id":"3c2d5166-e2b5-4e09-bd78-7c3208596a68","html_url":"https://github.com/pinterest/singer","commit_stats":null,"previous_names":[],"tags_count":24,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pinterest%2Fsinger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pinterest%2Fsinger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pinterest%2Fsinger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pinterest%2Fsinger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pinterest","download_url":"https://codeload.github.com/pinterest/singer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247294011,"owners_count":20915329,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T15:01:06.246Z","updated_at":"2025-04-05T05:30:32.393Z","avatar_url":"https://github.com/pinterest.png","language":"Java","funding_links":[],"categories":["Java","日志库","Logging"],"sub_categories":[],"readme":"# \u003cimg src=\"docs/icons/icon-singer-sk-small.png\" alt=\"Singer logo\" width=\"22\" align=\"bottom\"\u003e \u0026nbsp; Singer\n\n## High-performance, reliable and extensible logging agent\nSinger is a high performance logging agent for uploading logs to Kafka. \nIt can also be extended to support writing to other message transporters or storage systems. \n\nSinger runs as a standalone process on the service boxes. It monitors the log directories \nby listening to file system events, and  uploads data once it detects new data.\nSinger guarantees at least one time delivery of log messages.\n\n### Key Features: \n\n- **Support thrift log format and text log format out-of-box**: \nThrift log format provides better throughput and efficiency. We highly recommend you use thrift log format \nif your logs will not be consumed directly by humans. To facilitate thrift log format usage, we \nbuild a set of client libraries in Python, Java, and Go for converting text \nlog messages into JSON and thrift formats. \n\n- **At-least-once message delivery to Kafka**: Singer will retry when it fails to upload a batch of messages.\nFor each log stream, Singer uses a watermark file to track its progress. When Singer restarts, \nit processes messages from the watermark position.\n\n- **Support logging in Kubernetes as a side-car service**.\nLogging in Kubernetes as a daemonset. Singer can monitor and upload loads from log directories of multiple Kubernetes pods.\n\n- **High throughput writes to Kafka**:\nSinger uses multiple layers of thread pool to achieve maximum parallelism. \nUsing thrift log format, Singer can achieve \u003e100MB/second writing throughput to Kafka from one host.\nSinger can process text logs at 20MB/second. \n\n- **Low latency logging**: \nSinger supports configurable processing latency and batch sizes, it can achieve \u003c5ms log uploading latency. \n\n- **Flexible partitioning**:\nSinger provides multiple partitioners for writing data to Kafka, including locality aware partitioners\nthat can avoid producer traffic across availability zones and reduce data transfer costs.\nSinger also supports customized partitioner. \n\n- **Heartbeat**:\nSinger supports sending heartbeats to a kafka topic periodically based on the configuration.\nThis allows the users to set up central monitoring of Singer instances across fleets. \n\n- **Write auditing**:\nSinger can write an audit message to another topic for each batch of messages that it writes\nto kafka. This allow users to audit Singer kafka writes. \n\n- **Extensible design**: \nSinger can be easily extended to support data uploading to custom destinations. \n\n### Detailed design\n\nPlease see [docs/DESIGN.md](docs/DESIGN.md) on Singer design.\n\n\n## Build\n\n#### Get Singer code\n\n```bash\ngit clone [git-repo-url] singer\ncd singer\n```\n\n#### Build Singer binary\n\n```bash\nmvn clean package -pl singer -am -DskipTests\n```\n\nAs there is no native support in JDK for file system events monitoring on Mac OSX, \nsome tests that run fine in the Linux environment may fail intermittently on Mac OSX. \nPlease use `-DskipTests` flag if you want to build Singer on macOS. \n\n#### Build thift-logger client library\n\n```bash\nmvn clean package -pl thrift-logger -am\n```\n\n#### Testing\n\nSinger has a set of unit tests that can be run through ```mvn test package -pl singer -am```.\n\nAn end-to-end integration test that can be run through:\n\n```bash\nmvn clean package -pl singer -am \nsinger/src/main/scripts/run_singer_tests.sh\n```\n\n## Quick Start\n\nThe [tutorial](tutorial) directory contains a demo that shows how to run Singer. Please see [tutorial/README.md](tutorial/README.md) for details.\n\n## Usage\n\n#### Use Singer client library to log data to local disk \n\nSinger uses `file inode + offset` as the watermark position to track its progress, \nand writes the watermark info to disk after it writes a batch of messages to kafka.\nIt resumes from the last watermark position after restarting. \nBecause of this, Singer requires that a log stream is a sequence of append-only log files, \nand uses **file renaming** for log rotation.\n\n**Singer does not handle log streams that use file copy and truncation for log rotation**,\nbecause Singer cannot use `file inode + offset` to uniquely identify log messages\nwhen a log file is copied and truncated.  \n\n\nFor example, we have before rotation:\n\n ```\n ls -li \n   1001    service.log      # service.log with inode 1001\n ```\n\nafter rotation\n\n```\n ls -li \n \n   1001   service.log.2018-11-30   # service.log.2018-11-30 with inode 1001 (was renamed from the old service.log)\n   1002   service.log              # (this was newly generated service.log)\n```\n\nFor logged data in plaintext format, you can directly config Singer to upload those logs. \nSinger also support high throughput logging using thrift format. \nYou can write data to local disk using `thrift-logger` library that Singer provides.\nCurrently Singer has thrift_logger libraries in Python, Java, Go, and C++. \n\nSamples on using thrift_logger libraies: \n - Java : [thrift_logger java sample](singer/src/test/java/com/pinterest/singer/e2e/LogWriter.java) \n - Python : [thrift_logger_python_sample](thrift-logger-python/tests/thrift_logger/test_thrift_logger_wrapper.py)\n\n#### Config Singer to upload data from local disk to Kafka\n\nSinger uploads data based on configuration settings. \nSinger configuration is composed of two parts: 1) `singer.properties` that configures\nglobal Singer settings, e.g. size of thread pools, daily restart settings, \nheartbeat settings, etc. 2) log stream configuration: for each set of log streams, \nsinger needs one log stream configuration to define log stream related settings. \n\nPlease see [tutorial/etc/singer](tutorial/etc/singer) for singer configurations. \n[docs/configuration_samples/sample_kubernetes](docs/configuration_samples/sample_kubernetes) has an example\non Singer configuration for Kubernetes. \n\n\n#### Run Singer\n\n```bash\njava -server  -cp $singer_home:$singer_home/lib/*:$singer_home/singer-$version.jar  \\\n     -Dlog4j.configuration=log4j.prod.properties -Dsinger.config.dir=$config_dir \\\n     com.pinterest.singer.SingerMain\n```\n\n#### Package Singer as a debian package \n\n```bash\ntar xzvf singer-${VERSION}-bin.tar.gz --directory $SINGER_DIR\ncd $BUILD_DIR\n\nfpm -s dir -t deb -n singer -v $VERSION --deb-upstart ../singer.upstart  \\\n    --deb-default ../singer.default -- .\n```\n\n#### Singer Metrics\n\nSinger exposes metrics using [Twitter Ostrich](https://github.com/twitter/ostrich) framework. \nSinger stats can be checked using the following command. Here `2047` is the  ostrich port that \nyou define in `singer.ostrichPort` configuration.\n\n```bash\ncurl -s localhost:2047/stats.txt\n```\n\n## License\n\nSinger is distributed under [Apache License, Version 2.0](LICENSE).\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpinterest%2Fsinger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpinterest%2Fsinger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpinterest%2Fsinger/lists"}