{"id":20643885,"url":"https://github.com/adjust/kafka_fdw","last_synced_at":"2025-08-15T14:32:01.900Z","repository":{"id":26555818,"uuid":"107690710","full_name":"adjust/kafka_fdw","owner":"adjust","description":"kafka foreign database wrapper for postresql","archived":false,"fork":false,"pushed_at":"2024-06-17T13:20:28.000Z","size":946,"stargazers_count":104,"open_issues_count":6,"forks_count":19,"subscribers_count":67,"default_branch":"master","last_synced_at":"2024-12-08T04:22:21.977Z","etag":null,"topics":["adjust-pg-extension"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"postgresql","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/adjust.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-10-20T14:51:35.000Z","updated_at":"2024-11-28T16:33:36.000Z","dependencies_parsed_at":"2023-11-22T13:39:12.665Z","dependency_job_id":"42585557-9e38-4af1-ac15-e412aeda3502","html_url":"https://github.com/adjust/kafka_fdw","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adjust%2Fkafka_fdw","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adjust%2Fkafka_fdw/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adjust%2Fkafka_fdw/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adjust%2Fkafka_fdw/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/adjust","download_url":"https://codeload.github.com/adjust/kafka_fdw/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229920918,"owners_count":18144863,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adjust-pg-extension"],"created_at":"2024-11-16T16:14:11.933Z","updated_at":"2024-12-16T07:07:28.170Z","avatar_url":"https://github.com/adjust.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Kafka Foreign Data Wrapper for PostgreSQL\n\n[![build](https://github.com/adjust/kafka_fdw/actions/workflows/ci_dockerfile.yml/badge.svg)](https://github.com/adjust/kafka_fdw/actions/workflows/ci_dockerfile.yml)\n\nAt this point the project is not yet production ready.\nUse with care. Pull requests welcome\n\n\nA simple  foreign data wrapper for Kafka which allows it to be treated as\na table.\n\nCurrently kafka_fdw allows message parsing in csv and json format.\nMore might come in a future release.\n\n\n## Build\n\nThe FDW uses the librdkafka C client library. https://github.com/edenhill/librdkafka\nto build against installed librdkafka and postgres run\n`make \u0026\u0026 make install`\n\nto run test\n\n`make installcheck`\n\nnot this runs an integration test against an asumed running\nkafka on localhost:9092 with zookeeper on  localhost:2181\nsee `test/init_kafka.sh`\n\n\n## Usage\n\nCREATE SERVER must specify a brokerlist using option `brokers`\n```SQL\nCREATE SERVER kafka_server\nFOREIGN DATA WRAPPER kafka_fdw\nOPTIONS (brokers 'localhost:9092');\n```\n\nCREATE USER MAPPING\n```SQL\nCREATE USER MAPPING FOR PUBLIC SERVER kafka_server;\n```\n\nCREATE FOREIGN TABLE\nmust specify the two meta columns for partition and offset.\nThese can be named abritrary just must be specified wich is what using options.\nNote offset is a sql reserved keyword so naming a column `offset` needs quotation\nwhen used.\nThe remaining columns must match the expected csv message format.\nFor more usage options see test/expected\n\n```\nCREATE FOREIGN TABLE kafka_test (\n    part int OPTIONS (partition 'true'),\n    offs bigint OPTIONS (offset 'true'),\n    some_int int,\n    some_text text,\n    some_date date,\n    some_time timestamp\n)\nSERVER kafka_server OPTIONS\n    (format 'csv', topic 'contrib_regress', batch_size '30', buffer_delay '100');\n```\n\nThe offset and partition columns are special.  Due to the way Kafka works, we _should_\nspecify these on all queries.\n\n\n## Notes on Supported Formats\n\n### CSV\n\nCSV, like a PostgreSQL relation, represents data as a series of tuples.  In this respect\nthe mapping is fairly straight forward.  We use position to map to columns.  What CSV lacks'\nhowever is any sort of schema enforcement between rows, to ensure that all values of a\nparticular column have the same data types, and other schema checks we expect from a relational\ndatabase.  For this reason, it is important to ask how much one trusts the schema enforcement\nof the writers.  If the schema enforcement is trusted then you can assume that bad data should\nthrow an error.  But if it is not, then the error handling options documented here should be\nused to enforce schema on read and skip but flag malformed rows.\n\nOn one side you can use `strict 'true'` if the format will never change and you fully trust\nthe writer to properly enforce schemas.  If you trust the writer to always be correct and allow\nnew columns to be added on to the end, however, you should leave this setting off.\n\nIf you do not trust the writer and wish to enforce schema on read only, then set a column with\nthe option junk 'true'` and another with the option `junk_error 'true'`.\n\n## JSON\n\nJSON has many of the same schema validation issues that CSV does but there are tools and standards\nto validate and check JSON documents against schema specifications.  Thus the same error handling\nrecommendations that apply to CSV above apply here.\n\nMapping JSON fields to the relation fields is somewhat less straight forward than it with CSV.  JSON\nobjects represent key/value mappings in an arbitrary order.  For JSON we apply a mapping of the\ntupple attribute name to the JSON object key name.  For JSON tables one uses the json option to specify\nthe json property mapped to.\n\nThe example in our test script is:\n\n```\nCREATE FOREIGN TABLE kafka_test_json (\n    part int OPTIONS (partition 'true'),\n    offs bigint OPTIONS (offset 'true'),\n    some_int int OPTIONS (json 'int_val'),\n    some_text text OPTIONS (json 'text_val'),\n    some_date date OPTIONS (json 'date_val'),\n    some_time timestamp OPTIONS (json 'time_val')\n)\n\nSERVER kafka_server OPTIONS\n    (format 'json', topic 'contrib_regress_json', batch_size '30', buffer_delay '100');\n```\n\nHere you can see that a message on partition 2, with an offset of 53 containing the document:\n\n```\n{\n   \"text_val\": \"Some arbitrary text, apparently\",\n   \"date_val\": \"2011-05-04\",\n   \"int_val\": 3,\n   \"time_val\": \"2011-04-14 22:22:22\"\n}\n```\n\nwould be turned into\n\n(2, 13, 3, \"Some text, apparently\", 2011-05-04, \"2011-04-14 22:22:22\")\n\nas a row in the above table.\n\nCurrently the Kafka FDW does not support series of JSON arrays, only JSON objects.  JSON arrays\nin objects can be presented as text or JSON/JSONB fields, however.\n\n\n## Querying\n\nWith the defined meta columns you can query like so:\n\n```\nSELECT * FROM kafka_test WHERE part = 0 AND offs \u003e 1000 LIMIT 60;\n```\n\nHere offs is the offset column. And defaults to  offset beginning.\nWithout any partition specified all partitions will be scanned.\n\nQuerying across partitions could be done as well.\n\n```\nSELECT * FROM kafka_test WHERE (part = 0 AND offs \u003e 100) OR (part = 1 AND offs \u003e 300) OR (part = 3 AND offs \u003e 700)\n```\n\n## Error handling\n\nThe default for consuming kafka data is not very strict i.e. to less columns\nwill be assumed be NULL and to many will be ignored.\nIf you don't like this behaviour you can enable strictness via table options\n`strict 'true'`. Thus any such column will error out the query.\nHowever invalid or unparsable data e.g. text for numeric data or invalid date\nor such will still error out per default. To ignore such data you can pass\n`ignore_junk 'true'` as table options and these columns will be set to NULL.\nAlternatively you can add table columns with the attributes\n`junk 'true'` and / or `junk_error 'true'`. While fetching data kafka_fdw\nwill then put the whole payload into the junk column and / or the errormessage(s)\ninto the junk_error column.\nsee test/sql/junk_test.sql for a usage example.\n\n\n## Producing\n\nInserting Data into kafka works with INSERT statements. If you provide the partition\nas a values that will be user otherwise kafkas builtin partitioner will select partition.\n\n\nadd partition as a value\n\n```\nINSERT INTO kafka_test(part, some_int, some_text)\nVALUES\n    (0, 5464565, 'some text goes into partition 0'),\n    (1, 5464565, 'some text goes into partition 1'),\n    (0, 5464565, 'some text goes into partition 0'),\n    (3, 5464565, 'some text goes into partition 3'),\n    (NULL, 5464565, 'some text goes into partition selected by kafka');\n```\nuse built in partitioner\n\n```\nINSERT INTO kafka_test(some_int, some_text)\nVALUES\n    (5464565, 'some text goes into partition selected by kafka');\n```\n\n### Testing\n\nis currently broken I can't manage to have a proper repeatable topic setup\n\n### Development\n\nAlthough it works when used properly we need way more error handling.\nBasically more test are needed for inapproiate usage like\nno topic specified, topic doesn't exist, no partition and offsetcolumn defined\nwrong format specification and stuff that might come.\n\n### Future\n\nThe idea is to make the FDW more flexible in usage\n\n* specify other formats like protobuf or binary\n\n* specify encoding\n\n* optimize performance with check_selective_binary_conversion\n    i.e. WHEN just a single column is projected like\n        SELECT one_coll FROM forein_table WHERE ...\n    we won't need to take the effort to convert all columns\n\n* better cost and row estmate\n\n* some analyze options would be nice\n\n* parallelism\n    with multiple partitions we could theoretically consum them\n    in parallel\n....\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadjust%2Fkafka_fdw","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadjust%2Fkafka_fdw","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadjust%2Fkafka_fdw/lists"}