{"id":29055715,"url":"https://github.com/crunchydata/pg_incremental","last_synced_at":"2026-04-09T09:25:15.407Z","repository":{"id":269030970,"uuid":"866063093","full_name":"CrunchyData/pg_incremental","owner":"CrunchyData","description":"Incremental Data Processing in PostgreSQL","archived":false,"fork":false,"pushed_at":"2025-12-12T09:52:01.000Z","size":67,"stargazers_count":223,"open_issues_count":3,"forks_count":9,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-03-27T22:57:12.342Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://www.crunchydata.com/","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"postgresql","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CrunchyData.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-10-01T15:28:42.000Z","updated_at":"2026-03-16T10:50:50.000Z","dependencies_parsed_at":null,"dependency_job_id":"a3b2e66d-c4d5-433e-8e1c-b2377712a3f2","html_url":"https://github.com/CrunchyData/pg_incremental","commit_stats":null,"previous_names":["crunchydata/pg_incremental"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/CrunchyData/pg_incremental","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CrunchyData%2Fpg_incremental","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CrunchyData%2Fpg_incremental/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CrunchyData%2Fpg_incremental/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CrunchyData%2Fpg_incremental/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CrunchyData","download_url":"https://codeload.github.com/CrunchyData/pg_incremental/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CrunchyData%2Fpg_incremental/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31290944,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-01T13:12:26.723Z","status":"ssl_error","status_checked_at":"2026-04-01T13:12:25.102Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-27T04:30:26.368Z","updated_at":"2026-04-09T09:25:15.384Z","avatar_url":"https://github.com/CrunchyData.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pg\\_incremental: Incremental Data Processing in PostgreSQL\n\npg\\_incremental is a simple extension that helps you do fast, reliable, incremental batch processing in PostgreSQL.\n\nWhen storing an append-only stream of event data in PostgreSQL (e.g. IoT, time series), a common challenge is to process only the new data. For instance, you might want to create one or more summary tables containing pre-aggregated data, and insert or update aggregates as new data arrives. However, you cannot really know the data that is still being inserted by concurrent transactions, and immediately aggregating data when inserting (e.g. via triggers) is certain to create a concurrency bottleneck. You also want to make sure that all new events are processed successfully exactly once, even when queries fail.\n\nA similar challenge exists with data coming in from cloud storage systems. New files show up in object storage, and they need to get processed or loaded into a table, exactly once.\n\nWith pg\\_incremental, you define a pipeline with a parameterized query. The pipeline is executed for all existing data when created, and then periodically executed. If there is new data, the query is executed with parameter values that correspond to the new data. Depending on the type of pipeline, the parameters could reflect a new range of sequence values, a new time range, or a new file.\n\n```sql\n-- Periodically aggregate rows inserted into the events table into an events_agg table\nselect incremental.create_sequence_pipeline('event-aggregation', 'events', $$\n  insert into events_agg\n  select date_trunc('day', event_time), count(*)\n  from events\n  where event_id between $1 and $2\n  group by 1\n  on conflict (day) do update set event_count = events_agg.event_count + excluded.event_count;\n$$);\n```\n\nThe internal progress tracking is done in the same transaction as the command, which ensures exactly once delivery.\n\nWhile there are much more sophisticated approaches to this problem like incremental materialized views or logical decoding-based solutions, they come with many limitations and a lack of flexibility. We felt the need for a simple, fire-and-forget tool that gets the job done without a lot of boilerplate.\n\n## Build and install\n\npg\\_incremental depends on [pg\\_cron](https://github.com/citusdata/pg_cron), which needs to be installed first.\n\nTo build and install pg\\_incremental from source:\n\n```bash\ngit clone https://github.com/crunchydata/pg_incremental.git\ncd pg_incremental\n# Ensure pg_config is in your path, e.g.\nexport PATH=/usr/pgsql-17/bin:$PATH\nmake \u0026\u0026 sudo PATH=$PATH make install\n```\n\nOnce the extension is installed, you can create the extension In PostgreSQL:\n```sql\ncreate extension pg_incremental cascade;\n\n/* user needs pg_cron permission to create pipelines */\ngrant usage on schema cron to application;\n```\n\nYou can only create pg\\_incremental in the database that has pg\\_cron.\n\n## Running tests in Docker\n\nYou can run the full SQL regression suite (`installcheck`: `sequence`, `time_interval`, and `file_list`) without installing PostgreSQL or pg\\_cron on your machine. The repo includes a Docker Compose setup with one image per major PostgreSQL version: services `postgres-17` and `postgres-18`, each with pg\\_cron preloaded and the right `cron.database_name` for the test database.\n\n**Requirements:** [Docker](https://docs.docker.com/get-docker/) with Compose v2 (`docker compose`). The first run needs network access to pull base images and build dependencies.\n\nFrom the repository root:\n\n```bash\n./docker/run-tests.sh\n```\n\nBy default this runs `installcheck` against **PostgreSQL 17 and 18** in sequence (each build uses its own data volume). Set `PG_VERSIONS` to restrict or reorder majors, for example `PG_VERSIONS=18 ./docker/run-tests.sh` or `PG_VERSIONS=\"17 18\" ./docker/run-tests.sh`.\n\nThe test image builds [pg\\_cron](https://github.com/citusdata/pg_cron) from source (see `PG_CRON_REF` in `docker/Dockerfile`); the pinned tag is new enough to compile against PostgreSQL 18 as well as 17. The official PostgreSQL **18** Docker image uses a different data volume layout than 17; `docker/docker-compose.yml` mounts accordingly.\n\nFor each selected version, the script builds the image if needed, starts one container, copies the source tree to `/tmp` inside the container (the repo bind-mount is read-only, so your working tree is not modified), builds and installs the extension, runs `make installcheck`, then tears down that container and its data volume. Success ends with per-version `installcheck: OK` and a final line listing all versions that passed.\n\nTo exercise the same flow manually, see comments in `docker/docker-compose.yml` (the default mount is `:ro`, so prefer `run-tests.sh` over running `make` directly on `/work` in the container).\n\n## Creating incremental processing pipelines\n\nThere are 3 types of pipelines in pg\\_incremental\n\n- **Sequence pipelines** - The pipeline query is executed for a range of sequence values, with a mechanism to ensure that no more new sequence values will fall in the range. These pipelines are most suitable for incrementally building summary tables.\n- **Time interval pipelines** - The pipeline query is executed for a time interval or range of time intervals, after the time interval has passed. These pipelines can be used for incrementally building summary tables or periodically exporting new data.\n- **File list pipelines** - The pipeline query is executed for a new file obtained from a file list function. These pipelines can be used to import new data.\n\nEach pipeline has a command with 1 or 2 parameters. The pipelines run periodically using [pg\\_cron](https://github.com/citusdata/pg_cron) (every minute, by default) and execute the command only if there is new data to process. However, each pipeline execution will appear in `cron.job_run_details` regardless of whether there is new data.\n\nWe describe each type of pipeline below.\n\n### Creating a sequence pipeline\n\nYou can define a sequence pipeline with the `incremental.create_sequence_pipeline` function by specifying a generic pipeline name, the name of a source table name with a sequence or an explicit sequence name, and a command. The command you pass will be executed in a context where `$1` and `$2` are set to the lowest and highest value of a range of sequence values that can be safely aggregated (bigint).\n\nExample:\n```sql\n-- create a source table\ncreate table events (\n  event_id bigint generated always as identity,\n  event_time timestamptz default now(),\n  client_id bigint,\n  path text,\n  response_time double precision\n);\n\n-- BRIN indexes are highly effective in selecting new ranges\ncreate index on events using brin (event_id);\n\n-- generate some random inserts\ninsert into events (client_id, path, response_time)\nselect s % 100, '/page-' || (s % 3), random() from generate_series(1,1000000) s;\n\n-- create a summary table to pre-aggregate the number of events per day\ncreate table events_agg (\n  day timestamptz,\n  event_count bigint,\n  primary key (day)\n);\n\n-- create a pipeline to aggregate new inserts from a postgres table using a sequence\n-- $1 and $2 will be set to the lowest and highest (inclusive) sequence values that can be aggregated\n\nselect incremental.create_sequence_pipeline('event-aggregation', 'events', $$\n  insert into events_agg\n  select date_trunc('day', event_time), count(*)\n  from events\n  where event_id between $1 and $2\n  group by 1\n  on conflict (day) do update set event_count = events_agg.event_count + excluded.event_count;\n$$);\n```\n\nWhen creating the pipeline, the command is executed immediately for all sequence values starting from 0. Immediate execution can be disabled by passing `execute_immediately := false`, in which case the first execution will happen as part of periodic job scheduling.\n\nThe pipeline execution ensures that the range of sequence values is known to be safe, meaning that there are no more transactions that might produce sequence values that are within the range. This is ensured by waiting for concurrent write transactions before proceeding with the command. The size of the range is effectively the number of inserts since the last time the pipeline was executed up to the moment that the new pipeline execution started. This technique was first introduced on the [Citus Data blog](https://www.citusdata.com/blog/2018/06/14/scalable-incremental-data-aggregation/) by the author of this extension.\n\n#### Limiting batch size\n\nBy default, sequence pipelines process all available sequence values in a single execution. For scenarios where large batches of data are uploaded at once (e.g., daily bulk imports), you can use the `max_batch_size` parameter to limit how many sequence IDs are processed per execution:\n\n```sql\n-- Process at most 10,000 events per execution to avoid long-running transactions\nselect incremental.create_sequence_pipeline(\n  'event-aggregation',\n  'events',\n  $$\n    insert into events_agg\n    select date_trunc('day', event_time), count(*)\n    from events\n    where event_id between $1 and $2\n    group by 1\n    on conflict (day) do update set event_count = events_agg.event_count + excluded.event_count;\n  $$,\n  schedule := '* * * * *',     -- Run every minute\n  max_batch_size := 10000      -- Process max 10k rows per run\n);\n```\n\nWith `max_batch_size` set, if 100,000 new events arrive, the pipeline will process them in chunks of 10,000 over multiple executions rather than all at once. This helps to:\n- Avoid long-running transactions\n- Provide more predictable resource usage\n- Allow incremental progress on large data uploads\n\nThe benefit of sequence pipelines is that they can process the data in small incremental steps and it is agnostic to where the timestamps used in aggregations came from (i.e. late data is fine). The downside is that you almost always have to merge aggregates using an ON CONFLICT clause, and there are situations where that is not possible (e.g. exact distinct counts).\n\nArguments of the `incremental.create_sequence_pipeline` function:\n\n| Argument name         | Type     | Description                                        | Default                      |\n| --------------------- | -------- | -------------------------------------------------- | ---------------------------- |\n| `pipeline_name`       | text     | Name of the pipeline that acts as an identifier    | Required                     |\n| `sequence_name`       | regclass | Name of a sequence or table with a sequence        | Required                     |\n| `command`             | text     | Pipeline command with $1 and $2 parameters         | Required                     |\n| `schedule`            | text     | pg\\_cron schedule for periodic execution (or NULL) | `* * * * *` (every minute)   |\n| `max_batch_size`      | bigint   | Maximum number of sequence IDs to process per run  | NULL (unlimited)             |\n| `execute_immediately` | bool     | Execute command immediately for existing data      | `true`                       |\n\n### Creating a time interval pipeline\n\nYou can define a time interval pipeline with the `incremental.create_time_interval_pipeline` function by specifying a generic pipeline name, an interval, and a command. The command will be executed in a context where `$1` and `$2` are set to the start and end (exclusive) of a range of time intervals that has passed (timestamptz).\n\nExample:\n```sql\n-- continuing with the data model from the previous section, but with a time range pipeline\n\n-- BRIN indexes are highly effective in selecting new ranges\ncreate index on events using brin (event_time);\n\n-- create a pipeline to aggregate new inserts using a 1 day interval\n-- $1 and $2 will be set to the start and end (exclusive) of a range of time intervals that can be aggregated\nselect incremental.create_time_interval_pipeline('event-aggregation', '1 day', $$\n  insert into events_agg\n  select event_time::date, count(distinct event_id)\n  from events\n  where event_time \u003e= $1 and event_time \u003c $2\n  group by 1\n$$);\n```\n\nWhen creating the pipeline, the command is executed immediately for the time starting from 2000-01-01 00:00:00 (configurable using the `start_time` argument). Immediate execution can be disabled by passing `execute_immediately := false`, in which case the first execution will happen as part of periodic job scheduling.\n\nThe command is executed after a time interval has passed. If the interval is 1 day, then the data for the previous day is typically processed at 00:01:00 (delay is configurable). If the query fails multiple times, the range may expand to cover multiple unprocessed intervals, except when using `batched := false`.\n\nWhen using `batched := false`, the command is executed separately for each time interval. This can be useful to periodically export a specific time interval. It's important to pick a `start_time` that's close to the lowest timestamp in the data to avoid executing the command many times redundantly for intervals that have passed but have no data.\n\n```sql\n-- define an export function that wraps a COPY command\ncreate function export_events(start_time timestamptz, end_time timestamptz)\nreturns void language plpgsql as $function$\ndeclare\n  path text := format('/tmp/export/%s.csv', start_time::date);\nbegin\n  execute format($$copy (select * from events where event_time \u003e= start_time and event_time \u003c end_time) to %L$$, path);\nend;\n$function$;\n\n-- Export events daily to a CSV file, starting from 2024-11-01\n-- The command is executed separately for each interval\nselect incremental.create_time_interval_pipeline('event-export',\n  time_interval := '1 day',\n  batched := false,\n  start_time := '2024-11-01',\n  command := $$ select export_events($1, $2) $$\n);\n```\n\nThe pipeline execution logic can also ensure that the range of time intervals is safe, _if the timestamp is generated by the database using now() and assuming no large clock jumps_ (usually safe in cloud environments). In that case, the caller should set the `source_table_name` argument to the name of the source table. The pipeline execution will then wait for concurrent writers to finish before executing the command.\n\n```sql\n-- create a pipeline to aggregate new inserts using a 1 day interval\n-- also ensure that there are no uncommitted event_time values in the range by specifying source_table_name\nselect incremental.create_time_interval_pipeline('event-aggregation',\n  time_interval := '1 day',\n  source_table_name := 'events',\n  command := $$\n    ...\n  $$);\n```\n\nThe benefit of time interval pipelines is that they are easier to define and can do more complex processing such as exact distinct counts and are also more suitable for exporting data because the command always processes exact time ranges. The downside is that you need to wait until after a time interval passes to see results and inserting old timestamps may cause data to be skipped. Sequence pipelines are more reliable in that sense because the values are always generated by the database.\n\nArguments of the `incremental.create_time_range_pipeline` function:\n\n| Argument name         | Type        | Description                                        | Default                    |\n| --------------------- | ----------- | -------------------------------------------------- | -------------------------- |\n| `pipeline_name`       | text        | User-defined name of the pipeline                  | Required                   |\n| `time_interval`       | interval    | At which interval to execute the pipeline          | Required                   |\n| `command`             | text        | Pipeline command with $1 and $2 parameters         | Required                   |\n| `batched`             | text        | Whether to run the command for multiple intervals  | `true`                     |\n| `start_time`          | timestamptz | Time from which the intervals start                | `2000-01-01 00:00:00`      |\n| `source_table_name`   | regclass    | Wait for lockers of this table before aggregation  | NULL (no waiting)          |\n| `schedule`            | text        | pg\\_cron schedule for periodic execution (or NULL) | `* * * * *` (every minute) |\n| `min_delay`           | interval    | How long to wait to process a past interval        | `30 seconds`               |\n| `execute_immediately` | bool        | Execute command immediately for existing data      | `true`                     |\n\n### Creating a file list pipeline\n\nUpgrading from extension version **1.4** to **1.5** runs `pg_incremental--1.4--1.5.sql`: it refreshes `_drop_extension_trigger` (including the `pg_cron` guard for `DROP EXTENSION`) and adds **`max_batches_per_run`** to `incremental.file_list_pipelines` and `incremental.create_file_list_pipeline`. Use `ALTER EXTENSION pg_incremental UPDATE TO '1.5';`.\n\nYou can define a file list pipeline with the `incremental.create_file_list_pipeline` function by specifying a generic pipeline name, a file pattern, and a command. When the pipeline is not batched, the command runs with `$1` set to the path of a file (`text`). When batched, `$1` is a `text[]` of paths. Each call to `incremental.execute_pipeline` (or each pg\\_cron run) lists unprocessed paths from your list function and runs the command up to **`max_batches_per_run`** times in that invocation: `-1` (default) means no limit—process every file (every batch when batched) in that run; a positive integer caps how many batch iterations run—each iteration is one file when not batched, or one array batch when batched. Remaining paths wait for the next run.\n\nExample:\n```sql\n-- define an import function that wraps a COPY command\ncreate function import_events(path text)\nreturns void language plpgsql as $function$\nbegin\n\texecute format($$copy events from %L$$, path);\nend;\n$function$;\n\n-- create a pipeline to import new files into a table, one by one.\n-- $1 will be set to the path of a new file\nselect incremental.create_file_list_pipeline('event-import', 's3://mybucket/events/inbox/*.csv', $$\n   select import_events($1)\n$$);\n```\n\nThe API of the file list pipeline is still subject to change. It currently defaults to using the [`crunchy_lake.list_files` function](https://docs.crunchybridge.com/warehouse/data-lake#explore-your-object-store-files) function in [Crunchy Data Warehouse](https://www.crunchydata.com/products/warehouse). You can set the list function to another set-returning function that returns a `path` value as `text`.\n\nArguments of the `incremental.create_file_list_pipeline` function:\n\n| Argument name         | Type        | Description                                         | Default                            |\n| --------------------- | ----------- | --------------------------------------------------- | ---------------------------------- |\n| `pipeline_name`       | text        | User-defined name of the pipeline                   | Required                           |\n| `file_pattern`        | text        | File pattern to pass to the list function           | Required                           |\n| `command`             | text        | Pipeline command; `$1` is file path (`text`) or path array (`text[]`) when batched | Required                           |\n| `list_function`       | text        | Name of the function used to list files             | `crunchy_lake.list_files`          |\n| `batched`             | bool        | Whether to pass in a batch of files as an array     | `false`                            |\n| `max_batch_size`      | int         | If batched, maximum length of the array             | 100                                |\n| `schedule`            | text        | pg\\_cron schedule for periodic execution (or NULL)  | `*/15 * * * *` (every 15 minutes)  |\n| `execute_immediately` | bool        | Execute command immediately for existing data       | `true`                             |\n| `max_batches_per_run` | int         | Max batch iterations per `execute_pipeline` call: `-1` = no limit (process full backlog in that run); a positive integer caps how many files (non-batched) or array batches (batched) run in that call | `-1`                               |\n\nInstead of using the argument, you can also change the default list function via the `incremental.default_file_list_function` setting:\n\n```sql\n-- change the default file list function (note: this function name is an example and not included in pg_incremental)\nset incremental.default_file_list_function to 'public.list_local_files';\n```\n\nIf you have a faulty file, you can skip it by running the `incremental.skip_file` function. It will be treated as already-processed and therefore skipped in future runs.\n```sql\n-- skip a file that contains errors\nselect incremental.skip_file('event-import', 's3://mybucket/events/inbox/00048.csv');\n```\n\n## Monitoring pipelines\n\nThere are two ways to monitor pipelines: \n\n1) via tables corresponding to each pipeline type: `incremental.sequence_pipelines`, `incremental.time_interval_pipelines`, and `incremental.processed_files`\n2) via `cron.job_run_details` to check for errors\n\nSee the last processed sequence number in a sequence pipeline:\n\n```sql\nselect * from incremental.sequence_pipelines ;\n┌─────────────────────┬────────────────────────────┬────────────────────────────────┬────────────────┐\n│    pipeline_name    │       sequence_name        │ last_processed_sequence_number │ max_batch_size │\n├─────────────────────┼────────────────────────────┼────────────────────────────────┼────────────────┤\n│ view-count-pipeline │ public.events_event_id_seq │                        3000000 │                │\n│ event-aggregation   │ events_event_id_seq        │                        1000000 │          10000 │\n└─────────────────────┴────────────────────────────┴────────────────────────────────┴────────────────┘\n```\n\nSee the last processed time interval in a time interval pipeline:\n\n```sql\nselect * from incremental.time_interval_pipelines;\n┌───────────────┬───────────────┬─────────┬───────────┬────────────────────────┐\n│ pipeline_name │ time_interval │ batched │ min_delay │  last_processed_time   │\n├───────────────┼───────────────┼─────────┼───────────┼────────────────────────┤\n│ export-events │ 1 day         │ f       │ 00:00:30  │ 2024-12-17 00:00:00+01 │\n└───────────────┴───────────────┴─────────┴───────────┴────────────────────────┘\n```\n\nSee the processed files in a file list pipeline:\n```sql\nselect * from incremental.file_list_pipelines ;\n┌───────────────┬─────────────────────────────────────┬─────────┬─────────────────────────┬────────────────┬──────────────────────────┐\n│ pipeline_name │            file_pattern             │ batched │      list_function      │ max_batch_size │ max_batches_per_run │\n├───────────────┼─────────────────────────────────────┼─────────┼─────────────────────────┼────────────────┼─────────────────────┤\n│ event-import  │ s3://marco-crunchy-data/inbox/*.csv │ f       │ crunchy_lake.list_files │                │                  -1 │\n└───────────────┴─────────────────────────────────────┴─────────┴─────────────────────────┴────────────────┴─────────────────────┘\n\nselect * from incremental.processed_files ;\n┌───────────────┬────────────────────────────────────────────┐\n│ pipeline_name │                    path                    │\n├───────────────┼────────────────────────────────────────────┤\n│ event-import  │ s3://marco-crunchy-data/inbox/20241215.csv │\n│ event-import  │ s3://marco-crunchy-data/inbox/20241215.csv │\n└───────────────┴────────────────────────────────────────────┘\n```\n\nFor all pipelines, you can check the outcome of the underlying [pg_cron](https://github.com/citusdata/pg_cron) job and any error messages.\n```sql\nselect jobname, start_time, status, return_message\nfrom cron.job_run_details join cron.job using (jobid)\nwhere jobname like 'pipeline:event-import%' order by 1 desc limit 3;\n┌───────────────────────┬───────────────────────────────┬───────────┬────────────────┐\n│        jobname        │          start_time           │  status   │ return_message │\n├───────────────────────┼───────────────────────────────┼───────────┼────────────────┤\n│ pipeline:event-import │ 2024-12-17 13:27:00.090057+01 │ succeeded │ CALL           │\n│ pipeline:event-import │ 2024-12-17 13:26:00.055813+01 │ succeeded │ CALL           │\n│ pipeline:event-import │ 2024-12-17 13:25:00.086688+01 │ succeeded │ CALL           │\n└───────────────────────┴───────────────────────────────┴───────────┴────────────────┘\n```\n\nNote that the jobs run more frequently than the pipeline command is executed. The job will simply be a noop if there is no new work to do.\n\n## Manually executing a pipeline\n\nYou can also execute a pipeline manually using the `incremental.execute_pipeline` procedure, though it will only run the command if there is new data to process.\n\n```sql\n-- call the incremental.execute_pipeline procedure using the CALL syntax\ncall incremental.execute_pipeline('event-aggregation');\n```\n\nWhen you create the pipeline, you can pass `schedule := NULL` to disable periodic scheduling, such that you can perform all executions manually.\n\nArguments of the `incremental.execute_pipeline` function:\n\n| Argument name         | Type        | Description                                       | Default                     |\n| --------------------- | ----------- | ------------------------------------------------- | --------------------------- |\n| `pipeline_name`       | text        | User-defined name of the pipeline                 | Required                    |\n\n\n## Resetting an incremental processing pipelines\n\nIf you need to rebuild an aggregation you can reset a pipeline to the beginning using the `incremental.reset_pipeline` function.\n```sql\n-- Clear the summary table and reset the pipeline to rebuild it\nbegin;\ndelete from events_agg;\nselect incremental.reset_pipeline('event-aggregation');\ncommit;\n```\nThe pipeline will be executed from the start. If execution fails, the pipeline is not reset.\n\nArguments of the `incremental.reset_pipeline` function:\n\n| Argument name         | Type        | Description                                       | Default                     |\n| --------------------- | ----------- | ------------------------------------------------- | --------------------------- |\n| `pipeline_name`       | text        | User-defined name of the pipeline                 | Required                    |\n| `execute_immediately` | bool        | Execute command immediately for existing data     | `true`                      |\n\n## Dropping an incremental processing pipelines\n\nWhen you are done with a pipeline, you can drop it using `incremental.drop_pipline(..)`:\n```sql\n-- Drop the pipeline\nselect incremental.drop_pipeline('event-aggregation');\n```\n\nArguments of the `incremental.drop_pipeline` function:\n\n| Argument name         | Type        | Description                                       | Default                     |\n| --------------------- | ----------- | ------------------------------------------------- | --------------------------- |\n| `pipeline_name`       | text        | User-defined name of the pipeline                 | Required                    |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrunchydata%2Fpg_incremental","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcrunchydata%2Fpg_incremental","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrunchydata%2Fpg_incremental/lists"}