Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pier-oliviert/reimagined-octo-journey
https://github.com/pier-oliviert/reimagined-octo-journey
Last synced: about 21 hours ago
JSON representation
- Host: GitHub
- URL: https://github.com/pier-oliviert/reimagined-octo-journey
- Owner: pier-oliviert
- Created: 2022-04-13T09:04:46.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2022-04-13T09:04:47.000Z (almost 3 years ago)
- Last Synced: 2024-11-19T19:00:59.742Z (2 months ago)
- Language: Shell
- Size: 2.93 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Flow Project Template (Word Count)
This repository is an example and template for new Flow projects.
Try it in your browser by selecting **<> Code > Codespaces > New codespace**,
using [GitHub Codespaces](https://github.com/features/codespaces).
Not part of an organization with access to Codespaces? No problem. Follow the instructions
to [set up a local devcontainer](https://docs.estuary.dev/getting-started/installation#using-visual-studio-code-devcontainers)
and proceed in the same manner. Note, however, that [Mac M1](https://support.apple.com/en-us/HT211814) is not currently supported for local development.Wondering what Flow is?
See ๐ [Documentation](https://docs.estuary.dev/)
or :octocat: [github.com/estuary/flow](https://github.com/estuary/flow).
**tl;dr**: it makes it easy to continuously move and transform your data
across all the places you want it.Through the magic of
[VSCode devcontainers](https://code.visualstudio.com/docs/remote/containers) โจ,
this repository comes with a PostgreSQL
database that you interact with.## Continuous Materializations in PostgreSQL
PostgreSQL is super great, and supports materialized views,
but it doesn't offer _continuous_ materialized views.
So... let's build one with Flow?How many times have you managed text documents in PostgreSQL and thought to yourself:
> "Gee-whiz, self, I wish I had a table of word counts that was always up-to-date!"... basically never, right? Well, let's do it anyway! This example composes:
* Change data capture (CDC) from a `documents` table.
* A derivation which computes word and document frequency updates.
* A materialization back into a `word_counts` table.## ๐งช Verify Tests
All cutting-edge word count projects should have tests.
Let's make sure the words are, um, counted:
```console
$ flowctl test --source word-counts.flow.yaml
... snip ...
Running 1 tests...
โ๏ธ word-counts.flow.yaml :: acmeCo/tests/word-counts-from-documentsRan 1 tests, 1 passed, 0 failed
```## ๐ข Run It Locally
Start a local, temporary Flow data plane:
```console
$ flowctl temp-data-plane
export BROKER_ADDRESS=http://localhost:8080
export CONSUMER_ADDRESS=http://localhost:9000
```A data plane is a long-lived, multi-tenant, scale-out component that
usually runs in a data center.
Fortunately it also shrinks down to your laptop.
Note the exported addresses above; you'll need those.โ๏ธ Deploy the catalog to your data plane:
```console
$ export BROKER_ADDRESS=http://localhost:8080
$ export CONSUMER_ADDRESS=http://localhost:9000
$ flowctl deploy --wait-and-cleanup --source flow.yaml
... snip ...
Deployment done. Waiting for Ctrl-C to clean up and exit.
```๐ Flow is now watching the `documents` table, and materializing to `word_counts`:
```console
$ psql --host localhost
psql (13.5 (Debian 13.5-0+deb11u1), server 13.2 (Debian 13.2-1.pgdg100+1))
Type "help" for help.flow=# insert into documents (body) values ('The cat in the hat.'), ('hat Cat CAT!');
INSERT 0 2
flow=# select word, count, doc_count from word_counts;
word | count | doc_count
------+-------+-----------
cat | 3 | 2
hat | 2 | 2
in | 1 | 1
the | 2 | 1
(4 rows)
flow=# update documents set body = 'cat Cat CAT!' where id = 2;
UPDATE 1
flow=# select word, count, doc_count from word_counts order by word;
word | count | doc_count
------+-------+-----------
cat | 4 | 2
hat | 1 | 1
in | 1 | 1
the | 2 | 1
(4 rows)
flow=# delete from documents ;
DELETE 2
flow=# select word, count, doc_count from word_counts order by word;
word | count | doc_count
------+-------+-----------
cat | 0 | 0
hat | 0 | 0
in | 0 | 0
the | 0 | 0
(4 rows)
```