{"id":13776713,"url":"https://github.com/CrowdHailer/pachyderm","last_synced_at":"2025-05-11T10:31:11.905Z","repository":{"id":62430251,"uuid":"55302063","full_name":"CrowdHailer/pachyderm","owner":"CrowdHailer","description":"Virtual actors for elixir","archived":false,"fork":false,"pushed_at":"2019-09-10T13:50:22.000Z","size":323,"stargazers_count":104,"open_issues_count":2,"forks_count":4,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-05-10T10:51:54.458Z","etag":null,"topics":["actors","elixir","entities","virtual-actors"],"latest_commit_sha":null,"homepage":"","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CrowdHailer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-04-02T15:36:17.000Z","updated_at":"2024-12-12T08:43:56.000Z","dependencies_parsed_at":"2022-11-01T20:21:41.747Z","dependency_job_id":null,"html_url":"https://github.com/CrowdHailer/pachyderm","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CrowdHailer%2Fpachyderm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CrowdHailer%2Fpachyderm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CrowdHailer%2Fpachyderm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CrowdHailer%2Fpachyderm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CrowdHailer","download_url":"https://codeload.github.com/CrowdHailer/pachyderm/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253551658,"owners_count":21926330,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actors","elixir","entities","virtual-actors"],"created_at":"2024-08-03T18:00:31.951Z","updated_at":"2025-05-11T10:31:11.557Z","avatar_url":"https://github.com/CrowdHailer.png","language":"Elixir","funding_links":[],"categories":["Libraries"],"sub_categories":[],"readme":"# Pachyderm - an elephant never forgets\n\n**A virtual/immortal/durable/resilient/global actor \"always exists\" and \"never fails\".**\n\n[![Build Status](https://img.shields.io/travis/com/CrowdHailer/pachyderm/master)](https://travis-ci.com/CrowdHailer/pachyderm)\n\nProgram with actors that are durable and globally unique \"in effect\".\nPachyderm calls an actor with these properties an entity.\n\nEntities are useful where there are strong consistency requirements.\nThey also mitigate several of the [problems with Single Global Processes](https://keathley.io/blog/sgp.html).\n\nThis idea was loosely inspired by projects like [Microsoft Orleans](https://dotnet.github.io/orleans/).\n\nFurther explanation can be found in the [Design notes](#design-notes).\n\n## Usage\n\n### Defining an Entity\n\n```elixir\ndefmodule MyApp.Counter do\n  @behaviour Pachyderm.Entity\n\n  alias MyApp.Counter.{Increment, ...}\n  alias MyApp.Counter.{Increased, ...}\n\n  def init() do\n    %{count: 0}\n  end\n\n  def handle(%Increment{}, _state) do\n    events = [%Increased{amount: 1}]\n    {:ok, events}\n  end\n\n  def update(%Increased{amount: amount}, state = %{count: current}) do\n    %{state | count: current + amount}\n  end\nend\n```\n\n*In event sourcing execute/apply would be the equivalent terms to handle/update.*\n\nBoth the `handle/2` and `update/2` callbacks MUST NOT create any side effects, see [Entity side effects](#entity-side-effects) for how to create side effects.\n\n### Sending messages to an Entity\n\n```elixir\ntype = MyApp.Counter\nid = UUID.uuid4()\nreference = {type, id}\n\n{:ok, state} = Pachyderm.call(reference, %Increment{})\n# =\u003e {:ok, %{count: 1}}\n```\n\n*The id of an entity MUST be uuid that is unique across all entities, regardless of type.*\n\n### Entity side effects\n\nAn entity creates side effects by, optionally, returning a list of effects in addition to the the list of events.\nPachyderm dispatches effects once the events have be committed to storage.\n\n```elixir\ndefmodule MyApp.Counter do\n  def handle(%Increment{}, %{count: count}) do\n    events = [%Increased{amount: 1}]\n\n    if count == 9 do\n      effects = [{MyApp.AdminMailer, %{threshold: 10}}]\n      {:ok, {events, effects}}\n    else\n      {:ok, events}\n    end\n  end\nend\n```\n\nSide effects have at most once semantics. This is because the events are committed before dispatching effects and it is always possible for the dispatch to fail/crash.\n\n*A future feature should allow persisting effects to a task queue in the same transaction as events are committed.*\n\n```elixir\ndefmodule MyApp.AdminMailer do\n  @behaviour Pachyderm.Effect\n\n  @admin_email \"admin@myapp.example\"\n\n  def dispatch(%{threshold: threshold}, _config) do\n    body = \"The threshold was reached at a count of #{threshold}\"\n\n    EmailProvider.send(@admin_email, body)\n  end\nend\n```\n\n*The config value can be passed as a third argument to `Pachyderm.send`.*\n\n\n## Testing\n\n```\ndocker-compose up\nmix do event_store.drop, event_store.create, event_store.init\nmix test\n```\n\n## Design notes\n\n### Entities vs Processes\n\nThe core computational unit in Pachyderm is an Entity.\n\nEntities, like processes are actors, i.e. they are a primitive of concurrent computation.\n- All messages handled by an entity see the latest state of that entity.\n- The state history of an entity has a single, well defined order.\n\nAn entity differs from a process because it can be restarted and moved between machines.\n\n### Events as state primitive\n\nThe underlying storage required by Pachyderm is an append only log.\nFor this reason an event based API is exposed, rather than one based on the current state.\n\nIt is possible to use this model for a state based system by having all events be replace state events.\nFor the counter example this could look like.\n\n```elixir\ndef handle(%Increment{}, %{count: current}) do\n  events = [%NewCount{value: current + 1}]\n  {:ok, events}\nend\n\ndef update(%NewCount{value: new_count}, _state) do\n  %{count: new_count}\nend\n```\n\nThe library chooses to use actor terminology over event sourcing. e.g. handle vs execute.\n\n### Globally unique events, NOT processes.\n\nThere may be more than one worker process alive for an entity at any given time.\nThis does not break any guarantees because a message is not considered handled by an entity until the events are committed to storage.\n**All storage backends must expose an optimistic concurrency control mechanism.**\n\nProcessing messages for a given entity will be handled by running workers where possible.\nWorkers are registered using `:global`.\n\nWorker registration is only to save starting processes, all the guarantees are handled at the storage layer.\nThis also means the library should work just as well in an unclustered environment.\n\nNote in an unclustered setup, it is possible that a worker for an entity gets started on every machine.\nIn such a case scaling the number of machines wouldn't reduce load.\n\n### Deferred side effects\n\nAll side effects from handling a message must happen after events are committed.\n\nFor example.\n- Two messages (message A) and (message B) are processed concurrently, potentially on two node that cannot communicate.\n- The events from one message (message A) are committed to storage successfully.\n- Events from the other message (message B) cannot be saved, as these events were calculated from a stale entity state.\n- **Side effects from handling message B must not exist, only the effects from handling message A**\n- Message B is considered lost, if reliable delivery is required then retries and message acknowledgement can be layered on top\n\nThe Pachyderm effects API exists to allow Entities to interact with other parts of the system in a safe manner.\n\nIt is up to the developer to make sure no side effects happen in the `handle` function.\nElixir/erlang cannot enforce this.\n\n##### Question\n\nI don't believe there is any harm in having a sidecause in the handle function,\nsuch as generating a random number or getting the current date.\nIt may be easier to work with only pure functions, but I am not sure it is necessary (Needs further thought)\n\n##### Message vs Event Based\n\nI consider all effects as a message to be sent somewhere, hence why the function on Mailer is called dispatch rather than run/execute\n\nThere are discussions of event vs message based systems online.\nThis is a message based approach, the event based approach would be to have sideeffects derrived from following the event log.\n\nBoth approaches have there advantages.\n- Message based moves more logic into the entity (it would have existed in subscribers in an event system)\n  This allows more of an application to be tested at a pure level inside the entity functions.\n- Message based is more aligned with the erlang process model for familiarity\n- Event based subscriptions have the added complexity of requiring a durable cursor for progress they have made through the event log\n- Event based writes everything to storage, a problem if the event should trigger sending an email with one time code that can't be saved in DB\n- Message based is more likely to be at most once, event based at least once. Messages can be lost, vs subscription cursor failing to be updated. Probably this is not a hard and fast separating, see sideeffect guarantees and subscription cursor could be updated before processing effect.\n\nIt is easier to build at least once delivery on top of at most once delivery.\nIt also might be possible to have both by adding the ability to subscribe to an event log in addition to the effects API described here.\nEffect could also be to write to \"all\" stream and have no default ability to follow an entity.\n\n### SideEffect guarantees\n\nThere is no guarantee that a side effect will run successfully, the real world does that.\nIf side effects are considered as messages out then it is always possible they can be lost.\n\nThis is also fine the actor model makes no guarantees about message delivery.\nRetries, timeouts and acknowledgement can all be layered on top.\n\nIt might be required to have a reliable timeout mechanism. (maybe not, needs further thought)\nSo when an entity is restarted any existing timers should be checked.\nProcess\n\nWhen writing to a database all events will be written in a single transaction.\nThat transaction could be left running until all the sideeffect handlers have run,\nif these where to write to a task queue in the same transaction, then sideeffects would be reliably retryable.\n\n### Use Entity references as side effects\n\nIt would be easy to return `{reference, message}` from a handle function.\nThe assumption here is that the dispatch action should be to send the message to the referred to entity.\n\nThis has not been done yet. I am unsure if there is a sensible default for retrying to send the message/task durability\nPerhaps it would work if there could be exactly once semantics by marking the task as done in the same transaction as the receiving entity receives events.\n\nTasks that crash should be marked as crashed for a specific version of the module, if it changes they should automatically be retried.\n\n### Sync Snapshots for Entity lookup\n\nThere should be a way of committing snapshot/query module in the same transaction\nCurrently this is entity_state but could be working_state\n\n### Entity references\n\nAll entities can be addressed by their reference, this is a combination of type and id.\n\nThis was the most pragmatic approach, when starting out it is intuitive to ascribe types to entities.\nOne of the problems with entity types is that entities last forever and so the concept of type might evolve overtime.\n\nIt is possible to have a system with only one type of entity and have the event history fully describes the state of an entity including it's type.\nThis is however unwieldy, the behaviour for all entity types ends up in a single callback module.\n\nConsideration of this issue is why entities are uniquely identified by their id only.\nIt allows systems to evolve.\nEntities that were created from one module can be, in the systems future, handled by multiple modules.\nFor example the `User` module could evolve to `LegacyUser` and `NewUser` depending on which API endpoints are used to interact with the system.\n\nPerformance also improves by limiting entity id's to `uuid4` only.\n\n### Return/Reply values\n\nThere are two options for this\n\n#### Result return values\n\n```\n{:ok, [event]}\n{:ok, {[event], [effect]}}\n{:error, reason}\n```\n- Limits options, potentially a good thing.\n- Makes it clear that returning an error value to a caller means no events were created.\n\n#### GenServer inspired reply values\n\n```\n{:reply, {:ok, anything}, [event]}\n{:reply, {:ok, anything}, [event], [effect]}\n{:reply, {:error, reason}, [], []}\n```\n- Can have error response and no events. Good/Bad?\n- Reply often based on the state, state not calculated until after update function called, often end up working things out twice.\n- Add another tuple argument for continuations/timeout. Might be very ugly in OK case\n\n#### Sending full state as part of reply value\n\nThe simplest API is to have the new state returned when sending a message to an entity.\n\nSending the full state back is wasteful if it, is large, is not needed, is transferred between machines.\nAn explicit reply can be set in `handle` but what if clients sending the same update what different views.\n\nTo reduce the amount sent there could be a Query API where an anonymous function is sent and only that result returned.\nThis separates logic from the entity and so a :query callback might be better. clients just send a simple/expected query and the result is generated from that.\n\nIf on separate nodes you might not want to send message then query, requiring some kind of message then query interface\nIf sending only a reduced value back the new cursor (stream_version) is probably the most useful. It allows a client to listen for all events.\n\nTo reduce messages between nodes could have a cache process on every node, queries only go to local, commands are sent via local which waits for event before running query and returning to caller.\nA follower on every node messes up scaling, more node doesn't increase free up memory.\nAlso it doesn't really match a dist erl environment.\nMy assumption is extra nodes are added for more memory, latency of sending messages between nodes is not important.\nProbably if latency is a problem, the best option is sticky sessions so normal lookup from Pachyderm results in intra node communicate in most cases.\n\nI think we should stick with the simple for a while, most of the issues are for high performace cases.\n\n### Can Worker inactivity timeouts be a global setting?\n\nA system where all entities can be active could have no timeouts, entities only restarted on deploy.\nIn reality I think an entity is likely to know when it is no longer going to be activated. However even these cases might have the end state queried for some time.\n\nThere should be a Stop event that caches final snapshot.\n\n### Should it be possible to have effects without events?\n\nI can't think of any good thing that will come out of this, it basically just skips all checks.\n\n### Should calculated state be one of the arguments to effect dispatch?\n\nThis is another place where state can be worked out twice, in dispatch and update\n\n### Non global address space using network_id\n\nIt would be good to start more than one `EntitySupervisor` and have separate interacting environments (ecosystems) of entities.\nOne way to handle this would be to have a network_id id column in EventStore and have all interaction with the DB scoped to a specific network_id.\nDifferent network identifier should be able to use different pools/db connections.\n\nIn a global network of entities, creating a reference could take the environment as an argument so giving separate id's.\nThis is rather reliant on the developer doing the right thing repetitavly.  \n\n## TODO\n\n- If waiting on specific promises, entity MUST terminate if nothing to await on.\n\n- Single global process, some discussion on this, is it a safe way to have single global processes?\n\nhttps://yiming.dev/blog/2019/08/16/use-return-value-to-defer-decisions/\n\nOld stuff https://github.com/CrowdHailer/pachyderm/commit/bd852b376e58c318183a60f1b8ddf18ada1fe6cc\n\n- Counter using Protocols\n  - Linked all events created to the command that created them. command id being the transaction and idempotency id is a possibility\n  - Can do protocols with a Global entity module. Implement protocol on null struct to handle create messages\n  - Implement protocol on others for each state.\n  - Most things don't change that much so it's a lot of struct typing for little benefit, shows that everything can be types but has not checking of those types\n- Pachyderm/Pachyderm\n  - Trying to implement set/unset adjustments, maybe makes it easier to query but too much overhead, reimplimenting datomic\n- Top level\n  - Pessimistic lock by taking DB lock, lock can be lost while processing continues. see forum discussion. https://elixirforum.com/t/an-experimental-implementation-of-actors-that-do-not-die/14608/11\n  - Ecosystem seems passable name for grouping of entities though.\n  - Check ecosystems exhaust function for walk through, lot's of notes\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCrowdHailer%2Fpachyderm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FCrowdHailer%2Fpachyderm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCrowdHailer%2Fpachyderm/lists"}