{"id":19011915,"url":"https://github.com/3squared/smoulder","last_synced_at":"2025-02-21T15:46:30.732Z","repository":{"id":68491610,"uuid":"195847050","full_name":"3Squared/Smoulder","owner":"3Squared","description":"Smoulder is a really good data pipe","archived":false,"fork":false,"pushed_at":"2020-05-05T08:27:52.000Z","size":1566,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"develop","last_synced_at":"2025-01-01T21:47:10.355Z","etag":null,"topics":["composition","data","facade-pattern","forge-framework","object-oriented"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/3Squared.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-08T16:11:03.000Z","updated_at":"2019-10-16T22:12:47.000Z","dependencies_parsed_at":null,"dependency_job_id":"e55fdc98-263a-488f-bc86-7f84e666c3a7","html_url":"https://github.com/3Squared/Smoulder","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/3Squared%2FSmoulder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/3Squared%2FSmoulder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/3Squared%2FSmoulder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/3Squared%2FSmoulder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/3Squared","download_url":"https://codeload.github.com/3Squared/Smoulder/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240043139,"owners_count":19739022,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["composition","data","facade-pattern","forge-framework","object-oriented"],"created_at":"2024-11-08T19:16:06.526Z","updated_at":"2025-02-21T15:46:30.687Z","avatar_url":"https://github.com/3Squared.png","language":"C#","readme":"![Forge Smoulder](ForgeSmoulder.png \"Forge Smoulder\")\n\n# Smoulder\n\n## Introduction\nSmoulder meets the need for low-profile, slow burn processing of data in a non-time critical environment. Intended uses include aggregating statistics over a constant data stream and creating arbitrarily complex reports on large data volumes with the ability to take snapshots of the current resultant states without waiting for completion or interrupting the data processing. This is achieved by separating data preparation, processing and aggregation of results into separate loosely-coupled processes, linked together by defined data packets through internally accessible message queues.\n\nThe entire system can be implemented by creating new concrete implementations of the base abstract interfaces, allowing for real flexibility in the applications the system can be used for. Each of the three parts of the system is built to be run in a separate thread, allowing performance to scale up or down depending on the hardware the system is hosted on and decoupling each process from the other. They will communicate with each other over two concurrent queues, a thread-safe feature of C#.\n\nSmoulder is part of the [Forge Framework](#Forge-Framework).\n## System Description\n### Loader\nThis component is responsible for retrieving data and converting it into usable data packets. Data can then be bundled into a data packet containing multiple data points, or simply applied as a stream of single data points using a customisable data object. This stream of sanitised data is then made available to the Processor by means of an inter-thread message queue called the ProcessorQueue.\n### Processor\nResponsible for computing the results from the provided data packet, retains necessary information about previous data packets if required. This could be keeping track of a cumulative number (e.g. number of data objects meeting a certain criterion) or calculated statistics (e.g. number of peaks in a continuous data stream).\nThese produced results are bundled into a results object that is then made available to the Distributor by means of a message queue.\n### Distributor\nResponsible for calculating the up-to-date status of the tracked statistics and providing them to external components that are polling for the results. This decouples the reporting and processing elements, ensuring that the polling for results doesn’t affect processor performance and large data volumes won’t slow down returning of results. The distributor could also be configured to publish data to an external service or database while still allowing the processor to remain agnostic. The action the distributor takes is deliberately open-ended, allowing developers to tailor the result to each individual need.\n\n## Setup\n### Data Objects\nDecide the form of the data objects. One will become `TProcessData`, the other will become `TDistributeData` when passed to the generic `Build\u003cTProcessData,TDistributeData\u003e()` method. This will produce a `Smoulder\u003cTProcessData,TDistributeData\u003e`, giving type safety to the `Enqueue`/`Dequeue` calls.\n\n### Worker units\nCreate `Loader`, `Processor` and `Distributor` classes that implement `ILoader`, `IProcessor` and `IDistributor`. The `Loader` has access to the `ProcessorQueue`, the `Processor` has access to the `ProcessorQueue` and `DistributorQueue`, and the `Distributor` has access to the `DistributorQueue`. The queues will be hooked up in the `Smoulder.Start()` method, so they will be successfully hooked up by the time the `Action()` method is called for the first time.\n\nThe implementation of the worker units should:\n- implement relevant interface\n- Extend the relevant workerUnitBase (i.e. `public class MyProcessor: ProcessorBase\u003cMyProcessData,MyDistributeData\u003e, IProcessor`)\n- Override the `Action()` method\n\nOptionally, the implementation can override:\n- the `Startup()` method, which is called when the Smoulder.Start() method is called. This allows you to initialise any variables that can't be done in the constructor, or that you want to initialise every time the Smoulder object is started, not just at object creation.\n- the `Finalise()` which will be called when the Smoulder.Stop() method is called. This could be used to ensure all the remaining data is processed before the smoulder shuts down, close any open connections etc.\n- the `OnEmptyQueue()` method, which is called if there was nothing on the queue for $Timeout number of milliseconds\n- the `OnError()` method, which is called if there is an uncaught error in Action(), OnEmptyQueue() or the method inside of smoulder that contains the dequeuing logic.\n\n#### Action method\n#### Action(TData item)\nThe `Action(TData item)` method on a workerUnit is the main payload and is the only one that must be implemented for the Smoulder object to be valid. This is what will be called continuously until the `Smoulder.Stop()` method is called. An example format for the action method for a processor would be:\n\n    public override TDistributeData Action(TProcessData item, CancellationToken cancellationToken)\n    {\n        //Archive the incoming data\n\t\t_someRepository.Save(item);\n\t\t\n\t\t//Transform the data into another form\n        TDistributeData outgoingData = Do.Something(item);\n\t\t\n\t\t//Pass to the distributor\n        return outgoingData;\n    }\n\nThe returned `TDistributeData` object will be enqueued onto the distributor queue ready for the distributor to process. Returning null means nothing is enqueued.\n\nThe signatures for the Action methods are as follows:\n##### Loader\n`public TProcessData Action(CancellationToken cancellationToken)`\n##### Processor\n`public TDistributeData Action(TProcessData item, CancellationToken cancellationToken)`\n##### Distributor\n`public void Action(TDistributeData item, CancellationToken cancellationToken)`\n\n#### Available methods\n##### Loader\n- `Enqueue(TData itemToEnqueue)` - enqueues the item onto the `ProcessorQueue` - Not recommended to do this directly in most cases, use the return value.\n- `int GetProcessorQueueCount()` - Returns the number of items on the `ProcessorQueue`\n\n##### Processor\n- `Enqueue(TDistributeData itemToEnqueue)` - enqueues the item onto the `DistributorQueue`- Not recommended to do this directly in most cases, use the return value.\n- `bool Dequeue(out TProcessData item)` - does a TryTake(item, Timeout) on the `ProcessorQueue`. You usually don't have to call this as it is called for you and the result passed to Action(), but it's available for completeness. You could use:\n\n        public override void Finalise()\n        {\n            while (Dequeue(out var incomingData))\n            {\n                var processedData = ProcessData(incomingData);\n\t\t\t\tEnqueue(processedData);\n            }\n        }\n\nto cycle through all the remaining items on the queue before finally closing down for example.\n\n- `bool Peek(out TProcessData item)` - Allows for peeking the `ProcessorQueue`. This is implemented here as BlockingCollections don't allow it normally due to multi-threading concerns. There is only one producer and consumer for each queue, so Peek is assumed to be safe.\n- `int GetDistributorQueueCount()`\n- `int GetProcessorQueueCount()`\n\n##### Distributor\n- `bool Dequeue(out TDistributeData item)` - enqueues the item onto the `DistributorQueue` - See processor\n- `bool Peek(out TDistributeData item)`- Allows for peeking the `DistributorQueue` - See processor\n- `int GetDistributorQueueCount()`\n\n#### Startup()\nStartup is deliberately distinct to the constructor so that a Smoulder can be restarted after being stopped. The constructor will only run when the workerUnits are instantiated, but the startup method is called every time the Smoulder.Start() method is called. An example for a use for this is opening a connection to a message queue in the `Startup()` method and disconnecting in the `Finalise()` method.\n\n### Instantiation\nOnce classes for the workerUnits have been created, a Smoulder object can be instantiated. This is achieved with following two lines:\n\n    var smoulderFactory = new SmoulderFactory();\n    var smoulder = smoulderFactory.Build(new Loader(), new Processor(), new Distributor());\n    \nAny way of creating concrete instances of the workerUnits can be done, such as through an IOC container.\nAfter instantiating the Smoulder object, starting it is as simple as:\n\n    smoulder.Start();\n    \nWhile stopping can be achieved with:\n\n    smoulder.Stop();\n    \nWhen the `Start()` and `Stop()` methods are called is left entirely up to the implementing developer. For example, If the smoulder is being used inside a windows service the `OnStart()` and `OnStop()` methods are obvious candidates.\n\n# Advanced Features\n\n## QueueBounding\nThe maximum number of items allowed in the queues can be set when the `SmoulderFactory.Build()` is called.\nThe following line sets the maximum number of ProcessData objects to 50 and the number of DistributeData objects to 100.\n\n    var smoulder = smoulderFactory.Build(_movementLoader, _movementProcessor, _movementDistributor,50,100);\n    \nThe `Enqueue()` method on `Loader`s and `Processor`s will block until an item is removed.\n\n## Dequeue timeout\nBy default smoulder will wait `1000` milliseconds before giving up waiting for an item on the queue and calling `OnEmptyQueue()` instead. This can be changed by setting the `Timeout` attribute on the `Processor` and `Distributor` objects.\nIf the timeout is set to `-1`, it will wait forever or until an item arrives on the queue.\n\n## Multiple Smoulders with IOC\nSay you want two smoulder objects in the same application and you're using an IOC container. You have two different implementations of `ILoader`. The best way to split these up so your IOC container knows which is which is to create two interfaces, say `IProcessorA` and `IProcessorB` that both implement the  `Smoulder.Interfaces.IProcessor` interface. Then you can hook your IOC up using these two interfaces and everything is peachy.\n\n# Functional/Compositional methodology\n\nThere are methods on the Smoulder object that allow delegates to be passed that overwrite the default functionality. This allows a basic Smoulder to be set up without having to directly extend, override, instantiate and pass in worker unit instances. At it's most basic, a Smoulder can be created with:\n\n    var secondSmoulder = smoulderFactory.Build\u003cProcessDataObject, DistributeDataObject\u003e()\n    .SetLoaderAction(token =\u003e\n    {\n        var data = GetData();\n        return data;\n    })\n    .SetProcessorAction((incomingData, token) =\u003e\n    {\n        fakeRepository.SaveData(incomingData);\n        var processedData = ProcessData(incomingData);\n\n        return processedData;\n    })\n    .SetDistributorAction((incomingData, token) =\u003e\n    {\n        SendData(incomingData);\n    });\n\nHowever, due to this more functional style there are restrictions on the way methods have side effects and they are unable to call the ancillary methods on the Smoulder object like `Enqueue` and `Peek`. This could be improve on if a copy of the Smoulder object was passed into each of the methods, but that is only being considered for future work at this time.\n\n# Worked Example - Random Number Pipe\nThis simply generates random numbers and passes them through, doing some nominal work on them to show that data can get from one end to the other. In doing so it attempts to show off some of the different configurations worker units can be created with and acts as a quick check that everything works nicely together. A successful build should be able to run this console application without any errors.\n\nIn the worked example, a console app creates a Smoulder object using the SmoulderFactory and sets it running.  Regular reports are printed while it is running to show progress to the user. This could be left indefinitely, but is instead stopped. The object is then started again to showcase the ability to stop and start the Smoulder object.\n\nReading through the worked example will be a good introduction to the different ways Smoulder can be used, it is commented to guide a user through each of the worker units. The Processor is using the most features, if speed is valued over complete comprehension then start there.\n\nImagine while reading this that the Loader is hooked up to some external data source, the processor is saving the incoming messages to archive and the distributor is building up some aggregate data for report.\n\n# Forge Framework\n\nThe Forge Framework is a collection of open-source frameworks created by the team at [3Squared](https://github.com/3squared).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F3squared%2Fsmoulder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F3squared%2Fsmoulder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F3squared%2Fsmoulder/lists"}