{"id":13671745,"url":"https://github.com/milanboers/rurel","last_synced_at":"2025-04-27T18:31:37.947Z","repository":{"id":44339314,"uuid":"83959472","full_name":"milanboers/rurel","owner":"milanboers","description":"Flexible, reusable reinforcement learning (Q learning) implementation in Rust","archived":false,"fork":false,"pushed_at":"2024-06-25T08:45:48.000Z","size":69,"stargazers_count":138,"open_issues_count":2,"forks_count":16,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-11-11T09:44:04.764Z","etag":null,"topics":["ai","learning","q","reinforcement","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/milanboers.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-03-05T09:40:12.000Z","updated_at":"2024-10-21T12:55:06.000Z","dependencies_parsed_at":"2024-11-11T09:32:00.546Z","dependency_job_id":"55690fc2-0031-416a-b38e-918dcaae3c15","html_url":"https://github.com/milanboers/rurel","commit_stats":{"total_commits":42,"total_committers":6,"mean_commits":7.0,"dds":"0.11904761904761907","last_synced_commit":"40d0fa7116c528953780b74e0a19756182a70a72"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/milanboers%2Frurel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/milanboers%2Frurel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/milanboers%2Frurel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/milanboers%2Frurel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/milanboers","download_url":"https://codeload.github.com/milanboers/rurel/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251187260,"owners_count":21549610,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","learning","q","reinforcement","rust"],"created_at":"2024-08-02T09:01:17.641Z","updated_at":"2025-04-27T18:31:37.313Z","avatar_url":"https://github.com/milanboers.png","language":"Rust","funding_links":[],"categories":["Rust","Reinforcement Learning"],"sub_categories":[],"readme":"# Rurel\n\n[![crates.io](https://img.shields.io/crates/v/rurel.svg)](https://crates.io/crates/rurel)\n\nRurel is a flexible, reusable reinforcement learning (Q learning) implementation in Rust.\n\n* [Release documentation](https://docs.rs/rurel)\n\nIn Cargo.toml:\n```toml\nrurel = \"0.6.0\"\n```\n\n\nAn example is included. This teaches an agent on a 21x21 grid how to arrive at 10,10, using actions (go left, go up, go right, go down):\n```console\ncargo run --example eucdist\n```\n\n## Getting started\nThere are two main traits you need to implement: `rurel::mdp::State` and `rurel::mdp::Agent`.\n\nA `State` is something which defines a `Vec` of actions that can be taken from this state, and has a certain reward. A `State` needs to define the corresponding action type `A`.\n\nAn `Agent` is something which has a current state, and given an action, can take the action and evaluate the next state.\n\n### Example\n\nLet's implement the example in `cargo run --example eucdist`. We want to make an agent which is taught how to arrive at 10,10 on a 21x21 grid.\n\nFirst, let's define a `State`, which should represent a position on a 21x21, and the correspoding Action, which is either up, down, left or right.\n\n```rust\nuse rurel::mdp::State;\n\n#[derive(PartialEq, Eq, Hash, Clone)]\nstruct MyState { x: i32, y: i32 }\n#[derive(PartialEq, Eq, Hash, Clone)]\nstruct MyAction { dx: i32, dy: i32 }\n\nimpl State for MyState {\n\ttype A = MyAction;\n\tfn reward(\u0026self) -\u003e f64 {\n\t\t// Negative Euclidean distance\n\t\t-((((10 - self.x).pow(2) + (10 - self.y).pow(2)) as f64).sqrt())\n\t}\n\tfn actions(\u0026self) -\u003e Vec\u003cMyAction\u003e {\n\t\tvec![MyAction { dx: 0, dy: -1 },\t// up\n\t\t\t MyAction { dx: 0, dy: 1 },\t// down\n\t\t\t MyAction { dx: -1, dy: 0 },\t// left\n\t\t\t MyAction { dx: 1, dy: 0 },\t// right\n\t\t]\n\t}\n}\n```\n\nThen define the agent:\n\n```rust, ignore\nuse rurel::mdp::Agent;\n\nstruct MyAgent { state: MyState }\nimpl Agent\u003cMyState\u003e for MyAgent {\n\tfn current_state(\u0026self) -\u003e \u0026MyState {\n\t\t\u0026self.state\n\t}\n\tfn take_action(\u0026mut self, action: \u0026MyAction) -\u003e () {\n\t\tmatch action {\n\t\t\t\u0026MyAction { dx, dy } =\u003e {\n\t\t\t\tself.state = MyState {\n\t\t\t\t\tx: (((self.state.x + dx) % 21) + 21) % 21, // (x+dx) mod 21\n\t\t\t\t\ty: (((self.state.y + dy) % 21) + 21) % 21, // (y+dy) mod 21\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t}\n}\n```\n\nThat's all. Now make a trainer and train the agent with Q learning, with learning rate 0.2, discount factor 0.01 and an initial value of Q of 2.0. We let the trainer run for 100000 iterations, randomly exploring new states.\n\n```rust, ignore\nuse rurel::AgentTrainer;\nuse rurel::strategy::learn::QLearning;\nuse rurel::strategy::explore::RandomExploration;\nuse rurel::strategy::terminate::FixedIterations;\n\nlet mut trainer = AgentTrainer::new();\nlet mut agent = MyAgent { state: MyState { x: 0, y: 0 }};\ntrainer.train(\u0026mut agent,\n              \u0026QLearning::new(0.2, 0.01, 2.),\n              \u0026mut FixedIterations::new(100000),\n              \u0026RandomExploration::new());\n```\n\nAfter this, you can query the learned value (Q) for a certain action in a certain state by:\n\n```rust, ignore\ntrainer.expected_value(\u0026state, \u0026action) // : Option\u003cf64\u003e\n```\n\n## Development\n* Run `cargo fmt --all` to format the code.\n* Run `cargo clippy --all-targets --all-features -- -Dwarnings` to lint the code.\n* Run `cargo test` to test the code.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmilanboers%2Frurel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmilanboers%2Frurel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmilanboers%2Frurel/lists"}