https://github.com/sukiboo/rl_personalization_challenge

Challenge of solving a personalization task with RL methods.
https://github.com/sukiboo/rl_personalization_challenge

contextual-bandits openai-gym personalization reinforcement-learning simulation

Last synced: 7 months ago
JSON representation

Challenge of solving a personalization task with RL methods.

Host: GitHub
URL: https://github.com/sukiboo/rl_personalization_challenge
Owner: sukiboo
License: mit
Created: 2022-02-14T23:29:01.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2022-03-26T23:51:10.000Z (over 3 years ago)
Last Synced: 2025-01-21T12:35:53.647Z (9 months ago)
Topics: contextual-bandits, openai-gym, personalization, reinforcement-learning, simulation
Language: Python
Homepage:
Size: 16.6 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Reinforcement Learning Personalization Challenge

In this challenge your goal is to train an RL agent to solve a personalization task that is simulated as a synthetic contextual bandit.

In the given environment the state space is continuous and is reperesented by a `100`-dimensional hypercube, and the action space is discrete and consists of `100` fixed `100`-dimensional vectors. The reward signal is a bit convoluted in its nature but it's intentionally made this way to mimic the human behavioral preferencing (*and we can discuss what it actually means or how it's designed in more detail*).

The rationale behind such an environment is the following: the set of available actions represents the *possible recommendations* and the observed states are the parameterized representations of the *persons* to whom the recommendations are provided; for each state-action pair the received reward value represents the *suitability* of the provided recommendation for the given person with `1` being the best recomendation and `-1` being the worst.

## Setup

Install the dependancies with `pip install -r requirements.txt`, then run with `python main.py`.

## Files

Essentially this simple repository consists of the following files:

* `environment.py` --- contains the class `SampleEnv` that creates an OpenAI Gym contextual bandit environment

* `main.py` --- trains a policy gradient agent, serving as a basic baseline --- **modify this file to implement and train your agent**

## Environment

The generated `SampleEnv` environment inherits from `gym.Env` and, as such, has the following methods:

* `reset()` --- observe a new state

* `step(action)` --- take an action and return the result

The above methods are technically sufficient to solve the environment.

Other useful methods include

* `evaluate_agent(agent)` --- compute the *deterministic* performance of the agent's policy on the environment

* `restart()` --- fully recreate the environment; should be called between the training of different agents for reproducibility

* `observe(num=1)` --- observe new states; identical to `reset` but can sample multiple states (`num`) simultaneously

* `compute_reward(s,a_ind)` --- compute the *normalized* reward for a state `s` and an action index `a_ind`

* `compute_reward_raw(s,a)` --- compute the *un-normalized* reward value of a state-action pair `(s,a)`

* `print_action_histogram()` --- print the histogram of the optimal actions; ideally an agent should provide a similar histogram

## Results

By default the reward values returned by the environment are *normalized*, i.e. the optimal reward for any state `s` is `1` and the average reward is `0`.

Hence any sensible agent should achieve a positive return and the optimal agent has the return of `1`.

For example, the current baseline agent achieves a performance score of `0.2318`.

The intended outcome is to train an agent that demonstrates a *good* performance, e.g. `> 0.8` or so.

If you manage to obtain such an agent, please let me know!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sukiboo/rl_personalization_challenge

Awesome Lists containing this project

README