{"id":29654557,"url":"https://github.com/taylor-eos/cell-relation","last_synced_at":"2025-07-22T07:34:57.419Z","repository":{"id":295984770,"uuid":"991906722","full_name":"Taylor-eOS/cell-relation","owner":"Taylor-eOS","description":"A learning project that gets a transformer to handle a 2D environment through cell-to-cell relation embeddings","archived":false,"fork":false,"pushed_at":"2025-06-13T09:15:50.000Z","size":169,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-13T10:27:52.954Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Taylor-eOS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-28T10:24:31.000Z","updated_at":"2025-06-13T09:15:53.000Z","dependencies_parsed_at":"2025-06-13T10:34:10.175Z","dependency_job_id":null,"html_url":"https://github.com/Taylor-eOS/cell-relation","commit_stats":null,"previous_names":["taylor-eos/cell-relation-transformer","taylor-eos/cell-relation"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Taylor-eOS/cell-relation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Taylor-eOS%2Fcell-relation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Taylor-eOS%2Fcell-relation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Taylor-eOS%2Fcell-relation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Taylor-eOS%2Fcell-relation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Taylor-eOS","download_url":"https://codeload.github.com/Taylor-eOS/cell-relation/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Taylor-eOS%2Fcell-relation/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266448574,"owners_count":23930244,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-22T02:00:09.085Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-22T07:34:56.756Z","updated_at":"2025-07-22T07:34:57.396Z","avatar_url":"https://github.com/Taylor-eOS.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![gridworld](https://github.com/user-attachments/assets/7f19209a-9252-441d-b09c-6297b31b821d)\n\nThis is a learning project built around a deliberate constraint: to test whether a transformer can learn 2D spatial navigation without being given shortcuts. There are no intermediate rewards, no temporal embeddings, and no architectural tricks to inject memory or sequence into the policy. The goal isn't to solve the navigation problem in the most efficient way, it is to see whether the pattern-recognition machinery of a transformer can bootstrap intelligent behavior purely from end-goal feedback.\u003cbr\u003e\nThe idea was to test whether a general-purpose architecture could handle a domain it wasn’t made for, not because it should be good at it, but precisely because it shouldn’t. Watching where it breaks was the point. The spatial domain was picked as a simple, controlled substrate in which intelligence could be observed as emergent generalization over structured patterns, like \"walls block\" or \"go around\". The goal was conceptual abstraction: can a transformer build internal representations that navigate a decision space, where order is not structure, as in LLMs.\u003cbr\u003e\nThe project constructs a controlled curriculum to gently introduce the transformer-based agent to navigation tasks, beginning with a world building pretraining, then trivial guaranteed-success cases increasing in complexity. This scaffolding is necessary to circumvent the sparse rewards problem in RL without modifying the reward function itself. Rather than dilute the learning process with reward shaping or hand-crafted feedback, the agent is simply trained in scenarios where success likely enough to stumble onto the reward. As the training progresses through structured stages and finally through memorized obstacle configurations, the agent experiences a consistent stream of positive reinforcement associated with correct spatial understanding. The curriculum carefully maintains a solvable distribution of environments so that learning is always grounded in something achievable, while slowly nudging the agent into harder generalization.\u003cbr\u003e\nThe agent doesn’t model time, each decision is made independently based only on the current grid observation. The encoder processes a 3-channel snapshot of agent, goal and walls, and a transformer encodes all spatial relationships between grid cells on the spatial relation of the agent to every other cell. The agent is timeless by design, both during training and inference.\u003cbr\u003e\nThis sidesteps temporal credit assignment. When an episode is successful, every step in that trajectory is rewarded. The model learns what individual actions worked because they are part of sequences that were rewarded.\u003cbr\u003e\nWhat emerges is a kind of implicit planning behavior. The agent seems to figure out how to get around walls and toward goals because it has learned to encode spatial configurations in a way that makes good actions salient in the present. The result is a trained policy that looks like it's doing something clever, but is actually just reacting well in a rich spatial embedding space. It's a trained reaction function over 2D arrangements. The project isn't about building a great navigation agent, it's about probing the transformers capacity for pattern induction.\u003cbr\u003e\nWhat’s being trained is a statistical system for recognizing structural correlates. The agent learns to associate abstract configurations of local perception with higher expected returns, based on the patterns that emerge across many trajectories. Rather than isolating which action caused success, it internalizes an inductive bias toward action-context pairs that, in aggregate, tend to lead to reward. This is an emergent sensitivity to latent regularities in the environment. Over time, the transformer builds a representation space where certain trajectories align with reward-attracting configurations because they’re statistically downstream of good outcomes.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaylor-eos%2Fcell-relation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftaylor-eos%2Fcell-relation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaylor-eos%2Fcell-relation/lists"}