{"id":50938208,"url":"https://github.com/simula/soccer-rag","last_synced_at":"2026-06-17T11:04:16.485Z","repository":{"id":233102672,"uuid":"783765018","full_name":"simula/soccer-rag","owner":"simula","description":null,"archived":false,"fork":false,"pushed_at":"2024-07-16T12:18:37.000Z","size":301,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-07-17T12:40:15.718Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simula.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-08T14:24:37.000Z","updated_at":"2024-07-17T12:40:15.719Z","dependencies_parsed_at":"2024-04-17T13:51:07.324Z","dependency_job_id":"586178fd-efcc-4ddb-b2cb-2d0126ba22fc","html_url":"https://github.com/simula/soccer-rag","commit_stats":null,"previous_names":["simula/soccer-rag"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/simula/soccer-rag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2Fsoccer-rag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2Fsoccer-rag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2Fsoccer-rag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2Fsoccer-rag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simula","download_url":"https://codeload.github.com/simula/soccer-rag/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2Fsoccer-rag/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34445187,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-17T02:00:05.408Z","response_time":127,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-17T11:04:15.793Z","updated_at":"2026-06-17T11:04:16.480Z","avatar_url":"https://github.com/simula.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\nsdk: docker\n---\n\u003c!---WARNING!! The snippet above is required for Huggingface Space in https://huggingface.co/spaces/SimulaMet-HOST/SoccerRAG, so don't remove or move this.\nYou need to manually update games.db in space in ./data as space doesn't allow pushing file more than 10MB.\nSushant usually force updates that space repo with Github's version and then uploads the db file manually at https://huggingface.co/spaces/SimulaMet-HOST/SoccerRAG/tree/main/data\n---\u003e\n\n# SoccerRAG: Multimodal Soccer Information Retrieval via Natural Queries\n\n## Abstract\nThe rapid evolution of digital sports media necessitates sophisticated information retrieval systems that can efficiently parse extensive multimodal datasets. This work introduces SoccerRAG, an innovative framework designed to harness the power of Retrieval Augmented Generation (RAG) and Large Language Models (LLMs) to extract soccer-related information through natural language queries. By leveraging a multimodal dataset, SoccerRAG supports dynamic querying and automatic data validation, enhancing user interaction and accessibility to sports archives. Our evaluations indicate that SoccerRAG effectively handles complex queries, offering significant improvements over traditional retrieval systems in terms of accuracy and user engagement. The results underscore the potential of using RAG and LLMs in sports analytics, paving the way for future advancements in the accessibility and real-time processing of sports data.\n\n## Enviroment setup\nThe framework requires Python 3.12.\n````bash\npip install -r requirements.txt\n````\nRename .env_demo to .env and fill in the required fields.\n\n## Setting up the database\n\nBy running \n````bash\npython setup.py\n````\nfrom project root, all files will be downloaded, and the database will be set up.\nBefore running the setup, make sure to fill in the required fields in the .env file, and do a \n````bash\npip install soccernet\n````\nas this package is not in the requirements.txt file.\nExpected setup time is around 10 minutes.\n\nIf you want to download the data and set up the database manually, you can do so by following the instructions below.\n\n### Downloading the data manually\n\nThe data required to run the code is not included in this repository. \nThe data can be downloaded from the [Soccernet](https://www.soccer-net.org/data).\nFiles needed are:\n* Labels-v2.json [link](https://www.soccer-net.org/data#h.5klq86rmgt96)\n* Labels-captions.json [link](https://www.soccer-net.org/data#h.ccybjenq8od4)\n\nOne can use the soccernet package to download the data:\n````bash\npip install soccernet\n````\n\n````python\nfrom SoccerNet.Downloader import SoccerNetDownloader\nmySoccerNetDownloader = SoccerNetDownloader(LocalDirectory=\"data/dataset/SoccerNet\")\nmySoccerNetDownloader.downloadGames(files=[\"Labels-caption.json\"], split=[\"train\", \"valid\", \"test\"]) \nmySoccerNetDownloader.downloadGames(files=[\"Labels-v2.json\"], split=[\"train\", \"valid\", \"test\"]) \n````\n\nThe data should be placed in the ./data/Dataset/SoccerNet/ directory\nFor each league, create a new folder with the name of the leauge\nFor each season create a new folder with the name of the season (YYYY-YYYY)\nFor each game create a new folder with the name of the game (YYYY-MM-DD - HomeTeam Score - Score AwayTeam)\nIn each game folder, place the Labels-v2.json and Labels-captions.json files\n\nFor a full guide on how to download the data, please refer to the [SoccerNet package website](https://pypi.org/project/SoccerNet/).\n\n\n### Setting up and populating the database\nTo set up the database, execute the following command:\n````bash\npython src/database.py\n````\nAdjust the path to the data in the database.py file as needed.\n\n## Running the code in command line\nTo run the code, execute the following command:\n````bash\nThe code will prompt you to enter a natural language query.\n\npython main.py\n````\nYou can also call main_cli.py with a query as an argument:\n````bash\npython main_cli.py -q \"How many goals has Messi scored each season?\"\n````\n\n## Running the code in ChainLit (GUI)\nTo run the code in ChainLit, execute the following command:\n````bash\nchainlit run app.py\n````\nThis will open up a browser window with the GUI. \n![ChainLit](media/chainlit.png)\n\n### Example query\n````angular2html\nEnter a query: How many goals has Messi scored each season?\nLionel Messi has scored the following number of goals each season:\n- 2014-2015: 13 goals\n- 2015-2016: 3 goals\n- 2016-2017: 31 goals\n````\n\n\n## Results\n\nSample questions (Q1-Q20) and corresponding results can be found below.\n\n- **Question 1:** Is Manchester United in the database?\n- **Question 2:** Give me the total home goals for Bayern M in the 2014-15 season.\n- **Question 3:** Calculate home advantage for Real Madrid in the 2015-16 season\n- **Question 4:** How many goals did Messi score in the 15-16 season?\n- **Question 5:** How many yellow-cards did Enzo Perez get in the 15-2016 season?\n- **Question 6:** List all teams that played a game against Napoli in 2016-17 season in seriea? Do not limit the number of results\n- **Question 7:** Give all the teams in the league ucl in the 2015-2016 season?\n- **Question 8:** Give me all games in epl with yellow cards in the first half in the 2015-2016 season\n- **Question 9:** What teams and leagues has Adnan Januzaj play in?\n- **Question 10:** List ALL players that started a game for Las Palmas in the 2016-2017 season? Do NOT limit the number of results .\n- **Question 11:** Did Ajax or Manchester United win the most games in the 2014-15 season?\n- **Question 12:** How many yellow and red cards were given in the UEFA Champions League in the 2015-2016 season?\n- **Question 13:** Are Messi and C. Ronaldo in the database?\n- **Question 14:** How many goals did E. Hazard score in the game between Bournemouth and Chelsea in the 2015-2016 season?\n- **Question 15:** How many yellow cards were given in the game between Bayern Munich and Shakhtar Donetsk in the 2014-15 UEFA Champions League, and did anyone receive a red card?\n- **Question 16:** Make a list of when corners happened in the English Premier League (EPL) 2015-2016 season. Aggregate by a period of 15 minutes.\n- **Question 17:** What league is Manchester United, Arsenal, Bournemouth, Real Madrid, Chelsea, and Liverpool in?\n- **Question 18:** How many players have \"Aleksandar\" as their first name in the database, and how many goals have they scored in total?\n- **Question 19:** What did the commentary say about the game between Arsenal and Southampton in the 2016-17 season?\n- **Question 20:** Have Mesut Ozil, Pablo Insua, or Alex Pike played for West Ham or Barcelona?\n        \n![result-table.png](media%2Fresult-table.png)\n\n## Acknowledgements\nThis research was partly funded by the Research Council of Norway, project number 346671 ([AI-Storyteller](https://prosjektbanken.forskningsradet.no/project/FORISS/346671)). \n\n## Citation\n```\n@misc{strand2024soccerragmultimodalsoccerinformation,\n      title={SoccerRAG: Multimodal Soccer Information Retrieval via Natural Queries}, \n      author={Aleksander Theo Strand and Sushant Gautam and Cise Midoglu and Pål Halvorsen},\n      year={2024},\n      eprint={2406.01273},\n      archivePrefix={arXiv},\n      primaryClass={cs.IR},\n      url={https://arxiv.org/abs/2406.01273}, \n}\n```\n```\n@misc{strand2024demosoccerinformationretrieval,\n      title={Demo: Soccer Information Retrieval via Natural Queries using SoccerRAG}, \n      author={Aleksander Theo Strand and Sushant Gautam and Cise Midoglu and Pål Halvorsen},\n      year={2024},\n      eprint={2406.01280},\n      archivePrefix={arXiv},\n      primaryClass={cs.IR},\n      url={https://arxiv.org/abs/2406.01280}, \n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimula%2Fsoccer-rag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimula%2Fsoccer-rag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimula%2Fsoccer-rag/lists"}