{"id":22322441,"url":"https://github.com/danielrosehill/eco-ninja-3","last_synced_at":"2026-02-22T00:02:56.992Z","repository":{"id":266133830,"uuid":"897485483","full_name":"danielrosehill/Eco-Ninja-3","owner":"danielrosehill","description":"Configuration for an LLM assistant that performs analysis on sustainability data","archived":false,"fork":false,"pushed_at":"2024-12-04T23:18:12.000Z","size":2514,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-14T16:05:56.369Z","etag":null,"topics":["data-visualization","prompt-engineering","prompting","sustainability"],"latest_commit_sha":null,"homepage":"https://docs.bydanielrosehill.com","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danielrosehill.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-12-02T17:59:34.000Z","updated_at":"2024-12-04T23:18:16.000Z","dependencies_parsed_at":"2024-12-02T19:31:04.731Z","dependency_job_id":"4fedad3e-4345-489f-be9e-36fa25ce555c","html_url":"https://github.com/danielrosehill/Eco-Ninja-3","commit_stats":null,"previous_names":["danielrosehill/eco-ninja-3"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/danielrosehill/Eco-Ninja-3","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielrosehill%2FEco-Ninja-3","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielrosehill%2FEco-Ninja-3/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielrosehill%2FEco-Ninja-3/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielrosehill%2FEco-Ninja-3/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danielrosehill","download_url":"https://codeload.github.com/danielrosehill/Eco-Ninja-3/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielrosehill%2FEco-Ninja-3/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29699340,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-21T23:35:04.139Z","status":"ssl_error","status_checked_at":"2026-02-21T23:35:03.832Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-visualization","prompt-engineering","prompting","sustainability"],"created_at":"2024-12-04T01:07:25.041Z","updated_at":"2026-02-22T00:02:56.958Z","avatar_url":"https://github.com/danielrosehill.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Eco Ninja 3: Using LLMs To Compare Sustainability \u0026 Performance Data For Publicly Listed Companies\n\n![alt text](images/sloth-calculating-emissions.webp)\n\n## How far can LLMs be stretched for data retrieval and analysis?\n\n[![Try on Hugging Face](https://img.shields.io/badge/Try%20on-Hugging%20Face-orange)](https://hf.co/chat/assistant/674dfb83203be059afb0da43)\n\n*02-Dec-24*\n\nMy decision to give this a whimsical name and share this on the internet was motivated primarily by the fact that after spending so much time working on these prompts and configurations it felt out of character *not* to want to pass it along.\n\nMy \"Eco Genie\" prompt has already gone through far more than three rounds of iteration but parking it here seems like a comfortable place for version control: it's had a couple of fitting starts and requires much more tinkering. \n\n\"Eco Ninja\" is *not* a prompting strategy that I expected to work out of the box. I've played around with various ways to run this including LangChain, as an \"assistant\" (I think the most logiclal implementation) and - for testing, etc - casual prompting. \n\nSo far, I've been pleasantly surprised by the accuracy and it has maintained good accuracy over runs of a couple of hundred inputs (for LangChain, I used an input file called `companies.txt` to provide a list to iterate through). I like testing this agent and prompt config because it represents, for me, something of a moonshot idea in seeing how far chain-of-thought prompting can stretch way beyond the realm of conversational instruction into more exploratory territory. \n\nAsking for only one humble variable from the user, the config throws an *awful* lot of wants at the kitchen sink of whatever large language model is unlucky enough to run into it (I experimented with a more elaborate version but through it not worth including):\n\n- Retrieve the right data!  \n- Parse those long sustainability PDFs!  \n- Retrieve the exact correct metric from a long table of lookalikes!  \n- Perform some calculations!  \n- Summarise and format the output\n\nTo my mind, it represents a good challenge to see what LLMs enriched with RAG and computational augments *are* already capable of. Its accuracy is imperfect and verification and review of its findings are a must, but it has nevertheless consistently surprised me.\n\n## Test URLs\n\nChatGPT - not public\n\n## Validation parameter\n\nExxon Mobile Corporation (XOM:NYSE) has been chosen as a validation benchmark because of its sizeable scope 3 emissions.\n\nThe validated data for Exxon can be found recorded within the  `validation` folder.\n\nSimilar validation documents can be used for accuracy assessment / QA / evaluation.\n\n## Config For Agent\n\nNested under:\n\n`agent-config`\n\nTry on:\n\n- Hugging Face  \n- OpenAI Platform \n-  Etc\n-  \n\n## Agent --\u003e Prompt\n\nTo \"convert\" this to a prompt template, rewrite (or ask an LLM to!). This works nicely with LangChain.\n\n\n## Versions\n\nCore prompt body:\n\nV3 - Word count about 356 chars\n\n488 tokens (GPT4-o)\n\n## Model Requirements\n\n- Choose an instructional model  \n- Training data cutoff must be after Y/E 2023 *or* RAG pipeline beyond that  \n- OpenAI `omni` series is recommended.\n\n## Prompting Strategies\n\n- The thumbnail request was added as I was working on creating a visualisation for this in Streamlit\n- The random ID would be better scripted but as a second best I added it into the config. The idea is that it could facilitate correlating similar reports after the initial first pass data generation. \n\n## Output Formatting Methods\n\nThe strategy I've been experimenting with to use this to ingest data in a standardised format was to provide a header row of `CSV` with the instruction to adhere exactly to this format. Something very like:\n\n```csv\ncompany,company name,sector,country,reporting year,scope 1 emissions,scope 2 emissions,scope 3 emissions,scope 1 emissions monetized,scope 2 emissions monetized,scope 3 emissions monetized,scope 1 units,scope 2 units,scope 3 units,units text,report URL,report title,report publication date,all scopes,all scopes monetized,EBITDA,EBITDA source,EBITDA URL,EBITDA reporting year,monetized emissions over EBITDA ratio,monetization rate,scope 3 reporting type\n```\n\n## Experiment: Hey LLM, What Would YOU Add?!\n\nI *love* the fact that working with prompts is such fertile ground for creative techhies. \n\nI like seeing if an LLM can ideate any variables that it may wish to scoop in to its output (yes, I'm a polite prompter):\n\n\u003ePlease generate one iteration on this prompt template. Its purpose is to retrieve information to analyze the potential correlation in different industrial sectors between sustainability performance and financial profitability. You may add up to five more values and you have total liberty to think what those might be. Just make sure to add them into the main body of the template and also add them as values in the CSV row to ensure that the data is returned in the same format. Here is the current template. {template}","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanielrosehill%2Feco-ninja-3","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanielrosehill%2Feco-ninja-3","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanielrosehill%2Feco-ninja-3/lists"}