https://github.com/ashish-soni08/frugal-ai-challenge

co-located with the 2025 AI Action Summit
https://github.com/ashish-soni08/frugal-ai-challenge

ai-deployment climate-challenges efficiency-and-performance

Last synced: 3 months ago
JSON representation

co-located with the 2025 AI Action Summit

Host: GitHub
URL: https://github.com/ashish-soni08/frugal-ai-challenge
Owner: Ashish-Soni08
License: mit
Created: 2025-01-17T21:34:37.000Z (5 months ago)
Default Branch: main
Last Pushed: 2025-01-31T18:48:27.000Z (5 months ago)
Last Synced: 2025-01-31T19:30:30.923Z (5 months ago)
Topics: ai-deployment, climate-challenges, efficiency-and-performance
Language: Jupyter Notebook
Homepage: https://frugalaichallenge.org/
Size: 69.3 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

---
license: cc-by-nc-4.0
task_categories:
- text-classification
language:
- en
pretty_name: Frugal AI Challenge 2025 - Text - Climate Disinformation
size_categories:
- 1K samba-ai
python -m venv frugal-ai
```

```bash
# activate the environment
source frugal-ai/bin/activate
```

```bash
# deactivate the virtual environment
deactivate
```

```bash
# create a Jupyter Notebook kernel
pip install jupyter ipykernel
```

```bash
# add the virtual environment as a kernel for the jupyter notebook
python -m ipykernel install --user --name=frugal-ai --display-name="Py3.12-frugal-ai"
```

```bash
# verify kernel installation
jupyter kernelspec list
```

```bash
# If needed
jupyter kernelspec uninstall frugal-ai
```

```bash
# A python package estimating the hardware energy consumption (CPU + GPU + RAM) of your program.

pip install codecarbon
```

## CHOSEN TASK: 📝 Detecting climate disinformation 📝: based on text from news articles

A major problem is the rise of climate related disinformation. A recent [scientific article](https://www.nature.com/articles/s41598-023-50591-6) estimated that **`14.8%`** of Americans do not believe in anthropogenic climate change. Much of this is probably due to disinformation. With climate change being one of the central threats to the wellbeing of our societies, this is a problem that must urgently be addressed.

Therefore, the task is to **Detect the Climate Disinformation in Newspaper articles.**

## Dataset

### Dataset Summary

A comprehensive collection of **~6000** `climate-related quotes and statements`, specifically `focused on identifying and categorizing climate disinformation narratives`. The dataset combines quotes and statements from various media sources, including television, radio, and online platforms, to help train models that can identify different types of climate disinformation claims. The labels are drawn from a simplified version of the [CARDS taxonomy](https://cardsclimate.com/) with only the **7** main labels.

## Dataset Creation

### Curation and Annotation

This dataset was compiled to help identify and understand common climate disinformation narratives in media and public discourse. **`It serves as a tool for training models that can automatically detect and categorize climate disinformation claims.`**

The dataset combines data from two main sources curated by the QuotaClimat & Data For Good team.

1. [DeSmog](https://www.desmog.com/climate-disinformation-database/) climate disinformation database with extracted and annotated quotes with GPT4o-mini and manual validations

2. [FLICC dataset](https://huggingface.co/datasets/fzanartu/FLICCdataset) from the paper ["Detecting Fallacies in Climate Misinformation: A Technocognitive Approach to Identifying Misleading Argumentation"](https://arxiv.org/abs/2405.08254) by Francisco Zanartu, John Cook, Markus Wagner, Julian Garcia - re-annotated with GPT4o-mini and manual validations.

### Data Dictionary

| Field Name | Data Type | Description | Possible Values/Information | Example Value |
|----------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `quote` | string | The actual quote or statement related to climate change. This field contains the textual data that is labeled for climate disinformation. | 6091 unique values (matches the number of rows in the dataset). These quotes are drawn from various media sources, including television, radio, and online platforms. | "There is clear, compelling evidence that many of the major conclusions of the IPCC, your new religions constantly-changing Holy Book, are based on evidence that has been fabricated. The hockey stick graph that purported to abolish the mediaeval warm period is just one example." |
| `label` | string | The category of the climate-related claim. This field represents the type of climate disinformation or related claim present in the `quote`. The labels are based on a simplified version of the CARDS taxonomy. | **Possible Values:** `5_science_unreliable`, `1_not_happening`, `4_solutions_harmful_unnecessary`, `0_not_relevant`, `6_proponents_biased`, `7_fossil_fuels_needed`, `2_not_human`, `3_not_bad`. These string labels correspond to the numerical categories 0-7 defined in the Label Descriptions below. | "5_science_unreliable" |
| `source` | string | The source of the quote or claim. This indicates the origin of the quote within the dataset. | **Possible Values:** `FLICC`, `Desmog`.
- **FLICC:** Quotes sourced from the FLICC dataset, which focuses on detecting fallacies in climate misinformation. This data was re-annotated for this challenge.
- **Desmog:** Quotes sourced from the DeSmog climate disinformation database, which were extracted and annotated for this challenge. | "FLICC" |
| `url` | string | The URL associated with the quote, pointing to the original source or a relevant page where the quote was found. | 780 unique values. These URLs represent a variety of sources, including news articles, websites, and potentially social media platforms. Note that external links may become inactive over time. | "https://huggingface.co/datasets/fzanartu/FLICCdataset" |
| `language` | string | The language of the quote. | **Possible Value:** `"en"` (English). All quotes in this dataset are in English. | "en" |
| `subsource` | string | A more specific sub-source or category within the main `source`. This can indicate the specific dataset or split from which the quote originated or a detail about the annotation process. | **Possible Values:** `CARDS`, `hamburg_test1`, `hamburg_test3`, `jindev`, `jintrain`, `hamburg_test2`, `Alhindi_train`, `jintest`, `Alhindi_dev`, `Alhindi_test`, `None`. These values likely correspond to different subsets or annotation stages from the original data sources or specific test sets created for the challenge. | "CARDS" |
| `id` | null | An identifier for the quote. | This field is consistently `null`, indicating that unique identifiers for each quote were not included in this version of the dataset. | `None` |
| `__index_level_0__` | int64 | The original index or row number in the dataset. This likely represents the row number in the original data file before any processing or splitting. | 6091 unique values (matches the number of rows in the dataset). This field likely serves as an internal index or identifier within the dataset structure. | 0 |

#### Labels and their Description

| Label | Description | Example Keywords/Concepts |
|-------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **0_not_relevant** | No relevant claim detected or claims that don't fit other categories. | General discussion, unrelated topics, acknowledgments, procedural statements. |
| **1_not_happening** | Claims denying the occurrence of global warming and its effects. | Global warming is not happening, climate change is a hoax, no warming, it's getting colder, no melting ice, no sea level rise, extreme weather is normal, cold weather proves no warming. |
| **2_not_human** | Claims denying human responsibility in climate change. | Greenhouse gases from humans are not causing climate change, natural cycles, solar activity, volcanoes are the cause, it's a natural phenomenon. |
| **3_not_bad** | Claims minimizing or denying negative impacts of climate change. | The impacts of climate change will not be bad, might even be beneficial, it's not a threat, it's exaggerated, plants will benefit from more CO2, warmer temperatures are good. |
| **4_solutions_harmful_unnecessary** | Claims against climate solutions. | Climate solutions are harmful, unnecessary, expensive, ineffective, renewable energy won't work, electric vehicles are bad, carbon taxes are a scam, the Green New Deal is a disaster, it's too late to do anything. |
| **5_science_is_unreliable** | Claims questioning climate science validity. | Climate science is uncertain, unsound, unreliable, biased, flawed models, scientists are manipulating data, it's just a theory, the data is wrong, historical records don't support it, consensus is not science. |
| **6_proponents_biased** | Claims attacking climate scientists and activists. | Climate scientists are alarmist, biased, wrong, hypocritical, corrupt, politically motivated, they are in it for the money, activists are exaggerating, it's a political agenda. |
| **7_fossil_fuels_needed** | Claims promoting fossil fuel necessity. | We need fossil fuels for economic growth, prosperity, to maintain our standard of living, renewable energy is not reliable, fossil fuels are essential, they provide cheap energy, transitioning away from fossil fuels will hurt the economy.

## Training Data

The models use the QuotaClimat/frugalaichallenge-text-train dataset:

- Size: ~6000 examples
- Split: 80% train, 20% test
- 8 categories of climate disinformation claims

## Personal and Sensitive Information

The dataset contains publicly available statements and quotes. Care has been taken to focus on the claims themselves rather than personal information about individuals.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ashish-soni08/frugal-ai-challenge

Awesome Lists containing this project

README