https://github.com/yashksaini-coder/learning-agency-lab

The main objective is to train a model that can score the student essays. It can reduce the high expense and time required to hand grade these essays.
https://github.com/yashksaini-coder/learning-agency-lab

Last synced: 4 months ago
JSON representation

The main objective is to train a model that can score the student essays. It can reduce the high expense and time required to hand grade these essays.

Host: GitHub
URL: https://github.com/yashksaini-coder/learning-agency-lab
Owner: yashksaini-coder
License: gpl-3.0
Created: 2024-04-15T06:04:41.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-05-15T19:01:13.000Z (about 1 year ago)
Last Synced: 2024-05-17T06:21:27.012Z (about 1 year ago)
Language: HTML
Size: 12.1 MB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

Learning Agency Lab - Automated Essay Scoring 2.0

**Learning Agency Lab - Automated Essay Scoring 2.0:-** The goal is to build a model that can accurately predict the score an essay deserves based solely on its text content. The competition aims to improve student learning outcomes by providing timely and reliable feedback to overburdened educators.

## Problem Statement

Essay writing is a crucial method to evaluate student learning and performance, but it is time-consuming for educators to grade manually.

- **Automated Writing Evaluation (AWE)** systems can assist in scoring essays, providing students with regular and timely feedback. However, many advancements in AWE are not widely accessible due to cost barriers. Open-source solutions are needed to make AWE technology available to every community, especially underserved ones.

## Competition Objective

The objective of this competition is to train a model to score student essays accurately. Participants are tasked with reducing the high expense and time required for manual grading, making it feasible to introduce essays into testing, a key indicator of student learning.

## Dataset
The competition dataset comprises about 24000 student-written argumentative essays. Each essay was scored on a scale of 1 to 6 (Link to the Holistic Scoring Rubric). Your goal is to predict the score an essay received from its text.

***File and Field Information:-***
Sure, here's the information organized in a tabular form:

| File Name | Description | Fields |
|--------------------|---------------------------------------------------------|-----------------------------------------|
| train.csv | Essays and scores to be used as training data | essay_id, full_text, score |
| test.csv | Essays to be used as test data | essay_id, full_text |
| sample_submission.csv | A submission file in the correct format | essay_id, score |

Each file contains specific fields:

- `train.csv`: Contains essays along with their unique ID (`essay_id`), the full text of the essay (`full_text`), and the holistic score of the essay on a 1-6 scale (`score`).
- `test.csv`: Contains essays to be used as test data, including their unique ID (`essay_id`) and the full text of the essay (`full_text`). This file does not include the `score` field.
- `sample_submission.csv`: A submission file template with the correct format for submission. It includes the unique ID of each essay (`essay_id`) and a placeholder for the predicted holistic score of the essay on a 1-6 scale (`score`).

This tabular representation summarizes the contents of each file and their respective fields, providing clarity on the dataset structure and file formats.

## Evaluation

Submissions are scored based on the quadratic weighted kappa, which measures the agreement between two outcomes. This metric typically varies from 0 (random agreement) to 1 (complete agreement). In the event that there is less agreement than expected by chance, the metric may go below 0.

The quadratic weighted kappa is calculated as follows. First, an N x N histogram matrix O is constructed, such that Oi,j corresponds to the number of essay_ids i (actual) that received a predicted value j. An N-by-N matrix of weights, w, is calculated based on the difference between actual and predicted values:

wi,j=(i−j)2(N−1)2

An N-by-N histogram matrix of expected outcomes, E, is calculated assuming that there is no correlation between values.
This is calculated as the outer product between the actual histogram vector of outcomes and the predicted histogram vector, normalized such that E and O have the same sum.

From these three matrices, the quadratic weighted kappa is calculated as:

κ=1−∑i,jwi,jOi,j∑i,jwi,jEi,j.

## Submission File

For each essay_id in the test set, participants must predict the corresponding score. The submission file should contain a header and have the following format:

```
essay_id,score
000d118,3
000fe60,3
001ab80,4
```

---
For detailed instructions, guidelines, and access to the dataset, please visit the competition page on Kaggle: [Learning Agency Lab - Automated Essay Scoring 2.0](https://www.kaggle.com/competitions/learning-agency-lab-automated-essay-scoring-2)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/yashksaini-coder/learning-agency-lab

Awesome Lists containing this project

README