https://github.com/vidhi1290/scienceqa-insights-exploring-with-llms
Predictive Text Analysis project! This repository contains code for predicting answers to science exam questions using advanced natural language processing techniques. Check out the code and results!
https://github.com/vidhi1290/scienceqa-insights-exploring-with-llms
interactive-visualizations kaggle kaggle-competition machine-learning multi-class-classification nlp nlp-machine-learning predictive-text-analysis random-forest-classifier text-analysis text-vectorization
Last synced: 6 months ago
JSON representation
Predictive Text Analysis project! This repository contains code for predicting answers to science exam questions using advanced natural language processing techniques. Check out the code and results!
- Host: GitHub
- URL: https://github.com/vidhi1290/scienceqa-insights-exploring-with-llms
- Owner: Vidhi1290
- Created: 2023-10-12T12:32:47.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-16T07:57:13.000Z (almost 2 years ago)
- Last Synced: 2025-02-02T18:33:25.503Z (8 months ago)
- Topics: interactive-visualizations, kaggle, kaggle-competition, machine-learning, multi-class-classification, nlp, nlp-machine-learning, predictive-text-analysis, random-forest-classifier, text-analysis, text-vectorization
- Language: Jupyter Notebook
- Homepage:
- Size: 5.29 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🚀 Predictive Text Analysis for Science Exams 🚀
Welcome to our Predictive Text Analysis project! This repository contains code for predicting answers to science exam questions using advanced natural language processing techniques.
## 📚 Dataset Used
We utilized a comprehensive dataset containing questions (prompt) and answer choices (A, B, C, D, E) from science exams. The dataset was meticulously curated to ensure diverse and meaningful questions for analysis.
## 🔍 Features
- **Prompt Analysis**: We performed in-depth analysis on question prompts, exploring word frequencies, lengths, and semantic patterns.
- **Text Vectorization**: Utilized TF-IDF vectorization to convert textual data into numerical features for machine learning model training.
- **Machine Learning Model**: Implemented a Random Forest Classifier for answer prediction, achieving high accuracy on the test set.## 🧠 Model Architecture
Our machine learning model comprises a Random Forest Classifier, a robust algorithm for multi-class classification tasks. We used TF-IDF vectorized features as input, enabling the model to learn complex patterns in the textual data.
## 🌟 Features
- **Interactive Visualizations**: Explore interactive charts and visualizations, including bar charts representing class distributions and dynamic word clouds showcasing frequently occurring words in questions.
- **3D Scatter Plots**: Dive into 3D scatter plots to uncover correlations between question difficulty, length, and correct answer frequencies.
- **Confusion Matrix**: Visualize the model's performance through an intuitive confusion matrix, providing insights into prediction accuracy.## 🚀 Usage
1. **Data Preprocessing**: Explore Jupyter Notebooks for in-depth data preprocessing and exploratory data analysis.
2. **Model Training**: Utilize the provided Python scripts to train the Random Forest Classifier and obtain predictions.
3. **Interactive Visualizations**: Run interactive Python scripts for dynamic visualizations of the dataset and model performance.## 🛠️ Dependencies
- Python 3.7+
- Pandas
- NumPy
- Scikit-Learn
- Matplotlib
- Seaborn
- Plotly
- WordCloud## 📊 Results
Our trained model achieved an accuracy of over 90% on the test dataset, demonstrating its effectiveness in predicting correct answers to science exam questions.
## 🌐 Connect with Me
Let's connect and collaborate! Feel free to reach out to me on:
- **LinkedIn**: [Vidhi Waghela](https://www.linkedin.com/in/vidhi-waghela/)
- **Kaggle**: [Vidhi Kishor Waghela](https://www.kaggle.com/vidhikishorwaghela)
- **GitHub**: [Vidhi1290](https://github.com/Vidhi1290)I'm always open to discussions, collaborations, and learning new things together. Don't hesitate to drop me a message or explore my other projects on GitHub. Happy coding! 🚀
Feel free to dive into the code, experiment with the features, and explore the nuances of writing quality predictions through keystroke analysis! 🕵️♂️💬
Happy coding! 🚀