An open API service indexing awesome lists of open source software.

https://github.com/mukuta-manit-d/shadowfox

This repository showcases the work done during my AI/ML internship at ShadowFox, where I contributed to machine learning projects involving Boston House Price Prediction and Car Price Prediction, both using Linear Regression, and GPT-2 Text Generation and NLP Exploration
https://github.com/mukuta-manit-d/shadowfox

jupyter-notebook linear-regression machine-learning price-prediction vscode

Last synced: 9 months ago
JSON representation

This repository showcases the work done during my AI/ML internship at ShadowFox, where I contributed to machine learning projects involving Boston House Price Prediction and Car Price Prediction, both using Linear Regression, and GPT-2 Text Generation and NLP Exploration

Awesome Lists containing this project

README

          

# AI/ML Developer Internship - ShadowFox

This repository showcases the work done during my AI/ML internship at **ShadowFox**, where I contributed to machine learning projects involving **Boston House Price Prediction** and **Car Price Prediction**, both using **Linear Regression** and **GPT-2 Text Generation** and **NLP Exploration**

## Projects

### 1. Boston House Price Prediction using Linear Regression

In this project, I used the famous **Boston Housing dataset** to predict house prices in Boston using **Linear Regression**. The goal was to build a model that could predict the price of a house based on features such as the number of rooms, location, crime rate, and other socioeconomic factors.

#### Key Steps:
- Data preprocessing: Cleaned the dataset, handled missing values, and encoded categorical variables.
- Model training: Trained a Linear Regression model on the processed dataset.
- Model evaluation: Assessed performance using **Mean Squared Error (MSE)** and **R² score**.

#### Results:
- **Model**: Linear Regression
- **R² Score**: 0.85
- **MSE**: 23.78

### 2. Car Price Prediction using Linear Regression

This project involved predicting the selling price of cars using features such as fuel type, kilometers driven, number of previous owners, and more. A **Linear Regression** model was used to estimate the price of a car based on these attributes.

#### Key Steps:
- Data preprocessing: Cleaned and prepared the car dataset by encoding categorical variables and scaling numerical features.
- Model training: Trained a Linear Regression model to predict car prices.
- Model evaluation: Evaluated the model’s performance using **Mean Squared Error (MSE)** and **R² score**.

#### Results:
- **Model**: Linear Regression
- **R² Score**: 0.85
- **MSE**: 3.48

## Key Technologies Used
- **Programming Languages**: Python
- **Libraries**: Scikit-learn, Pandas, Matplotlib, Seaborn
- **Tools**: Jupyter Notebooks, Google Colab, VS Code, GitHub

## GPT-2 Text Generation and NLP Exploration

This repository contains work carried out during my internship at ShadowFox, focusing on the exploration of text generation using OpenAI's GPT-2 model and related NLP (Natural Language Processing) concepts. The project aimed to understand the nuances of language modeling, implement GPT-2-based solutions, and explore real-world applications of text generation.

### Project Goals

1. **Understand GPT-2 Architecture**: Study the architecture, training techniques, and functionality of GPT-2.
2. **Implement Text Generation**: Build pipelines to generate coherent and contextually relevant text using GPT-2.
3. **Explore NLP Techniques**: Dive into preprocessing, tokenization, and fine-tuning for custom text generation tasks.
4. **Real-World Applications**: Investigate applications in automated content creation, summarization, and chatbot responses.

### Key Features

- **Text Generation**: Seamlessly generate text based on given prompts.
- **Fine-Tuning**: Customize GPT-2 on domain-specific datasets for improved relevance.
- **Tokenization**: Efficiently preprocess text data for model compatibility.
- **Exploratory Data Analysis (EDA)**: Analyze and visualize text datasets to identify trends and patterns.

### Tools & Technologies

- **Programming Language**: Python
- **Libraries**:
- Transformers (Hugging Face)
- PyTorch/TensorFlow
- NLTK and SpaCy (for preprocessing)
- Matplotlib/Seaborn (for visualizations)
- **Data**: Domain-specific datasets for fine-tuning.

### Challenges Faced

1. Ensuring coherence in longer text generations.
2. Managing computational resources during fine-tuning on large datasets.
3. Handling ambiguous or contextually vague prompts.

### Applications Explored

- **Content Creation**: Automated article generation.
- **Summarization**: Generating concise summaries from large text inputs.
- **Chatbots**: Enhancing conversational AI capabilities.
- **Storytelling**: Creating engaging and creative narratives.

### Results and Insights

- Achieved a **highly coherent text generation** performance with fine-tuned GPT-2.
- Identified best practices for **prompt engineering** and **model optimization**.
- Developed a reusable pipeline for **NLP experimentation and fine-tuning**.

### Future Scope

1. Experiment with newer transformer models like GPT-3 and beyond.
2. Integrate multilingual support for text generation.
3. Optimize the pipeline for real-time text generation tasks.

## Installation and Setup

1. Clone the repository:
```bash
git clone https://github.com/yourusername/shadowfox-ai-ml-internship.git
cd ShadowFox

2. Run the .ipynb files
```bash
Boston_Price_Prediction.ipynb
Car_price_prediction.ipynb
Language_Model.ipynb