https://github.com/mukuta-manit-d/shadowfox
This repository showcases the work done during my AI/ML internship at ShadowFox, where I contributed to machine learning projects involving Boston House Price Prediction and Car Price Prediction, both using Linear Regression, and GPT-2 Text Generation and NLP Exploration
https://github.com/mukuta-manit-d/shadowfox
jupyter-notebook linear-regression machine-learning price-prediction vscode
Last synced: 9 months ago
JSON representation
This repository showcases the work done during my AI/ML internship at ShadowFox, where I contributed to machine learning projects involving Boston House Price Prediction and Car Price Prediction, both using Linear Regression, and GPT-2 Text Generation and NLP Exploration
- Host: GitHub
- URL: https://github.com/mukuta-manit-d/shadowfox
- Owner: Mukuta-Manit-D
- License: mit
- Created: 2025-01-19T16:42:22.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-01-20T17:36:22.000Z (10 months ago)
- Last Synced: 2025-01-20T17:41:15.861Z (10 months ago)
- Topics: jupyter-notebook, linear-regression, machine-learning, price-prediction, vscode
- Language: Jupyter Notebook
- Homepage:
- Size: 154 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AI/ML Developer Internship - ShadowFox
This repository showcases the work done during my AI/ML internship at **ShadowFox**, where I contributed to machine learning projects involving **Boston House Price Prediction** and **Car Price Prediction**, both using **Linear Regression** and **GPT-2 Text Generation** and **NLP Exploration**
## Projects
### 1. Boston House Price Prediction using Linear Regression
In this project, I used the famous **Boston Housing dataset** to predict house prices in Boston using **Linear Regression**. The goal was to build a model that could predict the price of a house based on features such as the number of rooms, location, crime rate, and other socioeconomic factors.
#### Key Steps:
- Data preprocessing: Cleaned the dataset, handled missing values, and encoded categorical variables.
- Model training: Trained a Linear Regression model on the processed dataset.
- Model evaluation: Assessed performance using **Mean Squared Error (MSE)** and **R² score**.
#### Results:
- **Model**: Linear Regression
- **R² Score**: 0.85
- **MSE**: 23.78
### 2. Car Price Prediction using Linear Regression
This project involved predicting the selling price of cars using features such as fuel type, kilometers driven, number of previous owners, and more. A **Linear Regression** model was used to estimate the price of a car based on these attributes.
#### Key Steps:
- Data preprocessing: Cleaned and prepared the car dataset by encoding categorical variables and scaling numerical features.
- Model training: Trained a Linear Regression model to predict car prices.
- Model evaluation: Evaluated the model’s performance using **Mean Squared Error (MSE)** and **R² score**.
#### Results:
- **Model**: Linear Regression
- **R² Score**: 0.85
- **MSE**: 3.48
## Key Technologies Used
- **Programming Languages**: Python
- **Libraries**: Scikit-learn, Pandas, Matplotlib, Seaborn
- **Tools**: Jupyter Notebooks, Google Colab, VS Code, GitHub
## GPT-2 Text Generation and NLP Exploration
This repository contains work carried out during my internship at ShadowFox, focusing on the exploration of text generation using OpenAI's GPT-2 model and related NLP (Natural Language Processing) concepts. The project aimed to understand the nuances of language modeling, implement GPT-2-based solutions, and explore real-world applications of text generation.
### Project Goals
1. **Understand GPT-2 Architecture**: Study the architecture, training techniques, and functionality of GPT-2.
2. **Implement Text Generation**: Build pipelines to generate coherent and contextually relevant text using GPT-2.
3. **Explore NLP Techniques**: Dive into preprocessing, tokenization, and fine-tuning for custom text generation tasks.
4. **Real-World Applications**: Investigate applications in automated content creation, summarization, and chatbot responses.
### Key Features
- **Text Generation**: Seamlessly generate text based on given prompts.
- **Fine-Tuning**: Customize GPT-2 on domain-specific datasets for improved relevance.
- **Tokenization**: Efficiently preprocess text data for model compatibility.
- **Exploratory Data Analysis (EDA)**: Analyze and visualize text datasets to identify trends and patterns.
### Tools & Technologies
- **Programming Language**: Python
- **Libraries**:
- Transformers (Hugging Face)
- PyTorch/TensorFlow
- NLTK and SpaCy (for preprocessing)
- Matplotlib/Seaborn (for visualizations)
- **Data**: Domain-specific datasets for fine-tuning.
### Challenges Faced
1. Ensuring coherence in longer text generations.
2. Managing computational resources during fine-tuning on large datasets.
3. Handling ambiguous or contextually vague prompts.
### Applications Explored
- **Content Creation**: Automated article generation.
- **Summarization**: Generating concise summaries from large text inputs.
- **Chatbots**: Enhancing conversational AI capabilities.
- **Storytelling**: Creating engaging and creative narratives.
### Results and Insights
- Achieved a **highly coherent text generation** performance with fine-tuned GPT-2.
- Identified best practices for **prompt engineering** and **model optimization**.
- Developed a reusable pipeline for **NLP experimentation and fine-tuning**.
### Future Scope
1. Experiment with newer transformer models like GPT-3 and beyond.
2. Integrate multilingual support for text generation.
3. Optimize the pipeline for real-time text generation tasks.
## Installation and Setup
1. Clone the repository:
```bash
git clone https://github.com/yourusername/shadowfox-ai-ml-internship.git
cd ShadowFox
2. Run the .ipynb files
```bash
Boston_Price_Prediction.ipynb
Car_price_prediction.ipynb
Language_Model.ipynb