https://github.com/rayyan9477/dep
https://github.com/rayyan9477/dep
data data-science machine-learning python visualization web-scraping
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/rayyan9477/dep
- Owner: Rayyan9477
- License: gpl-3.0
- Created: 2024-07-29T19:06:22.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-29T19:28:57.000Z (almost 2 years ago)
- Last Synced: 2025-01-10T06:07:26.823Z (over 1 year ago)
- Topics: data, data-science, machine-learning, python, visualization, web-scraping
- Language: Jupyter Notebook
- Homepage:
- Size: 3.23 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Digital Empowerment Network
## Task 1: Data Extraction and Initial Analysis (`DEP_Task1.ipynb`)
**Objective**: Extract data from given sources and perform initial analysis.
### Steps:
1. **Data Collection**:
- Extracted data from various sources using web scraping techniques.
- Libraries used: `BeautifulSoup`, `requests`.
2. **Data Cleaning**:
- Processed raw data to remove noise and irrelevant information.
- Handled missing values and standardized data formats.
3. **Initial Analysis**:
- Conducted exploratory data analysis (EDA) to understand data distribution and identify key patterns.
- Visualized data using libraries like `matplotlib` and `seaborn`.
### Key Achievements:
- Successfully extracted data from multiple sources.
- Cleaned and preprocessed data for further analysis.
- Identified initial insights and patterns in the data.
## Task 2: Text Analysis and NLP (`DEP_TASK_2.ipynb`)
**Objective**: Perform textual analysis and compute various NLP metrics.
### Steps:
1. **Text Extraction**:
- Extracted article text from URLs provided in an Excel file.
- Ensured extraction of only relevant content (title and body).
2. **Text Analysis**:
- Computed various NLP metrics such as positive score, negative score, polarity score, subjectivity score, etc.
- Libraries used: `TextBlob`, `nltk`.
3. **Output Generation**:
- Structured the computed metrics as per the provided output format.
- Saved the results in an Excel file.
### Key Achievements:
- Efficiently extracted and processed textual data.
- Computed and analyzed various NLP metrics.
- Generated structured output for further use.
## Task 3: Advanced Data Processing (`Task_3.ipynb`)
**Objective**: Advanced data processing and analysis using Python.
### Steps:
1. **Advanced Data Cleaning**:
- Applied advanced data cleaning techniques to handle complex datasets.
- Used regular expressions and custom functions for specific cleaning tasks.
2. **Feature Engineering**:
- Created new features to enhance data analysis.
- Techniques used: aggregation, normalization, and transformation.
3. **Data Analysis and Visualization**:
- Conducted in-depth analysis using advanced statistical methods.
- Visualized complex relationships and trends using `plotly` and `seaborn`.
### Key Achievements:
- Applied advanced data cleaning and feature engineering techniques.
- Conducted comprehensive data analysis.
- Created interactive and insightful visualizations.
## Task 4: Anomaly Detection in Network Traffic (`Task_4.ipynb`)
**Objective**: Detect anomalies in network traffic data using machine learning techniques.
### Steps:
1. **Data Loading and Preprocessing**:
- Loaded network traffic data from a CSV file.
- Converted time columns to datetime format and calculated the duration of network sessions.
2. **Feature Extraction and Normalization**:
- Extracted relevant features such as bytes_in, bytes_out, and duration.
- Normalized the features using StandardScaler for consistent scaling.
3. **Anomaly Detection**:
- Applied the Isolation Forest algorithm to detect anomalies in the data.
- Identified anomalies based on the algorithm's predictions.
4. **Evaluation and Visualization**:
- Evaluated the performance of the anomaly detection using a confusion matrix.
- Visualized the results using Plotly and Seaborn for better understanding and presentation.
### Key Achievements:
- Successfully detected anomalies in network traffic data using machine learning.
- Evaluated and validated the anomaly detection results.
- Visualized anomalies effectively to highlight patterns and insights.
## Conclusion
Throughout the internship, I gained hands-on experience in data extraction, cleaning, and analysis using Python. I successfully completed tasks involving web scraping, NLP, advanced data processing, and anomaly detection, enhancing my skills in data science and analytics.