Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/christs8920/data-science-py
A collection of data science projects made in python.
https://github.com/christs8920/data-science-py
data-science data-visualization machine-learning matplotlib nltk numpy pandas python sklearn svm-classifier visualization
Last synced: about 1 month ago
JSON representation
A collection of data science projects made in python.
- Host: GitHub
- URL: https://github.com/christs8920/data-science-py
- Owner: ChrisTs8920
- License: mit
- Created: 2023-04-30T19:18:56.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-03T10:00:00.000Z (5 months ago)
- Last Synced: 2024-08-03T11:22:01.009Z (5 months ago)
- Topics: data-science, data-visualization, machine-learning, matplotlib, nltk, numpy, pandas, python, sklearn, svm-classifier, visualization
- Language: Python
- Homepage:
- Size: 30.3 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Data Science with Python
## Description
This repository contains a collection of Data science projects, made in python.
Libraries used:
- Pandas
- NumPy
- sklearn
- NLTK## Heart Disease Classification
This project uses Machine Learning (classification algorithm - S.V.M. or Support Vector Machine) to predict whether a patient has an increased chance of a heart attack or not. It then shows the accuracy of the algorithm, and plots the different parameters of the data set.
Data was provided by [kaggle.com](https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-prediction-dataset/data).
>*This project was an assignment and was made during my Big Data course in University.*
## Corpus analysis
This project plots statistics for some of the built-in [nltk](https://www.nltk.org/) text books:
- Lexical richness and percentage of text taken up by various words.
- Stemming vs Lemmatization.
- str.split() vs nltk.tokenize().
- Frequency Distributions.>*This project was an assignment and was made during my Information Retrieval course in University.*
## Salary Survey
This project plots some statistics for programmers in Greece for the year 2022:
- Education Level
- Most used programming languages
- Remote work (both, remote, on-site)
- Median wage>Data was provided by [SocialNerds](https://www.youtube.com/@SocialNerdsGR).
## How to run
1. Data file needs to be in the same directory as script file.
2. Execute ```.py```.