https://github.com/saeidemadi/scrapeformankan
This is a student project for a data mining course and is a simple exercise
https://github.com/saeidemadi/scrapeformankan
activity crawling csv dataset food linear-regression randomforestregressor spider spiderman webscraping
Last synced: about 1 year ago
JSON representation
This is a student project for a data mining course and is a simple exercise
- Host: GitHub
- URL: https://github.com/saeidemadi/scrapeformankan
- Owner: saeidEmadi
- License: gpl-3.0
- Created: 2024-06-09T03:48:51.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-06-17T13:08:29.000Z (almost 2 years ago)
- Last Synced: 2025-01-26T19:46:43.274Z (about 1 year ago)
- Topics: activity, crawling, csv, dataset, food, linear-regression, randomforestregressor, spider, spiderman, webscraping
- Language: Jupyter Notebook
- Homepage:
- Size: 1.84 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# scrape For Mankan
This is a student project for a data mining course and is a simple exercise
In this project, we have tried to extract data from a site using ``` web scraping ``` and ``` crawling ``` methods and create a data set.
And after cleaning and preparing the data set, we first analyze it and then use the obtained results to predict and guide users.
*You can easily calculate the amount of calories needed based on the amount of daily activity :*
- **ridingBike**
- **running**
- **walking**
- **cleaningUp**
*and foods that provide the same amount of calories to the body.*
## DataSet 
This dataset contains useful information such as the amount of ```calories, protein, etc```. about foods and edibles
**This dataset has 1821 records and 11 columns**
| siteId | name | calory | carbo | protein | fat | fiber | activity1 | activity2 | activity3 | activity4 |
| ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
The unit of columns is as :
- ` siteId ` : *Integer*
- ` name ` : *String*
- ` calory ` : *Kcal[kilocalorie]*
- ` carbo ` : *g[Gram]*
- ` protein ` : *g[Gram]*
- ` fat ` : *g[Gram]*
- ` fiber ` : *g[Gram]*
- `activity1 = ridingBike` : *m[Minute]*
- `activity2 = running` : *m[Minute]*
- `activity3 = walking` : *m[Minute]*
- `activity4 = cleaningUp` : *m[Minute]*
In this project, we use the following two models with the specified accuracy:
- `Linear regression`: *0.84*
- `RandomForestRegressor`: *0.87*
we have used **RandomForestRegressor** for prediction because is very accurate.
**Dataset Reference :** 
> [!TIP]
> Thanks to the Mankan site