https://github.com/AlessandroGianfelici/italian_reviews_dataset
A dataset of italian reviews to train sentiment classification models
https://github.com/AlessandroGianfelici/italian_reviews_dataset
Last synced: about 1 month ago
JSON representation
A dataset of italian reviews to train sentiment classification models
- Host: GitHub
- URL: https://github.com/AlessandroGianfelici/italian_reviews_dataset
- Owner: AlessandroGianfelici
- Created: 2021-05-20T12:19:44.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-09-07T08:29:14.000Z (over 3 years ago)
- Last Synced: 2024-11-01T15:37:25.505Z (6 months ago)
- Size: 75.1 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-italian - Italian review dataset - Trustpilot-crawled dataset with 146,910 reviews. (Corpora / Sentiment Analysis)
README
# Italian reviews dataset
A dataset of italian reviews to train sentiment classification models.This dataset has been collected from the internet using web scraping techniques. For further information take a look to the code:
```html
https://github.com/AlessandroGianfelici/trustpilot_spider.git
```## Data
For each data point, the dataset contains the company name (hashed for privacy reasons), the title of the review, the text of the review and the number of stars (from 1 to 5).
As far as I know, this is the largest sentiment classification dataset for italian language freely available online.
## Usage
The data are stored as a txt file with comma separated fields. For example, if you're using python you can load it with pandas:
```python
import pandas as pddata = pd.read_csv('raw_data.txt')
```