https://github.com/luciarevaliente/shell_script_data_cleaning
This project focuses on cleaning and processing datasets using Shell scripts. It is part of the Fundamentals of Informatics course (2022-23) and involves handling movie and show data to create cleaned and filtered datasets for further analysis.
https://github.com/luciarevaliente/shell_script_data_cleaning
data data-cleaning shell-script
Last synced: 5 months ago
JSON representation
This project focuses on cleaning and processing datasets using Shell scripts. It is part of the Fundamentals of Informatics course (2022-23) and involves handling movie and show data to create cleaned and filtered datasets for further analysis.
- Host: GitHub
- URL: https://github.com/luciarevaliente/shell_script_data_cleaning
- Owner: luciarevaliente
- Created: 2022-10-10T08:46:16.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-12-08T20:07:03.000Z (over 1 year ago)
- Last Synced: 2025-05-21T18:33:48.947Z (about 1 year ago)
- Topics: data, data-cleaning, shell-script
- Language: Shell
- Homepage:
- Size: 3.82 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Fundamentals of Informatics: Cleaning a Dataset
This repository contains the first practice of the Fundamentals of Informatics course (2022-23), which involves cleaning a dataset.
## Project Description
The goal of this practice is to learn how to handle and clean datasets using Shell scripts. Several CSV files with movie and show data have been provided, and scripts have been created to filter and clean this data, generating final files that are more manageable and useful for further analysis.
## Repository Contents
- **Movies.csv**: Original file with movie data.
- **Movies_columna12.csv** to **Movies_columna16.csv**: Files with specific columns extracted from the original dataset.
- **Movies_f.csv** and **Movies_net.csv**: Files with filtered and cleaned movie data.
- **Shows.csv**: Original file with show data.
- **Shows_columna12.csv** to **Shows_columna15.csv**: Files with specific columns extracted from the original dataset.
- **Shows_f.csv** and **Shows_net.csv**: Files with filtered and cleaned show data.
- **practica1.sh**: Script used for data cleaning and processing.
- **prova.txt** and **prova_script_pas4**: Test files used during the development of the practice.
- **titles.cvs**: File with titles of movies and shows.
## Instructions
1. **Clone the repository**:
```bash
git clone https://github.com/luciarevaliente/fon_info_practica1.git
cd fon_info_practica1
```
2. **Run the cleaning script**:
```bash
./practica1.sh
```
## Contributions
This project is part of an academic course and does not accept external contributions.
## License
This project does not have a specific license and is for educational purposes only.