https://github.com/miferreiro/cdap-map-reduce
Map/Reduce exercises for the subject of "Computación Distribuída e de Altas Prestacións" in the Master Degree of Computer Engineering of the University of Vigo in 2020
https://github.com/miferreiro/cdap-map-reduce
map-reduce python
Last synced: 11 months ago
JSON representation
Map/Reduce exercises for the subject of "Computación Distribuída e de Altas Prestacións" in the Master Degree of Computer Engineering of the University of Vigo in 2020
- Host: GitHub
- URL: https://github.com/miferreiro/cdap-map-reduce
- Owner: miferreiro
- Created: 2020-06-22T16:12:56.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2020-06-22T16:19:27.000Z (about 6 years ago)
- Last Synced: 2024-12-27T20:36:21.282Z (over 1 year ago)
- Topics: map-reduce, python
- Language: Python
- Size: 31.6 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Starting on Map/reduce
These three exercises were made in the subject of "Computación Distribuída e de Altas Prestacións" in the Master Degree of Computer Engineering of the University of Vigo in 2020
### Exercise 1
This exercise is composed of a series of files containing audience data on topics broadcast on radio stations:
- The join_cad?.txt files consist of a list of music tracks and, for each track, the radio station where it was broadcast.
- The join_num?.txt files also contain playlists and, for each track, the number of listeners it has had.
The objective of this section is to implement a map/reduce task that provides an answer to the following question:
*What has been the total number of listeners (in all radio stations) to the topics that have been broadcast by RNE1?*
NOTE 1: the mapper for this task is simple. Once implemented, its operation can be checked in the terminal:
`$ cat join_*.txt | ./join_mapper.py | sort`
NOTE 2: the reducer will be a little more complex, but we must not lose sight of the fact that at its entry the data will be ordered alphabetically.
### Exercise 2
In order to do this exercise, the file containing information on the sales made in a chain of department stores in January 2012 is used as a starting point. Each line of the purchases.txt file contains the following fields: date, time, city, section, amount, means of payment.
We ask that you implement map/reduce programs that will allow you to answer the following questions:
- What is the most widely used payment method for the purchase of computers?
- For each means of payment, which section makes the most sales?
A small pdf document should be attached briefly justifying the decision taken on the content of the fields and briefly explaining the implementation and results.