Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/davelester/mrjob-chnm-million-syllabi
MapReduce jobs to run on a corpus of a million course syllabi.
https://github.com/davelester/mrjob-chnm-million-syllabi
Last synced: 7 days ago
JSON representation
MapReduce jobs to run on a corpus of a million course syllabi.
- Host: GitHub
- URL: https://github.com/davelester/mrjob-chnm-million-syllabi
- Owner: davelester
- Created: 2012-03-27T06:48:23.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2012-12-24T07:20:04.000Z (almost 12 years ago)
- Last Synced: 2024-10-17T05:35:10.448Z (29 days ago)
- Language: Python
- Homepage:
- Size: 94.5 MB
- Stars: 4
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MapReduce Jobs for CHNM's Million Syllabi Database
## Description
This repository contains a series of MapReduce jobs that run on a sample of ~~50,000 syllabi~~ 100 syllabi from [CHNM's million syllabi database](http://www.dancohen.org/2011/03/30/a-million-syllabi/). They can also be used on the entire million+ dataset, however only a subset of the data has been cleaned and reformatted at this time. MapReduce jobs are written in Python, using [MRJob](https://github.com/Yelp/mrjob).
# Contents
* /data/ - Includes syllabi_sample.tsv, which is the first 100 records from the CHNM syllabi database.
* average_words_per_syllabus.py - Calculate the average number of words per syllabus text.
* count_syllabi.py - Count the number of syllabi in the dataset. This is the most-basic example of map reduce and using MRJob I could write.# Licence
MIT