Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mitodl/ocw-data-parser
A parsing script for MIT OpenCourseWare course data
https://github.com/mitodl/ocw-data-parser
Last synced: 6 days ago
JSON representation
A parsing script for MIT OpenCourseWare course data
- Host: GitHub
- URL: https://github.com/mitodl/ocw-data-parser
- Owner: mitodl
- Created: 2018-11-14T02:10:04.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2024-03-20T16:57:31.000Z (9 months ago)
- Last Synced: 2024-04-14T05:23:47.335Z (8 months ago)
- Language: Python
- Homepage:
- Size: 132 MB
- Stars: 0
- Watchers: 20
- Forks: 0
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ocw-data-parser
A parsing script for OCW course data. Additionally, you can use parsed output to generate static HTML site.
## Installation
This is a python 3.3+ project.pip install the `ocw-data-parser` library:
```bash
pip install ocw-data-parser
```## Usage
Each OCW course exported from Plone usually has a single folder named "0" under the course directory. This directory structure must be maintained for the parser to work correctly. When "course_dir" is referred to here, we are talking about the directory that contains this "0" directory.To parse a single OCW course:
```python
from ocw_data_parser import OCWParseryour_parser = OCWParser("path/to/course_dir/", "path/to/output/destination/")
# Extract the media files and parsed json locally inside output directory for each course directory in course_dir
your_parser.extract_media_locally()
# Extract media files hosted on the Akamai cloud
your_parser.extract_foreign_media_locally()# To upload all media to your S3 Bucket
# First make sure your AWS credentials are setup in your local environment
# Second, setup your s3 info
your_parser.setup_s3_uploading("your_bucket_name", "optional_containing_folder")
# Then, call upload all media to s3
your_parser.upload_all_media_to_s3()
# To upload course image thumbnail only
your_parser.upload_course_image()
```## Local Workflow
To download a list of courses based on `example_courses.json`, placed in `private` as `courses.json`:
```python
from ocw_data_parser import OCWDownloaderdownloader = OCWDownloader("private/courses.json", "PROD", "private/raw_courses", "ocw-content-storage")
downloader.download_courses()
```
In order for the above to work, you need `awscli` installed on your machine and it needs to be configured for access to the bucket that you specify.To parse a folder of course folders (like the ones downloaded above) and export only parsed json with s3 links:
```python
from ocw_data_parser import parse_allparse_all(courses_dir="private/raw_courses", destination_dir="../ocw-to-hugo/private/courses", upload_parsed_json=False, s3_bucket="open-learning-course-data-ci", s3_links=True, overwrite=True, beautify_parsed_json=True)
```If you desire to upload the parsed JSON to S3, simply set `upload_parsed_json` to `True`.