https://github.com/mosheberman/lightbulb
A system for scraping Brooklyn College classes that I wrote in sophomore year.
https://github.com/mosheberman/lightbulb
academia college php
Last synced: about 1 month ago
JSON representation
A system for scraping Brooklyn College classes that I wrote in sophomore year.
- Host: GitHub
- URL: https://github.com/mosheberman/lightbulb
- Owner: MosheBerman
- Created: 2017-07-26T03:12:46.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2017-07-26T17:25:58.000Z (almost 9 years ago)
- Last Synced: 2025-01-20T10:47:47.280Z (over 1 year ago)
- Topics: academia, college, php
- Language: PHP
- Homepage:
- Size: 613 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
lightbulb
=========
This webapp alerts users when information for a CUNY class changes. Class info is pulled from the CUNY website. (**Note:** The course information used to be hosted on CUNY's website, [here][0], before the university system migrated to CUNYFirst.)
# History
Lightbulb was a project I wrote during sophomore year at Brooklyn College.
It had several parts:
1. A bot that scraped the Brooklyn College course system and regularly alerted me of changes to course listings. The bot is written in PHP, and most of what's visible in this repository is the bot.
2. There was a cron job that ran the scraping and diffing code regularly. The cron configuration has unfortunately has been lost to time.
3. A student-facing webapp was planned but never finished. As a part of this, I had planned to build out an API, which could eventually be used for an app as well.
I stopped working on this project when CUNY switched over all of the colleges to CUNYFirst, because my scraper was incompatible with the new website's layout and interaction model. In 2014 I did a partial rewrite in Python for CUNYFirst as my independent study coursework but never used the product to actually alert.
# Time Complexity
This project included several technical challenges, including time complexity issues.
One performance gain was found in [scraper.php][5]. Instead of iterating a list at a time complexity of `O(n)` (linear), I continuously popped the first element until there's nothing left. This has the effective complexity of `O(1)` (constant time,) because `O(n)` where `n` is `1`, is `O(1)`. I researched and documented the changes made to the scraper [on Stack Overflow][3].
In [`differ.php`][4], I was iterating courses and sections to compare contents. The initial loop had a complexity of `O(courses^2) * O(sections^2)`. I changed the comparison to use PHP's `array_key_exists`, which itself has a time complexity that's ["really close to `O(1)`"][2]. This theoretically brought my iteration down to `O(courses) * O(sections)`.
Between these two fixes, the bot ran in a reasonable amount of time, and actually helped people get into classes they needed to graduate.
# License
Copyright 2012 Moshe Berman.
[0]: http://student.cuny.edu/cgi-bin/SectionMeeting/SectMeetColleges.pl?COLLEGECODE=05
[1]: https://github.com/MosheBerman/lightbulb/commit/da0f8a4ee5366b28eed9521a78e5cd268a152fa5#diff-712c420d940d1a22c672127f9cbb2e8a
[2]: https://stackoverflow.com/a/2484455/224988
[3]: https://stackoverflow.com/a/13931470/224988
[4]: https://github.com/MosheBerman/lightbulb/blob/master/system/differ.php
[5]: https://github.com/MosheBerman/lightbulb/blob/master/system/scraper.php