{"id":21690303,"url":"https://github.com/vic-lsh/imdb-analytics","last_synced_at":"2026-05-11T03:44:10.653Z","repository":{"id":40981125,"uuid":"190964725","full_name":"vic-lsh/imdb-analytics","owner":"vic-lsh","description":"Distributed website scraping and analysis system","archived":false,"fork":false,"pushed_at":"2022-12-08T05:56:26.000Z","size":10075,"stargazers_count":1,"open_issues_count":15,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-01-25T13:07:34.398Z","etag":null,"topics":["distributed-system","microservices","scraping"],"latest_commit_sha":null,"homepage":"https://imdb-analytics.azurewebsites.net/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vic-lsh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-06-09T04:41:16.000Z","updated_at":"2023-12-12T15:18:07.000Z","dependencies_parsed_at":"2023-01-24T14:45:51.483Z","dependency_job_id":null,"html_url":"https://github.com/vic-lsh/imdb-analytics","commit_stats":null,"previous_names":["vicshli/imdb-analytics","vic-lsh/imdb-analytics"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vic-lsh%2Fimdb-analytics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vic-lsh%2Fimdb-analytics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vic-lsh%2Fimdb-analytics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vic-lsh%2Fimdb-analytics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vic-lsh","download_url":"https://codeload.github.com/vic-lsh/imdb-analytics/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244618386,"owners_count":20482316,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed-system","microservices","scraping"],"created_at":"2024-11-25T17:29:58.225Z","updated_at":"2026-05-11T03:44:05.623Z","avatar_url":"https://github.com/vic-lsh.png","language":"Python","funding_links":[],"categories":["📦 Legacy \u0026 Inactive Projects"],"sub_categories":[],"readme":"# IMDb Analytics\n\nThis project aims to use various ways to help you explore IMDb in a visually\nfun way. Users are able to see trends of a TV Series' episode ratings (so they\ncan easily idenfify flops such as Game of Thrones), see our predictions on\nwhether a TV show / Movie will succeed commercially, explore our recommendations\nbased on your search history, and more.\n\nI use this project to explore Microservices architecture, the inner-workings of\na recommendation engine, predictions through data mining, how a resilient\ninternet scraper is built, and CI/CD best practices.\n\n## Screenshots\n\n| Web App                             | CLI                                      |\n| ----------------------------------- | ---------------------------------------- |\n| ![stdui](docs/static/sample_ui.png) | ![stdout](docs/static/sample_stdout.png) |\n\n## Motivations\n\nI visit IMDb almost on a daily basis. I use it to find new movies, look up\ncritic reviews for TV episodes, and check out fun trivias for TV shows that I\nwatch. While I love the service for how comprehensive and relevant it is, I\nwanted to do something to improve the site's UI. Hence this project, an attempt \nto provide a better interface for dealing with IMDb's data.\n\n## Architecture\n\nThe application is organized as a series of microservices that communicates with\neach other using RPC or RESTful API.\n\n### Services manifest\n\n| Service | Written In         | API | Description                                                                       |\n| ------- | ------------------ | --- | --------------------------------------------------------------------------------- |\n| [Web app](./src/web-app) | Typescript (React) | N/A | The front-end of the application. Users use it to interact with various services. |\n| [Database service](./src/db-service) | Python | REST | Provides an interface for interacting with MongoDb, where IMDb data is stored. |\n| [Extractor service](./src/extractor-service) | Python | RPC | Extracts queried information from IMDb. |\n| [Job service](./src/job-service) | Go | RPC | Schedules data extraction jobs, controlls extraction frequency, etc. |\n| Recommendation service | Python | TBD | _To be implemented_. Recommends IMDb entries based on search history (stored locally). |\n| Prediction service | Python | TBD | _To be implemented_. Predicts whether an unreleased movie / TV season would succeed commercially and/or critically. \n| User service | TBD | TBD | _Proposed, not confirmed_. Allows users to log in using SSO (Google, Facebook, etc.), or to create an account.|\n| Favorites service | TBD | TBD | _Proposed, not confirmed._ Gives users the ability to save items into lists (like pinterest. |\n\n## Learning goals\n\nOriginally intended to be a lightweight IMDb scraper, this project has evolved\ninto a microservices learning experience. In this project, I hope to learn more\nabout:\n\n- Microservices architectural design\n  - Designation of responsibilities\n  - Well-defined data flows (Istios)\n  - Abstraction of common infrastructure\n  - etc.\n- Docker containerization\n- Inter-container communication (RESTful vs. RPC)\n- Container orchestration (Kubernetes)\n- Golang and its advantages in concurrency\n- CI / CD\n\n## Trying out the application locally\n\nThis project is currently under development. To test the application on your\nhost machine, you may follow the commands below. Note that some of the\nfollowing instructions are unix-specific.\n\nBefore tesing, please ensure the following dependencies are installed:\n\n- [docker](https://docs.docker.com/v17.12/docker-for-mac/install/#download-docker-for-mac)\n- [brew](https://brew.sh)\n- [node](https://treehouse.github.io/installation-guides/mac/node-mac.html)\n\nTo start up the environment, go to the root directory and run\n\n```bash\nmake run-demo   # starts up the environment and loads sample data\n# or\nmake run        # starts up the environment, without loading sample data\n```\n\nYou should be able to access the web app at port 3001 (http://localhost:3001)\nnow.\n\nIn a local environment, each service runs in a dedicated container. If you're \ninterested in testing REST-ful calls to a service (provided REST-ful APIs are \navailable), use the following command to see the mappings between containers' \nand the host's ports:\n\n```\ndocker container ls\n```\n\nUse [docker](https://docs.docker.com/engine/reference/commandline/cli/) and \n[docker-compose](https://docs.docker.com/compose/)'s documentation to learn \nmore commands to play with the containers.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvic-lsh%2Fimdb-analytics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvic-lsh%2Fimdb-analytics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvic-lsh%2Fimdb-analytics/lists"}