{"id":25194961,"url":"https://github.com/its-kunal/recommendation-systems","last_synced_at":"2026-04-28T11:02:57.463Z","repository":{"id":276509233,"uuid":"929491051","full_name":"its-kunal/recommendation-systems","owner":"its-kunal","description":"This project demonstrates how to build a recommendation system using Apache Spark for distributed data processing, Python for machine learning, and common data science libraries. It uses the Alternating Least Squares (ALS) algorithm for collaborative filtering.","archived":false,"fork":false,"pushed_at":"2025-02-08T17:19:09.000Z","size":6,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-04T14:43:54.704Z","etag":null,"topics":["apache-spark","distributed-ml","machine-learning","pyspark","python","spark"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/its-kunal.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-08T17:10:33.000Z","updated_at":"2025-02-08T17:22:05.000Z","dependencies_parsed_at":"2025-02-08T18:25:46.391Z","dependency_job_id":"ed657ad5-d6d7-48a1-9eb4-1449e4a17956","html_url":"https://github.com/its-kunal/recommendation-systems","commit_stats":null,"previous_names":["its-kunal/recommendation-systems"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/its-kunal/recommendation-systems","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/its-kunal%2Frecommendation-systems","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/its-kunal%2Frecommendation-systems/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/its-kunal%2Frecommendation-systems/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/its-kunal%2Frecommendation-systems/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/its-kunal","download_url":"https://codeload.github.com/its-kunal/recommendation-systems/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/its-kunal%2Frecommendation-systems/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32377599,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T09:24:15.638Z","status":"ssl_error","status_checked_at":"2026-04-28T09:24:15.071Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","distributed-ml","machine-learning","pyspark","python","spark"],"created_at":"2025-02-10T00:29:34.882Z","updated_at":"2026-04-28T11:02:57.444Z","avatar_url":"https://github.com/its-kunal.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Recommendation System with Apache Spark, Python, and Machine Learning\n\nThis project demonstrates how to build a recommendation system using Apache Spark for distributed data processing, Python for machine learning, and common data science libraries.  It uses the Alternating Least Squares (ALS) algorithm for collaborative filtering.\n\n## Project Overview\n\nThe recommendation system analyzes user interaction data (e.g., ratings, purchases, views) to predict user preferences and generate personalized recommendations.  It leverages Spark's distributed computing capabilities to handle large datasets efficiently.\n\n## Data Files\n\nThe project uses three CSV files:\n\n* **`users.csv`:** Contains user information (e.g., `user_id`, `age`, `gender`, `location`).\n* **`interactions.csv`:** Records user interactions with items (e.g., `user_id`, `item_id`, `rating`, `timestamp`).\n* **`items.csv`:** Contains item metadata (e.g., `item_id`, `name`, `category`, `price`).\n\nExample data files are provided in the repository.  You can replace these with your own datasets.\n\n## Code Files\n\n* **`main.py`:** The main Python script that implements the recommendation system. It loads the data, preprocesses it, trains the ALS model, evaluates the model, generates recommendations, and saves the results to a CSV file.\n\n## Libraries\n\n* `pyspark`\n* `pandas`\n* `numpy`\n* `scikit-learn` (for evaluation metrics)\n\n## Running the Project (Local)\n\n1. **Prerequisites:**\n    * Install Python 3.x.\n    * Install Apache Spark.  You can download a pre-built version from the Apache Spark website or use a package manager like `brew` (on macOS). Make sure `SPARK_HOME` is set.\n    * Install the required Python libraries:\n\n    ```bash\n    pip install pyspark pandas numpy scikit-learn\n    ```\n\n2. **Data:** Place the `users.csv`, `interactions.csv`, and `items.csv` files in the same directory as `main.py`.\n\n3. **Run the Spark Application:** Use `spark-submit` to run the `main.py` script:\n\n    ```bash\n    spark-submit main.py\n    ```\n\n    This will start a local Spark session, load the data, train the model, evaluate it, generate recommendations, and save the results to `recommendations.csv` in the same directory.\n\n## Running the Project (Docker - Spark Container Separate)\n\n1. **Build the Spark Docker Image (If you don't have one):**\n\n   ```bash\n   docker pull apache/spark:latest  # Or a specific version\n   ```\n2. Run the Spark Container:\n\n    ```bash\n    docker run -it -p 8080:8080 -p 7077:7077 -v $(pwd):/home/jovyan apache/spark:latest\n    ```\n\n    Find Spark Master URL: In the Docker logs, find the Spark master URL `(e.g., spark://\u003ccontainer_id\u003e:7077 or spark://localhost:7077)`.\n\n    Set `SPARK_MASTER_URL` and run `main.py`:\n\n    Bash\n    ```bash\n    export SPARK_MASTER_URL=spark://localhost:7077  # Replace with actual URL\n    spark-submit main.py # or python main.py\n    ```\n\n    Make sure your data files (users.csv, interactions.csv, items.csv) are in the directory you mounted with the -v flag (your current directory).\n\n    Running the Project (Docker - Spark and Application in One Container):\n    Build the Docker Image:\n    Create a Dockerfile in the same directory as your main.py and CSV files. Example:\n\n    ```Dockerfile\n\n    FROM jupyter/pyspark-notebook:latest\n\n    USER root\n\n    RUN pip install scikit-learn\n\n    COPY main.py /home/jovyan/\n    COPY users.csv /home/jovyan/\n    COPY interactions.csv /home/jovyan/\n    COPY items.csv /home/jovyan/\n\n    WORKDIR /home/jovyan\n\n    USER $NB_UID\n\n    CMD [\"spark-submit\", \"main.py\"] # or CMD [\"python\", \"main.py\"]\n    Build the image:\n    ```\n    ```Bash\n    docker build -t my-spark-app .\n    Run the Docker Container:\n    ```\n    ```Bash\n    docker run -v $(pwd):/home/jovyan -p 8888:8888 my-spark-app # -p 8888:8888 is only if you want to use jupyter\n    ```\n\n## Output\nThe recommendations are saved in a CSV file named recommendations.csv in the same directory where you run the script.\n\n## Further Enhancements\n- Parameter Tuning: Experiment with different ALS parameters (e.g., maxIter, regParam) to improve model performance.\n- More Metrics: Evaluate the model using additional metrics (e.g., precision@k, recall@k).\n- Implicit Feedback: Adapt the code to handle implicit feedback data.\n- Real-time Recommendations: Explore using Spark Streaming or Structured - Streaming for real-time recommendation generation.\n- Production Deployment: Consider how to deploy the model and generate recommendations in a production environment.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fits-kunal%2Frecommendation-systems","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fits-kunal%2Frecommendation-systems","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fits-kunal%2Frecommendation-systems/lists"}