{"id":26152771,"url":"https://github.com/mensenvau/pipezone_org","last_synced_at":"2026-05-08T06:41:23.598Z","repository":{"id":281542387,"uuid":"925796842","full_name":"mensenvau/pipezone_org","owner":"mensenvau","description":"🚀 Pipezone: Scalable \u0026 Containerized Apache Spark with Jupyter","archived":false,"fork":false,"pushed_at":"2025-03-09T20:12:30.000Z","size":0,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-09T20:25:37.188Z","etag":null,"topics":["apache-spark","docker-compose","jupyter-notebook","pipezone","pyspark"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mensenvau.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-01T19:12:09.000Z","updated_at":"2025-03-09T20:20:12.000Z","dependencies_parsed_at":"2025-03-09T20:36:16.029Z","dependency_job_id":null,"html_url":"https://github.com/mensenvau/pipezone_org","commit_stats":null,"previous_names":["mensenvau/pipezone_org"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mensenvau%2Fpipezone_org","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mensenvau%2Fpipezone_org/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mensenvau%2Fpipezone_org/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mensenvau%2Fpipezone_org/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mensenvau","download_url":"https://codeload.github.com/mensenvau/pipezone_org/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242989228,"owners_count":20217746,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","docker-compose","jupyter-notebook","pipezone","pyspark"],"created_at":"2025-03-11T07:21:41.510Z","updated_at":"2025-10-20T07:43:45.336Z","avatar_url":"https://github.com/mensenvau.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## 🚀 Pipezone\n\n**Pipezone** is a lightweight, scalable, and fully containerized data processing environment. It provides an easy-to-use Apache Spark setup with dynamic worker scaling, Jupyter Notebook for interactive development, and shared storage for seamless data access.\n\n### 📌 Features\n\n- **Apache Spark**: Master-worker architecture with dynamic worker scaling\n- **Jupyter Notebook**: Pre-configured for PySpark and SQL magic\n- **Shared Workspace**: Easily share code and data between services\n- **SQL Magic**: Run SQL queries directly within Jupyter (`%sql` support)\n- **One-Command Setup**: Deploy everything using `docker-compose`\n\n### 📺 System Architecture\n\n![Docker Setup](images/Docker.png)\n\n### 📦 Installation\n\n#### Prerequisites\n\nEnsure you have the following installed:\n\n- [Docker](https://www.docker.com/get-started)\n- [Docker Compose](https://docs.docker.com/compose/install/)\n\n#### Setup\n\nClone the repository and navigate into the project directory:\n\n```sh\ngit clone https://github.com/your-repo/pipezone_org.git\ncd pipezone_org\n```\n\nRun the setup:\n\n```sh\ndocker-compose up -d\n```\n\n### 🔗 Access Services\n\n| Service      | URL                                     |\n| ------------ | --------------------------------------- |\n| **Spark UI** | [127.0.0.1:8080](http://127.0.0.1:8080) |\n| **Jupyter**  | [127.0.0.1:8888](http://127.0.0.1:8888) |\n\n\u003e **Note:** By default, Jupyter runs without authentication. If it's not accessible, check the Docker log for the authentication token URL.\n\n### 📂 Shared Workspace\n\n![Folder Structure](images/Folder.png)\n\n### 📝 Usage\n\n#### Running a Simple Spark Job\n\nInside Jupyter, open a new Python notebook and run:\n\n```python\n# Set up Spark environment\nfrom spark_utils import get_spark\n\nspark = get_spark()\ndf = spark.read.csv(\"/home/jovyan/shared/data.csv\", header=True)\ndf.toPandas().head(4)  # You can also use df.show() without converting\n```\n\n#### Running SQL Queries\n\n```python\n%load_ext sql\n%sql spark\n\n%sql SELECT * FROM df\n```\n\n### 📓 Jupyter Notebook Example\n\n![Notebook Example](images/Notebook.png)\n\n### 🚫 Stopping and Removing Containers\n\nTo stop all running containers:\n\n```sh\ndocker-compose down\n```\n\nTo remove all containers and volumes:\n\n```sh\ndocker-compose down -v\n```\n\n### 👨‍💻 Contributing\n\n1. Fork the repository\n2. Create a new branch (`git checkout -b feature-branch`)\n3. Commit your changes (`git commit -m 'Add new feature'`)\n4. Push to the branch (`git push origin feature-branch`)\n5. Open a Pull Request\n\n### 📝 License ... \n\nThis project is licensed under the MIT License. See `LICENSE` for details.\n\n---\n\n🚀 **PipeZone** - Simplifying Big Data Processing!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmensenvau%2Fpipezone_org","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmensenvau%2Fpipezone_org","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmensenvau%2Fpipezone_org/lists"}