{"id":17506350,"url":"https://github.com/kenriortega/data_engineer_learning_path","last_synced_at":"2025-03-28T21:13:41.071Z","repository":{"id":236296490,"uuid":"792323748","full_name":"kenriortega/data_engineer_learning_path","owner":"kenriortega","description":null,"archived":false,"fork":false,"pushed_at":"2024-04-26T12:47:03.000Z","size":11,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-03T06:52:58.417Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kenriortega.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-26T12:40:30.000Z","updated_at":"2024-04-26T12:47:06.000Z","dependencies_parsed_at":"2024-04-26T13:48:42.730Z","dependency_job_id":"5e99eacf-52f9-4507-b8cb-02e2413f50a8","html_url":"https://github.com/kenriortega/data_engineer_learning_path","commit_stats":null,"previous_names":["kenriortega/data_engineer_learning_path"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kenriortega%2Fdata_engineer_learning_path","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kenriortega%2Fdata_engineer_learning_path/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kenriortega%2Fdata_engineer_learning_path/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kenriortega%2Fdata_engineer_learning_path/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kenriortega","download_url":"https://codeload.github.com/kenriortega/data_engineer_learning_path/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246100583,"owners_count":20723479,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-20T03:37:26.894Z","updated_at":"2025-03-28T21:13:41.052Z","avatar_url":"https://github.com/kenriortega.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Documentation\n\n\n## Dependencies\n\n```sh\n\npip install \"pyiceberg[s3fs,hive,sql-sqlite,duckdb,pyarrow]\"\n```\n\n```yaml\n\nversion: \"3.7\"\nname: etl\nnetworks:\n  data_lakehouse:\n    driver: bridge\nvolumes:\n  redpanda-0: null\nservices:\n  postgres:\n    image: 'postgres:latest'\n    container_name: postgres\n    ports:\n      - \"5432:5432\"\n    environment:\n      POSTGRES_USER: postgres\n      POSTGRES_PASSWORD: postgres\n      POSTGRES_DB: ikoko\n    healthcheck:\n      test: [ \"CMD-SHELL\", \"pg_isready\" ]\n      interval: 10s\n      timeout: 5s\n      retries: 5\n    networks:\n      - data_lakehouse\n  minio:\n    image: minio/minio\n    container_name: minio\n    environment:\n      - MINIO_ROOT_USER=admin\n      - MINIO_ROOT_PASSWORD=password\n    networks:\n      - data_lakehouse\n    ports:\n      - 9001:9001\n      - 9000:9000\n    command: [\"server\", \"/data\", \"--console-address\", \":9001\"]\n\n  mc:\n    depends_on:\n      - minio\n    image: minio/mc\n    networks:\n      - data_lakehouse\n    container_name: mc\n    entrypoint: \u003e\n      /bin/sh -c \"\n      until (/usr/bin/mc config host add minio http://minio:9000 admin password) do echo '...waiting...' \u0026\u0026 sleep 1; done;\n      /usr/bin/mc rm -r --force minio/warehouse;\n      /usr/bin/mc mb minio/warehouse;\n      tail -f /dev/null\n      \" \n  # Nessie Catalog Server Using In-Memory Store\n  nessie:\n    image: projectnessie/nessie:latest\n    container_name: nessie\n    networks:\n      - data_lakehouse\n    ports:\n      - 19120:19120\n  rest:\n    image: tabulario/iceberg-rest\n    container_name: iceberg-rest\n    networks:\n      - data_lakehouse\n    ports:\n      - 8181:8181\n    environment:\n      - AWS_ACCESS_KEY_ID=admin\n      - AWS_SECRET_ACCESS_KEY=password\n      - AWS_REGION=us-east-1\n      - CATALOG_WAREHOUSE=s3://warehouse/\n      - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO\n      - CATALOG_S3_ENDPOINT=http://minio:9000\n\n```\n\n```python\nfrom pyiceberg.catalog.sql import SqlCatalog\nfrom pyiceberg.catalog.rest import RestCatalog\n\nwarehouse_path = \"./warehouse\"\ncatalog = SqlCatalog(\n    \"sqlite\",\n    **{\n        \"uri\": f\"sqlite:///{warehouse_path}/pyiceberg_catalog.db\",\n        \"warehouse\": f\"file://{warehouse_path}\",\n    },\n)\ncatalog = RestCatalog(\n    \"docs\",\n    **{\n        \"uri\": \"http://192.168.1.105:8181\",\n        \"s3.endpoint\": \"http://192.168.1.105:9000\",\n        \"py-io-impl\": \"pyiceberg.io.pyarrow.PyArrowFileIO\",\n        \"s3.access-key-id\": \"admin\",\n        \"s3.secret-access-key\": \"password\"\n    },\n)\n```\n\n## Resources\n- [https://py.iceberg.apache.org/api/#create-a-table](https://py.iceberg.apache.org/api/#create-a-table)\n- [https://py.iceberg.apache.org/#installation](https://py.iceberg.apache.org/#installation)\n- [https://www.kaggle.com/datasets/gauthamp10/google-playstore-apps?resource=download](https://www.kaggle.com/datasets/gauthamp10/google-playstore-apps?resource=download)\n- [https://www.kaggle.com/datasets/gauthamp10/apple-appstore-apps](https://www.kaggle.com/datasets/gauthamp10/apple-appstore-apps)\n- [https://jira.readthedocs.io/](https://jira.readthedocs.io/)\n- [https://documenter.getpostman.com/view/8765260/TzzHnDGw#00479c80-ae16-4bcd-90e9-96a9649b68d6](https://documenter.getpostman.com/view/8765260/TzzHnDGw#00479c80-ae16-4bcd-90e9-96a9649b68d6)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkenriortega%2Fdata_engineer_learning_path","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkenriortega%2Fdata_engineer_learning_path","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkenriortega%2Fdata_engineer_learning_path/lists"}