{"id":20392745,"url":"https://github.com/baptvit/big_data","last_synced_at":"2026-04-29T20:35:04.896Z","repository":{"id":130241066,"uuid":"177421088","full_name":"baptvit/Big_Data","owner":"baptvit","description":"My courses and activities in Big Data","archived":false,"fork":false,"pushed_at":"2022-08-11T17:53:17.000Z","size":7599,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-10-25T03:34:36.611Z","etag":null,"topics":["big-data","hadoop","hbase","hive","kafka","mapreduce","oozie","pig","python3","scala","spark","zookeeper"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/baptvit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-03-24T13:50:01.000Z","updated_at":"2022-08-11T17:53:20.000Z","dependencies_parsed_at":"2023-05-03T03:15:33.549Z","dependency_job_id":null,"html_url":"https://github.com/baptvit/Big_Data","commit_stats":null,"previous_names":["baptvit/big_data"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/baptvit/Big_Data","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baptvit%2FBig_Data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baptvit%2FBig_Data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baptvit%2FBig_Data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baptvit%2FBig_Data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/baptvit","download_url":"https://codeload.github.com/baptvit/Big_Data/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baptvit%2FBig_Data/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32443564,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T20:22:27.477Z","status":"ssl_error","status_checked_at":"2026-04-29T20:22:26.507Z","response_time":110,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","hadoop","hbase","hive","kafka","mapreduce","oozie","pig","python3","scala","spark","zookeeper"],"created_at":"2024-11-15T03:45:23.221Z","updated_at":"2026-04-29T20:35:04.878Z","avatar_url":"https://github.com/baptvit.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Learning Big Data\n# Resource attributes\n\nSince resources across the internet vary in terms of their pre-requisites and general accessibility, it is useful to\ngive attributes to them so that it is easy to understand where a resource fits into the wider machine learning scope. Below is a few suggested attributes (please extend):\n \n - :blue_book: = Doing\n - :heavy_check_mark: = Completed\n - :rainbow: = creative\n - :bowtie: = beginner\n - :sweat_smile: = intermediate, some pre-requisites\n - :godmode: = advanced, many pre-requisites\n\n\n#### Tools Used\nHadoop, Hive, HBase, ZooKeeper, Oozie, Sorl, Kafka, Pig, MapReduce, YARN, Spark, Scala and Python.\n\n### Accelerated Learning Techniques\n- Watch videos at 2x or 3x speed using a browser extension\n- Handwrite notes as you watch for memory retention\n- Immerse yourself in the [community](https://medium.com/@exastax/top-20-data-science-blogs-and-websites-for-data-scientists-d88b7d99740)\n\n# Real-World Tools\n\n## Big Data Fundamentals\n- #### [Big Data Fundamentals](https://cognitiveclass.ai/learn/big-data/) [RESULTS](https://github.com/helpthx/Big_Data/tree/master/Big%20Data%20Fundamentals):heavy_check_mark:\n\t- [Big Data 101](https://cognitiveclass.ai/courses/what-is-big-data/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Big%20Data%20Fundamentals/Cognitive%20Class%20BD0101EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\t- [Hadoop 101](https://cognitiveclass.ai/courses/introduction-to-hadoop/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Big%20Data%20Fundamentals/BigDataUniversity%20BD0111EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\t- [Spark Fundamentals I](https://cognitiveclass.ai/courses/what-is-spark/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Big%20Data%20Fundamentals/Cognitive%20Class%20BD0211EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\n- Big Data Fundamentos 2.0 from Data Science Academy [RESULTS](https://github.com/helpthx/Big_Data/blob/master/certificate-big-data-fundamentos-20.pdf) :heavy_check_mark:\n\n## Hadoop\n- #### [Hadoop Fundamentals](https://cognitiveclass.ai/learn/hadoop/) [RESULTS](https://github.com/helpthx/Big_Data/tree/master/Hadoop%20Fundamentals) :heavy_check_mark:\n \t- [Hadoop 101](https://cognitiveclass.ai/courses/introduction-to-hadoop/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Big%20Data%20Fundamentals/BigDataUniversity%20BD0111EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\t- [MapReduce and YARN](https://cognitiveclass.ai/courses/mapreduce-and-yarn/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Hadoop%20Fundamentals/Big%20Data%20University%20BD0115EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\t- [Moving Data into Hadoop](https://cognitiveclass.ai/courses/flume-sqoop-moving-data-into-hadoop/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Hadoop%20Fundamentals/Big%20Data%20University%20BD0131EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\t- [Accessing Hadoop Data Using Hive](https://cognitiveclass.ai/courses/hadoop-hive/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Hadoop%20Fundamentals/Big%20Data%20University%20BD0141EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n- #### [Hadoop Programming](https://cognitiveclass.ai/learn/big-data-hadoop-programming/) [RESULTS](https://github.com/helpthx/Big_Data/tree/master/Hadoop%20Programming) :heavy_check_mark:\n\t-  [MapReduce and YARN](https://cognitiveclass.ai/courses/mapreduce-and-yarn/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Hadoop%20Fundamentals/Big%20Data%20University%20BD0115EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\t- [Apache Pig 101](https://cognitiveclass.ai/courses/introduction-to-pig/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Hadoop%20Programming/Big%20Data%20University%20BD0121EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\t- [Simplifying Data Pipelines with Apache Kafka](https://cognitiveclass.ai/courses/simplifyingdatapipelines/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Hadoop%20Programming/Big%20Data%20University%20BD0123EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\n- #### [Hadoop Administration](https://cognitiveclass.ai/learn/hadoop-administration/) [RESULTS](https://github.com/helpthx/Big_Data/tree/master/Hadoop%20Administration) :heavy_check_mark:\n\t- [Moving Data into Hadoop](https://cognitiveclass.ai/courses/flume-sqoop-moving-data-into-hadoop/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Hadoop%20Fundamentals/Big%20Data%20University%20BD0131EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\t- [Controlling Hadoop Jobs Using Oozie](https://cognitiveclass.ai/courses/controlling-hadoop-jobs-using-oozie/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Hadoop%20Administration/Big%20Data%20University%20BD0133EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\t- [Developing Distributed Applications Using ZooKeeper](https://cognitiveclass.ai/courses/developing-distributed-applications-using-zookeeper/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Hadoop%20Administration/Big%20Data%20University%20BD0135EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\t- [Solr 101](https://cognitiveclass.ai/courses/introduction-to-solr/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Hadoop%20Administration/Big%20Data%20University%20BD0137EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\n- #### [Hadoop Data Access](https://cognitiveclass.ai/learn/big-data-storage-and-retrieval/) [RESULTS](https://github.com/helpthx/Big_Data/tree/master/Hadoop%20Data%20Access) :heavy_check_mark:\n\t- [Accessing Hadoop Data Using Hive](https://cognitiveclass.ai/courses/hadoop-hive/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Hadoop%20Fundamentals/Big%20Data%20University%20BD0141EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\t- [Using HBase for Real-time Access to your Big Data](https://cognitiveclass.ai/courses/using-hbase-for-real-time-access-to-your-big-data/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Hadoop%20Data%20Access/Big%20Data%20University%20BD0143EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\t- [SQL Access for Hadoop](https://cognitiveclass.ai/courses/sql-access-for-hadoop/) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Hadoop%20Data%20Access/Big%20Data%20University%20BD0145EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\n- #### [Intro to Hadoop and MapReduce]( https://www.udacity.com/course/intro-to-hadoop-and-mapreduce--ud617) \n\n## Scala\n- #### [Scala Programming for Data Science](https://cognitiveclass.ai/learn/scala) \n\t- [Scala 101](https://courses.cognitiveclass.ai/courses/course-v1:BigDataUniversity+SC0101EN+2016/info) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Scala%20Programming%20for%20Data%20Science/Scala%20101/Module%201:%20Introduction/Big%20Data%20University%20SC0101EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\t- [Spark Overview for Scala Analytics](https://courses.cognitiveclass.ai/courses/course-v1:BigDataUniversity+SC0103EN+2016/info) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Scala%20Programming%20for%20Data%20Science/Spark%20Overview%20for%20Scala%20Analytics/Big%20Data%20University%20SC0103EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\t- [Data Science for Scala](https://cognitiveclass.ai/courses/data-science-scala) [RESULTS](https://github.com/helpthx/Big_Data/blob/master/Scala%20Programming%20for%20Data%20Science/Data%20Science%20with%20Scala/Lightbend%20SC0105EN%20Certificate%20_%20Cognitive%20Class.pdf) :heavy_check_mark:\n\n## Data Storytelling\n- Edx https://www.edx.org/course/analytics-storytelling-impact-1\n\n## Spark\n- [Spark Workshop PDF](https://stanford.edu/~rezab/sparkclass/slides/itas_workshop.pdf )\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbaptvit%2Fbig_data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbaptvit%2Fbig_data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbaptvit%2Fbig_data/lists"}