{"id":18675362,"url":"https://github.com/manuparra/hadoop-statistics","last_synced_at":"2025-09-15T00:07:34.338Z","repository":{"id":79159261,"uuid":"69913071","full_name":"manuparra/hadoop-statistics","owner":"manuparra","description":"Calculate statistical measures of one column in big data Datasets with these simply Hadoop Application ","archived":false,"fork":false,"pushed_at":"2017-02-24T22:48:46.000Z","size":43,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-18T12:08:03.142Z","etag":null,"topics":["avg","bigdata","hadoop","java","massive-datasets","max","min","standardeviation"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/manuparra.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-10-03T21:37:49.000Z","updated_at":"2017-06-27T22:08:45.000Z","dependencies_parsed_at":"2023-02-25T13:00:48.562Z","dependency_job_id":null,"html_url":"https://github.com/manuparra/hadoop-statistics","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/manuparra/hadoop-statistics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manuparra%2Fhadoop-statistics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manuparra%2Fhadoop-statistics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manuparra%2Fhadoop-statistics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manuparra%2Fhadoop-statistics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/manuparra","download_url":"https://codeload.github.com/manuparra/hadoop-statistics/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manuparra%2Fhadoop-statistics/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275185399,"owners_count":25419919,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-14T02:00:10.474Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["avg","bigdata","hadoop","java","massive-datasets","max","min","standardeviation"],"created_at":"2024-11-07T09:24:38.718Z","updated_at":"2025-09-15T00:07:34.312Z","avatar_url":"https://github.com/manuparra.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Basic statistical measures of massive datasets with APACHE HADOOP\n\nCalculate MIN, MAX, AVG and STDDEV of one column on big data datasets with these Hadoop Applications.\n\nThese applications are the resolution of the exercises for the practices of the course of BigData within the Master in Science of Data of the University of Granada (March 2016). \n\nThe practical master class has been done by Francisco J. Baldán and Manuel J. Parra.\n\n\nIndex of measures:\n\n- [MIN](#min-of-a-column)\n- [MAX](#max-of-a-column)\n- [AVG](#avg-of-a-column)\n- [STD DEV](#std-dev-of-a-column)\n\n\n\n\n## MIN of a column\n\nCode: [HADOOP MIN](./hadoop-min/)\n\n### How to compile and execute:\n\n```\n# Compile\njavac -cp /usr/lib/hadoop/*:/usr/lib/hadoop-0.20-mapreduce/* -d min_classes Min.java MinMapper.java MinReducer.\n\n# Create JAR\n\njar -cvf min.jar -C min_classes / .\n\n# Execute \nhadoop jar min.jar oldapi.Min ./sample1.txt ./min/output_2/\n```\n\n\n## MAX of a column\n\nCode: [HADOOP MAX](./hadoop-max/)\n\n### How to compile and execute:\n\n```\n# Compile\njavac -cp /usr/lib/hadoop/*:/usr/lib/hadoop-0.20-mapreduce/* -d max_classes Max.java MaxMapper.java MaxReducer.\n\n# Create JAR\n\njar -cvf max.jar -C max_classes / .\n\n# Execute\nhadoop jar max.jar oldapi.Max ./sample1.txt ./min/output_2/\n```\n\n## AVG of a column:\n\nCode: [HADOOP AVG](./hadoop-avg/)\n\n### How to compile and execute:\n\n```\n# Compile\njavac -cp /usr/lib/hadoop/*:/usr/lib/hadoop-0.20-mapreduce/* -d avg_classes Avg.java AvgMapper.java AvgReducer.\n\n# Create JAR\n\njar -cvf avg.jar -C avg_classes / .\n\n# Execute\nhadoop jar avg.jar oldapi.Avg ./sample1.txt ./avg/output_2/\n```\n\n\n\n## STD DEV of a column:\n\nCode: [HADOOP STD DEV](./hadoop-stddev/)\n\n### How to compile and execute:\n\n```\n# Compile\njavac -cp /usr/lib/hadoop/*:/usr/lib/hadoop-0.20-mapreduce/* -d desv_classes Desv.java DesvMapper.java DesvReducer.\n\n# Create JAR\n\njar -cvf desv.jar -C desv_classes / .\n\n# Execute\nhadoop jar desv.jar oldapi.Desv ./sample1.txt ./desv/output_2/\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanuparra%2Fhadoop-statistics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmanuparra%2Fhadoop-statistics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanuparra%2Fhadoop-statistics/lists"}