{"id":14066713,"url":"https://github.com/tuanle618/AEDA","last_synced_at":"2025-07-29T23:32:05.768Z","repository":{"id":129611239,"uuid":"114379124","full_name":"tuanle618/AEDA","owner":"tuanle618","description":"AEDA - Automated Data Exploratory Analysis in R","archived":false,"fork":false,"pushed_at":"2018-04-13T14:06:48.000Z","size":3301,"stargazers_count":11,"open_issues_count":3,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-12-04T06:36:44.889Z","etag":null,"topics":["data-science","eda","eda-report","exploratory-data-analysis","r"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tuanle618.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-12-15T14:27:46.000Z","updated_at":"2024-11-13T03:15:20.000Z","dependencies_parsed_at":"2023-04-19T13:31:46.150Z","dependency_job_id":null,"html_url":"https://github.com/tuanle618/AEDA","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tuanle618/AEDA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuanle618%2FAEDA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuanle618%2FAEDA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuanle618%2FAEDA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuanle618%2FAEDA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tuanle618","download_url":"https://codeload.github.com/tuanle618/AEDA/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuanle618%2FAEDA/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267780018,"owners_count":24143201,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","eda","eda-report","exploratory-data-analysis","r"],"created_at":"2024-08-13T07:05:13.709Z","updated_at":"2025-07-29T23:32:04.902Z","avatar_url":"https://github.com/tuanle618.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"# ![AEDA](https://github.com/ptl93/AEDA/blob/master/man/images/AEDA_logo.png) Automated Exploratory Data Analysis in R\n***\n\n[![Build Status](https://travis-ci.org/ptl93/AEDA.svg?branch=master)](https://travis-ci.org/ptl93/AEDA)\n## Description\n\nWriting exploratory data analysis (EDA) scripts helps in extracting valuable information from the data but can be very time consuming. Often people are producing the same tables and figures again and again which could be automatized with EDA scripts. \u003cbr\u003e\n\u003cbr\u003e\nThis package should help automating the process of creating an EDA report by providing functions which, one would normally script for each data type of dataset. Therefore we provide following functionalites:\n\n1. **Basic Data Summary**\n2. **Categorical Data Summary**\n3. **Numeric Data Summary**\n4. **Corellation Analysis**\n5. **Cluster Analysis**\n6. **Principal Component Analysis**\n7. **Multidimensional Scaling Analysis**\n8. **Exploratory Factor Analysis**\n\n## Installation\n```R\n#Install the development version from GitHub without our vignette:\ndevtools::install_github(\"ptl93/AEDA\")\n\n# Or install the development version from GitHub with our vignette:\ndevtools::install_github(\"jimhester/lintr\") #this is needed to compile the vignette\ndevtools::install_github(\"ptl93/AEDA\", build_vignettes = TRUE)\n```\n\n## Examples\nIn the following 2 sub sections we will show how you can simply conduct a exploratory data analysis.  \nIn general we provide the functionalites/subreports 1-8 listed above, which the \"fast\"-version calls with its default parameter arguments provided by us.  \nThe second version gives you, as user of the package, the freedom to choose between different methods, e.g in cluster analysis, instead of the default k-means algorithm, you might want to choose a hierarchical clustering for your data set, etc.\n\n### Create a fast report\n\n#### `fastReport()`\nWith the `fastReport()` function you can create a full EDA report for a data set stored in your current R environment with two lines of code.  \n```r \n#load library\nlibrary(AEDA)\ndata(\"survey\", package = \"MASS\")\nfastReport(data = survey, target = \"Exer\")\n```\nAfter executing the last line, you should see a **MainReport.rmd** file in your current directory and a subdirectory **Data_Report/** which has all subreport rmd files and analysis result stored as .rds files :\n\u003cimg src=\"https://github.com/ptl93/AEDA/blob/master/man/tutorial/0_childRMDs.PNG\" width=\"500\" height=\"400\" /\u003e\n\nIn order to render the final EDA HTML-report simply run `rmarkdown::render(\"MainReport.rmd\")` or open the MainReport.rmd file and hit the knitr button, if you use RStudio.\n\n#### `openMLReport()`\nWith the function `openMLReport()` you can create a full EDA Report for a data set stored in the [openML Database](https://www.openml.org/search?type=data). The approach is similar to the `fastReport()` call above.\n```r \nopenMLReport(data.id = 61L) \n#data.id = 61L is the iris data set in the openML database\nrmarkdown::render(\"MainReport.rmd\")\n```\n\n### Create a customized `AEDA`-Report\nIn order to select different methods for each report we provide the user the possibility to choose between several methods and algorithms. In general, to conduct a `AEDA`-Pipeline for each step 3 functions need to be called (except for the basic data summary):\n\n* `my.task = make*Task()`\n* `my.analysis = make*Analysis()`\n* `my.report = makeReport(my.analysis)`\n\nOr since these three multiple function calls do not provide much additional\nfunctionality, if the user does not modify the parameters for a few reports in the `AEDA`-Pipeline, there is a shortcut to get a report:\n* `my.report = create*Report()`\n\n\nIn the following code chunk we will show you how to modify the automated exploratory data analysis:\n\n### Create a customized report using `AEDA`-Pipeline\n\n``` r\n#load library\nlibrary(AEDA)\ndata(\"survey\", package = \"MASS\")\n\n#get data types\ndata.types = getDataType(data = survey, target = \"Exer\")\nprint(data.types)\n#there are integer, numerical and categorical columns in the dataset.\n\n######### AEDA Pipeline: long version #########\n##### This pipeline should be conducted, if the user wants to modify parameters for the analysis reports\n\n###1 - Basic Report\n#create task\nbasic.report.task = makeBasicReportTask(id = \"students.survey\", data = survey, target = \"Exer\")\n#create report\nbasic.report = makeReport(basic.report.task)\n\n\n###2 - Categorical Data Summary\n#create task\ncat.sum.task = makeCatSumTask(id = \"students.survey\", data = survey, target = \"Exer\",\n  position = \"stack\")\n#compute analysis\ncat.sum = makeCatSum(cat.sum.task)\n#create report\ncat.sum.report = makeReport(cat.sum)\n\n\n###3 - Numeric Data Summary\n#create task\nnum.sum.task = makeNumSumTask(id = \"students.survey\", data = survey, target = \"Exer\",\n  geom.hist.args = list(bins = 20L, alpha = 0.8))\n#compute analysis\nnum.sum = makeNumSum(num.sum.task)\n#create report\nnum.sum.report = makeReport(num.sum)\n\n\n###4 - Correlation Analysis\n#Since we do not modify the default paramters we will call the shortcut version:\ncorr.report = createCorrReport(data = survey)\n\n\n###5 - Cluster Analysis\n#create task\ncluster.task = makeClusterTask(id = \"students.survey\", data = survey,\n  method = \"cluster.h\", par.vals = list(method = \"average\"))\n#compute analysis\ncluster.analysis = makeClusterAnalysis(cluster.task)\n#create the report\ncluster.report = makeReport(cluster.analysis)\n\n\n###6 - Principal Component Analysis\n#create task\npca.task = makePCATask(id = \"students.survey\", data = survey, target = \"Exer\",\n  center = TRUE)\n#compute analysis\npca.result = makePCA(pca.task)\n#create report\npca.report = makeReport(pca.result)\n\n\n##7 - Multidimensional Scaling Analysis\n#create task\nmds.task = makeMDSTask(id = \"students.survey\", data = survey,\n  method = \"isoMDS\", par.vals = list(maxit = 100L))\n#compute analysis\nmds.result = makeMDSAnalysis(mds.task)\n#create report\nmds.report = makeReport(mds.result)\n\n\n###8 - Exploratory Factor Analysis\n#create task\nfa.task = makeFATask(id = \"students.survey\", data = survey,\n  rotate = \"varimax\", par.vals = list(max.iter = 20L))\n#compute analysis\nfa.result = makeFA(fa.task)\n#create report\nfa.report = makeReport(fa.result)\n\n\n###9 - create the HTML-report\nfinishReport(basic.report, cat.sum.report, num.sum.report, corr.report,\n  cluster.report, pca.report, mds.report, fa.report)\n  \n  \n###10 - render the final HTML-report\nrmarkdown::render(\"MainReport.rmd\")\n``` \n\nFor more information, you can check out our [Wiki](https://github.com/ptl93/AEDA/wiki).\n\n## Contributing\nWe are happy about feedback and contributions from you in order to improve this package.\n\n### Issue\nIf you believe that our package lacks several analysis steps or should enhance more methods/algorithms for each report, do not hesitate and let us know by opening a [new issue](https://github.com/ptl93/AEDA/issues).\n\n### Contribute\nIn case you want to contribute please go after our styleguide. We are following the styleguide from [mlr](https://github.com/rdatsci/PackagesInfo/wiki/R-Style-Guide). In general, we follow the \"fork-and-pull\" Git workflow.\n\n1. **Fork** the repo on GitHub\n2. **Clone** the project to your own machine\n3. **Commit** changes to your own branch\n4. **Push** your work back up to your fork\n5. **Submit** a Pull request so that we can review your changes\n\nNOTE: Be sure to merge the latest from \"upstream\" before making a pull request!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftuanle618%2FAEDA","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftuanle618%2FAEDA","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftuanle618%2FAEDA/lists"}