{"id":21455460,"url":"https://github.com/alvertogit/datascience","last_synced_at":"2025-07-05T01:33:39.783Z","repository":{"id":38386157,"uuid":"138083498","full_name":"alvertogit/datascience","owner":"alvertogit","description":"Data Science AI Artificial Intelligence Machine Deep Learning Classification Regression Python Keras TensorFlow TensorFlow2 TPOT XGBoost Matplotlib NumPy Pandas scikit-learn Folium Seaborn Jupyter Lab Notebook","archived":false,"fork":false,"pushed_at":"2024-11-17T09:52:04.000Z","size":4170,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-11-17T10:32:44.585Z","etag":null,"topics":["artificial-intelligence","data-science","deep-learning","jupyter-lab","jupyter-notebook","keras","machine-learning","matplotlib","pandas","python","scikit-learn","tensorflow","tensorflow2"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alvertogit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-20T20:39:23.000Z","updated_at":"2024-11-17T09:52:08.000Z","dependencies_parsed_at":"2024-03-17T08:31:43.765Z","dependency_job_id":"2cc06857-5c28-40bb-a069-168c87494e21","html_url":"https://github.com/alvertogit/datascience","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvertogit%2Fdatascience","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvertogit%2Fdatascience/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvertogit%2Fdatascience/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvertogit%2Fdatascience/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alvertogit","download_url":"https://codeload.github.com/alvertogit/datascience/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226002997,"owners_count":17558157,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","data-science","deep-learning","jupyter-lab","jupyter-notebook","keras","machine-learning","matplotlib","pandas","python","scikit-learn","tensorflow","tensorflow2"],"created_at":"2024-11-23T05:12:14.034Z","updated_at":"2025-07-05T01:33:39.775Z","avatar_url":"https://github.com/alvertogit.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DATA SCIENCE\n\nThis repository stores [Jupyter Notebooks] to demonstrate skills in [Data Science], [Artificial Intelligence], classification and regression problems with [Python], [Keras], [scikit-learn], [Matplotlib], [NumPy], [Pandas], [TPOT], [XGBoost], [Folium], [Seaborn] among others.\n\n## DEPENDENCIES\n\nThe code has been tested using:\n\n- [Python] (3.12): an interpreted high-level programming language for general-purpose programming.\n- [Jupyter Lab] (4.4): a web-based interactive development environment for [Jupyter Notebooks], code, and data.\n- [Keras] ([TensorFlow] built-in): a high-level neural networks [API], written in [Python] and capable of running on top of [TensorFlow], CNTK, or Theano.\n- [TensorFlow] (2.19): an open source [Deep Learning] library for high performance numerical computation using data flow graphs.\n- [Matplotlib] (3.10): a plotting library for [Python] and its numerical mathematics extension [NumPy].\n- [NumPy] (2.1): a library for [Python], adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.\n- [Pandas] (2.3):  an open source library providing high-performance, easy-to-use data structures and data analysis tools for [Python].\n- [scikit-learn] (1.6): a [machine learning] library for [Python]. It features various classification, regression and clustering algorithms including support vector machines, [random forest], [gradient boosting], k-means and DBSCAN.\n- [scikit-image] (0.25): a collection of algorithms for image processing with [Python].\n- [TPOT] (0.12): a [Python] Automated [Machine Learning] tool that optimizes [machine learning] pipelines using genetic programming.\n- [XGBoost] (3.0): an optimized distributed [gradient boosting] library designed to be highly efficient, flexible and portable.\n- [Folium] (0.20): an open source library to visualize data that has been manipulated in [Python] on an interactive [Leaflet.js] map.\n- [ipyleaflet] (0.20): a [Jupyter] / [Leaflet.js] bridge enabling interactive maps in [Jupyter Notebook].\n- [Seaborn] (0.13): a [Python] visualization library based on [Matplotlib]. It provides a high-level interface for drawing attractive statistical graphics.\n- [imbalanced-learn] (0.13): a [Python] package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance. It is compatible with [scikit-learn] and it allows [SMOTE (Synthetic Minority Over-sampling Technique)].\n- [joblib] (1.5): a set of tools to provide lightweight pipelining in [Python].\n- [findspark] (2.0): a package to make [Spark] Context available in [Jupyter Notebook].\n\nVirtual environment (\u003cenv_name\u003e=**.venv**) can be generated with  **requirements.txt** file found in main folder.\n\nCommand to configure virtual environment with [venv]:\n\n```bash\n~/datascience$ python3 -m venv .venv\n~/datascience$ source .venv/bin/activate\n(.venv)~/datascience$ python3 -m pip install pip==25.1.1\n(.venv)~/datascience$ python3 -m pip install setuptools==80.9.0\n(.venv)~/datascience$ python3 -m pip install -r requirements.txt\n(.venv)~/datascience$ pre-commit install\n```\n\n### HOW TO RUN NOTEBOOKS\n\nA good way to play with the [Jupyter Notebooks] is through [Jupyter Lab]. To run any of them use the command shown below:\n\n```bash\n(.venv)~/datascience$ jupyter lab \u003cnotebook_name\u003e.ipynb\n```\n\nIt might be also necessary to install locally [Graphviz] for rendering graph images with the command:\n\n```bash\n~/datascience$ sudo apt-get install graphviz\n```\n\nGraph image example of a decision tree is shown below.\n\n![Graph image example of a decision tree](images/tree_top3.png)\n\n## CREDITS\n\nauthor: alvertogit\ncopyright: 2018-2025\n\n[Data Science]: https://en.wikipedia.org/wiki/Data_science\n[Artificial Intelligence]: https://en.wikipedia.org/wiki/Artificial_intelligence\n[Python]: https://www.python.org/\n[Machine Learning]: https://en.wikipedia.org/wiki/Machine_learning\n[Deep Learning]: https://en.wikipedia.org/wiki/Deep_learning\n[Random Forest]: https://en.wikipedia.org/wiki/Random_forest\n[Gradient Boosting]: https://en.wikipedia.org/wiki/Gradient_boosting\n[API]: https://en.wikipedia.org/wiki/Application_programming_interface\n[Keras]: https://keras.io/\n[TensorFlow]: https://www.tensorflow.org/\n[Matplotlib]: https://matplotlib.org/\n[NumPy]: https://numpy.org/\n[Pandas]: https://pandas.pydata.org/\n[scikit-learn]: https://scikit-learn.org/stable/\n[scikit-image]: https://scikit-image.org/\n[TPOT]: https://github.com/EpistasisLab/tpot\n[XGBoost]: https://github.com/dmlc/xgboost\n[Folium]: https://github.com/python-visualization/folium\n[Leaflet.js]: https://leafletjs.com/\n[ipyleaflet]: https://github.com/jupyter-widgets/ipyleaflet\n[Seaborn]: http://seaborn.pydata.org/\n[imbalanced-learn]: https://github.com/scikit-learn-contrib/imbalanced-learn\n[SMOTE (Synthetic Minority Over-sampling Technique)]: https://jair.org/index.php/jair/article/view/10302\n[joblib]: https://pypi.org/project/joblib/\n[Jupyter]: https://jupyter.org/\n[Jupyter Lab]: https://jupyter.org/\n[Jupyter Notebook]: https://jupyter.org/\n[Jupyter Notebooks]: https://jupyter.org/\n[findspark]: https://github.com/minrk/findspark\n[Spark]: https://spark.apache.org/\n[venv]: https://docs.python.org/3/library/venv.html\n[Graphviz]: https://www.graphviz.org/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falvertogit%2Fdatascience","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falvertogit%2Fdatascience","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falvertogit%2Fdatascience/lists"}