{"id":23465522,"url":"https://github.com/agentds/data-science-list","last_synced_at":"2025-04-12T16:28:01.828Z","repository":{"id":165018127,"uuid":"193414321","full_name":"AgentDS/Data-Science-List","owner":"AgentDS","description":"Some blogs/problems on data science, machine learning I face in practice","archived":false,"fork":false,"pushed_at":"2023-05-14T02:57:31.000Z","size":212,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-16T03:18:48.538Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AgentDS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-06-24T01:34:36.000Z","updated_at":"2023-05-14T02:56:39.000Z","dependencies_parsed_at":null,"dependency_job_id":"bfdab265-ca7f-4ddd-93ff-99219b13fb65","html_url":"https://github.com/AgentDS/Data-Science-List","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AgentDS%2FData-Science-List","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AgentDS%2FData-Science-List/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AgentDS%2FData-Science-List/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AgentDS%2FData-Science-List/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AgentDS","download_url":"https://codeload.github.com/AgentDS/Data-Science-List/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248596073,"owners_count":21130632,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-24T11:29:42.825Z","updated_at":"2025-04-12T16:28:01.792Z","avatar_url":"https://github.com/AgentDS.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Analysis List\n\n[TOC]\n\n## Data Preprocessing\n\n### Data Imputation\n\n1. [Python Package: fancyimpute](\u003chttps://pypi.org/project/fancyimpute/\u003e)\n2. [Multiple Imputation by Chained Equations: What is it and how does it work?](\u003chttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/\u003e)\n3. [Imputing Missing Data with R:MICE package](\u003chttps://datascienceplus.com/imputing-missing-data-with-r-mice-package/\u003e)\n\n\n\n\n\n### Feature Scaling \u0026 Normalization\n\n1. [About Feature Scaling and Normalization: and the effect of standardization for machine learning algorithms](http://sebastianraschka.com/Articles/2014_about_feature_scaling.html)\n2. \n\n\n\n\n\n### Feature Engineering\n\n1. [Discover Feature Engineering, How to Engineer Features and How to Get Good at It](https://machinelearningmastery.com/discover-feature-engineering-how-to-engineer-features-and-how-to-get-good-at-it/)\n2. [使用Sklearn做单机特征工程](https://www.cnblogs.com/jasonfreak/p/5448385.html)\n3. \n\n\n\n### Data Visualization\n\n1. [How to Use t-SNE Effectively](https://distill.pub/2016/misread-tsne/)\n2. [Clustering on the output of t-SNE](https://stats.stackexchange.com/questions/263539/clustering-on-the-output-of-t-sne)\n\n\n\n\n\n## Classification Model\n\n### Hyperparameter Tuning\n\n1. [Hyperparameters in Deep Learning](https://towardsdatascience.com/hyperparameters-in-deep-learning-927f7b2084dd)\n2. [k-Fold Cross-Validating Neural Networks](https://chrisalbon.com/deep_learning/keras/k-fold_cross-validating_neural_networks/) (tensorflow)\n3. [Build a validation set with Keras](https://www.youtube.com/watch?v=dzoh8cfnvnI\u0026feature=youtu.be)\n4. [A guide to an efficient way to build neural network architectures- Part I: Hyper-parameter selection and tuning for Dense Networks using Hyperas on Fashion-MNIST](https://towardsdatascience.com/a-guide-to-an-efficient-way-to-build-neural-network-architectures-part-i-hyper-parameter-8129009f131b)\n5. [Neural Network Tuning with TensorFlow](https://medium.com/computer-car/neural-network-tuning-with-tensorflow-cc14a23f132c)\n6. [How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras](https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/)\n7. [SVM Parameter Tuning in Scikit Learn using GridSearchCV](https://medium.com/@aneesha/svm-parameter-tuning-in-scikit-learn-using-gridsearchcv-2413c02125a0)\n8. [Hyperparameter Tuning the Random Forest in Python](https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74)\n9. \n\n\n\n### Feature Importance\n\n1. [Explaining Feature Importance by example of a Random Forest](https://towardsdatascience.com/explaining-feature-importance-by-example-of-a-random-forest-d9166011959e?fbclid=IwAR3upeiEWH4ombBTZHsT9L1_bwixBnxUaN_k-KYcZo27yyuVlki9CITgCMk)\n2. \n\n\n\n\n\n\n\n\n\n##  Anomaly Detection\n\n1. [5 Ways to Detect Outliers/Anomalies That Every Data Scientist Should Know (Python Code)](https://towardsdatascience.com/5-ways-to-detect-outliers-that-every-data-scientist-should-know-python-code-70a54335a623)\n2. [Python Package: PyOD](\u003chttps://pyod.readthedocs.io/en/latest/\u003e)\n\n\n\n\n\n\n\n## Framework\n\n1. [Keras vs Pytorch for Deep Learning](https://towardsdatascience.com/keras-vs-pytorch-for-deep-learning-a013cb63870d)\n\n\n\n## Anaconda \u0026 Jupyter Notebook\n\n### Manage conda environment\n\nAssume currently we are in the default conda environment, named ``base``, and we want to create the ``env`` named ``ticketextract`` with ``python=3.6``, use command below in Terminal or Anaconda Prompt:\n\n```shell\nconda create -n ticketextract python=3.6 -y\n```\n\n\nAfter all installation done, to activate environment ``ticketextract`` in Terminal:\n\n```shell\nconda activate ticketextract\n```\n\nThen we are in the environment ``ticketextract``.\n\nTo deactivate environment ``ticketextract``:\n\n```shell\nconda deactivate\n```\n\nThen we are back into environment ``base`` again.\n\n\n\n### Add anaconda ``env`` to Jupyter Notebook kernel\n\nAssume currently we are in the default conda environment again, named ``base``, and we want to add the ``env`` in Jupyter Notebook kernel so we can use these environments in Jupyter Notebook.\n\nIn ``base``, install ``nb_conda_kernels``:\n\n```shell\n(base)$conda install nb_conda_kernels -y\n```\n\nIn ``base``, install ``jupyter_contrib_nbextensions`` for jupyter notebook extensions:\n\n```shell\n(base)$conda install -c conda-forge jupyter_contrib_nbextensions -y\n```\n\nNow activate the ``env``, assuming named as ``ticketextract``, you want to add to Jupyter kernel, and install ``ipykernel`` in this environment:\n\n```shell\n(ticketextract)$conda install ipykernel -y\n```\n\nNow deactivate the ``ticketextract`` with ``conda deactivate``, and go back to the ``base``, in which you want to open Jupyter Notebook, type below in Terminal to open Jupyter Notebook:\n\n```shell\n(base)$jupyter notebook\n```\n\nNow you can see the kernel shown:\n\n![kernel](./kernel.png)\n\nAnd here we can see that kernel of ``base`` is shown as ``root``.\n\n\u003e To conclude, install ``ipykernel`` in every environment you want to add to Jupyter kernel, and only install ``nb_conda_kernels`` in the environment you want to open Jupyter Notebook (usually it is ``base``)!!\n\n\n\n\n\n\n\n## Others\n\n1. [Here’s how you can get some free speed on your Python code with Numba](\u003chttps://towardsdatascience.com/heres-how-you-can-get-some-free-speed-on-your-python-code-with-numba-89fdc8249ef3\u003e)\n2. [Make your Pandas apply functions faster using Parallel Processing](\u003chttps://towardsdatascience.com/make-your-own-super-pandas-using-multiproc-1c04f41944a1\u003e)\n3. [GPU DataFrames: cudf, faster than pandas](https://github.com/rapidsai/cudf)\n4. [Debugging and Profiling](https://missing.csail.mit.edu/2020/debugging-profiling/): Time \u0026 Memory consumption for Python script\n\n\n\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagentds%2Fdata-science-list","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fagentds%2Fdata-science-list","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagentds%2Fdata-science-list/lists"}