{"id":18869350,"url":"https://github.com/codersales/machine-learning-project","last_synced_at":"2026-03-02T23:32:34.394Z","repository":{"id":137039970,"uuid":"590837436","full_name":"CoderSales/machine-learning-project","owner":"CoderSales","description":"Machine Learning Project","archived":false,"fork":false,"pushed_at":"2024-11-24T22:31:39.000Z","size":9068,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-30T23:33:00.588Z","etag":null,"topics":["code","data-science","filtered-4","jupyter","last-quarter","machine-learning","not-this-month","numpy","over-100-commits","pandas","pseudo-current","python","repository-6","scikit-learn","sub-critical","supervised","supervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CoderSales.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-19T10:31:02.000Z","updated_at":"2024-11-24T22:31:42.000Z","dependencies_parsed_at":"2024-11-24T06:27:51.551Z","dependency_job_id":null,"html_url":"https://github.com/CoderSales/machine-learning-project","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CoderSales%2Fmachine-learning-project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CoderSales%2Fmachine-learning-project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CoderSales%2Fmachine-learning-project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CoderSales%2Fmachine-learning-project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CoderSales","download_url":"https://codeload.github.com/CoderSales/machine-learning-project/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239816510,"owners_count":19701753,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code","data-science","filtered-4","jupyter","last-quarter","machine-learning","not-this-month","numpy","over-100-commits","pandas","pseudo-current","python","repository-6","scikit-learn","sub-critical","supervised","supervised-learning"],"created_at":"2024-11-08T05:16:31.305Z","updated_at":"2026-02-14T06:30:17.728Z","avatar_url":"https://github.com/CoderSales.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# machine-learning-project\n\n# primary source for this README: jupyter-6-Supervised-Learning\nRepository for running jupyter notebooks and keeping relevant files in one place\n\n# notes made for previous plan to remove null values\ncheck how to remove null values from dataframe\n\n## notes\npandas \n.iloc() - locate by row, col indices\n.loc() - locate by row index and col NAME\n\n## Data Cleaning\n### 2.13 Lecture\ndf.drop('Column name', axis=1)\n    - where axies = 0 for rows, 1 for columns\n    - drops referenced column from data frame\n    - inplace=True argument to ensure column stays dropped.\ndf.drop(1,axis=0).reset_index()\n    - new col with old indices\ndf.drop(1,axis=0).reset_index(drop=True,inplace=True)\n\ndf.copy\n\n### 4.1 Lecture Data Sanity Checks - Part 1\ndf['columnname'].apply(type).value_counts()\n    - this looks at and notes the values by type and then counts them\n\ndf['colname'] = df['colname'].replace('missing','inf'],np.nan)\n    - replaces our specified strings 'missing' and 'inf' \n    -  with np.nan\n\ndf['colname'] = df['colname'].astype(float)\n    - convert values to float\n\nReview note: when we substitute np.nan in for strings the resulting data type is (if all the other entries are say float) float.\n\ndf.info()\n    - rerunning this after data cleaning may result in cleaned columns type changing to, say, float.\n\nCheck length of each column \nColumns shorter than max col length means missing values as empty cells\n\n#### Alternative approach - clean while loading:\n##### using na_values to tell python which values it should consider as NaN\ndata_new = pd.read_csv('/content/drive/MyDrive/Python Course/Melbourne_Housing.csv',na_values=['missing','inf'])\n- on load, above line automatically converts all missing and inf to nan so, running:\ndata_new['BuildingArea'].dtype\n- gives \ndtype('float64')\nas only float (and nan which seems to be treated as whatever the rest of the data types are)\n\n#### Review note\ndata['BuildingArea'].unique()\n- above line run before cleaning gives unique values in column as a numpy array\n- so can inspect to find out which strings to remove.\n# setup steps\npython3 -m venv .venv\n    - in bash\n    - and on Windows\nsource .venv/bin/activate\n    - in bash\nsource .venv/Scripts/activate\n    - on Windows\n/workspace/jupyter-6/.venv/bin/python -m pip install --upgrade pip\n    - in GitPod\npython3 -m pip install --upgrade pip\n    - on Windows\n\npip install --upgrade pip\npip install jupyter notebook\npip install matplotlib\npip install pandas\npip install seaborn\npip install numpy\npip install scipy\npip install statsmodels\npip install -U scikit-learn\npip install ipykernel\npip install nb-black\n\n\nCtrl Shift P\nCreate New Jupyter Notebook\nSave and name notebook\nPaste in necessary code\n\nCtrl Shift P\nPython: Select Interpreter\nuse Python version in ./.venv/bin/python\n\npip freeze \u003e requirements.txt\n\npip install -r requirements.txt\n\n## Add required files\nauto-mpg.csv\n## Extensions\nExtension: Excel Viewer\n    - for  viewing csv files in VSCode\n\n## Debug\n### prelim\nper above\nPython:Select Interpreter\n3.10.9 (.venv)\n### ipykernel bug\nafter running\npip install ipykernel\non running LinearRegression_HandsOn-1.ipynb\nmessage appears saying:\nit is necessary to install ipykernel\nOK\ninstalling ipykernel\nRerun\nLinearRegression_HandsOn-1.ipynb\n\n### pandas bug\nafter running\npip install pandas \npandas not found\n\n### Fix for previious 2 bugs\ncreate new jupyter notebook using \nCtrl Shift P\nCreate New Jupyter Notebook\n# References\n## previous repositories\njupyter-test\njupyter-repo-2\njupyter-3\n- [Coder731/jupyter-5](https://github.com/Coder731/jupyter-5)\n- [Coder731/jupyter-6-Supervised-Learning](https://github.com/Coder731/jupyter-6-Supervised-Learning)\n\n\n# References Part2 / (MyGreatLearning, Colab, modules)\n#### MyGreatLearning\n##### pre scikit-learn\n- [LMS - Hands_on_Notebook_Week3.ipynb](https://www.mygreatlearning.com/)\n- [LMS - ENews_Express_Learner_Notebook%5BLow_Code_Version%5D.ipynb](https://www.mygreatlearning.com/)\n- [LMS - abtest.csv](https://www.mygreatlearning.com/)\n- [2.13 Pandas - Accessing and Modifying DataFrames (condition-based indexing)](https://www.mygreatlearning.com/)\n#### scikit-learn\n- [Supervised Learning - Foundations / Week 1 - Lecture Video Materials](https://www.mygreatlearning.com/)\n    - [auto-mpg.csv used in 1.9 Linear Regression Hands-on](https://www.mygreatlearning.com/)\n\n#### Colab\n- Google Colab [mount drive](https://colab.research.google.com/notebooks/io.ipynb#scrollTo=RWSJpsyKqHjH)\n\n#### modules\n##### matplotlib\n###### matplotlib figure dimentions\n- [Set plot dimensions matplotlib](https://stackoverflow.com/questions/332289/how-do-i-change-the-size-of-figures-drawn-with-matplotlib)\n\n##### scipy\n- [scipy - check version](https://blog.finxter.com/how-to-check-scipy-package-version-in-python/)\n\n\n# References Part3 / (StackOverflow, Git, Tutorials and Repositories)\n## StackOverflow\nhttps://stackoverflow.com/questions/46419607/how-to-automatically-install-required-packages-from-a-python-script-as-necessary\n\n## Git\n### Gitpod\n- [Gitpod docs prebuilds](https://www.gitpod.io/docs/configure/projects/prebuilds)\n- [Gitpod docs workspaces](https://www.gitpod.io/docs/configure/workspaces/tasks)\n- [Gitpod Prebuild](https://youtu.be/ZtlJ0PakUHQ?t=54)\n### Git in VSCode\n- [Git source control in VS Code](https://code.visualstudio.com/docs/sourcecontrol/overview)\n\n## Tutorials and Repositories\n\n# References Part4 / (environments, Packages, and Statistics)\n## environments\n### local\n- [Getting Full Directory Path in Python](https://www.youtube.com/watch?v=DQRSvg54bhM\u0026ab_channel=Analyst%27sCorner)\n\nWindows\nAnaconda\nconda create --name .cenv\ny\nconda activate .cenv\n\npython3\n\nnot installed so Windows store opens\ninstall Python 3.10\n\n#### conda\n##### virtual environment\n- [conda.io](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html)\n\n#### python environment\n`python3 -m venv .venv`\ncommand was slow at first  but self-resolved\n- search string: stuck on $ python3 -m venv .venv [setting up environment in virtaulenv using python3 stuck on ...](https://discuss.dizzycoding.com/setting-up-environment-in-virtaulenv-using-python3-stuck-on-setuptools-pip-wheel/)\n- search string: installing collected packages stuck [why is the pip install process stuck on ''Installing collected packages\" step?](https://stackoverflow.com/questions/54699197/why-is-the-pip-install-process-stuck-on-installing-collected-packages-step)\n\n## Packages\n### NumPy\n### Pandas\n### matplotlib\n### scipy\n### scipy.stats\n### statsmodels\n- [statsmodels.stats.proportion.proportions_ztest](https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_ztest.html)\n### scikit-learn\n#### Documentation\n- [search string: sklearn](https://www.google.com/search?q=sklearn\u0026oq=sklearn\u0026gs_lcrp=EgZjaHJvbWUqBggAEEUYOzIGCAAQRRg7MgYIARAjGCcyBggCEAAYQzIGCAMQABhDMgYIBBAAGEMyBggFEAAYQzIGCAYQRRg8MgYIBxBFGDzSAQc3MzVqMGo3qAIAsAIA\u0026sourceid=chrome\u0026ie=UTF-8)\n- [scikit-learn | Machine Learning in Python](https://scikit-learn.org/stable/)\n- [Getting Started -- skikit-learn](https://scikit-learn.org/stable/getting_started.html)\n- [Citing scikit-learn](https://scikit-learn.org/stable/about.html#citing-scikit-learn)\n- [User Guide](https://scikit-learn.org/stable/user_guide.html#user-guide)\n- [Installing scikit-learn](https://scikit-learn.org/stable/install.html)\n- Scikit-learn: Machine Learning in Python [Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.](https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html)\n- redirects to https://scikit-learn.org/stable/ (link 2 in this section, above) [Source code, binaries, and documentation](http://scikit-learn.sourceforge.net)\n### ipykernel\n- [search string: ipykernel](https://www.google.com/search?q=ipykernel\u0026oq=ipykernel\u0026gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIHCAEQABiABDIHCAIQABiABDIHCAMQABiABDIHCAQQABiABDIHCAUQABiABDIHCAYQABiABDIHCAcQABiABDIMCAgQABgUGIcCGIAEMgcICRAAGIAE0gEHNDUzajBqN6gCALACAA\u0026sourceid=chrome\u0026ie=UTF-8)\n- pip install ipykernel [ipykernel 6.19.2](https://pypi.org/project/ipykernel/)\n## Statistics\n\n## pandas print statement\n- [turn off automatic pandas data type output on print statment](https://stackoverflow.com/questions/29645153/remove-name-dtype-from-pandas-output-of-dataframe-or-series)\n\n## storing variables\n### naming arbitrary number of variables\n- [used for first attempt at naming arbitrary number of variables](https://stackoverflow.com/questions/48372808/create-an-unknown-number-of-programmatically-defined-variables)\n- [second attempt at naming arbitrary number of variables](https://pythonprinciples.com/ask/how-do-you-create-a-variable-number-of-variables/)\n\n### append\n- [.append()](https://realpython.com/python-append/#:~:text=Python%20provides%20a%20method%20called,list%20using%20a%20for%20loop.)\n\n## pass multiple variables into string\n- [pass multiple variables into string](https://stackoverflow.com/questions/10112614/how-do-i-create-a-multiline-python-string-with-inline-variables)\n\n## multiline string python\n- [Python Multiline Strings](https://www.w3schools.com/python/gloss_python_multi_line_strings.asp)\n\n## How do you add value to a key in Python?\n- 'a':'0' [How do you add value to a key in Python?](https://www.mygreatlearning.com/blog/python-dictionary-append/#:~:text=How%20do%20you%20add%20value,new%20values%20to%20the%20keys.)\n\n## pass variable into string variable \n- [pass variable into string variable](https://matthew-brett.github.io/teaching/string_formatting.html)\n\n## turn off pandas index output\n- [Remove name, dtype from pandas output of dataframe or series](https://stackoverflow.com/questions/29645153/remove-name-dtype-from-pandas-output-of-dataframe-or-series)\n- [2ndary source for turning off index on pandas dataframe print out](https://stackoverflow.com/questions/24644656/how-to-print-pandas-dataframe-without-index)\n\n## concatenate\n- [concatenate with +](https://www.digitalocean.com/community/tutorials/python-string-concatenation)\n\n## String into variable\n- [String Into Variable Name in Python Using the vars() Function](https://www.pythonforbeginners.com/basics/convert-string-to-variable-name-in-python#:~:text=is%20pythonforbeginners.com-,String%20Into%20Variable%20Name%20in%20Python%20Using%20the%20vars(),like%20the%20globals()%20function.)\n- [Convert string to variable name in python [duplicate]](https://stackoverflow.com/questions/19122345/convert-string-to-variable-name-in-python)\n\n- option used [Python Template String Formatting Method](https://towardsdatascience.com/python-template-string-formatting-method-df282510a87a)\n\n## .update() a dictionary\n- [Python dictionary append: How to add Key-value Pair?](https://www.mygreatlearning.com/blog/python-dictionary-append/)\n- [Python Dictionary update() Method](https://www.w3schools.com/python/ref_dictionary_update.asp)\n\n## print separate with no spaces\n- [Print without space in python 3](https://stackoverflow.com/questions/12700558/print-without-space-in-python-3)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodersales%2Fmachine-learning-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodersales%2Fmachine-learning-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodersales%2Fmachine-learning-project/lists"}