{"id":15028052,"url":"https://github.com/spandan-madan/deeplearningproject","last_synced_at":"2025-05-14T22:08:38.972Z","repository":{"id":39916255,"uuid":"97027254","full_name":"Spandan-Madan/DeepLearningProject","owner":"Spandan-Madan","description":"An in-depth machine learning tutorial introducing readers to a whole machine learning pipeline from scratch.","archived":false,"fork":false,"pushed_at":"2023-01-15T06:31:21.000Z","size":2519,"stargazers_count":4760,"open_issues_count":6,"forks_count":635,"subscribers_count":199,"default_branch":"master","last_synced_at":"2025-05-14T22:08:31.489Z","etag":null,"topics":["deep-learning","machine-learning","neural-networks","tutorial"],"latest_commit_sha":null,"homepage":"https://spandan-madan.github.io/DeepLearningProject/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Spandan-Madan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-07-12T16:03:24.000Z","updated_at":"2025-05-09T13:34:42.000Z","dependencies_parsed_at":"2023-02-09T21:30:44.989Z","dependency_job_id":null,"html_url":"https://github.com/Spandan-Madan/DeepLearningProject","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Spandan-Madan%2FDeepLearningProject","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Spandan-Madan%2FDeepLearningProject/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Spandan-Madan%2FDeepLearningProject/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Spandan-Madan%2FDeepLearningProject/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Spandan-Madan","download_url":"https://codeload.github.com/Spandan-Madan/DeepLearningProject/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254235700,"owners_count":22036964,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","machine-learning","neural-networks","tutorial"],"created_at":"2024-09-24T20:07:32.490Z","updated_at":"2025-05-14T22:08:33.960Z","avatar_url":"https://github.com/Spandan-Madan.png","language":"HTML","readme":"![harvard-logo](http://logonoid.com/images/harvard-logo.png)\n# An end to end tutorial of a machine learning pipeline\n\nThis tutorial tries to do what most Most Machine Learning tutorials available online do not. It is not a 30 minute tutorial which teaches you how to \"Train your own neural network\" or \"Learn deep learning in under 30 minutes\". It's a full pipeline which you would need to do if you actually work with machine learning - introducing you to all the parts, and all the implementation decisions and details that need to be made. The dataset is not one of the standard sets like MNIST or CIFAR, you will make you very own dataset. Then you will go through a couple conventional machine learning algorithms, before finally getting to deep learning!\n\nIn the fall of 2016, I was a Teaching Fellow (Harvard's version of TA) for the graduate class on \"Advanced Topics in Data Science (CS209/109)\" at Harvard University. I was in-charge of designing the class project given to the students, and this tutorial has been built on top of the project I designed for the class.\n\n# UPDATE 24th October 2018\nThe tutorial has now been re-written in PyTorch thanks to Anshul Basia (https://github.com/AnshulBasia)\n\nYou can access the HTML here: https://spandan-madan.github.io/DeepLearningProject/PyTorch_version/Deep_Learning_Project-Pytorch.html\nand the IPython Notebook with the code in PyTorch here:https://github.com/Spandan-Madan/DeepLearningProject/blob/master/PyTorch_version/Deep_Learning_Project-Pytorch.ipynb\n\n\n\n# Citing if you use the work here\nIf you would like to use this work, please cite the work using the doi -\n[![DOI](https://zenodo.org/badge/doi/10.5281/zenodo.830003.svg)](http://dx.doi.org/10.5281/zenodo.830003)\n\n# Reading/Viewing the Tutorial\nTo view the project as an HTML file, visit - https://spandan-madan.github.io/DeepLearningProject/\n\n# The Code\nIf you would like to access to Code, please go through the ipython notebook `Deep_Learning_Project.ipynb`\n\n# SETUP\n\nPython\n- We will be using Python 2.7. Primary reason is that Tensorflow is not compatible with python \u003e 3.5, and some other libraries are not compatible with python 3.\n\nTo make setup easy, we are going to use conda.\n- Please install conda 3 from https://www.continuum.io/downloads\n- The repository has a conda config file which will make setting up super easy. It's the file `deeplearningproject_environment.yml`\n- Then create a new conda environment using the command with `conda env create -f deeplearningproject_environment.yml`\n- Now, you can activate the environment with: `source activate deeplearningproject`  \n- `jupyter notebook` \nIf all the isntallations go through, you are good to go! If not, here is a list of packages that need to be installed: `requests imDbPy wget tmdbsimple seaborn sklearn Pillow keras tensorflow h5py gensim nltk stop_words`\n\nPlease install imdbpy using 'pip install imdbpy==6.6' since earlier versions are broken\n### Setting up conda environment in jupyter notebook\nTo be able to run the environment you just created on a juputer notebook, first check that you have the python package `ipykernel` installed. If you don't simply install it using\n\n```bash\npip install ipykernel\n```\n\nNow, add this to your jupyter notebook using the command:\n\n```bash\npython -m ipykernel install --user --name deeplearningproject --display-name \"deeplearningproject\"\n```\n\nNeedless to say, remove all single quotes before running commands.\n\nGo to the directory and run jupyter notbeook by \"jupyter notebook\" and open the respective notebook on browser.\nTO install TMDB: pip install tmdbsimple\nUse \"import tmdbsimple as tmdb\"\n\n\n### Setting up a docker container with docker-compose\n\n#### Prerequisites\n* Docker https://docs.docker.com/install/\n* docker compose https://docs.docker.com/compose/install/\n\n#### Run docker-compose\nTo work with an isolate environment and be able to run it on many systems without troubles, you can run this docker-compose command:\n```bash\ndocker-compose up\n```\nIt will build `deeplearningproject` image according to Dockerfile. And then run dokcer container via docker-compose. See Docker and docker-compose docs for more informations :\n* https://docs.docker.com/\n* https://docs.docker.com/compose/\n\nThen access notebooks through your web browser at http://localhost:8888\n\nYou should notice that notebooks have been copied from root to notebooks folder to mount them into container via bind volume. Any changes you make, will be saved  on host (notebooks dir).\n\n#### Add packages\nYou can add conda or pip packages to image (and thus, container) by updating `deeplearningproject_environment.yml` file and then run\n```bash\ndocker-compose build\n```\nIt will build a new `deeplearningproject` image with new conda/pip packages installed. Stop your running container (`CTRL-C`) and then `docker-compose up` to rerun a fresh new container.\n\n\n# Known common bugs\nI will keep updating this as issues pop up on this repository.\n\n- One known bug is because Keras 2.0 is not compatible with some Keras 1.2 functionalities. You may run into errors with importing `VGG16`. If so, just update keras using the following command:\n```bash\nsudo pip install git+git://github.com/fchollet/keras.git --upgrade\n```\n\n-OS Error: Too Many Open Files\nRefer to: https://stackoverflow.com/questions/16526783/python-subprocess-too-many-open-files\nor, shut down notebook and execute following the the same terminal\n``bash\nulimit -Sn 10000\n```\n\nAnd restart the jupyter notebook.\n\nHope this repo helps introduce you to a full machine learning pipeline! If you spot an error, please create an issue to help out others using this resource!\n\nTo prevent problems with installation and setting up, this repository comes with a conda environment profile. The only thing you will need is to install the newest version of conda, and use this profile to create a new environment and it will come set up with all the libraries you will need for the tutorial.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspandan-madan%2Fdeeplearningproject","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspandan-madan%2Fdeeplearningproject","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspandan-madan%2Fdeeplearningproject/lists"}