{"id":15031697,"url":"https://github.com/rubenszimbres/repo-2017","last_synced_at":"2025-05-16T04:06:20.354Z","repository":{"id":40444089,"uuid":"76725259","full_name":"RubensZimbres/Repo-2017","owner":"RubensZimbres","description":"My first Python repo with codes in Machine Learning, NLP and Deep Learning with Keras and Theano","archived":false,"fork":false,"pushed_at":"2021-12-06T22:14:24.000Z","size":42382,"stargazers_count":1192,"open_issues_count":1,"forks_count":677,"subscribers_count":125,"default_branch":"master","last_synced_at":"2024-10-30T00:32:17.129Z","etag":null,"topics":["anomaly-detection","autoencoder","deep-learning","denoising-autoencoders","generative-adversarial-network","glove","keras","keras-layer","keras-models","latent-dirichlet-allocation","natural-language-processing","nlp","opencv","resnet-50","segnet","sentiment-analysis","svm-classifier","t-sne","variational-autoencoder","word2vec"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RubensZimbres.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-12-17T13:21:20.000Z","updated_at":"2024-10-15T06:20:53.000Z","dependencies_parsed_at":"2022-07-12T18:02:19.776Z","dependency_job_id":null,"html_url":"https://github.com/RubensZimbres/Repo-2017","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RubensZimbres%2FRepo-2017","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RubensZimbres%2FRepo-2017/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RubensZimbres%2FRepo-2017/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RubensZimbres%2FRepo-2017/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RubensZimbres","download_url":"https://codeload.github.com/RubensZimbres/Repo-2017/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247856544,"owners_count":21007621,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","autoencoder","deep-learning","denoising-autoencoders","generative-adversarial-network","glove","keras","keras-layer","keras-models","latent-dirichlet-allocation","natural-language-processing","nlp","opencv","resnet-50","segnet","sentiment-analysis","svm-classifier","t-sne","variational-autoencoder","word2vec"],"created_at":"2024-09-24T20:16:21.080Z","updated_at":"2025-04-08T14:11:19.731Z","avatar_url":"https://github.com/RubensZimbres.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Python Codes in Data Science\n\nCodes in NLP, Deep Learning, Reinforcement Learning and Artificial Intelligence\n\n\u003cb\u003e Welcome to my GitHub repo. \u003c/b\u003e\n\nI am a Data Scientist and I code in R, Python and Wolfram Mathematica. Here you will find some Machine Learning, Deep Learning, Natural Language Processing and Artificial Intelligence models I developed.\n\n---------------\nKeras version used in models: keras==1.1.0\n\n\u003cb\u003e Autoencoder for Audio  \u003c/b\u003e is a model where I compressed an audio file and used Autoencoder to reconstruct the audio file, for use in phoneme classification.\n\n\u003cb\u003e Collaborative Filtering  \u003c/b\u003e is a Recommender System where the algorithm predicts a movie review based on genre of movie and similarity among people who watched the same movie.\n\n\u003cb\u003e Convolutional NN Lasagne  \u003c/b\u003e is a Convolutional Neural Network model in Lasagne to solve the MNIST task.\n\n\u003cb\u003e Ensembled Machine Learning \u003c/b\u003e is a .py file where 7 Machine Learning algorithms are used in a classification task with 3 classes and all possible hyperparameters of each algorithm are adjusted. Iris dataset of scikit-learn.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=https://github.com/RubensZimbres/Repo-2017/raw/master/Pictures%20-%20Formulas/Ensembled.MachineLearning.png?raw=true\u003e\n\u003c/p\u003e\n\n\u003cb\u003e GAN Generative Adversarial  \u003c/b\u003e are models of Generative Adversarial Neural Networks.\n\n\u003cb\u003e Hyperparameter Tuning RL  \u003c/b\u003e is a model where hyperparameters of Neural Networks are adjusted via Reinforcement Learning. According to a reward, hyperparameter tuning (environment) is changed through a policy (mechanization of knowledge) using the Boston Dataset. Hyperparameters tuned are: learning rate, epochs, decay, momentum, number of hidden layers and nodes and initial weights.\n\n\u003cb\u003e Keras Regularization L2  \u003c/b\u003e is a Neural Network model for regression made with Keras where a L2 regularization was applied to prevent overfitting.\n\n\u003cb\u003e Lasagne Neural Nets Regression  \u003c/b\u003e is a Neural Network model based in Theano and Lasagne, that makes a linear regression with a continuous target variable and reaches 99.4% accuracy. It uses the DadosTeseLogit.csv sample file.\n\n\u003cb\u003e Lasagne Neural Nets + Weights  \u003c/b\u003e is a Neural Network model based in Theano and Lasagne, where is possible to visualize weights between X1 and X2 to hidden layer. Can also be adapted to visualize weights between hidden layer and output. It uses the DadosTeseLogit.csv sample file.\n\n\u003cb\u003e Multinomial Regression  \u003c/b\u003e is a regression model where target variable has 3 classes.\n\n\u003cb\u003e Neural Networks for Regression  \u003c/b\u003e shows multiple solutions for a regression problem, solved with sklearn, Keras, Theano and Lasagne. It uses the Boston dataset sample file from sklearn and reaches more than 98% accuracy.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=https://github.com/RubensZimbres/Repo-2017/raw/master/Pictures%20-%20Formulas/HiddenLayers.jpg?raw=true\u003e\n\u003c/p\u003e\n\n\u003cb\u003e NLP + Naive Bayes Classifier  \u003c/b\u003e is a model where movie reviews were labeled as positive and negative and the algorithm then classifies a totally new set of reviews using Logistic Regression, Decision Trees and Naive Bayes, reaching an accuracy of 92%.\n\n\u003cb\u003e NLP Anger Analysis  \u003c/b\u003e is a Doc2Vec model associated with Word2Vec model to analyze level of anger using synonyms in consumer complaints of a U.S. retailer in Facebook posts.\n\n\u003cb\u003e NLP Consumer Complaint  \u003c/b\u003e is a model where Facebook posts of a U.S. computer retailer were scraped, tokenized, lemmatized and applied Word2Vec. After that, t-SNE and Latent Dirichlet Allocation were developed in order to classify the arguments and weights of each keyword used by a consumer in his complaint. The code also analyzes frequency of words in 100 posts.\n\n\u003cb\u003e NLP Convolutional Neural Network \u003c/b\u003e is a Convolutional Neural Network for Text in order to classify movie reviews.\n\n\u003cb\u003e NLP Doc2Vec  \u003c/b\u003e is a Natural Language Procesing file where cosine similarity among phrases is measured through Doc2Vec.\n\n\u003cb\u003e NLP Document Classification  \u003c/b\u003e is a code for Document Classification according to Latent Dirichlet Allocation.\n\n\u003cb\u003e NLP Facebook Analysis  \u003c/b\u003e analyzes Facebook posts regarding Word Frequency and Topic Modelling using LDA.\n\n\u003cb\u003e NLP Facebook Scrap  \u003c/b\u003e is a Python code for scraping data from Facebook.\n\n\u003cb\u003e NLP - Latent Dirichlet Allocation  \u003c/b\u003e is a Natural Language Processing model where a Wikipedia page on Statistical Inference is classified regarding topics, using Latent Dirichlet Allocation with Gensim, NLTK, t-SNE and K-Means.\n\n\u003cb\u003e NLP Probabilistic ANN  \u003c/b\u003e is a Natural Language Processing model where sentences are vectorized by Gensim and a probabilistic Neural Network model is deveoped using Gensim, for sentiment analysis.\n\n\u003cb\u003e NLP Semantic Doc2Vec + Neural Network  \u003c/b\u003e is a model where positive and negative movie reviews were extracted and semantically classified with NLTK and BeautifulSoup, then labeled as positive or negative. Text was then used as an input for the Neural Network model training. After training, new sentences are entered in the Keras Neural Network model and then classified. It uses the zip file.\n\n\u003cb\u003e NLP Sentiment Positive  \u003c/b\u003e is a model that identifies website content as positive, neutral or negative using BeautifulSoup and NLTK libraries, plotting the results.\n\n\u003cb\u003e NLP Twitter Analysis ID #  \u003c/b\u003e is a model that extracts posts from Twitter based in ID of user or Hashtag.\n\n\u003cb\u003e NLP Twitter Scrap  \u003c/b\u003e is a model that scraps Twitter data and shows the cleaned text as output.\n\n\u003cb\u003e NLP Twitter Streaming  \u003c/b\u003e is a model of analysis of real-time data from Twitter (under development).\n\n\u003cb\u003e NLP Twitter Streaming Mood  \u003c/b\u003e is a model where the evolution of mood Twitter posts is measured during a period of time.\n\n\u003cb\u003e NLP Wikipedia Summarization  \u003c/b\u003e is a Python code that summarizes any given page in a few sentences.\n\n\u003cb\u003e NLP Word Frequency  \u003c/b\u003e is a model that calculates the frequency of nouns, verbs, words in Facebook posts.\n\n\u003cb\u003e Probabilistic Neural Network  \u003c/b\u003e is a Probabilistic Neural Network for Time Series Prediction.\n\n\u003cb\u003e REAL-TIME Twitter Analysis  \u003c/b\u003e is a model where Twitter streaming is extracted, words and sentences tokenized, word embeddings were created, topic modeling was made and classified using K-Means. Then, NLTK SentimentAnalyzer was used to classify each sentence of the streaming into positive, neutral or negative. Accumulated sum was used to generate the plot and the code loops each 1 second, collecting new tweets.\n\n\u003cb\u003e RESNET-2  \u003c/b\u003e is a Deep Residual Neural Network.\n\n\u003cb\u003e ROC Curve Multiclass  \u003c/b\u003e is a .py file where Naive Bayes was used to solve the IRIS Dataset task and ROC curve of different classes are plotted.\n\n\u003cb\u003e SQUEEZENET  \u003c/b\u003e is a simplified version of the AlexNet.\n\n\u003cb\u003e Stacked Machine Learning  \u003c/b\u003e is a .py notebook where t-SNE, Principal Components Analysis and Factor Analysis were applied to reduce dimensionality of data. Classification performances were measured after applying K-Means.\n\n\u003cb\u003e Support Vector Regression  \u003c/b\u003e is a SVM model for non linear regression in an artificial dataset.\n\n\u003cb\u003e Text-to-Speech  \u003c/b\u003e is a .py file where Python speaks any given text and saves it as an audio .wav file.\n\n\u003cb\u003e Time Series ARIMA \u003c/b\u003e  is a ARIMA model to forecast time series, with an error margin of 0.2%.\n\n\u003cb\u003e Time Series Prediction with Neural Networks - Keras \u003c/b\u003e  is a Neural Network model to forecast time series, using Keras with an adaptive learning rate depending upon derivative of loss.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=https://github.com/RubensZimbres/Repo-2017/blob/master/Pictures%20-%20Formulas/ARIMA.10Period.png?raw=true\u003e \n\u003c/p\u003e\n\n\u003cb\u003e Variational Autoencoder  \u003c/b\u003e is a VAE made with Keras.\n\n\u003cb\u003e Web Crawler  \u003c/b\u003e is a code that scraps data from different URLs of a hotel website.\n\n\u003cb\u003e t-SNE Dimensionality Reduction  \u003c/b\u003e is a t-SNE model for dimensionality reduction which is compared to Principal Components Analysis regarding its discriminatory power.\n\n\u003cb\u003e t-SNE PCA + Neural Networks  \u003c/b\u003e is a model that compares performance or Neural Networks made after t-SNE, PCA and K-Means.\n\n\u003cb\u003e t-SNE PCA LDA embeddings \u003c/b\u003e is a model where t-SNE, Principal Components Analysis, Linear Discriminant Analysis and Random Forest embeddings are compared in a task to classify clusters of similar digits.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=https://github.com/RubensZimbres/Repo-2017/raw/master/Pictures%20-%20Formulas/Doc2Vec.png?raw=true\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=https://github.com/RubensZimbres/Repo-2017/raw/master/Pictures%20-%20Formulas/t_SNE_Lk.png?raw=true\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=https://github.com/RubensZimbres/Repo-2017/blob/master/Pictures%20-%20Formulas/RESNET_Me.jpg?raw=true\u003e\n\u003c/p\u003e\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frubenszimbres%2Frepo-2017","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frubenszimbres%2Frepo-2017","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frubenszimbres%2Frepo-2017/lists"}