{"id":20620041,"url":"https://github.com/shwetajoshi601/yeast-multilabel-classifier","last_synced_at":"2025-04-15T12:13:06.421Z","repository":{"id":201631925,"uuid":"256830670","full_name":"shwetajoshi601/yeast-multilabel-classifier","owner":"shwetajoshi601","description":"Multi-label classification approaches on the Yeast dataset","archived":false,"fork":false,"pushed_at":"2020-04-18T19:30:39.000Z","size":941,"stargazers_count":11,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-15T12:12:32.288Z","etag":null,"topics":["binary-relevance","classifier-chains","ensemble","machine-learning-algorithms","multi-label-classification","under-sampling","yeast-dataset"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shwetajoshi601.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-04-18T18:59:41.000Z","updated_at":"2024-04-24T13:50:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"808ff9ed-564b-451f-a5d0-24345d6a559f","html_url":"https://github.com/shwetajoshi601/yeast-multilabel-classifier","commit_stats":null,"previous_names":["shwetajoshi601/yeast-multilabel-classifier"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shwetajoshi601%2Fyeast-multilabel-classifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shwetajoshi601%2Fyeast-multilabel-classifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shwetajoshi601%2Fyeast-multilabel-classifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shwetajoshi601%2Fyeast-multilabel-classifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shwetajoshi601","download_url":"https://codeload.github.com/shwetajoshi601/yeast-multilabel-classifier/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249067779,"owners_count":21207396,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binary-relevance","classifier-chains","ensemble","machine-learning-algorithms","multi-label-classification","under-sampling","yeast-dataset"],"created_at":"2024-11-16T12:13:11.748Z","updated_at":"2025-04-15T12:13:06.402Z","avatar_url":"https://github.com/shwetajoshi601.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Yeast Multilabel Classifier\n\n## Introduction\n\nIn contrast to multi-class classification in which instances can only belong to a single class, in multi-label classification1 problems instances can belong to\nmore than one class at a time. For example, an image might be classified as containing all of mountains, sky, and a house or a piece of music might be classified as belonging to both the rock and jazz genres. A selection of multi-label classification approaches based on ensemble methods exist in the literature.\nIn this project, we have implemented a selection of multi-label classification approaches. scikit-learn base estimator implementations (e.g. decision trees, logistic regression, or support vector machine models) have been used, however, the ensemble methods have been implemented from scratch.\n\n## Dataset\n\nThe Yeast dataset is formed by micro-array expression data and phylogenetic profiles with 2,417 included. There are 103 descriptive features per gene. Each gene is associated with a set of functional classes. In this version of the dataset there are 14 functional classes and a gene can be associated with any number of these.\n\nThe folder \"data\" in this repository contains the dataset in the form of a CSV file.\n\n## Binary Relevance Classifier\n\nA Binary Relevance Classifier has been implemented in which independent base classifiers are implemented for each label.\nThis uses a one-vs-all approach to generate the training sets for each base classifier. Implement\n\n## Binary Relevance Classifier with Under-Sampling\n\nOne of the issues with the one-vs-all approach to generating training datasets used in the binary relevance algorithm is that the training\ndatasets for base classifiers can be very imbalanced.\nThe Binary Relevance Classifier is implemented with under sampling to overcome imbalance in the training data.\n\n## Classifier Chains\n\nOne of the criticisms of the simple binary relevance classifier approach is that it does not take advantage of associations between labels in a multilabel classification scenario. For example, the presence of sea in an image increases the likelihood of a boat also being present, but decreases the likelihood of a giraffe being present.\nThe classifier chains algorithm is an effective multi-label classification algorithm that takes advantage of label associations. A classifier chain model generates a chain of binary classifiers each of predicts the presence or absence of a specific label. The in input to each classifier in the chain, however, includes the original descriptive features plus the outputs of the classifiers so far in the chain.[1]\nThis allows label associations to be taken into account.\n\n## Reflection\n\nConsider BR=Binary Relevance Classifier, BRUS=Binary Relevance Classifier with Under Sampling, CC=Classifier Chains.\n\nFrom the above experiment we can conclude that:\n\nBR works much better than BRUS. This is because with undersampling, relevant training samples may be lost which affect the accuracy.\nBRUS gives better F1 scores for all the base models as compared to BR. Lower F1 score is an indication that the data is highly biased which means high precision and low recall. So when the data gets more balanced after undersampling, we achieve better F1 scores.\nCC works better than BR and BRUS because it takes label dependency into consideration while making predictions. BR on the other hand fits and predicts independently of other labels. To my surprise, in this experiment we can obseve that CC outperforms BR only in case of RandomForest as the base model.\nThe performance of CC depends on the label order as while making current predictions, we consider the results and labels of previous predictions. To experiment with label ordering, random 20 label orders were used in grid search. To my observation, the results accross different runs are inconsistent pertaining to label orders. (plots included in the notebook)\nIn terms of complexity, BR is simple and faster as compared to CC. In terms of performance, CC has better performance than BR due to reasons mentioned above.\nWhile dealing with the Yeast Dataset, CC works slightly better than BR which is a surprise. So, while dealing with Yeast Dataset, we can use BR as an adequate model if complexity and speed is a concern.\n\n**Links:** [Dataset]() | [Jupyter Notebook]()\n\n## References\n\n[1] Read, Jesse, et al. \"Classifier chains for multi-label classification.\" Available At: https://link.springer.com/content/pdf/10.1007/s10994-011-5256-5.pdf\n\nNOTE: The project is WIP","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshwetajoshi601%2Fyeast-multilabel-classifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshwetajoshi601%2Fyeast-multilabel-classifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshwetajoshi601%2Fyeast-multilabel-classifier/lists"}