Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jieyuz2/awesome-weak-supervision

A curated list of programmatic weak supervision papers and resources
https://github.com/jieyuz2/awesome-weak-supervision

List: awesome-weak-supervision

data-programming machine-learning weak-supervision

Last synced: 11 days ago
JSON representation

A curated list of programmatic weak supervision papers and resources

Awesome Lists containing this project

README

        

# Awesome-Weak-Supervision

[![Awesome](https://awesome.re/badge.svg)](https://awesome.re) ![visitors](https://visitor-badge.glitch.me/badge?page_id=JieyuZ2/Awesome-Weak-Supervision) ![GitHub stars](https://img.shields.io/github/stars/JieyuZ2/Awesome-Weak-Supervision.svg?color=green) ![GitHub forks](https://img.shields.io/github/forks/JieyuZ2/Awesome-Weak-Supervision?color=9cf)



- A curated list of programmatic/rule-based weak supervision papers and resources.
- A bib file for most of the collected papers

![An overview of weak supervision](https://github.com/JieyuZ2/Awesome-Programmatic-Weak-Supervision/blob/main/ws.png)

## Blogs
[An Overview of Weak Supervision](https://www.snorkel.org/blog/weak-supervision)

[Building NLP Classifiers Cheaply With Transfer Learning and Weak Supervision](https://medium.com/sculpt/a-technique-for-building-nlp-classifiers-efficiently-with-transfer-learning-and-weak-supervision-a8e2f21ca9c8)

## Videos
[Theory & Systems for Weak Supervision](https://www.youtube.com/watch?v=CR1g2-ZqswE) | [Chinese Version](https://www.bilibili.com/video/BV1wV411H7AS)

## Lecture Notes
[Lecture Notes on Weak Supervision](http://narimanfarsad.com/cps803/docs/weak_supervision_notes.pdf)

## Workshops

[DCAI@NeurIPS 2021](https://datacentricai.org/)

[DBAI@NeurIPS 2021](https://dbai-workshop.github.io/)

[ML4data@ICML 2021](https://sites.google.com/view/ml4data)

[WeaSuL@ICLR 2021](https://weasul.github.io/)

## Survey
[A Survey on Programmatic Weak Supervision](https://arxiv.org/abs/2202.05433). Jieyu Zhang

## Dataset and Benchmark
[WRENCH: A Comprehensive Benchmark for Weak Supervision](https://arxiv.org/abs/2109.11377). Jieyu Zhang ```NeurIPS 2021```
- [codebase](https://github.com/JieyuZ2/wrench) (for both classification and sequence tagging tasks)

[WALNUT: A Benchmark on Semi-weakly Supervised Learning for Natural Language Understanding](https://aclanthology.org/2022.naacl-main.64/). Guoqing Zheng ```NAACL 2022```
- [codebase](https://github.com/microsoft/WALNUT)

[AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels](https://openreview.net/forum?id=nQZHEunntbJ). Nicholas Roberts ```NeurIPS 2022```
- [codebase](https://github.com/Kaylee0501/AutoWS-Bench-101)

[SPEAR : Semi-supervised Data Programming in Python](https://arxiv.org/abs/2108.00373). Ayush Maheshwari ```EMNLP 2022```
- [codebase](https://github.com/decile-team/spear)
- [Documentation](https://spear-decile.readthedocs.io/)

## Algorithm
[Data Programming: Creating Large Training Sets, Quickly](https://arxiv.org/abs/1605.07723). Alex Ratner ```NeurIPS 2016```

[Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data](https://arxiv.org/abs/1610.08123). Paroma Varma ```FILM-NeurIPS 2016```

[Training Complex Models with Multi-Task Weak Supervision](https://arxiv.org/abs/1810.02840). Alex Ratner ```AAAI 2019```

[Data Programming using Continuous and Quality-Guided Labeling Functions](https://arxiv.org/abs/1911.09860). Oishik Chatterjee ```AAAI 2020```

[Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods](https://arxiv.org/abs/2002.11955). Dan Fu ```ICML 2020```

[KnowMAN: Weakly Supervised Multinomial Adversarial Network](https://arxiv.org/pdf/2109.07994.pdf). Luisa März ```EMNLP 2021```

[End-to-End Weak Supervision](https://arxiv.org/abs/2107.02233). Salva Rühling Cachay ```NeurIPS 2021```

[Creating Training Sets via Weak Indirect Supervision](https://openreview.net/forum?id=m8uJvVgwRci). Jieyu Zhang ```ICLR 2022```

[Universalizing Weak Supervision](https://openreview.net/forum?id=YpPiNigTzMT). Changho Shin ```ICLR 2022```

[Learning from Multiple Noisy Partial Labelers](https://arxiv.org/abs/2106.04530). Peilin Yu ```AISTATS 2022```

[Firebolt: Weak Supervision Under Weaker Assumptions](https://proceedings.mlr.press/v151/kuang22a/kuang22a.pdf). Zhaobin Kuang ```AISTATS 2022```

[Learning the Structure of Generative Models without Labeled Data](https://arxiv.org/abs/1703.00854). Stephen H. Bach ```ICML 2017```

[Inferring Generative Model Structure with Static Analysis](https://papers.nips.cc/paper/2017/file/cedebb6e872f539bef8c3f919874e9d7-Paper.pdf). Paroma Varma ```NeurIPS 2017```

[Learning Dependency Structures for Weak Supervision Models](https://arxiv.org/abs/1903.05844). Paroma Varma ```ICML 2019```

[Dependency Structure Misspecification in Multi-Source Weak Supervision Models](https://arxiv.org/pdf/2106.10302.pdf). Salva Ruhling Cachay ```ICLR-WeaSul 2019```

[Pairwise Feedback for Data Programming](https://arxiv.org/abs/1912.07685). Benedikt Boecking ```NeurIPS 2019 workshop on Learning with Rich Experience: Integration of Learning Paradigms```

[Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision](https://arxiv.org/abs/2203.13270). Mayee F. Chen ```UAI 2022```

[Binary Classification with Positive Labeling Sources](https://aps.arxiv.org/abs/2208.01704). Jieyu Zhang ```CIKM 2022```

[Understanding Programmatic Weak Supervision via Source-aware Influence Function](https://arxiv.org/abs/2205.12879). Jieyu Zhang ```NeurIPS 2022```

[Training Subset Selection for Weak Supervision](https://arxiv.org/abs/2206.02914). Hunter Lang ```NeurIPS 2022```

[Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision](https://arxiv.org/abs/2210.02724). Jieyu Zhang and Linxin Song ```AISTATS 2023```

[Learning Hyper Label Model for Programmatic Weak Supervision](https://openreview.net/forum?id=aCQt_BrkSjC). Renzhi Wu ```ICLR 2023```

## System
[Snorkel: Rapid Training Data Creation with Weak Supervision](https://arxiv.org/abs/1711.10160). Alex Ratner ```VLDB 2018```

[Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale](https://arxiv.org/abs/1812.00417). Stephen H. Bach ```SIGMOD (Industrial) 2019```

[Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design](http://cidrdb.org/cidr2020/papers/p31-sheng-cidr20.pdf). Ying Sheng ```CIDR 2020```

[Overton: A Data System for Monitoring and Improving Machine-Learned Products](https://arxiv.org/abs/1909.05372). Christopher Ré ```CIDR 2020```

[Ruler: Data Programming by Demonstration for Document Labeling](https://www.aclweb.org/anthology/2020.findings-emnlp.181/). Sara Evensen ```EMNLP 2020 Findings```

[skweak: Weak Supervision Made Easy for NLP](https://arxiv.org/abs/2104.09683). Pierre Lison ```2021```

[TagRuler: Interactive Tool for Span-Level Data Programming by Demonstration](https://dl.acm.org/doi/abs/10.1145/3442442.3458602). Dongjin Choi ```WWW 2021```

[Demonstration of Panda: A Weakly Supervised Entity Matching System](https://arxiv.org/pdf/2106.10821.pdf). Renzhi Wu ```VLDB Demo 2021```

[Asterisk: Generating Large Training Datasets with Automatic Active Supervision](https://dl.acm.org/doi/10.1145/3385188). Mona Nashaat ```ACM/IMS Transactions on Data Science 2020```

[Inspector Gadget: A Data Programming-based Labeling System for Industrial Images](https://arxiv.org/abs/2004.03264). Geon Heo ```VLDB 2021```

## Weak Supervision with Labeled Data

[Learning from Rules Generalizing Labeled Exemplars](https://arxiv.org/abs/2004.06025). Abhijeet Awasthi ```ICLR 2020```

[Self-Training with Weak Supervision](https://arxiv.org/abs/2104.05514). Giannis Karamanolakis ```NAACL 2021```

[Semi-Supervised Aggregation of Dependent Weak Supervision Sources with Performance Guarantees](http://cs.brown.edu/people/sbach/files/mazzetto-aistats21.pdf). Alessio Mazzetto ```AISTATS 2021```

[Adversarial Multiclass Learning under Weak Supervision with Performance Guarantees](http://cs.brown.edu/people/sbach/files/mazzetto-icml21.pdf). Alessio Mazzetto ```ICML 2021```

[Semi-Supervised Data Programming with Subset Selection](https://arxiv.org/pdf/2008.09887.pdf). Ayush Maheshwari ```ACL 2021```

[Active WeaSuL: Improving Weak Supervision with Active Learning](https://arxiv.org/abs/2104.14847). Samantha Biegel ```ICLR WeaSuL 2021```

[DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples](https://arxiv.org/pdf/2110.13740.pdf). Yi Xu ```NeuIPS 2021```

[Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming](https://arxiv.org/abs/2109.11410). Ayush Maheshwari ```ACL 2022 Findings```

## Weak Supervision Sources Generation

[Snuba: Automating Weak Supervision to Label Training Data](http://www.vldb.org/pvldb/vol12/p223-varma.pdf). Paroma Varma ```VLDB 2019```

[Interactive Programmatic Labeling for Weak Supervision](https://bencw99.github.io/files/kdd2019_dcclworkshop.pdf). Benjamin Cohen-Wang ```KDD Workshop 2019```

[Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling](https://arxiv.org/abs/2012.06046). Benedikt Boecking ```ICLR 2021```

[Adaptive Rule Discovery for Labeling Text Data](https://arxiv.org/abs/2005.06133). Sainyam Galhotra ```VLDB 2019```

[Weakly Supervised Named Entity Tagging with Learnable Logical Rules](https://arxiv.org/pdf/2107.02282.pdf) Jiacheng Li ```ACL 2021 ```

[GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition](https://aclanthology.org/2021.eacl-main.318/) Xinyan Zhao ```EACL 2021 ```

[Classifying Unstructured Clinical Notes via Automatic Weak Supervision](https://arxiv.org/abs/2206.12088). Chufan Gao and Mononito Goswami ```MLHC 2022```

[Witan: Unsupervised Labelling Function Generation for Assisted Data Programming](https://www.vldb.org/pvldb/vol15/p2334-denham.pdf). Benjamin Denham ```VLDB 2022```

[Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming](https://arxiv.org/abs/2203.01382). Cheng-Yu Hsieh ```VLDB 2023```

## Weak Supervision for Active Learning

[Iterative Data Programming for Expanding Text Classification Corpora](https://ojs.aaai.org/index.php/AAAI/article/view/7045). Neil Mallinar ```AAAI/IAAI 20 Technical Tracks ```

[Hybridization of Active Learning and Data Programming for Labeling Large Industrial Datasets](https://ieeexplore.ieee.org/document/8622459). Mona Nashaat ```Big Data 2018```

## Application

### CV
[Scene Graph Prediction with Limited Labels](https://arxiv.org/pdf/1904.11622.pdf). Vincent Chen ```ICCV 2019```

[Multi-Resolution Weak Supervision for Sequential Data](https://arxiv.org/pdf/1910.09505.pdf). Paroma Varma ```NeurIPS 2019```

[Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels](https://arxiv.org/abs/1910.02993). Daniel Y. Fu ```SOSP 2019```

[GOGGLES: Automatic Image Labeling with Affinity Coding](https://arxiv.org/abs/1903.04552). Nilaksh Das ```SIGMOD 2020```

[Cut out the annotator, keep the cutout: better segmentation with weak supervision](https://openreview.net/pdf?id=bjkX6Kzb5H). Sarah Hooper ```ICLR 2021```

[Task Programming: Learning Data Efficient Behavior Representations](https://arxiv.org/abs/2011.13917). Jennifer J. Sun ```CVPR 2021```

### NLP
[Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach](https://arxiv.org/abs/1707.00166). Liyuan Liu ```EMNLP 2017```

[Training Classifiers with Natural Language Explanations](https://arxiv.org/abs/1805.03818). Braden Hancock ```ACL 2018```

[Deep Text Mining of Instagram Data without Strong Supervision](https://ieeexplore.ieee.org/abstract/document/8609589/authors#authors). Kim Hammar ```ICWI 2018```

[Bootstrapping Conversational Agents With Weak Supervision](https://arxiv.org/pdf/1812.06176.pdf). Neil Mallinar ```AAAI 2019```

[Weakly Supervised Sequence Tagging from Noisy Rules](http://cs.brown.edu/people/sbach/files/safranchik-aaai20.pdf). Esteban Safranchik ```AAAI 2020```

[NERO: A Neural Rule Grounding Framework for Label-Efficient Relation Extraction](https://arxiv.org/abs/1909.02177). Wenxuan Zhou ```WWW 2020```

[Named Entity Recognition without Labelled Data: A Weak Supervision Approach](https://arxiv.org/abs/2004.14723). Pierre Lison ```ACL 2020```

[Leveraging Multi-Source Weak Social Supervision for Early Detection of Fake News](https://arxiv.org/pdf/2004.01732.pdf). Kai Shu ```CML-PKDD 2020```

[Learning with Weak Supervision for Email Intent Detection](https://www.microsoft.com/en-us/research/uploads/prod/2020/05/SIGIR_2020_Learning_with_Weak_Supervision_from_User_Interactions-5ecf17ccc607f.pdf). Kai Shu ```SIGIR 2020```

[Understanding the Dynamics between Vaping and Cannabis Legalization Using Twitter Opinions](https://arxiv.org/pdf/2106.11029.pdf). Shishir Adhikari ```AAAI-ICWSM 2021```

[Denoising Multi-Source Weak Supervision for Neural Text Classification](https://www.aclweb.org/anthology/2020.findings-emnlp.334/). Wendi Ren ```EMNLP 2020 Findings```

[Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach](https://arxiv.org/abs/2010.07835). Yue Yu ```NAACL 2021```

[Heterogeneous Graph Neural Networks for Concept Prerequisite Relation Learning in Educational Data](https://www.aclweb.org/anthology/2021.naacl-main.164/). Chenghao Jia ```NAACL 2021```

[Goodwill Hunting: Analyzing and Repurposing Off-the-Shelf Named Entity Linking Systems](https://www.aclweb.org/anthology/2021.naacl-industry.26/). Karan Goel ```NAACL 2021```

[Bootstrapping a Music Voice Assistant with Weak Supervision](https://www.aclweb.org/anthology/2021.naacl-industry.7/). Sergio Oramas ```NAACL 2021 Industry```

[BERTifying Hidden Markov Models for Multi-Source Weakly Supervised Named Entity Recognition](https://arxiv.org/abs/2105.12848) Yinghao Li ```ACL 2021```

[HERALD: An Annotation Efficient Method to Train User Engagement Predictors in Dialogs](https://arxiv.org/pdf/2106.00162.pdf) Weixin Liang ```ACL 2021```

[Controllable Abstractive Dialogue Summarization with Sketch Supervision](https://arxiv.org/pdf/2105.14064.pdf) Chien-Sheng Wu ```ACL 2021 Findings```

[Named Entity Recognition through Deep Representation Learning and Weak Supervision](https://aclanthology.org/2021.findings-acl.335.pdf) Jerrod Parker ```ACL 2021 Findings```

[Weakly supervised discourse segmentation for multiparty oral conversations](https://aclanthology.org/2021.emnlp-main.104/) Lila Gravellier ```EMNLP 2021 ```

[Adaptive Ranking-based Data Selection for Weakly supervised Class-imbalanced Text Classification](https://arxiv.org/abs/2210.03092) Linxin Song ```Findings of EMNLP 2022 ```

### RL
[Generating Multi-Agent Trajectories using Programmatic Weak Supervision](https://openreview.net/forum?id=rkxw-hAcFQ). Eric Zhan ```ICLR 2019```

### Software Engineering
[Search4Code: Code Search Intent Classification Using Weak Supervision](https://arxiv.org/abs/2011.11950). Nikitha Rao ```MSR 2021```

### Others
[Generating Training Labels for Cardiac Phase-Contrast MRI Images](https://www.paroma.xyz/). Vincent Chen ```MED-NeurIPS 2017```

[Osprey: Weak Supervision of Imbalanced Extraction Problems without Code](https://ajratner.github.io/assets/papers/Osprey_DEEM.pdf). Eran Bringer ```SIGMOD DEEM Workshop 2019```

[Weakly Supervised Classification of Rare Aortic Valve Malformations Using Unlabeled Cardiac MRI Sequences](https://jdunnmon.github.io/mri_biorxiv.pdf). Jason Fries ```Nature Communications 2019```

[Doubly Weak Supervision of Deep Learning Models for Head CT](https://link.springer.com/chapter/10.1007/978-3-030-32248-9_90). Khaled Saab ```MICCAI 2019```

[A clinical text classification paradigm using weak supervision and deep representation](https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-018-0723-6). Yanshan Wang ```BMC MIDM 2019```

[A machine-compiled database of genome-wide association studies](https://www.nature.com/articles/s41467-019-11026-x). Volodymyr Kuleshov ```Nature Communications 2019```

[Weak Supervision as an Efficient Approach for Automated Seizure Detection in Electroencephalography](https://jdunnmon.github.io/ndm-final.pdf). Khaled Saab ```NPJ Digital Medicine 2020```

[Extracting Chemical Reactions From Text Using Snorkel](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03542-1). Emily Mallory ```BMC Bioinformatics 2020```

[Cross-Modal Data Programming Enables Rapid Medical Machine Learning](https://arxiv.org/abs/1903.11101). Jared A. Dunnmon ```Patterns 2020```

[SwellShark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data](https://arxiv.org/abs/1704.06360). Jason Fries

[Ontology-driven weak supervision for clinical entity classification in electronic health records](https://arxiv.org/abs/2008.01972). Jason Fries ```Nature Communications 2021```

[Utilizing Weak Supervision to Infer Complex Objects and Situations in Autonomous Driving Data](https://ieeexplore.ieee.org/abstract/document/8814147). Zhenzhen Weng ```IV 2019```

[Multi-frame Weak Supervision to Label Wearable Sensor Data](http://roseyu.com/time-series-workshop/submissions/2019/timeseries-ICML19_paper_44.pdf). Saelig Khattar ```ICML Time Series Workshop 2019```

[Applying Weak Supervision to Mobile Sensor Data: Experiences with Transport Mode Detection](https://aiotworkshop.github.io/2020/published/AIoT20-ModeDetect.pdf). Jonathan Furst ```AAAI Workshop on Artificial Intelligence of Things 2020```

[Exploring Inspiration Sets in a Data Programming Pipeline for Product Moderation](https://aclanthology.org/2021.ecnlp-1.16.pdf). Justine Winkler ```ACL 2021 ECNLP 4```

[Detecting Hashtag Hijacking for Hashtag Activism](https://aclanthology.org/2021.nlp4posimpact-1.9.pdf). Pooneh Mousavi ```ACL 2021 NLP for Positive Impact```

[CHECKER: Detecting Clickbait Thumbnails with Weak Supervision and Co-Teaching](https://2021.ecmlpkdd.org/wp-content/uploads/2021/07/sub_147.pdf). Tianyi Xie ```ECML-PKDD 2021```

[DeFraudNet: An End-to-End Weak Supervision Framework to Detect Fraud in Online Food Delivery](https://2021.ecmlpkdd.org/wp-content/uploads/2021/07/sub_10-1.pdf). Jose Mathew ```ECML-PKDD 2021```

[Weak Supervision for Affordable Modeling of Electrocardiogram Data](https://arxiv.org/abs/2201.02936). Mononito Goswami. ```AMIA 2021 Annual Symposium```

[Fraud Detection under Multi-Sourced Extremely Noisy Annotations](https://gcatnjust.github.io/ChenGong/paper/zhang_cikm21.pdf) Chuang Zhang ```CIKM 2021 ```

[Multi-Source Domain Adaptation with Weak Supervision for Early Fake News Detection](http://www.cs.iit.edu/~kshu/files/BigDataMDAWS.pdf) Yichuan Li ```BigData 2021```

[Weakly Supervised Classification of Vital Sign Alerts as Real or Artifact](https://arxiv.org/abs/2206.09074). Arnab Dey. ```AMIA 2022 Annual Symposium```

## Thesis
[Acclerating Machine Learning with Training Data Management](https://ajratner.github.io/assets/papers/thesis.pdf). Alex Ratner

[Weak Supervision From High-Level Abstrations](https://stacks.stanford.edu/file/druid:ns523jd4552/hancock_dissertation_vsubmission-augmented.pdf). Braden Jay Hancock

## Other Weak Supervision Paradigm

### Label-name Only Supervision

[Weakly-Supervised Neural Text Classification](https://arxiv.org/abs/1809.01478). Yu Meng ```CIKM 2018```

[Weakly-Supervised Hierarchical Text Classification](https://arxiv.org/abs/1812.11270). Yu Meng ```AAAI 2019```

[Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding](https://arxiv.org/abs/2010.06705). Jiaxin Huang ```EMNLP 2020```

[Text Classification Using Label Names Only: A Language Model Self-Training Approach](https://arxiv.org/abs/2010.07245). Yu Meng ```EMNLP 2020```

[Hierarchical Metadata-Aware Document Categorization under Weak Supervision](https://arxiv.org/abs/2010.13556). Yu Zhang ```WSDM 2021```

[Contextualized weak supervision for text classification](https://aclanthology.org/2020.acl-main.30/). Dheeraj Mekala ```ACL 2020```

[Meta: Metadata-empowered weak supervision for text classification](https://aclanthology.org/2020.emnlp-main.670/). Dheeraj Mekala ```EMNLP 2020```

[X-class: Text classification with extremely weak supervision](https://arxiv.org/abs/2010.12794). Zihan Wang ```NAACL 2021```

[Coarse2Fine: Fine-grained text classification on coarsely-grained annotated data](https://arxiv.org/abs/2109.10856). Dheeraj Mekala ```EMNLP 2021```

### Improving Weak Supervision

[LOPS: Learning Order Inspired Pseudo-Label Selection for Weakly Supervised Text Classification](https://arxiv.org/abs/2205.12528). Dheeraj Mekala ```EMNLP 2022 Findings```