Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/thammegowda/thammegowda

README
https://github.com/thammegowda/thammegowda

Last synced: about 1 month ago
JSON representation

README

Awesome Lists containing this project

README

        

[cols="2a,2a"]
|===

image::https://stackexchange.com/users/flair/1632148.png[float="right",align="center", link="https://stackexchange.com/users/1632148/thamme-gowda?tab=accounts"]

- šŸ”­ Iā€™m currently working on neural machine translation, imbalanced learning
- šŸŒ± Iā€™m currently learning ...
- šŸ‘Æ Iā€™m looking to collaborate on ...
- šŸ¤” Iā€™m looking for help with ...
- šŸ’¬ Ask me about ... anything
- šŸ“« How to reach me: https://twitter.com[@thammegowda^]
- šŸ˜„ Pronouns: he/him/his
- āš” Fun fact: ...
|

=== Tools

- link:https://github.com/thammegowda/mtdata[mtdata^], datasets downloader
- link:https://github.com/marian-nmt/sotastream[sotastream^], streaming approach to training data
- link:https://github.com/isi-nlp/rtg[RTG^], NMT toolkit
- link:https://github.com/isi-nlp/boteval[BotEval^], chatbot eval
- link:https://github.com/isi-nlp/nlcodec[nlcodec^], vocabulary manager
- link:https://github.com/thammegowda/awkg[awkg^], pythonic awk
- link:https://github.com/USCDataScience/sparkler[Sparkler^], a webcrawler on Apache Spark
- Parser-Indexer: https://github.com/USCDataScience/parser-indexer[Java^], https://github.com/USCDataScience/parser-indexer-py[Python^]
- Web apps (I wasn't trying to be a web dev):
* link:https://github.com/USCDataScience/supervising-ui[Supervising UI^], for image labeling
* link:https://github.com/thammegowda/nllb-serve[NLLB Serve^], an MT webapp to serve huggingface models

|===

== Research

https://scholar.google.com/citations?hl=en&user=7Ed3-tMAAAAJ&view_op=list_works&sortby=pubdate[__See my latest publications on Google Scholar__^]

[columns="m,"]
|===
| Repo | Description | Status | Note

| https://github.com/marian-nmt/marian-dev/tree/master/src/python[PyMarian^]
| Python bindings to Marian C++; `pip install pymarian`
| Completeāœ…
| https://arxiv.org/abs/2408.11853[Paper^]; https://pypi.org/project/pymarian[PyPI^]

| https://github.com/isi-nlp/boteval[BotEval^]
| Facilitating human evaluation of chatbots; `pip install boteval`
| Completeāœ…
| https://aclanthology.org/2024.acl-demos.11/[Paper @ ACL2024 Demos^]; https://justin-cho.com/boteval[Demos^]; https://pypi.org/project/boteval[PyPI^]

| https://github.com/marian-nmt/wmt23-metrics[Cometoid^]
| Distilling strong reference based metrics into stronger reference-less metrics
| Completeāœ…
| https://aclanthology.org/2023.wmt-1.62/[Paper @ WMT2023^]; https://huggingface.co/collections/marian-nmt/cometoid-wmt23-metrics-66903bb137eadb9c5768d5f2[Models on Huggingface^]

| https://github.com/marian-nmt/sotastream[sotastream^]
| A streaming approach to machine translation training. `pip install sotastream`
| Completeāœ…
| https://aclanthology.org/2023.nlposs-1.13/[Paper @ NLP OSS 2023^] ; https://pypi.org/project/sotastream[PyPI^]

| https://github.com/thammegowda/016-many-eng-v2[016-many-eng-v2^]
| Many-to-English (v2)
| Completeāœ…
|

| https://github.com/thammegowda/015-nmt-ablation[015-nmt-ablation^]
| Transformer ablation, showing that model can work without encoder.
| Completeāœ…
|

| https://github.com/thammegowda/014-udhr-dataset[014-udhr-dataset^]
| Parallel sentence alignment from Universal Declaration of Human Rights corpus
| WIP/Incompleteā—’
|

| https://github.com/thammegowda/013-nmt-codeswitching[013-nmt-codeswitching^]
| Done
| Completeāœ…
| https://arxiv.org/abs/2210.05096[Paper^]

| https://github.com/thammegowda/012-macrobert[012-macrobert^]
| Macro sampling in BERT
| Didn't workāŒ
| Maybe we should revisit

| https://github.com/thammegowda/011-imb-learn[011-imb-learn^]
| Imbalanced machine learning: case studies in image recognition, text classification, and machine translation
| Incompleteā—’
| https://gowda.ai/011-imb-learn/[Docs^]

| https://github.com/thammegowda/010-hyperparam-theory[010-hyperparam-theory^]
| A theory on hyperparameter
| Incompleteā—’
| Book idea! Needs more time. šŸ•™

| https://github.com/thammegowda/009-nmt-toolkits[009-nmt-toolkits^]
| A survey of NMT toolkits
| Incompleteā—’
| _Lost interest_

| https://github.com/thammegowda/008-asr-eval-macro[008-asr-eval-macro^]
| Macro-averaged evaluation for automatic speech recognition
| Incompleteā—’
| (Some positive results, but needs more evidence)

| https://github.com/thammegowda/007-mt-eval-macro[007-mt-eval-macro^]
| Macro Average: Rare Types are Important Too
| Completeāœ…
| https://aclanthology.org/2021.naacl-main.90/[NAACL 2021^]

| https://github.com/thammegowda/006-many-to-eng[006-many-to-eng]
| Many-to-English machine translation tools, data, and pretrained models
| Completeāœ…
| https://aclanthology.org/2021.acl-demo.37/[ACL 2021 Demos^]. http://rtg.isi.edu/many-eng/[Demo page^]

| https://github.com/thammegowda/005-nmt-imbalance[005-nmt-imbalance^]
|Finding the optimal vocabulary size for neural machine translation
| Complete āœ…
| https://aclanthology.org/2020.findings-emnlp.352/[EMNLP 2020 Findings^]

| https://github.com/thammegowda/005-nmt-imbalance-old[005-nmt-imbalance-old^]
| Neural machine translation with imbalanced classes
| Complete āœ…
| Rejected from *ACL; https://arxiv.org/abs/2004.02334v1[Arxiv link^]

| https://github.com/thammegowda/004-nmt-learning-curve[004-nmt-learning-curve^]
| NMT learning curve revisited.
| Complete āœ…
| Not published

| https://github.com/thammegowda/image-forensics-MFSec17[image-forensics-MFSec17^]
| An Approach for Automatic and Large Scale Image Forensics
| Complete āœ…
| https://dl.acm.org/doi/abs/10.1145/3078897.3080536[MFSec 2017^]

| https://github.com/uscdataScience/autoextractor[autoextractor^]
| Clustering webpages based on structure and style similarity
| Completeāœ…
| https://ieeexplore.ieee.org/abstract/document/7785739[IEEE IRI 2016^]

|===