Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/thammegowda/thammegowda

README
https://github.com/thammegowda/thammegowda

Last synced: about 1 month ago
JSON representation

README

Host: GitHub
URL: https://github.com/thammegowda/thammegowda
Owner: thammegowda
Created: 2021-08-18T17:51:12.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-09-06T20:56:35.000Z (2 months ago)
Last Synced: 2024-09-07T00:09:07.891Z (2 months ago)
Size: 37.1 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.adoc

Awesome Lists containing this project

README

        
[cols="2a,2a"]

|===

image::https://stackexchange.com/users/flair/1632148.png[float="right",align="center", link="https://stackexchange.com/users/1632148/thamme-gowda?tab=accounts"]

- 🔭 I’m currently working on neural machine translation, imbalanced learning

- 🌱 I’m currently learning ... 

- 👯 I’m looking to collaborate on ...

- 🤔 I’m looking for help with ...

- 💬 Ask me about ... anything

- 📫 How to reach me: https://twitter.com[@thammegowda^]

- 😄 Pronouns: he/him/his

- ⚡ Fun fact: ...

|

=== Tools

- link:https://github.com/thammegowda/mtdata[mtdata^], datasets downloader

- link:https://github.com/marian-nmt/sotastream[sotastream^], streaming approach to training data

- link:https://github.com/isi-nlp/rtg[RTG^], NMT toolkit

- link:https://github.com/isi-nlp/boteval[BotEval^], chatbot eval

- link:https://github.com/isi-nlp/nlcodec[nlcodec^], vocabulary manager

- link:https://github.com/thammegowda/awkg[awkg^], pythonic awk

- link:https://github.com/USCDataScience/sparkler[Sparkler^], a webcrawler on Apache Spark

- Parser-Indexer: https://github.com/USCDataScience/parser-indexer[Java^], https://github.com/USCDataScience/parser-indexer-py[Python^]

- Web apps (I wasn't trying to be a web dev):

  * link:https://github.com/USCDataScience/supervising-ui[Supervising UI^], for image labeling

  * link:https://github.com/thammegowda/nllb-serve[NLLB Serve^], an MT webapp to serve huggingface models

|=== 

== Research 

https://scholar.google.com/citations?hl=en&user=7Ed3-tMAAAAJ&view_op=list_works&sortby=pubdate[__See my latest publications on Google Scholar__^]

[columns="m,"]

|===

| Repo | Description | Status | Note 

| https://github.com/marian-nmt/marian-dev/tree/master/src/python[PyMarian^]

| Python bindings to Marian C++; `pip install pymarian`

| Complete✅

| https://arxiv.org/abs/2408.11853[Paper^]; https://pypi.org/project/pymarian[PyPI^]

| https://github.com/isi-nlp/boteval[BotEval^]

| Facilitating human evaluation of chatbots; `pip install boteval`

| Complete✅

| https://aclanthology.org/2024.acl-demos.11/[Paper @ ACL2024 Demos^];  https://justin-cho.com/boteval[Demos^]; https://pypi.org/project/boteval[PyPI^]

| https://github.com/marian-nmt/wmt23-metrics[Cometoid^]

| Distilling strong reference based metrics into stronger reference-less metrics 

| Complete✅

| https://aclanthology.org/2023.wmt-1.62/[Paper @ WMT2023^]; https://huggingface.co/collections/marian-nmt/cometoid-wmt23-metrics-66903bb137eadb9c5768d5f2[Models on Huggingface^] 

| https://github.com/marian-nmt/sotastream[sotastream^]

| A streaming approach to machine translation training. `pip install sotastream`

| Complete✅

| https://aclanthology.org/2023.nlposs-1.13/[Paper @ NLP OSS 2023^] ; https://pypi.org/project/sotastream[PyPI^]

| https://github.com/thammegowda/016-many-eng-v2[016-many-eng-v2^]

| Many-to-English (v2) 

| Complete✅

|

| https://github.com/thammegowda/015-nmt-ablation[015-nmt-ablation^] 

| Transformer ablation, showing that model can work without encoder.

| Complete✅

| 

| https://github.com/thammegowda/014-udhr-dataset[014-udhr-dataset^]

| Parallel sentence alignment from Universal Declaration of Human Rights corpus

| WIP/Incomplete◒

| 

| https://github.com/thammegowda/013-nmt-codeswitching[013-nmt-codeswitching^]

| Done

| Complete✅

| https://arxiv.org/abs/2210.05096[Paper^]

 

| https://github.com/thammegowda/012-macrobert[012-macrobert^]

| Macro sampling in BERT

| Didn't work❌

| Maybe we should revisit

| https://github.com/thammegowda/011-imb-learn[011-imb-learn^]

| Imbalanced machine learning: case studies in image recognition, text classification,  and machine translation

| Incomplete◒ 

|  https://gowda.ai/011-imb-learn/[Docs^]

| https://github.com/thammegowda/010-hyperparam-theory[010-hyperparam-theory^]

| A theory on hyperparameter 

| Incomplete◒

| Book idea! Needs more time. 🕙

| https://github.com/thammegowda/009-nmt-toolkits[009-nmt-toolkits^] 

| A survey of NMT toolkits

| Incomplete◒  

| _Lost interest_

| https://github.com/thammegowda/008-asr-eval-macro[008-asr-eval-macro^] 

| Macro-averaged evaluation for automatic speech recognition

|  Incomplete◒

| (Some positive results, but needs more evidence)

| https://github.com/thammegowda/007-mt-eval-macro[007-mt-eval-macro^]

| Macro Average: Rare Types are Important Too

| Complete✅

| https://aclanthology.org/2021.naacl-main.90/[NAACL 2021^]

| https://github.com/thammegowda/006-many-to-eng[006-many-to-eng]

| Many-to-English machine translation tools, data, and pretrained models

| Complete✅

| https://aclanthology.org/2021.acl-demo.37/[ACL 2021 Demos^]. http://rtg.isi.edu/many-eng/[Demo page^]

| https://github.com/thammegowda/005-nmt-imbalance[005-nmt-imbalance^] 

|Finding the optimal vocabulary size for neural machine translation

| Complete ✅

| https://aclanthology.org/2020.findings-emnlp.352/[EMNLP 2020 Findings^]

| https://github.com/thammegowda/005-nmt-imbalance-old[005-nmt-imbalance-old^] 

| Neural machine translation with imbalanced classes 

| Complete ✅

| Rejected from *ACL; https://arxiv.org/abs/2004.02334v1[Arxiv link^]

| https://github.com/thammegowda/004-nmt-learning-curve[004-nmt-learning-curve^]

| NMT learning curve revisited.

| Complete ✅

| Not published 

| https://github.com/thammegowda/image-forensics-MFSec17[image-forensics-MFSec17^]

| An Approach for Automatic and Large Scale Image Forensics

| Complete ✅

| https://dl.acm.org/doi/abs/10.1145/3078897.3080536[MFSec 2017^]

| https://github.com/uscdataScience/autoextractor[autoextractor^]

| Clustering webpages based on structure and style similarity

| Complete✅

| https://ieeexplore.ieee.org/abstract/document/7785739[IEEE IRI 2016^]

|===