Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/stephantul/stephantul

Last synced: about 6 hours ago
JSON representation

Host: GitHub
URL: https://github.com/stephantul/stephantul
Owner: stephantul
Created: 2022-11-08T08:25:58.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2025-01-07T10:24:20.000Z (about 1 month ago)
Last Synced: 2025-01-07T11:35:51.818Z (about 1 month ago)
Size: 25.4 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        ### Hi there 👋

I'm Stéphan Tulkens! I'm a computational linguistics/AI person. I am currently working as a machine learning engineer at [Ecosia](https://ecosia.org). I am one of the two founding members of [Minish](https://github.com/MinishLab).

I got my Phd at [CLiPS](https://www.uantwerpen.be/en/research-groups/clips/) at the University of Antwerpen under the watchful eyes of Walter Daelemans (Computational Linguistics) and Dominiek Sandra (Psycholinguistics). The topic of my Phd was the way people process orthography during reading. You can find a copy [here](https://scholar.google.com/scholar?oi=bibs&hl=en&cluster=11519863597548395702).

Before that I studied computational linguistics (Ma), philosophy (Ba) and software engineering (Ba)

My goal is always to make things as fast and small as possible. I like it when simple models work well, and I love it when simple models get close in accuracy to big models. I do not believe absolute accuracy is a metric to be chased, and I think we should always be mindful of what a model computes or learns from the data.

#### I’m currently working on 🏃‍♂️:

* [vicinity](https://github.com/MinishLab/vicinity): a ANN/Knn interface library.

* [model2vec](https://github.com/MinishLab/model2vec): a library for creating extremely fast sentence-transformers through distillation.

* [semhash](https://github.com/MinishLab/semhash): a library for data deduplication and other dataset work.

* [reach](https://github.com/stephantul/reach): a library for loading and working with word embeddings.

### Other stuff I made (most of it from my Phd) 🐕:

* [wordkit](https://github.com/clips/wordkit): a library for working with orthography

* [old20](https://github.com/stephantul/old20): calculate the orthographic levenshtein distance 20 metric.

* [metameric](https://github.com/clips/metameric): fast interactive activation networks in numpy.

* [humumls](https://github.com/clips/humumls): load the UMLS database into a mongodb instance. Fast!

* [dutchembeddings](https://github.com/clips/dutchembeddings): word embeddings for dutch (back when this was a cool thing to do)

#### My research interests 🤖:

* Tokenizers, specifically subword tokenizers.

* Embeddings, specifically _static_ embeddings (so old-fashioned! 💀), and how to combine these in meaningful ways.

* String similarity, and how to compute it without using dynamic programming.

#### Contact:

* [My website](https://stephantul.github.io)

* [Google Scholar](https://scholar.google.com/citations?user=pvoqmHQAAAAJ)

* [Twitter](https://twitter.com/tulkenss)