Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/arian-askari/ChatGPT-RetrievalQA

A dataset for training/evaluating Question Answering Retrieval models on ChatGPT responses with the possibility to training/evaluating on real human responses.

ai chatgpt chatgpt-information-retrieval chatgpt-ir data-augmentation dataset deep-learning gpt-3 gpt2 gpt3 information-retrieval information-retrieval-chatgpt ir ir-chatgpt machine-learning nlp openai python sequence-to-sequence text-retrieval

Last synced: 03 Jul 2024

https://github.com/Ryanglambert/3d_model_retriever

Experimenting with a newly published deep learning paper and how it can be used for content-based 3D model retrieval. (info retrieval for CAD)

cad-models capsnets deep-learning information-retrieval neural-network python

Last synced: 01 Jul 2024

https://github.com/ucasir/NPRF

NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval

information-retrieval neural-network pseudo-relevance-feedback

Last synced: 01 Jul 2024

https://github.com/ict-bigdatalab/awesome-pretrained-models-for-information-retrieval

A curated list of awesome papers related to pre-trained models for information retrieval (a.k.a., pretraining for IR).

bert-for-ir dense-retrieval information-retrieval pretrain-for-search pretrained-language-models pretraining-for-ir reranking web-search

Last synced: 25 Jun 2024

https://github.com/huangtinglin/MixGCF

MixGCF: An Improved Training Method for Graph Neural Network-based Recommender Systems, KDD2021

graph-neural-network information-retrieval negative-sampling network-embedding pytorch recommender-system

Last synced: 23 Jun 2024

https://github.com/jingtaozhan/DRhard

SIGIR'21: Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track.

information-retrieval pytorch web-search

Last synced: 23 Jun 2024

https://github.com/princeton-nlp/DensePhrases

[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624

information-retrieval knowledge-base nlp open-domain-qa passage-retrieval slot-filling

Last synced: 22 Jun 2024

http://yomguithereal.github.io/talisman/

Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.

clustering deduplication fuzzy-matching information-retrieval machine-learning natural-language-processing record-linkage

Last synced: 22 Jun 2024

https://github.com/AdeDZY/K-NRM

K-NRM: End-to-End Neural Ad-hoc Ranking with Kernel Pooling

deep-learning information-retrieval neural-network

Last synced: 16 Jun 2024

https://github.com/thunlp/OpenMatch

An Open-Source Package for Information Retrieval.

information-retrieval open-domain-question-answering

Last synced: 16 Jun 2024

https://github.com/capreolus-ir/capreolus

A toolkit for end-to-end neural ad hoc retrieval

deep-learning information-retrieval

Last synced: 16 Jun 2024

https://github.com/allenai/ir_datasets

Provides a common interface to many IR ranking datasets.

dataset information-retrieval ir

Last synced: 16 Jun 2024

https://github.com/hical/HiCAL

HiCAL is a system for efficient high-recall retrieval with an adaptable assessing interface.

active-learning cal document-assessment high-recall information-retrieval machine-learning search-engine test-collection

Last synced: 16 Jun 2024

https://github.com/naver/splade

SPLADE: sparse neural search (SIGIR21, SIGIR22)

bert information-retrieval nlp passage-retrieval sparse splade

Last synced: 16 Jun 2024

https://github.com/PaddlePaddle/RocketQA

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

dense-retrieval information-retrieval nlp question-answering

Last synced: 16 Jun 2024

https://github.com/texttron/tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.

dense-retrieval dpr flax information-retrieval jax pytorch question-answering transformer

Last synced: 16 Jun 2024

https://github.com/allenai/aspire

Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.

document-similarity information-retrieval machine-learning natural-language-processing

Last synced: 16 Jun 2024

https://github.com/shrebox/Personified-Chatbot

A personified chatbot responding to a query based on the answering pattern of Dr. APJ Abdul Kalam using Information Retrieval, Natural Language Processing, and Deep Learning techniques.

apj-abdul-kalam chatbot deep-learning information-retrieval lstm natural-language-processing nlp ranking-algorithm seq2seq-chatbot seq2seq-model summarization word2vec

Last synced: 16 Jun 2024

https://github.com/project-miracl/miracl

A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.

benchmark dataset information-retrieval multilingual

Last synced: 16 Jun 2024

https://github.com/castorini/anserini

Anserini is a Lucene toolkit for reproducible information retrieval research

information-retrieval lucene

Last synced: 16 Jun 2024

https://github.com/terrier-org/terrier-core

Terrier IR Platform

information-retrieval java terrier

Last synced: 16 Jun 2024

https://github.com/mathetake/intergo

A package for interleaving / multileaving ranking generation in go

ab-testing go golang information-retrieval interleaving multileaving ranking ranking-algorithm recommendation-system

Last synced: 16 Jun 2024

https://github.com/YadaYuki/omochi

Full text search engine from scratch by Goʕ◔ϖ◔ʔ (Just a toy) 😊

ddd ent go golang information-retrieval search search-engine

Last synced: 16 Jun 2024

https://github.com/cvangysel/pytrec_eval

pytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.

evaluation information-retrieval

Last synced: 16 Jun 2024

https://github.com/sunnweiwei/rankgpt

[EMNLP 2023 Outstanding Paper Award] Is ChatGPT Good at Search? LLMs as Re-Ranking Agent

chatgpt information-retrieval large-language-models reranking

Last synced: 14 Jun 2024

https://github.com/UKPLab/gpl

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577

bert domain-adaptation information-retrieval nlp transformers vector-search

Last synced: 09 Jun 2024

https://github.com/pisa-engine/pisa

PISA: Performant Indexes and Search for Academia

information-retrieval inverted-index search search-engine

Last synced: 09 Jun 2024

https://github.com/rapidsai/raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

anns building-blocks clustering cuda distance gpu information-retrieval linear-algebra llm machine-learning nearest-neighbors neighborhood-methods primitives random-sampling solvers sparse statistics vector-search vector-similarity vector-store

Last synced: 09 Jun 2024

https://github.com/victordibia/neuralqa

NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

bert-model deep-learning elastic-search information-retrieval natural-language-processing

Last synced: 07 Jun 2024

https://github.com/Yomguithereal/talisman

Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.

clustering deduplication fuzzy-matching information-retrieval machine-learning natural-language-processing record-linkage

Last synced: 03 Jun 2024

https://github.com/AnonCatalyst/Coeus-Framework

Coeus 🌐 is an OSINT framework empowering users with tools for effective intelligence gathering from open sources. From social media monitoring 📱 to data analysis 📊, it offers a centralized platform for seamless OSINT investigations.

data-science data-visualization database forensic-analysis forensics forensics-tools framework information-retrieval infosec osint osint-framework osint-python osint-resources osint-tool osint-toolkit people-search reconnaissance

Last synced: 02 Jun 2024

https://github.com/apache/lucene

Apache Lucene open-source search software

backend information-retrieval java lucene nosql search search-engine

Last synced: 01 Jun 2024

https://github.com/SciPhi-AI/agent-search

AgentSearch is a framework for powering search agents and enabling customizable local search.

artificial-intelligence information-retrieval llms rag retrieval-augmented-generation search search-engine

Last synced: 01 Jun 2024

https://github.com/mindflowai/mindflow

🧠 AI-powered CLI git wrapper, boilerplate code generator, chat history manager, and code search engine to streamline your dev workflow 🌊

chat-gpt cli code-generation command-line-interface dev-tools git git-wrapper information-retrieval large-language-models llm machine-learning modern-dev-tools nlp openai openai-api python search search-engine

Last synced: 31 May 2024

https://github.com/danswer-ai/danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

ai-chat chatgpt enterprise-search gen-ai information-retrieval nextjs python rag

Last synced: 31 May 2024

https://github.com/dheeraj7596/SCDV

Text classification with Sparse Composite Document Vectors.

document-vector emnlp emnlp2017 information-retrieval natural-language-processing text-classification

Last synced: 19 May 2024

https://github.com/arosh/BM25Transformer

(Python) transform a document-term matrix to an Okapi/BM25 representation

information-retrieval machine-learning natural-language-processing python scikit-learn

Last synced: 19 May 2024

https://github.com/castorini/pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

information-retrieval

Last synced: 18 May 2024

https://github.com/weaviate/weaviate

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.

approximate-nearest-neighbor-search generative-search grpc hnsw hybrid-search image-search information-retrieval mlops nearest-neighbor-search neural-search recommender-system search-engine semantic-search semantic-search-engine similarity-search vector-database vector-search vector-search-engine vectors weaviate

Last synced: 16 May 2024

https://github.com/NicholasMamo/multiplex-plot

Multiplex: visualizations that tell stories—A Python library to create and annotate beautiful network graph visualizations, text visualizations and more.

data-science data-visualisation graph-visualization graphs information-retrieval matplotlib natural-language-processing network-visualization python text-mining text-visualisation text-visualization visualisation visualizations viz vizualisation

Last synced: 16 May 2024

https://github.com/rajkumardusad/IP-Tracer

Track any ip address with IP-Tracer. IP-Tracer is developed for Linux and Termux. you can retrieve any ip address information using IP-Tracer.

gnuroot-debian hacking-tool hacking-tools information-gathering information-retrieval ip-geolocation ip-location ip-tracer linux linux-tools termux termux-hacking termux-tool

Last synced: 15 May 2024

https://github.com/apache/lucene-solr

Apache Lucene and Solr open-source search software

backend information-retrieval java lucene nosql search search-engine solr

Last synced: 15 May 2024

https://github.com/felladrin/MiniSearch

Minimalist web-searching app with an AI assistant that runs directly from your browser. Uses Web-LLM, Ratchet-ML, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space

ai artificial-intelligence generative-ai gpu-accelerated information-retrieval llm llm-inference machine-learning nlp question-answering retrieval-augmented-generation search search-engine typescript

Last synced: 14 May 2024

https://github.com/Agrover112/awesome-semantic-search

A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

awesome awesome-list hacktoberfest information-retrieval information-retrival nlp ranking semantic-search semantic-similarity sentence-embeddings

Last synced: 14 May 2024

https://github.com/uutils/platform-info

A cross-platform way to get information about your machine

cross-platform information-retrieval rust uname

Last synced: 14 May 2024

https://github.com/kuutsav/information-retrieval

Neural information retrieval / semantic-search / Bi-Encoders

information-retrieval machine-learning nlp semantic-search

Last synced: 13 May 2024

https://github.com/beir-cellar/beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

ance benchmark bert colbert dataset deep-learning dpr elasticsearch information-retrieval nlp passage-retrieval pytorch question-generation retrieval retrieval-models sbert sentence-transformers use-qa zero-shot-retrieval

Last synced: 13 May 2024

https://github.com/dorianbrown/rank_bm25

A Collection of BM25 Algorithms in Python

algorithm bm25 information-retrieval ranking

Last synced: 13 May 2024

https://github.com/microsoft/rag-experiment-accelerator

The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of conducting experiments and evaluations using Azure Cognitive Search and RAG pattern.

acs azure chunking dense embedding evaluation experiment genai indexing information-retrieval llm openai rag sparse vectors

Last synced: 11 May 2024

https://github.com/JuliaText/WordTokenizers.jl

High performance tokenizers for natural language processing and other related tasks

data-mining information-retrieval lexer nlp tokenization

Last synced: 11 May 2024

https://github.com/MustafaSalih1993/fsi

FSI (Fetch System Info) cli tool written in Rust 🦀

information-retrieval linux neofetch rust rust-lang screenfetch

Last synced: 10 May 2024

https://github.com/momegas/megabots

🤖 State-of-the-art, production ready LLM apps made mega-easy, so you don't have to build them from scratch 🤯 Create a bot, now 🫵

chatbot faiss fastapi gpt-35-turbo gpt-4 information-retrieval langchain llama natural-language-processing nlp pinecone prompt-engineering python question-answering s3

Last synced: 10 May 2024

https://github.com/deepset-ai/haystack

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

ai bert chatgpt generative-ai gpt-3 information-retrieval language-model large-language-models machine-learning nlp python pytorch question-answering semantic-search squad summarization transformers

Last synced: 07 May 2024

https://github.com/KittyKatt/screenFetch

Fetches system/theme information in terminal for Linux desktop screenshots.

bash desktop information-retrieval shell

Last synced: 05 May 2024

https://github.com/JaidedAI/EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

cnn crnn data-mining deep-learning easyocr image-processing information-retrieval lstm machine-learning ocr optical-character-recognition python pytorch scene-text scene-text-recognition

Last synced: 05 May 2024

https://github.com/gaoisbest/NLP-Projects

word2vec, sentence2vec, machine reading comprehension, dialog system, text classification, pretrained language model (i.e., XLNet, BERT, ELMo, GPT), sequence labeling, information retrieval, information extraction (i.e., entity, relation and event extraction), knowledge graph, text generation, network embedding

dialogue-systems information-extraction information-retrieval knowledge-graph machine-reading-comprehension network-embedding pretrained-language-model sentence2vec sequence-labeling text-classification text-generation word2vec

Last synced: 04 May 2024

https://github.com/ashvardanian/StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖

beautifulsoup common-crawl csv dataset html information-retrieval json laion ndjson parser pattern-recognition simd sorting-algorithms string string-manipulation string-matching string-parsing string-search substring

Last synced: 02 May 2024

https://github.com/Anonyfox/elixir-scrape

Scrape any website, article or RSS/Atom Feed with ease!

data-science elixir feed html information-retrieval readability rss scrape scraping

Last synced: 01 May 2024

https://github.com/Phate6660/nixinfo

A lib crate for gathering system info such as cpu, distro, environment, kernel, etc in Rust.

information-retrieval lib linux rust

Last synced: 29 Apr 2024