Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with information-retrieval

A curated list of projects in awesome lists tagged with information-retrieval .

https://github.com/jaidedai/easyocr

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

cnn crnn data-mining deep-learning easyocr image-processing information-retrieval lstm machine-learning ocr optical-character-recognition python pytorch scene-text scene-text-recognition

Last synced: 29 Sep 2024

https://github.com/JaidedAI/EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

cnn crnn data-mining deep-learning easyocr image-processing information-retrieval lstm machine-learning ocr optical-character-recognition python pytorch scene-text scene-text-recognition

Last synced: 30 Jul 2024

https://github.com/deepset-ai/haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

ai bert chatgpt generative-ai gpt-3 information-retrieval language-model large-language-models llm machine-learning nlp python pytorch question-answering rag retrieval-augmented-generation semantic-search squad summarization transformers

Last synced: 29 Sep 2024

https://github.com/danswer-ai/danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

ai-chat chatgpt enterprise-search gen-ai information-retrieval nextjs python rag

Last synced: 26 Sep 2024

https://github.com/semi-technologies/weaviate

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.

approximate-nearest-neighbor-search generative-search grpc hnsw hybrid-search image-search information-retrieval mlops nearest-neighbor-search neural-search recommender-system search-engine semantic-search semantic-search-engine similarity-search vector-database vector-search vector-search-engine vectors weaviate

Last synced: 06 Aug 2024

https://github.com/weaviate/weaviate

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.

approximate-nearest-neighbor-search generative-search grpc hnsw hybrid-search image-search information-retrieval mlops nearest-neighbor-search neural-search recommender-system search-engine semantic-search semantic-search-engine similarity-search vector-database vector-search vector-search-engine vectors weaviate

Last synced: 29 Sep 2024

https://github.com/apache/lucene-solr

Apache Lucene and Solr open-source search software

backend information-retrieval java lucene nosql search search-engine solr

Last synced: 30 Sep 2024

https://github.com/kittykatt/screenfetch

Fetches system/theme information in terminal for Linux desktop screenshots.

bash desktop information-retrieval shell

Last synced: 30 Sep 2024

https://github.com/KittyKatt/screenFetch

Fetches system/theme information in terminal for Linux desktop screenshots.

bash desktop information-retrieval shell

Last synced: 31 Jul 2024

https://github.com/apache/lucene

Apache Lucene open-source search software

backend information-retrieval java lucene nosql search search-engine

Last synced: 01 Oct 2024

https://github.com/rajkumardusad/ip-tracer

Track any ip address with IP-Tracer. IP-Tracer is developed for Linux and Termux. you can retrieve any ip address information using IP-Tracer.

gnuroot-debian hacking-tool hacking-tools information-gathering information-retrieval ip-geolocation ip-location ip-tracer linux linux-tools termux termux-hacking termux-tool

Last synced: 30 Sep 2024

https://github.com/rajkumardusad/IP-Tracer

Track any ip address with IP-Tracer. IP-Tracer is developed for Linux and Termux. you can retrieve any ip address information using IP-Tracer.

gnuroot-debian hacking-tool hacking-tools information-gathering information-retrieval ip-geolocation ip-location ip-tracer linux linux-tools termux termux-hacking termux-tool

Last synced: 01 Aug 2024

https://github.com/ashvardanian/StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖

beautifulsoup common-crawl csv dataset html information-retrieval json laion ndjson parser pattern-recognition simd sorting-algorithms string string-manipulation string-matching string-parsing string-search substring

Last synced: 31 Jul 2024

https://github.com/castorini/pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

information-retrieval

Last synced: 01 Aug 2024

https://github.com/UKPLab/beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

ance benchmark bert colbert dataset deep-learning dpr elasticsearch information-retrieval nlp passage-retrieval pytorch question-generation retrieval retrieval-models sbert sentence-transformers use-qa zero-shot-retrieval

Last synced: 04 Aug 2024

https://github.com/beir-cellar/beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

ance benchmark bert colbert dataset deep-learning dpr elasticsearch information-retrieval nlp passage-retrieval pytorch question-generation retrieval retrieval-models sbert sentence-transformers use-qa zero-shot-retrieval

Last synced: 30 Sep 2024

https://github.com/apache/solr

Apache Solr open-source search software

backend information-retrieval java lucene nosql search search-engine solr

Last synced: 30 Sep 2024

https://github.com/castorini/anserini

Anserini is a Lucene toolkit for reproducible information retrieval research

information-retrieval lucene

Last synced: 02 Aug 2024

https://github.com/dorianbrown/rank_bm25

A Collection of BM25 Algorithms in Python

algorithm bm25 information-retrieval ranking

Last synced: 01 Aug 2024

https://github.com/pisa-engine/pisa

PISA: Performant Indexes and Search for Academia

information-retrieval inverted-index search search-engine

Last synced: 31 Jul 2024

https://github.com/allegro/allRank

allRank is a framework for training learning-to-rank neural models based on PyTorch.

click-model deep-learning information-retrieval learning-to-rank machine-learning ndcg python pytorch ranking transformer

Last synced: 02 Aug 2024

https://github.com/PaddlePaddle/RocketQA

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

dense-retrieval information-retrieval nlp question-answering

Last synced: 02 Aug 2024

https://github.com/ashvardanian/simsimd

Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, and C, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐

arm-neon arm-sve assembly avx2 avx512 blas blas-libraries distance-calculation distance-measures float16 information-retrieval metrics neon numpy scipy simd simd-instructions similarity-measures similarity-search vector-search

Last synced: 30 Sep 2024

https://github.com/ashvardanian/SimSIMD

Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, and C, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐

arm-neon arm-sve assembly avx2 avx512 blas blas-libraries distance-calculation distance-measures float16 information-retrieval metrics neon numpy scipy simd simd-instructions similarity-measures similarity-search vector-search

Last synced: 31 Jul 2024

https://github.com/Yomguithereal/talisman

Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.

clustering deduplication fuzzy-matching information-retrieval machine-learning natural-language-processing record-linkage

Last synced: 31 Jul 2024

https://github.com/naver/splade

SPLADE: sparse neural search (SIGIR21, SIGIR22)

bert information-retrieval nlp passage-retrieval sparse splade

Last synced: 02 Aug 2024

https://github.com/rapidsai/raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

anns building-blocks clustering cuda distance gpu information-retrieval linear-algebra llm machine-learning nearest-neighbors neighborhood-methods primitives random-sampling solvers sparse statistics vector-search vector-similarity vector-store

Last synced: 01 Aug 2024

https://github.com/princeton-nlp/DensePhrases

[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624

information-retrieval knowledge-base nlp open-domain-qa passage-retrieval slot-filling

Last synced: 01 Aug 2024

https://github.com/kreeben/resin

Vector space index based search engine that's available as a HTTP service or as an embedded library.

information-retrieval language-model machine-learning nlu nlu-engine resin search search-algorithms search-engine vector-space vector-space-model

Last synced: 31 Jul 2024

https://github.com/airalcorn2/Deep-Semantic-Similarity-Model

My Keras implementation of the Deep Semantic Similarity Model (DSSM)/Convolutional Latent Semantic Model (CLSM) described here: http://research.microsoft.com/pubs/226585/cikm2014_cdssm_final.pdf.

deep-learning information-retrieval keras natural-language-processing nlp

Last synced: 07 Aug 2024

https://github.com/gaoisbest/NLP-Projects

word2vec, sentence2vec, machine reading comprehension, dialog system, text classification, pretrained language model (i.e., XLNet, BERT, ELMo, GPT), sequence labeling, information retrieval, information extraction (i.e., entity, relation and event extraction), knowledge graph, text generation, network embedding

dialogue-systems information-extraction information-retrieval knowledge-graph machine-reading-comprehension network-embedding pretrained-language-model sentence2vec sequence-labeling text-classification text-generation word2vec

Last synced: 01 Aug 2024

https://github.com/eBay/Sequence-Semantic-Embedding

Tools and recipes to train deep learning models and build services for NLP tasks such as text classification, semantic search ranking and recall fetching, cross-lingual information retrieval, and question answering etc.

classification-task information-retrieval nlp-tasks sse-embeddings text-classification

Last synced: 07 Aug 2024

https://github.com/sunnweiwei/rankgpt

Is ChatGPT Good at Search? LLMs as Re-Ranking Agent [EMNLP 2023 Outstanding Paper Award]

chatgpt information-retrieval large-language-models reranking

Last synced: 02 Aug 2024

https://github.com/thunlp/OpenMatch

An Open-Source Package for Information Retrieval.

information-retrieval open-domain-question-answering

Last synced: 02 Aug 2024

https://github.com/USCDataScience/sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

big-data distributed-systems information-retrieval nutch search search-engine solr spark tika web-crawler

Last synced: 31 Jul 2024

https://github.com/IntelLabs/RAGFoundry

Framework for enhancing LLMs for RAG tasks using fine-tuning.

evaluation fine-tuning information-retrieval llm nlp question-answering rag semantic-search

Last synced: 21 Aug 2024

https://github.com/SciPhi-AI/agent-search

AgentSearch is a framework for powering search agents and enabling customizable local search.

artificial-intelligence information-retrieval llms rag retrieval-augmented-generation search search-engine

Last synced: 31 Jul 2024

https://github.com/texttron/tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.

dense-retrieval dpr flax information-retrieval jax pytorch question-answering transformer

Last synced: 02 Aug 2024

https://github.com/RUC-NLPIR/LLM4IR-Survey

This is the repo for the survey of LLM4IR.

information-retrieval large-language-models survey

Last synced: 03 Aug 2024

https://github.com/momegas/megabots

🤖 State-of-the-art, production ready LLM apps made mega-easy, so you don't have to build them from scratch 🤯 Create a bot, now 🫵

chatbot faiss fastapi gpt-35-turbo gpt-4 information-retrieval langchain llama natural-language-processing nlp pinecone prompt-engineering python question-answering s3

Last synced: 01 Oct 2024

https://github.com/Anonyfox/elixir-scrape

Scrape any website, article or RSS/Atom Feed with ease!

data-science elixir feed html information-retrieval readability rss scrape scraping

Last synced: 01 Aug 2024

https://github.com/UKPLab/gpl

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577

bert domain-adaptation information-retrieval nlp transformers vector-search

Last synced: 05 Aug 2024

https://github.com/allenai/ir_datasets

Provides a common interface to many IR ranking datasets.

dataset information-retrieval ir

Last synced: 02 Aug 2024

https://github.com/cvangysel/pytrec_eval

pytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.

evaluation information-retrieval

Last synced: 02 Aug 2024

https://github.com/aryn-ai/sycamore

🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.

ai dataprep etl information-retrieval llm ml nlp opensearch search semantic-search

Last synced: 18 Aug 2024

https://github.com/victordibia/neuralqa

NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

bert-model deep-learning elastic-search information-retrieval natural-language-processing

Last synced: 31 Jul 2024

https://github.com/P3GLEG/PwnBack

Burp Extender plugin that generates a sitemap of a website using Wayback Machine

burp burp-extensions burpsuite information-retrieval osint security-tools

Last synced: 02 Aug 2024

https://github.com/mindflowai/mindflow

🧠 AI-powered CLI git wrapper, boilerplate code generator, chat history manager, and code search engine to streamline your dev workflow 🌊

chat-gpt cli code-generation command-line-interface dev-tools git git-wrapper information-retrieval large-language-models llm machine-learning modern-dev-tools nlp openai openai-api python search search-engine

Last synced: 31 Jul 2024

https://github.com/AdeDZY/K-NRM

K-NRM: End-to-End Neural Ad-hoc Ranking with Kernel Pooling

deep-learning information-retrieval neural-network

Last synced: 02 Aug 2024