Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with information-retrieval
A curated list of projects in awesome lists tagged with information-retrieval .
https://github.com/jaidedai/easyocr
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
cnn crnn data-mining deep-learning easyocr image-processing information-retrieval lstm machine-learning ocr optical-character-recognition python pytorch scene-text scene-text-recognition
Last synced: 29 Sep 2024
https://github.com/JaidedAI/EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
cnn crnn data-mining deep-learning easyocr image-processing information-retrieval lstm machine-learning ocr optical-character-recognition python pytorch scene-text scene-text-recognition
Last synced: 30 Jul 2024
https://github.com/deepset-ai/haystack
:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
ai bert chatgpt generative-ai gpt-3 information-retrieval language-model large-language-models llm machine-learning nlp python pytorch question-answering rag retrieval-augmented-generation semantic-search squad summarization transformers
Last synced: 29 Sep 2024
https://github.com/RaRe-Technologies/gensim
Topic Modelling for Humans
data-mining data-science document-similarity fasttext gensim information-retrieval machine-learning natural-language-processing neural-network nlp python topic-modeling word-embeddings word-similarity word2vec
Last synced: 04 Aug 2024
https://github.com/rare-technologies/gensim
Topic Modelling for Humans
data-mining data-science document-similarity fasttext gensim information-retrieval machine-learning natural-language-processing neural-network nlp python topic-modeling word-embeddings word-similarity word2vec
Last synced: 07 Aug 2024
https://github.com/piskvorky/gensim
Topic Modelling for Humans
data-mining data-science document-similarity fasttext gensim information-retrieval machine-learning natural-language-processing neural-network nlp python topic-modeling word-embeddings word-similarity word2vec
Last synced: 29 Sep 2024
https://github.com/danswer-ai/danswer
Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
ai-chat chatgpt enterprise-search gen-ai information-retrieval nextjs python rag
Last synced: 26 Sep 2024
https://github.com/semi-technologies/weaviate
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.
approximate-nearest-neighbor-search generative-search grpc hnsw hybrid-search image-search information-retrieval mlops nearest-neighbor-search neural-search recommender-system search-engine semantic-search semantic-search-engine similarity-search vector-database vector-search vector-search-engine vectors weaviate
Last synced: 06 Aug 2024
https://github.com/weaviate/weaviate
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.
approximate-nearest-neighbor-search generative-search grpc hnsw hybrid-search image-search information-retrieval mlops nearest-neighbor-search neural-search recommender-system search-engine semantic-search semantic-search-engine similarity-search vector-database vector-search vector-search-engine vectors weaviate
Last synced: 29 Sep 2024
https://github.com/infiniflow/ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
chatbot data-pipelines deep-learning document-parser document-understanding information-retrieval llm llmops machine-learning nlp ocr orchestration pdf-to-text preprocessing rag retrieval-augmented-generation table-structure-recognition
Last synced: 25 Sep 2024
https://github.com/neuml/txtai
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
embeddings information-retrieval language-model large-language-models llm machine-learning neural-search nlp python rag retrieval-augmented-generation search search-engine semantic-search sentence-embeddings transformers txtai vector-database vector-search vector-search-engine
Last synced: 29 Sep 2024
https://neuml.github.io/txtai/
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
embeddings information-retrieval language-model large-language-models llm machine-learning neural-search nlp python rag retrieval-augmented-generation search search-engine semantic-search sentence-embeddings transformers txtai vector-database vector-search vector-search-engine
Last synced: 24 Sep 2024
https://github.com/flagopen/flagembedding
Retrieval and Retrieval-augmented LLMs
embeddings information-retrieval llm retrieval-augmented-generation sentence-embeddings text-semantic-similarity
Last synced: 29 Sep 2024
https://github.com/unstructured-io/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
data-pipelines deep-learning document-image-analysis document-image-processing document-parser document-parsing docx donut information-retrieval langchain llm machine-learning ml natural-language-processing nlp ocr pdf pdf-to-json pdf-to-text preprocessing
Last synced: 01 Oct 2024
https://github.com/Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
data-pipelines deep-learning document-image-analysis document-image-processing document-parser document-parsing docx donut information-retrieval langchain llm machine-learning ml natural-language-processing nlp ocr pdf pdf-to-json pdf-to-text preprocessing
Last synced: 31 Jul 2024
https://github.com/FlagOpen/FlagEmbedding
Retrieval and Retrieval-augmented LLMs
embeddings information-retrieval llm retrieval-augmented-generation sentence-embeddings text-semantic-similarity
Last synced: 31 Jul 2024
https://github.com/apache/lucene-solr
Apache Lucene and Solr open-source search software
backend information-retrieval java lucene nosql search search-engine solr
Last synced: 30 Sep 2024
https://github.com/marqo-ai/marqo
Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
chatgpt clip deep-learning gpt hacktoberfest hnsw information-retrieval knn large-language-models machine-learning machinelearning multi-modal natural-language-processing search-engine semantic-search tensor-search transformers vector-search vision-language visual-search
Last synced: 01 Oct 2024
https://github.com/kittykatt/screenfetch
Fetches system/theme information in terminal for Linux desktop screenshots.
bash desktop information-retrieval shell
Last synced: 30 Sep 2024
https://github.com/KittyKatt/screenFetch
Fetches system/theme information in terminal for Linux desktop screenshots.
bash desktop information-retrieval shell
Last synced: 31 Jul 2024
https://github.com/catalyst-team/catalyst
Accelerated deep learning R&D
computer-vision deep-learning distributed-computing image-classification image-processing image-segmentation information-retrieval infrastructure machine-learning metric-learning natural-language-processing object-detection python pytorch recommender-system reinforcement-learning reproducibility research text-classification text-segmentation
Last synced: 01 Oct 2024
https://github.com/llmware-ai/llmware
Providing enterprise-grade LLM-based development framework, tools, and fine-tuned models.
ai bert embedding-vectors embeddings faiss generative-ai information-retrieval large-language-models machine-learning milvus nlp parsing python pytorch question-answering rag retrieval-augmented-generation semantic-search transformers
Last synced: 30 Sep 2024
https://llmware-ai.github.io/llmware/
Providing enterprise-grade LLM-based development framework, tools, and fine-tuned models.
ai bert embedding-vectors embeddings faiss generative-ai information-retrieval large-language-models machine-learning milvus nlp parsing python pytorch question-answering rag retrieval-augmented-generation semantic-search transformers
Last synced: 24 Sep 2024
https://github.com/tensorflow/ranking
Learning to Rank in TensorFlow
deep-learning information-retrieval learning-to-rank machine-learning ranking recommender-systems
Last synced: 29 Sep 2024
https://github.com/naivehobo/invoicenet
Deep neural network to extract intelligent information from invoice documents.
billing classification deep-learning deep-neural-networks deeplearning information-extraction information-retrieval invoice invoice-insight invoice-management invoice-parser invoice-pdf invoice-software invoices keras keras-neural-networks keras-tensorflow
Last synced: 26 Sep 2024
https://github.com/naiveHobo/InvoiceNet
Deep neural network to extract intelligent information from invoice documents.
billing classification deep-learning deep-neural-networks deeplearning information-extraction information-retrieval invoice invoice-insight invoice-management invoice-parser invoice-pdf invoice-software invoices keras keras-neural-networks keras-tensorflow
Last synced: 31 Jul 2024
https://github.com/apache/lucene
Apache Lucene open-source search software
backend information-retrieval java lucene nosql search search-engine
Last synced: 01 Oct 2024
https://langroid.github.io/langroid/
Harness LLMs with Multi-Agent Programming
agents ai chatgpt function-calling gpt gpt-4 gpt4 information-retrieval language-model llama llm llm-agent llm-framework local-llm multi-agent-systems openai-api rag retrieval-augmented-generation
Last synced: 24 Sep 2024
https://github.com/langroid/langroid
Harness LLMs with Multi-Agent Programming
agents ai chatgpt function-calling gpt gpt-4 gpt4 information-retrieval language-model llama llm llm-agent llm-framework local-llm multi-agent-systems openai-api rag retrieval-augmented-generation
Last synced: 27 Sep 2024
https://github.com/rajkumardusad/ip-tracer
Track any ip address with IP-Tracer. IP-Tracer is developed for Linux and Termux. you can retrieve any ip address information using IP-Tracer.
gnuroot-debian hacking-tool hacking-tools information-gathering information-retrieval ip-geolocation ip-location ip-tracer linux linux-tools termux termux-hacking termux-tool
Last synced: 30 Sep 2024
https://github.com/rajkumardusad/IP-Tracer
Track any ip address with IP-Tracer. IP-Tracer is developed for Linux and Termux. you can retrieve any ip address information using IP-Tracer.
gnuroot-debian hacking-tool hacking-tools information-gathering information-retrieval ip-geolocation ip-location ip-tracer linux linux-tools termux termux-hacking termux-tool
Last synced: 01 Aug 2024
https://github.com/xlang-ai/instructor-embedding
[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
embeddings information-retrieval language-model prompt-retrieval text-classification text-clustering text-embedding text-evaluation text-reranking text-semantic-similarity
Last synced: 01 Oct 2024
https://github.com/ashvardanian/StringZilla
Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖
beautifulsoup common-crawl csv dataset html information-retrieval json laion ndjson parser pattern-recognition simd sorting-algorithms string string-manipulation string-matching string-parsing string-search substring
Last synced: 31 Jul 2024
https://github.com/boudinfl/pke
Python Keyphrase Extraction module
computational-linguistics information-retrieval keyphrase keyphrase-extraction keyword keyword-extraction natural-language-processing python
Last synced: 30 Sep 2024
https://github.com/castorini/pyserini
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
Last synced: 01 Aug 2024
https://github.com/embeddings-benchmark/mteb
MTEB: Massive Text Embedding Benchmark
benchmark bitext-mining clustering information-retrieval multilingual-nlp neural-search reranking retrieval sbert semantic-search sentence-transformers sgpt sts text-classification text-embedding
Last synced: 01 Oct 2024
https://github.com/sylphai-inc/adalflow
AdalFlow: The library to build & auto-optimize any LLM tasks.
agent ai bm25 chatbot faiss framework generative-ai information-retrieval llm machine-learning nlp optimizer python question-answering rag reranker retriever summarization trainer
Last synced: 01 Oct 2024
https://github.com/th3unkn0n/TeleGram-Scraper
telegram group scraper tool. fetch all information about group members
information-gathering information-retrieval linux promotion python3 smsbomber telegram telegram-scraper telegram-scraper-bot termux termux-tool
Last synced: 04 Aug 2024
https://github.com/UKPLab/beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
ance benchmark bert colbert dataset deep-learning dpr elasticsearch information-retrieval nlp passage-retrieval pytorch question-generation retrieval retrieval-models sbert sentence-transformers use-qa zero-shot-retrieval
Last synced: 04 Aug 2024
https://github.com/beir-cellar/beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
ance benchmark bert colbert dataset deep-learning dpr elasticsearch information-retrieval nlp passage-retrieval pytorch question-generation retrieval retrieval-models sbert sentence-transformers use-qa zero-shot-retrieval
Last synced: 30 Sep 2024
https://github.com/th3unkn0n/osi.ig
Information Gathering Instagram.
information-gathering information-retrieval instagram instagram-scraper linux osint python python3 scraper termux termux-tool
Last synced: 30 Sep 2024
https://github.com/apache/solr
Apache Solr open-source search software
backend information-retrieval java lucene nosql search search-engine solr
Last synced: 30 Sep 2024
https://github.com/castorini/anserini
Anserini is a Lucene toolkit for reproducible information retrieval research
Last synced: 02 Aug 2024
https://github.com/IntelLabs/fastRAG
Efficient Retrieval Augmentation and Generation Framework
benchmark colbert diffusion generative-ai information-retrieval knowledge-graph llm multi-modal nlp question-answering semantic-search sentence-transformers summarization transformers
Last synced: 31 Jul 2024
https://github.com/brylevkirill/notes
Learn about Machine Learning and Artificial Intelligence
artificial-intelligence bayesian-inference causal-inference deep-learning information-retrieval knowledge-graph knowledge-representation machine-learning natural-language-processing probabilistic-programming question-answering reasoning recommender-systems reinforcement-learning
Last synced: 31 Jul 2024
https://github.com/dorianbrown/rank_bm25
A Collection of BM25 Algorithms in Python
algorithm bm25 information-retrieval ranking
Last synced: 01 Aug 2024
https://github.com/tsinghua-fib-lab/GNN-Recommender-Systems
An index of recommendation algorithms that are based on Graph Neural Networks. (TORS)
gcn gnn graph-convolutional-networks graph-neural-networks graph-representation-learning information-retrieval recommendation recommendation-algorithms recommendation-system recommender-system
Last synced: 02 Aug 2024
https://github.com/edoardottt/scilla
Information Gathering tool - DNS / Subdomains / Ports / Directories enumeration
bugbounty directories-enumeration dns-enumeration enumeration hacking hacking-tool hacktoberfest information-gathering information-retrieval network penetration-testing pentesting port-enumeration portscanner recon reconnaissance security security-tools subdomain-scanner subdomains-enumeration
Last synced: 01 Aug 2024
https://github.com/pisa-engine/pisa
PISA: Performant Indexes and Search for Academia
information-retrieval inverted-index search search-engine
Last synced: 31 Jul 2024
https://github.com/allegro/allRank
allRank is a framework for training learning-to-rank neural models based on PyTorch.
click-model deep-learning information-retrieval learning-to-rank machine-learning ndcg python pytorch ranking transformer
Last synced: 02 Aug 2024
https://github.com/Muennighoff/sgpt
SGPT: GPT Sentence Embeddings for Semantic Search
gpt information-retrieval language-model large-language-models neural-search retrieval semantic-search sentence-embeddings sgpt text-embedding
Last synced: 03 Aug 2024
https://github.com/muennighoff/sgpt
SGPT: GPT Sentence Embeddings for Semantic Search
gpt information-retrieval language-model large-language-models neural-search retrieval semantic-search sentence-embeddings sgpt text-embedding
Last synced: 02 Aug 2024
https://github.com/PaddlePaddle/RocketQA
🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.
dense-retrieval information-retrieval nlp question-answering
Last synced: 02 Aug 2024
https://github.com/ashvardanian/simsimd
Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, and C, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐
arm-neon arm-sve assembly avx2 avx512 blas blas-libraries distance-calculation distance-measures float16 information-retrieval metrics neon numpy scipy simd simd-instructions similarity-measures similarity-search vector-search
Last synced: 30 Sep 2024
https://github.com/ashvardanian/SimSIMD
Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, and C, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐
arm-neon arm-sve assembly avx2 avx512 blas blas-libraries distance-calculation distance-measures float16 information-retrieval metrics neon numpy scipy simd simd-instructions similarity-measures similarity-search vector-search
Last synced: 31 Jul 2024
https://github.com/SylphAI-Inc/AdalFlow
AdalFlow: The Library for LLM Applications
agent ai bm25 chatbot faiss framework generative-ai information-retrieval llm machine-learning nlp python question-answering rag reranker retriever summarization
Last synced: 02 Aug 2024
https://github.com/Yomguithereal/talisman
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
clustering deduplication fuzzy-matching information-retrieval machine-learning natural-language-processing record-linkage
Last synced: 31 Jul 2024
https://github.com/naver/splade
SPLADE: sparse neural search (SIGIR21, SIGIR22)
bert information-retrieval nlp passage-retrieval sparse splade
Last synced: 02 Aug 2024
https://github.com/cdqa-suite/cdQA
⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.
artificial-intelligence bert deep-learning information-retrieval natural-language-processing nlp pytorch question-answering reading-comprehension transformers
Last synced: 02 Aug 2024
https://github.com/rapidsai/raft
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
anns building-blocks clustering cuda distance gpu information-retrieval linear-algebra llm machine-learning nearest-neighbors neighborhood-methods primitives random-sampling solvers sparse statistics vector-search vector-similarity vector-store
Last synced: 01 Aug 2024
https://github.com/princeton-nlp/DensePhrases
[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624
information-retrieval knowledge-base nlp open-domain-qa passage-retrieval slot-filling
Last synced: 01 Aug 2024
https://github.com/sebastian-hofstaetter/teaching
Open-Source Information Retrieval Courses @ TU Wien
course deep-learning dpr education information-retrieval neural-ir remote-teaching search-engine teaching
Last synced: 02 Aug 2024
https://github.com/kreeben/resin
Vector space index based search engine that's available as a HTTP service or as an embedded library.
information-retrieval language-model machine-learning nlu nlu-engine resin search search-algorithms search-engine vector-space vector-space-model
Last synced: 31 Jul 2024
https://github.com/airalcorn2/Deep-Semantic-Similarity-Model
My Keras implementation of the Deep Semantic Similarity Model (DSSM)/Convolutional Latent Semantic Model (CLSM) described here: http://research.microsoft.com/pubs/226585/cikm2014_cdssm_final.pdf.
deep-learning information-retrieval keras natural-language-processing nlp
Last synced: 07 Aug 2024
https://github.com/gaoisbest/NLP-Projects
word2vec, sentence2vec, machine reading comprehension, dialog system, text classification, pretrained language model (i.e., XLNet, BERT, ELMo, GPT), sequence labeling, information retrieval, information extraction (i.e., entity, relation and event extraction), knowledge graph, text generation, network embedding
dialogue-systems information-extraction information-retrieval knowledge-graph machine-reading-comprehension network-embedding pretrained-language-model sentence2vec sequence-labeling text-classification text-generation word2vec
Last synced: 01 Aug 2024
https://github.com/eBay/Sequence-Semantic-Embedding
Tools and recipes to train deep learning models and build services for NLP tasks such as text classification, semantic search ranking and recall fetching, cross-lingual information retrieval, and question answering etc.
classification-task information-retrieval nlp-tasks sse-embeddings text-classification
Last synced: 07 Aug 2024
https://github.com/sunnweiwei/rankgpt
Is ChatGPT Good at Search? LLMs as Re-Ranking Agent [EMNLP 2023 Outstanding Paper Award]
chatgpt information-retrieval large-language-models reranking
Last synced: 02 Aug 2024
https://github.com/seanlee97/angle
Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
dense-retrieval embeddings information-retrieval llama llama2 llm mteb rag retrieval-augmented-generation semantic-similarity semantic-textual-similarity sentence-embedding sentence-embeddings sentence-vector sts stsbenchmark text-embedding text-similarity text-vector text2vec
Last synced: 27 Sep 2024
https://github.com/thunlp/OpenMatch
An Open-Source Package for Information Retrieval.
information-retrieval open-domain-question-answering
Last synced: 02 Aug 2024
https://github.com/USCDataScience/sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
big-data distributed-systems information-retrieval nutch search search-engine solr spark tika web-crawler
Last synced: 31 Jul 2024
https://github.com/IntelLabs/RAGFoundry
Framework for enhancing LLMs for RAG tasks using fine-tuning.
evaluation fine-tuning information-retrieval llm nlp question-answering rag semantic-search
Last synced: 21 Aug 2024
https://github.com/SciPhi-AI/agent-search
AgentSearch is a framework for powering search agents and enabling customizable local search.
artificial-intelligence information-retrieval llms rag retrieval-augmented-generation search search-engine
Last synced: 31 Jul 2024
https://github.com/AmenRa/ranx
⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
comparison data-fusion evaluation evaluation-metrics information-retrieval information-retrieval-evaluation information-retrieval-metrics metasearch numba python rank-fusion ranking-metrics recommender-systems score-fusion
Last synced: 01 Aug 2024
https://github.com/Aquila-Network/aquila
An easy to use Neural Search Engine. Index latent vectors along with JSON metadata and do efficient k-NN search.
approximate-nearest-neighbor-search aquila embedding faiss feature-vectors image-search information-retrieval information-retrieval-engine knn-search nearest-neighbor-search neural-information-retrieval neural-search retrieval search-engine similarity-search similarity-searches vector-database video-search
Last synced: 01 Aug 2024
https://github.com/texttron/tevatron
Tevatron - A flexible toolkit for neural retrieval research and development.
dense-retrieval dpr flax information-retrieval jax pytorch question-answering transformer
Last synced: 02 Aug 2024
https://github.com/RUC-NLPIR/LLM4IR-Survey
This is the repo for the survey of LLM4IR.
information-retrieval large-language-models survey
Last synced: 03 Aug 2024
https://github.com/franccesco/getaltname
Extract subdomains from SSL certificates in HTTPS sites.
certificates discovery dns extract-subdomains https information-retrieval infosec pentest pentest-scripts pentest-tool pentesting ssl ssl-certificate ssl-certificates subdomain tool
Last synced: 01 Aug 2024
https://github.com/edoardottt/csprecon
Discover new target domains using Content Security Policy
bounty-hunting bugbounty bugbounty-tool content-security-policy csp golang hacking hacktoberfest information-retrieval offensive-security offensivesecurity recon recon-tool reconnaissance security security-tools
Last synced: 01 Aug 2024
https://github.com/momegas/megabots
🤖 State-of-the-art, production ready LLM apps made mega-easy, so you don't have to build them from scratch 🤯 Create a bot, now 🫵
chatbot faiss fastapi gpt-35-turbo gpt-4 information-retrieval langchain llama natural-language-processing nlp pinecone prompt-engineering python question-answering s3
Last synced: 01 Oct 2024
https://github.com/approach0/search-engine
A math-aware search engine.
fulltext-search information-retrieval math-search mathematics search-engine
Last synced: 02 Aug 2024
https://github.com/relari-ai/continuous-eval
Open-Source Evaluation for GenAI Application Pipelines
evaluation-framework evaluation-metrics information-retrieval llm-evaluation llmops rag retrieval-augmented-generation
Last synced: 01 Aug 2024
https://github.com/Anonyfox/elixir-scrape
Scrape any website, article or RSS/Atom Feed with ease!
data-science elixir feed html information-retrieval readability rss scrape scraping
Last synced: 01 Aug 2024
https://github.com/UKPLab/gpl
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
bert domain-adaptation information-retrieval nlp transformers vector-search
Last synced: 05 Aug 2024
https://github.com/allenai/ir_datasets
Provides a common interface to many IR ranking datasets.
dataset information-retrieval ir
Last synced: 02 Aug 2024
https://github.com/artitw/text2text
Text2Text: Crosslingual NLP/G toolkit
backtranslation chatgpt cross-lingual embeddings information-retrieval levenshtein-distance llama llm multi-lingual natural-language-generation natural-language-processing nlp question-answering question-generation search summarization tf-idf tokenizer transformers translator
Last synced: 27 Sep 2024
https://github.com/cvangysel/pytrec_eval
pytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.
evaluation information-retrieval
Last synced: 02 Aug 2024
https://github.com/aryn-ai/sycamore
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
ai dataprep etl information-retrieval llm ml nlp opensearch search semantic-search
Last synced: 18 Aug 2024
https://github.com/apache/solr-operator
Official Kubernetes operator for Apache Solr
aks aws azure controller crd eks gcp gke golang information-retrieval kubernetes lucene nosql operator search search-engine solr
Last synced: 30 Sep 2024
https://github.com/terrier-org/terrier-core
Terrier IR Platform
information-retrieval java terrier
Last synced: 02 Aug 2024
https://github.com/victordibia/neuralqa
NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT
bert-model deep-learning elastic-search information-retrieval natural-language-processing
Last synced: 31 Jul 2024
https://github.com/lgalke/vec4ir
Word Embeddings for Information Retrieval
data-science embedding-models embeddings evaluation information-retrieval natural-language-processing nlp retrieval-model similarity-scoring word-embeddings
Last synced: 02 Aug 2024
https://github.com/P3GLEG/PwnBack
Burp Extender plugin that generates a sitemap of a website using Wayback Machine
burp burp-extensions burpsuite information-retrieval osint security-tools
Last synced: 02 Aug 2024
https://github.com/maxent-ai/ocrpy
OCR, Archive, Index and Search: Implementation agnostic OCR framework.
aws azure computer-vision cv deep-learning google-vision-api image-processing information-retrieval nlp ocr ocr-python python semantic-search tesseract-ocr transformers
Last synced: 02 Aug 2024
https://github.com/mindflowai/mindflow
🧠 AI-powered CLI git wrapper, boilerplate code generator, chat history manager, and code search engine to streamline your dev workflow 🌊
chat-gpt cli code-generation command-line-interface dev-tools git git-wrapper information-retrieval large-language-models llm machine-learning modern-dev-tools nlp openai openai-api python search search-engine
Last synced: 31 Jul 2024
https://github.com/jina-ai/annlite
⚡ A fast embedded library for approximate nearest neighbor search
approximate-nearest-neighbor-search cython hnsw image-search information-retrieval neural-search product-quantization vector-quantization vector-search
Last synced: 01 Aug 2024
https://github.com/huangtinglin/Knowledge_Graph_based_Intent_Network
Learning Intents behind Interactions with Knowledge Graph for Recommendation, WWW2021
explainable-recommendation graph-neural-network information-retrieval knowledge-graph knowledge-graph-for-recommendation pytorch recommendation-system www2021
Last synced: 08 Aug 2024
https://github.com/AdeDZY/K-NRM
K-NRM: End-to-End Neural Ad-hoc Ranking with Kernel Pooling
deep-learning information-retrieval neural-network
Last synced: 02 Aug 2024
https://github.com/xei/recommender-system-tutorial
A step-by-step tutorial on developing a practical recommendation system (retrieval and ranking) using TensorFlow Recommenders and Keras.
information-retrieval keras keras-tutorials movielens movielens-movie-recommendation ranking ranking-system recommendation-engine recommendation-system recommender-system tensorflow tensorflow-datasets tensorflow-examples tensorflow-recommenders tensorflow-serving tensorflow-tutorials
Last synced: 26 Sep 2024
https://github.com/AmenRa/retriv
A Python Search Engine for Humans 🥸
bm25 dense-retrieval hybrid-retrieval information-retrieval numba search search-engine search-engine-optimization semantic-search sparse-retrieval tf-idf
Last synced: 17 Aug 2024