Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/NTMC-Community/MatchZoo-py
Facilitating the design, comparison and sharing of deep text matching models.
https://github.com/NTMC-Community/MatchZoo-py
deep-learning matching natural-language-processing neural-network pytorch text text-matching
Last synced: about 2 months ago
JSON representation
Facilitating the design, comparison and sharing of deep text matching models.
- Host: GitHub
- URL: https://github.com/NTMC-Community/MatchZoo-py
- Owner: NTMC-Community
- License: apache-2.0
- Created: 2019-06-17T14:15:36.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-05-03T19:50:11.000Z (8 months ago)
- Last Synced: 2024-11-10T14:17:46.580Z (2 months ago)
- Topics: deep-learning, matching, natural-language-processing, neural-network, pytorch, text, text-matching
- Language: Python
- Homepage:
- Size: 598 KB
- Stars: 495
- Watchers: 21
- Forks: 106
- Open Issues: 29
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: CODEOWNERS
Awesome Lists containing this project
- awesome-semantic-search - matchzoo-py
- StarryDivineSky - NTMC-Community/MatchZoo-py
README
# MatchZoo-py [![Tweet](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/intent/tweet?text=MatchZoo-py:%20deep%20learning%20for%20semantic%20matching&url=https://github.com/NTMC-Community/MatchZoo-py)
> PyTorch version of [MatchZoo](https://github.com/NTMC-Community/MatchZoo).
> Facilitating the design, comparison and sharing of deep text matching models.
> MatchZoo 是一个通用的文本匹配工具包,它旨在方便大家快速的实现、比较、以及分享最新的深度文本匹配模型。[![Python 3.6](https://img.shields.io/badge/python-3.6%20%7C%203.7-blue.svg)](https://www.python.org/downloads/release/python-360/)
[![Pypi Downloads](https://img.shields.io/pypi/dm/matchzoo-py.svg?label=pypi)](https://pypi.org/project/MatchZoo-py/)
[![Documentation Status](https://readthedocs.org/projects/matchzoo-py/badge/?version=latest)](https://matchzoo-py.readthedocs.io/en/latest/?badge=latest)
[![Build Status](https://travis-ci.org/NTMC-Community/MatchZoo-py.svg?branch=master)](https://travis-ci.org/NTMC-Community/MatchZoo-py)
[![codecov](https://codecov.io/gh/NTMC-Community/MatchZoo-py/branch/master/graph/badge.svg)](https://codecov.io/gh/NTMC-Community/MatchZoo-py)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Requirements Status](https://requires.io/github/NTMC-Community/MatchZoo-py/requirements.svg?branch=master)](https://requires.io/github/NTMC-Community/MatchZoo-py/requirements/?branch=master)
[![Gitter](https://badges.gitter.im/NTMC-Community/community.svg)](https://gitter.im/NTMC-Community/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge)
---The goal of MatchZoo is to provide a high-quality codebase for deep text matching research, such as document retrieval, question answering, conversational response ranking, and paraphrase identification. With the unified data processing pipeline, simplified model configuration and automatic hyper-parameters tunning features equipped, MatchZoo is flexible and easy to use.
Tasks
Text 1
Text 2
Objective
Paraphrase Indentification
string 1
string 2
classification
Textual Entailment
text
hypothesis
classification
Question Answer
question
answer
classification/ranking
Conversation
dialog
response
classification/ranking
Information Retrieval
query
document
ranking
## Get Started in 60 Seconds
To train a [Deep Semantic Structured Model](https://www.microsoft.com/en-us/research/project/dssm/), make use of MatchZoo customized loss functions and evaluation metrics to define a task:
```python
import torch
import matchzoo as mzranking_task = mz.tasks.Ranking(losses=mz.losses.RankCrossEntropyLoss(num_neg=4))
ranking_task.metrics = [
mz.metrics.NormalizedDiscountedCumulativeGain(k=3),
mz.metrics.MeanAveragePrecision()
]
```Prepare input data:
```python
train_pack = mz.datasets.wiki_qa.load_data('train', task=ranking_task)
valid_pack = mz.datasets.wiki_qa.load_data('dev', task=ranking_task)
```Preprocess your input data in three lines of code, keep track parameters to be passed into the model:
```python
preprocessor = mz.models.ArcI.get_default_preprocessor()
train_processed = preprocessor.fit_transform(train_pack)
valid_processed = preprocessor.transform(valid_pack)
```Generate pair-wise training data on-the-fly:
```python
trainset = mz.dataloader.Dataset(
data_pack=train_processed,
mode='pair',
num_dup=1,
num_neg=4,
batch_size=32
)
validset = mz.dataloader.Dataset(
data_pack=valid_processed,
mode='point',
batch_size=32
)
```Define padding callback and generate data loader:
```python
padding_callback = mz.models.ArcI.get_default_padding_callback()trainloader = mz.dataloader.DataLoader(
dataset=trainset,
stage='train',
callback=padding_callback
)
validloader = mz.dataloader.DataLoader(
dataset=validset,
stage='dev',
callback=padding_callback
)
```Initialize the model, fine-tune the hyper-parameters:
```python
model = mz.models.ArcI()
model.params['task'] = ranking_task
model.params['embedding_output_dim'] = 100
model.params['embedding_input_dim'] = preprocessor.context['embedding_input_dim']
model.guess_and_fill_missing_params()
model.build()
````Trainer` is used to control the training flow:
```python
optimizer = torch.optim.Adam(model.parameters())trainer = mz.trainers.Trainer(
model=model,
optimizer=optimizer,
trainloader=trainloader,
validloader=validloader,
epochs=10
)trainer.run()
```## References
[Tutorials](https://github.com/NTMC-Community/MatchZoo-py/tree/master/tutorials)[English Documentation](https://matchzoo-py.readthedocs.io/en/latest/)
If you're interested in the cutting-edge research progress, please take a look at [awaresome neural models for semantic match](https://github.com/NTMC-Community/awaresome-neural-models-for-semantic-match).
## Install
MatchZoo-py is dependent on [PyTorch](https://pytorch.org). Two ways to install MatchZoo-py:
**Install MatchZoo-py from Pypi:**
```python
pip install matchzoo-py
```**Install MatchZoo-py from the Github source:**
```
git clone https://github.com/NTMC-Community/MatchZoo-py.git
cd MatchZoo-py
python setup.py install
```## Models
- [DRMM](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/drmm.py): this model is an implementation of A Deep Relevance Matching Model for Ad-hoc Retrieval.
- [DRMMTKS](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/drmmtks.py): this model is an implementation of A Deep Top-K Relevance Matching Model for Ad-hoc Retrieval.
- [ARC-I](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/arci.py): this model is an implementation of Convolutional Neural Network Architectures for Matching Natural Language Sentences
- [ARC-II](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/arcii.py): this model is an implementation of Convolutional Neural Network Architectures for Matching Natural Language Sentences
- [DSSM](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/dssm.py): this model is an implementation of Learning Deep Structured Semantic Models for Web Search using Clickthrough Data
- [CDSSM](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/cdssm.py): this model is an implementation of Learning Semantic Representations Using Convolutional Neural Networks for Web Search
- [MatchLSTM](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/matchlstm.py):this model is an implementation of Machine Comprehension Using Match-LSTM and Answer Pointer
- [DUET](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/duet.py): this model is an implementation of Learning to Match Using Local and Distributed Representations of Text for Web Search
- [KNRM](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/knrm.py): this model is an implementation of End-to-End Neural Ad-hoc Ranking with Kernel Pooling
- [ConvKNRM](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/conv_knrm.py): this model is an implementation of Convolutional neural networks for soft-matching n-grams in ad-hoc search
- [ESIM](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/esim.py): this model is an implementation of Enhanced LSTM for Natural Language Inference
- [BiMPM](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/bimpm.py): this model is an implementation of Bilateral Multi-Perspective Matching for Natural Language Sentences
- [MatchPyramid](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/match_pyramid.py): this model is an implementation of Text Matching as Image Recognition
- [Match-SRNN](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/match_srnn.py): this model is an implementation of Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN
- [aNMM](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/anmm.py): this model is an implementation of aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model
- [MV-LSTM](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/mvlstm.py): this model is an implementation of A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations
- [DIIN](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/diin.py): this model is an implementation of Natural Lanuguage Inference Over Interaction Space
- [HBMP](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/hbmp.py): this model is an implementation of Sentence Embeddings in NLI with Iterative Refinement Encoders
- [BERT](https://github.com/NTMC-Community/MatchZoo-py/tree/master/matchzoo/models/bert.py): this model is an implementation of BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding## Citation
If you use MatchZoo in your research, please use the following BibTex entry.
```
@inproceedings{Guo:2019:MLP:3331184.3331403,
author = {Guo, Jiafeng and Fan, Yixing and Ji, Xiang and Cheng, Xueqi},
title = {MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching},
booktitle = {Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
series = {SIGIR'19},
year = {2019},
isbn = {978-1-4503-6172-9},
location = {Paris, France},
pages = {1297--1300},
numpages = {4},
url = {http://doi.acm.org/10.1145/3331184.3331403},
doi = {10.1145/3331184.3331403},
acmid = {3331403},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {matchzoo, neural network, text matching},
}
```## Development Team
Yixing Fan
Core Dev
ASST PROF, ICT
Jiangui Chen
Core Dev
PhD. ICT
Yinqiong Cai
Core Dev
M.S. ICT
Liang Pang
Core Dev
ASST PROF, ICT
Lixin Su
Dev
PhD. ICT
Ruibin Xiong
Dev
M.S. ICT
Yuyang Ding
Dev
M.S. ICT
Junfeng Tian
Dev
M.S. ECNU
Qinghua Wang
Documentation
B.S. Shandong Univ.
## Contribution
Please make sure to read the [Contributing Guide](./CONTRIBUTING.md) before creating a pull request. If you have a MatchZoo-related paper/project/compnent/tool, send a pull request to [this awesome list](https://github.com/NTMC-Community/awaresome-neural-models-for-semantic-match)!
Thank you to all the people who already contributed to MatchZoo!
[Bo Wang](https://github.com/bwanglzu), [Zeyi Wang](https://github.com/uduse), [Liu Yang](https://github.com/yangliuy), [Zizhen Wang](https://github.com/ZizhenWang), [Zhou Yang](https://github.com/zhouzhouyang520), [Jianpeng Hou](https://github.com/HouJP), [Lijuan Chen](https://github.com/githubclj), [Yukun Zheng](https://github.com/zhengyk11), [Niuguo Cheng](https://github.com/niuox), [Dai Zhuyun](https://github.com/AdeDZY), [Aneesh Joshi](https://github.com/aneesh-joshi), [Zeno Gantner](https://github.com/zenogantner), [Kai Huang](https://github.com/hkvision), [stanpcf](https://github.com/stanpcf), [ChangQF](https://github.com/ChangQF), [Mike Kellogg
](https://github.com/wordreference)## Project Organizers
- Jiafeng Guo
* Institute of Computing Technology, Chinese Academy of Sciences
* [Homepage](http://www.bigdatalab.ac.cn/~gjf/)
- Yanyan Lan
* Institute of Computing Technology, Chinese Academy of Sciences
* [Homepage](http://www.bigdatalab.ac.cn/~lanyanyan/)
- Xueqi Cheng
* Institute of Computing Technology, Chinese Academy of Sciences
* [Homepage](http://www.bigdatalab.ac.cn/~cxq/)## License
[Apache-2.0](https://opensource.org/licenses/Apache-2.0)
Copyright (c) 2019-present, Yixing Fan (faneshion)