https://github.com/brianseong99/thu-ir-hw4

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/brianseong99/thu-ir-hw4
Owner: BrianSeong99
Created: 2020-11-12T02:09:07.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2020-11-15T10:00:52.000Z (over 4 years ago)
Last Synced: 2025-01-01T04:13:52.107Z (5 months ago)
Language: JavaScript
Size: 32 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# IR_system

成镇宇计76 2017080068

## 文件结构梳理

```bash
flaskapp
├── Data # 里面有原始语料库，
│   ├── CORPUS/ # 原始语料库
│   ├── INDEX/ # 建好的索引数据
│   ├── SEGMENTATION/ # thulac处理过的所有分词数据
│   └── SEGMENTATION.txt
├── README.md
├── Report # 报告
│   ├── pics/
│   ├── report.pdf
│   └── report.md
├── THULAC_lite_c++_v1/ # thulac c++ 库
├── config.py # 一些工程的配置文件
├── indexer.py # 索引建立
├── main.py # 主运行文件
├── retriever.py # 信息检索
├── stopwords.txt # 去除stopwords用的stopwords列表
├── word_segmentation.py # python实现的thulac代码
└── templates # 前端代码
   ├── css/
   ├── home.html
   ├── js/
└── result.html

```

## 命令行指令

**想要直接运行flaskapp，就输入`python main.py -a`即可**

```bash
python main.py --help
usage: main.py [-h] [-s] [-i] [-t] [-l LIMIT] [-a]

Parser for Backend Options

optional arguments:
-h, --help show this help message and exit
-s, --segmenting This argument is to choose whether you want to run
segmentation on corpus or not. If specified, the
segmenting will be executed.
-i, --indexing This argument is to choose whether you want to run the
indexing or not. If specified, then indexing will be
executed.
-t, --training This argument is to choose whether you want to process
all data to train the result or not. If specified,
then training of all data will be executed.
-l LIMIT, --limit LIMIT
This argument is to set the limits of both indexing
and segmenting
-a, --startapp This argument is to choose whether you want to
activate RETREIVER and the App or not. If specified,
then Flask app will start along side with Retriever.
```

## 前端使用

在浏览器上前往`localhost:5000/`

### **每次搜索的时候搜索框里一定得有东西！**

1. **无位置约束：**

1. word word 搜索：直接输入即可

```
中国强大
```

2. word/phrase word/phrase搜索：`词语/词性`

```
中国/n 强大/a
```

2. **有位置约束：**

1. word word 搜索 + 位置约束：直接输入即可，在`restriction`输入2-5之间的数值

```
中国强大
2
```

2. word/phrase word/phrase搜索 + 位置约束：`词语/词性`，在`restriction`输入2-5之间的数值

```
中国/n 强大/a
2
```

3. **前四种的混合：**

即搜索词中，有的词可以有phrase也可以没有，位置约束也是可加可以不加。比如

```
中国/n 强大
2
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/brianseong99/thu-ir-hw4

Awesome Lists containing this project

README