https://github.com/zhiqwang/sightseq

Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection
https://github.com/zhiqwang/sightseq

attention crnn ctc densenet faster-rcnn image-captioning mobilenet object-detection ocr pytorch scene-texts text-recognition transformer

Last synced: 11 months ago
JSON representation

Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection

Host: GitHub
URL: https://github.com/zhiqwang/sightseq
Owner: zhiqwang
License: mit
Created: 2018-09-11T01:25:31.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2019-11-14T13:05:13.000Z (over 6 years ago)
Last Synced: 2024-11-27T03:35:04.918Z (over 1 year ago)
Topics: attention, crnn, ctc, densenet, faster-rcnn, image-captioning, mobilenet, object-detection, ocr, pytorch, scene-texts, text-recognition, transformer
Language: Python
Homepage:
Size: 203 KB
Stars: 125
Watchers: 11
Forks: 34
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # 🔭sightseq

Now, Let's go **sight**seeing by vision and **seq**uence language multimodal around the deep learning world.

### What's New:

- **July 30, 2019:** Add faster rcnn models. And I rename this repo from *image-captioning* to *sightseq*, this is the last time I rename this repo, I promise.

- **June 11, 2019:** I rewrite the text recognition part base on [fairseq](https://github.com/pytorch/fairseq). Stable version refer to branch [crnn](https://github.com/zhiqwang/image-captioning/tree/crnn), which provides pre-trained model checkpoints. Current branch is work in process. Very pleasure for suggestion and cooperation in the fairseq text recognition project.

### Features:

sightseq provides reference implementations of various deep learning tasks, including:

- **Text Recognition**

  - [Shi et al. (2015), CRNN: An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/abs/1507.05717)

- **Object Detection**

  - **_New_** [Ren et al. (2015), Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497)

**Additionally:**

- All features of fairseq

- Flexible to enable convolution layer, recurrent layer in CRNN

- Positional Encoding of images

# General Requirements and Installation

- [PyTorch](http://pytorch.org/) (There is a [bug](https://github.com/pytorch/pytorch/pull/21244) in [nn.CTCLoss](https://pytorch.org/docs/master/nn.html#ctcloss) which is solved in nightly version)

- Python version >= 3.5

- [Fairseq](https://github.com/pytorch/fairseq) version >= 0.7.1

- [torchvision](https://github.com/pytorch/vision) version >= 0.3.0

- For training new models, you'll also need an NVIDIA GPU and [NCCL](https://github.com/NVIDIA/nccl)

# Pre-trained models and examples

- [text recognition](examples/text_recognition)

- [object detection](examples/object_detection)

# License

sightseq is MIT-licensed.

The license applies to the pre-trained models as well.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zhiqwang/sightseq

Awesome Lists containing this project

README