XiangBai——【AAAI2017】TextBoxes：A Fast Text Detector with a Single Deep Neural Network

作者和相关链接

作者

论文下载
廖明辉，石葆光，白翔, 王兴刚，刘文予
代码下载

方法概括

文章核心：

改进版的SSD用来解决文字检测问题

端到端识别的pipeline:

Step 1: 图像输入到修改版SSD网络中 + 非极大值抑制（NMS）→ 输出候选检测结果
Step 2: 候选检测结果 + CRNN进行单词识别 → 新的检测结果 + 识别结果

方法的性能

多尺度版本-定位：ICDAR2011-0.85（f）,ICDAR2013-0.85(f)，0.73s/per image
单尺度版本-定位：ICDAR2011-0.80（f）,ICDAR2013-0.80(f)，0.09s/per image

改进的SSD的地方：

default box的长宽比进行修改（长条形），使其更适合文字检测（单词）
作为classifier的卷积滤波器大小从3*3变成1*5，更适合文字检测
SSD原来为多类检测问题，现在转为单类检测问题
从输入图像为单尺度变为多尺度
利用识别来调整检测的结果（text spotting）

创新点和贡献

创新点

把SSD进行修改，使其适用于文字检测（SSD本身对小目标识别不鲁棒）

贡献

提出一个端到端可训练的非常简洁的文字检测框架（SSD本身是single stage的，不像普通方法需要有多步骤组成）
提出一个完整的端到端识别的文字检测+识别框架
实验方法结果好，速度快

方法细节

相关背景——SSD

SSD的网络结构

SSD的default box

Fig. 1: SSD framework. (a) SSD only needs an input image and ground truth boxes for each object during training. In a convolutional fashion, we evaluate a small set (e.g. 4) of default boxes of different aspect ratios at each location in several feature maps with different scales (e.g. 8 × 8 and 4 × 4 in (b) and (c)). For each default box, we predict both the shape offsets and the confidences for all object categories ((c1; c2; · · · ; cp)). At training time, we first match these default boxes to the ground truth boxes. For example, we have matched two default boxes with the cat and one with the dog, which are treated as positives and the rest as negatives. The model loss is a weighted sum between localization loss (e.g. Smooth L1 [6]) and confidence loss (e.g. Softmax).

TextBoxes与SSD网络结构对比

TextBoxes网络结构

SSD 网络结构

Text-box layers的输出

(与SSD一样）

TextBoxes与SSD不同的修改细节

default box长宽比

（右边图）Figure 2: Illustration of default boxes for a 4*4 grid. For better visualization, only a column of default boxes whose aspect ratios 1 and 5 are plotted. The rest of the aspect ratios are 2,3,7 and 10, which are placed similarly. The black (aspect ratio: 5) and blue (ar: 1) default boxes are centered in their cells. The green (ar: 5) and red (ar: 1) boxes have the same aspect ratios and a vertical offset(half of the height of the cell) to the grid center respectively

XiangBai——【AAAI2017】TextBoxes_A Fast Text Detector with a Single Deep Neural Network

XiangBai——【AAAI2017】TextBoxes：A Fast Text Detector with a Single Deep Neural Network

目录

作者和相关链接

方法概括

文章核心：

端到端识别的pipeline:

方法的性能

改进的SSD的地方：

创新点和贡献

创新点

贡献

方法细节

相关背景——文字识别的任务

相关背景——SSD

相关背景——CRNN

TextBoxes与SSD网络结构对比

Text-box layers的输出

(与SSD一样）

TextBoxes与SSD不同的修改细节

default box长宽比

卷积滤波器大小

损失函数

多尺度输入

TextBoxes+CRNN进行识别

实验结果

定位

text spotting和端到端识别

效果展示

总结与收获点