(논문리뷰) RefineDet (2018) :: 누구나 쉽게, 인공지능

(논문리뷰) RefineDet (2018)

2021. 9. 13. 21:35

Single-Shot Refinement Neural Network for Object Detection

논문링크 : https://arxiv.org/pdf/1711.06897.pdf

Contribution

1-stage detector와 2-stage detector의 장점을 모두 활용
2-stage detector에서 주로 사용되던 아이디어를 1-stage detector에 도입
공개 당시, SOTA달성

Motivation

1-stage detector의 속도를 유지하면서도, 더 높은 정확도를 얻고 싶다.

Core idea

2가지 모듈 + TCB(Transfer Connection Block)

ARM(Anchor Refinement Module) : backbone network에서 추출한 다양한 scale의 feature map을 입력 받아 negative anchor를 제거하고, anchor의 크기와 위치를 대략적으로 조정(refine)하여 ODM에 제공
TCB(Transfer Connection Block) : ARM에서 출력된 feature map을 변환시켜 ODM에 전달, FPN과 비슷하게 multi-scale prediction 역할을 수행, 서로 다른 크기의 피쳐맵 정보를 활용하도록 도와주는 역할
ODM(Object Detection Module) : (TCB에서 전달받은) refined anchor를 기반으로 객체에 대한 정확한 위치와 class label을 예측

ARM(Anchor Refinement Module)

backbone network의 layer에서 feature map을 추출한 후, 해당 layer에 대하여 conv 연산을 추가한 구조
refined된 anchor box의 위치 정보를 담고 있는 feature map과, 해당 anchor box의 foregraound/background label에 대한 정보를 담고 있는 feature map을 생성
two-stage detector에서 사용하는 Region Proposal Network와 같은 기능을 수행

[이미지출처 : https://herbwood.tistory.com/22]

각 피쳐맵의 cell에 사전에 설정한 anchor 박스의 수를 할당
anchor 박스에 positive, negative를 할당하며, ground truth와의 offset을 계산
negetive anchor를 제거하여, positive:negative = 1:3으로 조절

TCB(Transfer Connection Block)

ARM의 각각의 layer로부터 비롯된 feature들을 ODM의 형태에 맞게 변환시켜주는 역할
anchor와 관련된 feature map에 대해서만 적용
서로 다른 scale을 가진 feature map을 upsamling한 후, element-wise하게 더해주는 Feature Pyramid Network와 같은 역할

[이미지출처 : https://herbwood.tistory.com/22]

[feature map 1]을 일련의 conv layer(conv-relu-conv)를 거쳐 feature map의 channel 수를 256으로 맞춤
[feature map 2]는 backbone network의 후속 layer에서 추출한 feature map을 ARM, 그리고 TCB의 conv layer에 입력시켜 얻은 출력값
[feature map 2]는 더 깊은 layer에서 추출했기 때문에 크기가 [feature map 1]에 비해 작음, [feature map 2]에 대하여 deconvoltution 연산을 적용한 후, [feature map 1]과 element-wise 연산
합쳐진 feature map을 conv layer(conv-relu-conv)에 입력하여 얻은 결과물을 ODM으로 전달

ODM(Object Detection Module)

ARM으로부터 refined anchor에 대한 정보를 입력 받아, 객체에 대한 정확한 위치와 class label을 예측하는 역할
ARM에서 얻은 feature map 중 positive/negative 여부에 대한 정보가 담긴 feature map과 TCB에서 전달받은 transfered features을 입력으로 받아, 각각에 대해 conv 연산을 적용하여 최종 prediction을 수행
최종 출력값은 bounding box regressors, class scores

기타

RefineDet은 VGG-16 네트워크에 추가적인 layer를 부착한 형태의 backbone network를 사용

RefineDet의 loss function

저작자표시 비영리 변경금지

'머신러닝_딥러닝 > Object Detection' 카테고리의 다른 글

(논문리뷰) CenterNet (2019) (곧, 작성 예정임) (0)	2021.09.13
(논문리뷰) CornerNet (2018) (0)	2021.09.13
(논문리뷰) Cascade R-CNN (2018) (0)	2021.09.13
(논문리뷰) CoupleNet (2017) (0)	2021.09.13
(논문리뷰) Deformable Convolution Network, DCN (2017) (0)	2021.09.13

+ Recent posts

Powered by Tistory, Designed by wallel

티스토리툴바