This Artificial Intelligence Paper Proposes ‘SuperGlue,’ A Graph Neural Network That Simultaneously Performs Context Aggregation, Matching, And Filtering of Local Features for Wide-Baseline Pose Estimation

Imagine you have two images of the same scene taken from different angles. Most of the objects in both pictures are the same, only you are looking at them from different angles. In computer vision, objects are assumed to have certain properties such as edges, corners, etc. Matching these characteristics is critical for some applications. But what would it take to match features between two images?

Finding correspondence between images is the prerequisite for estimating 3D structure and camera poses in computer vision tasks such as simultaneous localization and mapping (SLAM) and structure-from-motion (SfM). This is done by matching local features, and it is difficult to achieve due to the changes in lighting conditions, occlusion, blurring, etc.

Traditionally, feature matching is done using a two-step approach. First, the front-end step extracts visual features from the images. Second, the back end step applies bundle adjustment and pose estimation to help match extracted visual features. Once these are done, the features are ready, and the feature matching is modeled as a linear assignment problem.

Also Read :  FriMi by Nations Trust Bank signs on as Official Digital Banking Partner for FITIS Sri Lanka Internet Day 2022 - Adaderana Biz English

As in all other domains, deep neural networks have played a crucial role in recent years in feature matching problems. They have been used to learn better sparse detectors and local descriptors from data using convolutional neural networks (CNNs).

However, they were usually a component in the feature matching problem, not an end-to-end solution. What if a single neural network could perform context aggregation, matching and filtering in a single architecture? Time to introduce the SuperGlue.

SuperGlue approaches present compatibility problems in a different way. It learns the matching process of pre-existing local features using a graph neural network structure. This replaces the existing approaches where first, the task-agnostic features are learned, and they are matched using heuristics and simple methods. Being an end-to-end approach gives SuperGlue a strong advantage over existing methods. SuperGlue is learnable middle end this could be used to improve existing approaches.

Also Read :  An optical chip that can train machine learning hardware

So how does SuperGlue achieve this? It peaks in a new window and views the feature matching problem as a partial assignment between two sets of local features. Instead of solving a linear allocation problem to match properties, it treats it as an optimal transportation problem. SuperGlue uses a graph neural network (GNN) that predicts the cost function of this transport optimization.

We all know how transformers have achieved massive success in natural language processing and, more recently, computer vision tasks. SuperGlue uses a transformer to leverage both spatial relationships of key points and their visual aspects.

SuperGlue is trained in an end-to-end manner. Image pairs are used as training data. Priors for pose estimation are learned from a large labeled dataset; therefore, SuperGlue can have an understanding of the 3D scene.

SuperGlue can be applied to multiple problems where high quality feature matching is required for multiview geometry. It runs in real time on commodity hardware and can be applied for both classical and learned functions. You can find more information about SuperGlue at the links below.

Also Read :  Was Ronald Hua’s QTRON Investments Right About These 10 Stocks?

Look at the paper, project, and code. All Credits For This Research Go To Researchers On This Project. Also, don’t forget to sign up our Reddit page and discordant channelwhere we share the latest news on AI research, cool AI projects and more.

Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Turkey. He wrote his M.Sc. thesis on image denoising using deep convolutional networks. He is currently pursuing a Ph.D. degree at the University of Klagenfurt, Austria, and working as a researcher on the ATHENA project. His research interests include deep learning, computer vision, and multimedia networking.


Leave a Reply

Your email address will not be published.

Related Articles

Back to top button