Elsevier

Neurocomputing

Volume 494, 14 July 2022, Pages 455-467
Neurocomputing

TranSalNet: Towards perceptually relevant visual saliency prediction

https://doi.org/10.1016/j.neucom.2022.04.080Get rights and content
Under a Creative Commons license
open access

Abstract

Convolutional neural networks (CNNs) have significantly advanced computational modelling for saliency prediction. However, accurately simulating the mechanisms of visual attention in the human cortex remains an academic challenge. It is critical to integrate properties of human vision into the design of CNN architectures, leading to perceptually more relevant saliency prediction. Due to the inherent inductive biases of CNN architectures, there is a lack of sufficient long-range contextual encoding capacity. This hinders CNN-based saliency models from capturing properties that emulate viewing behaviour of humans. Transformers have shown great potential in encoding long-range information by leveraging the self-attention mechanism. In this paper, we propose a novel saliency model that integrates transformer components to CNNs to capture the long-range contextual visual information. Experimental results show that the transformers provide added value to saliency prediction, enhancing its perceptual relevance in the performance. Our proposed saliency model using transformers has achieved superior results on public benchmarks and competitions for saliency prediction models.

The source code of our proposed saliency model TranSalNet is available at:  https://github.com/LJOVO/TranSalNet.

Keywords

Saliency prediction
Deep learning
Transformer
Convolutional neural network

Cited by (0)

Jianxun Lou received the B.Eng. from Central South University, Changsha, China, in 2018 and the M.S. degree from Cardiff University, Cardiff, UK, in 2020. He is now pursuing his Ph.D. degree at the School of Computer Science and Informatics, Cardiff University, Cardiff, UK.

Hanhe Lin received his Ph.D. at the Department of Information Science, University of Otago, New Zealand in 2016. From 2016 to 2021, he was a postdoc at the Department of Computer and Information Science at the University of Konstanz, Germany, where he was working on project A05 (visual quality assessment) of SFB-TRR 161, funded by the German Research Foundation (DFG). Currently, he is a research fellow at the National Subsea Centre at Robert Gordon University, UK. His research interests include image processing, computer vision, machine learning, deep learning, and visual quality assessment.

Prof. David Marshall has been working in the field of computer vision since 1986. In 1989 he joined Cardiff University as lecturer and is now Professor of Computer Vision in the School of Computer Science and Informatics. David’s research interests include articulated modelling of human faces, models of human motion, statistical modelling, high dimensional subspace analysis, audio/video image processing, and data/sensor fusion. He has published over 150 papers and one book in these research areas and has attracted over £4 M in research funding over his academic career. He is currently Head of the Visual Computing Research Group and Director of the Human Factors Technology Centre. http://users.cs.cf.ac.uk/Dave.Marshall/

Dietmar Saupe was born in Bremen, Germany, in 1954. He received the Dr.rer.nat. degree in mathematics from the University of Bremen, Germany, in 1982. From 1985 to 1993, he was an Assistant Professor with the Departments of Mathematics, first at the University of California, Santa Cruz, USA, and then at the University of Bremen, resulting in his habilitation. From 1993 to 1998, he was a Professor of computer science with the University of Freiburg, Germany, the University of Leipzig, Germany, until 2002, and since then, the University of Konstanz, Germany. He is the coauthor of the book Chaos and Fractals, which won the Association of American Publishers Award for Best Mathematics Book of the Year, in 1992, and well over 100 research articles. His research interests include image and video processing, computer graphics, scientific visualisation, dynamical systems, and sport informatics.

Hantao Liu received the Ph.D. degree from the Delft University of Technology, Delft, The Netherlands in 2011. He is currently an Associate Professor with the School of Computer Science and Informatics, Cardiff University, Cardiff, U.K. He is an Associate Editor of IEEE Transactions on Circuits and Systems for Video Technology and IEEE Signal Processing Letters.