eprintid: 15609 rev_number: 2 eprint_status: archive userid: 1 dir: disk0/00/01/56/09 datestamp: 2023-11-10 03:30:14 lastmod: 2023-11-10 03:30:14 status_changed: 2023-11-10 01:59:54 type: article metadata_visibility: show creators_name: Usmani, U.A. creators_name: Watada, J. creators_name: Jaafar, J. creators_name: Aziz, I.A. creators_name: Roy, A. title: A Reinforcement Learning Based Adaptive ROI Generation for Video Object Segmentation ispublished: pub keywords: Deep learning; Feature extraction; Image segmentation; Motion analysis; Motion compensation; Object detection; Tracking (position), Computational modelling; Correlation; Features extraction; Model Adaptation; Motion segmentation; Object Tracking; Objects detection; Objects segmentation; Reinforcement learnings; Video objects segmentations, Reinforcement learning note: cited By 6 abstract: Video object segmentation's primary goal is to automatically extract the principal object(s) in the foreground from the background in videos. The primary focus of the current deep learning-based models is to learn the discriminative representations in the foreground over motion and appearance in small-term temporal segments. In the video segmentation process, it is difficult to handle various challenges such as deformation, scale variation, motion blur, and occlusion. Furthermore, relocating the segmentation target in the next frame is difficult if it is lost in the current frame during the segmentation process. This work aims at solving the zero-shot video object segmentation issue in a holistic fashion. We take advantage of the inherent correlations between the video frames by incorporating a global co-attention mechanism to overcome the limitations. We propose a novel reinforcement learning framework that provides competent and fast stages for gathering scene context and global correlations. The agent concurrently calculates and adds the responses of co-attention in the joint feature space. To capture the different aspects of the common feature space, the agent can generate multiple co-attention versions. Our framework is trained using pairs (or groups) of video frames, which adds to the training content, thus increasing the learning capacity. Our approach encodes the important information during the segmentation phase by a simultaneous process of various reference frames that are subsequently utilized to predict the persistent and conspicuous objects in the foreground. The proposed method has been validated using four commonly used video entity segmentation datasets: SegTrack V2, DAVIS 2016, CdNet 2014, and the Youtube-Object dataset. On the DAVIS 2016, the results reveal that the proposed results boost the state-of-the-art techniques on the F1 Measure by 4, SegTrack V2 by a Jaccard Index of 12.03, and Youtube Object by a Jaccard Index of 13.11. Meanwhile, our algorithm improves the accuracy by 8, F1 Measure by 12.25 , and precision by 14 on the CdNet 2014, thus ranking higher than the current state-of-the-art methods. © 2021 IEEE. date: 2021 publisher: Institute of Electrical and Electronics Engineers Inc. official_url: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85120911404&doi=10.1109%2fACCESS.2021.3132453&partnerID=40&md5=40be9bc0d80140390bf6fdd315a6b224 id_number: 10.1109/ACCESS.2021.3132453 full_text_status: none publication: IEEE Access volume: 9 pagerange: 161959-161977 refereed: TRUE issn: 21693536 citation: Usmani, U.A. and Watada, J. and Jaafar, J. and Aziz, I.A. and Roy, A. (2021) A Reinforcement Learning Based Adaptive ROI Generation for Video Object Segmentation. IEEE Access, 9. pp. 161959-161977. ISSN 21693536