deep learning for visual tracking a comprehensive survey

Kalaivani S,Appunraj C,Dinesh Kumar D,Sathya V

Published in International Journal of Advanced Research in Computer Science Engineering and Information Technology

ISSN: 2321-3337          Impact Factor:1.521         Volume:6         Issue:3         Year: 22 April,2024         Pages:1894-1899

International Journal of Advanced Research in Computer Science Engineering and Information Technology

Abstract

This project introduces a cutting-edge approach that combines advanced computer vision techniques with speech synthesis capabilities to achieve groundbreaking results. The impact of this innovation extends beyond technical advancements, finding relevance in diverse real-time applications. The fusion of speech synthesis with object detection adds a layer of accessibility and convenience to multiple scenarios. This system can revolutionize the way individuals with visual impairments interact with their surroundings, offering an auditory understanding of their environment. Furthermore, the model's ability to accurately measure distances between objects and the camera has far-reaching implications, spanning from enhanced object recognition in autonomous vehicles to optimized industrial processes and security monitoring. This project focus on combining speech and vision modalities to yield accurate object detection and distance calculation outcomes. This innovative endeavor sets the stage for intelligent systems that not only visualize the world but also communicate findings audibly, opening doors to novel applications and possibilities across various sectors.

Kewords

Object detection, Web Camera, Image Preprocessing, DNN Algorithm

Reference

[1] V.O. Yazici, A. Gonzalez-Garcia, A. Ramisa, B. Twardowski, and J. van de Weijer, “Orderless recurrent models for multi-label classification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 13440–13449. [2]. Z. Ji et al., “Deep ranking for image zero- shot multi-label classification,” IEEE Trans. Image Process., vol. 29, pp. 6549–6560, 2020. [3]. S. Wen et al., “Multilabel image classification via feature/label coprojection,” IEEE Trans. Syst., Man, Cybern. Syst., vol. 51, no. 11, pp. 7250–7259, Nov. 2021. [4]. J. Xu, H. Tian, Z. Wang, Y. Wang, W. Kang, and F. Chen, “Joint input and output space learning for multi-label image classification,” IEEE Trans. Multimedia, vol. 23, pp. 1696–1707, 2021. [5]. J. Lanchantin, T. Wang, V. Ordonez, and Y. Qi, “General multilabel image classification with transformers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 16478–16488. [6]. J R. You, Z. Guo, L. Cui, X. Long, Y. Bao, and S. Wen, “Cross-modality attention with semantic graph embedding for multi-label classification,” in Proc. AAAI Conf. Artif. Intell., vol. 34, Apr. 2020, pp. 12709–12716. [7]. Y. Liu, W. Chen, H. Qu, S. M. H. Mahmud, and K. Miao, “Weakly supervised image classification and pointwise localization with graph convolutional networks,” Pattern Recognit., vol. 109, Jan. 2021, Art. no. 107596. [8] P. Huang, J. Han, N. Liu, J. Ren, and D. Zhang, “Scribble-supervised video object segmentation,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 2, pp. 339–353, Feb. 2022. [9]. H. Ding, X. Jiang, B. Shuai, A. Q. Liu, and G. Wang, “Semantic segmentation with context encoding and multi-path decoding,” IEEE Trans. Image Process., vol. 29, pp. 3520–3533, 2020. [10]. Z. Tang, X. Liu, and B. Yang, “PENet: Object detection using points estimation in high definition aerial images,” in Proc. 19th IEEE Int. Conf. Mach. Learn. Appl. (ICMLA), Dec. 2020, pp. 392–398. [11] Choi, J. Kwon, and K. M. Lee, “Real-time visual tracking by deepreinforced decision making,” Comput. Vis. Image Und., vol. 171, pp.10–19, 2018. [12] X. Wang, C. Li, B. Luo, and J. Tang, “SINT++: Robust visual tracking via adversarial positive instance generation,” in Proc. IEEE CVPR, 2018, pp. 4864–4873. [13] E. Park and A. C. Berg, “Meta-tracker: Fast and robust online adaptation for visual object trackers,” in Proc. ECCV, 2018. [14] L. Zhang, A. Gonzalez-Garcia, J. v. d. Weijer, M. Danelljan, and F.S. Khan, “Learning the model update for Siamese trackers,” in Proc. IEEE ICCV, 2019. [15] F. Du, P. Liu, W. Zhao, and X. Tang, “Correlation-guided attention for corner detection based visual tracking,” in Proc. IEEE CVPR, 2020. [16] B. Yan, D. Wang, H. Lu, and X. Yang, “Cooling-shrinking attack: Blinding the tracker with imperceptible noises,” in Proc. IEEE CVPR, 2020. [17] X. Chen, X. Yan, F. Zheng, Y. Jiang, S.- T. Xia, Y. Zhao, and R. Ji, “One-shot adversarial attacks on visual tracking with dual attention,” in Proc. IEEE CVPR, 2020. [18] M. Danelljan, L. V. Gool, and R. Timofte, “Probabilistic regression for visual tracking,” in Proc. IEEE CVPR, 2020. [19] J. Gao, W. Hu, and Y. Lu, “Recursive least-squares estimator-aided online learning for visual tracking,” in Proc. IEEE CVPR, 2020. [20] T. Yang, P. Xu, R. Hu, H. Chai, and A. B. Chan, “Roam: Recurrently optimizing tracking model,” in Proc. IEEE CVPR, 2020.