Image Captioning Generator Using Deep Learning Models: An Abbreviated Survey
DOI:
https://doi.org/10.24237/ASJ.02.02.733BKeywords:
Image Caption, Deep Learning Models, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM)Abstract
Captioning an image is the process of using a visual comprehension system with a model of
language, by which we can construct sentences that are meaningful and syntactically accurate.
These accurate phrases can explain the natural language (The seen content of the image). As a
relatively young field of study, it is gaining growing attention. To accomplish image caption,
semantic information about the images must be gathered and conveyed in natural language.
Computer vision and natural language processing are both used in the difficult task of image
captioning. That issue has received a lot of proposals for solutions. An abbreviated survey of
image captioning studies is given in this paper. We concentrate our efforts on neural networkbased
approaches that deliver current outcomes. Neural network-based techniques are broken
down into subcategories in accordance with the implementation architecture. The most recent
methodologies are then compared to normative data sets. Methods based on neural networks
are classified into subcategories according to the framework being used.
References
Y. H. Tan, C. S. Chan, Phrase-based image caption generator with hierarchical LSTM network, Neurocomputing, 333, 86-100(2019)
S. Bai, S. An, A survey on automatic image caption generation, Neurocomputing, 311, 291-304(2018)
X. He, B. Shi, X. Bai, G. S. Xia, Z. Zhang, W. Dong, Image Caption Generation with Part of Speech Guidance, Pattern Recognition Letters, 119, 229–237(2019)
M. Stefanini, M. Cornia, L. Baraldi, S. Cascianelli, G. Fiameni, R. Cucchiara, From Show to Tell: A Survey on Deep Learning-based Image Captioning, 2021. arXiv:2107.06912 [cs.CV].
P. Tian, H. Mo, L. Jiang, Image caption generation using multi-level semantic context information, Symmetry, 13(7), (2021)
Y. H. Chang, Y. J. Chen, R. H. Huang, Y. T. Yu, Enhanced Image Captioning with Color Recognition Using Deep Learning Methods, Applied Sciences, 12(1), 209(2021)
J. Waleed, S. Albawi, H. Q. Flayyih, A. Alkhayyat, An Effective and Accurate CNN Model for Detecting Tomato Leaves Diseases, In: 2021 4th International Iraqi Conference on Engineering Technology and Their Applications (IICETA), 33-37(2021)
R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, Robert X. G., Deep learning and its applications to machine health monitoring, Mechanical Systems and Signal Processing, 115, 213-237(2019)
A. Mosavi, S. Faizollahzadeh ardabili, A. R. Várkonyi-Kóczy, List of Deep Learning Models, Preprints, (2019)
H. Q. Flayyih, J. Waleed and S. Albawi, ASystematic Mapping Study on Brain Tumors Recognition Based on Machine Learning Algorithms, In: 2020 3rd International Conference on Engineering Technology and its Applications (IICETA), 2020, 191-197.
J. Mao, W. Xu, Y. Yang, J, Wang, Z. Huang, A. Yuille, Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN), (2014)
O. Vinyals Google, A. Toshev Google, S. Bengio Google, D. Erhan Google, Show and Tell: A Neural Image Caption Generator, Computer Vision and Pattern Recognition, (2015)
K. Xu, J. Lei Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, Y. Bengio, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, (2015)
P. Kinghorn, L. Zhang, L. Shao, A hierarchical and regional deep learning architecture for image description generation, Pattern Recognition Letters, 119, 77-85(2019)
Y. H. Tan, C. S. Chan, phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning, (2016)
Q. You, H. Jin, Z. Wang, C. Fang, J. Luo, Image Captioning with Semantic Attention, In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 4651-4659(2016)
L. Yang, H. Hu, TVPRNN for image caption generation, Electronics Letters, 53(22), 1471-1473(2017)
X. He, B. Shi, X. Bai, G. S. Xia, Z. Zhang, W. Dong, Image Caption Generation with Part of Speech Guidance, Pattern Recognition Letters, 119, 229-237(2019)
A. Yuan, X. Li, X. Lu, 3G structure for image caption generation, Neurocomputing, 330, 17-28(2019)
P. Kinghorn, L. Zhang, L. Shao, A region-based image caption generator with refined descriptions, Neurocomputing, 272, 416-424(2018)
Y. H. Tan, C. S. Chan, Phrase-based image caption generator with hierarchical LSTM network, Neurocomputing, 333, 86-100(2019)
R. Alahmadi, C. H. Park, J. Hahn, Sequence-to-sequence image caption generator, Proc. SPIE 11041, In: Eleventh International Conference on Machine Vision (ICMV 2018), 110410C, March 2019.
G. Sharma, P. Kalena, N. Malde, A. Nair, S. Parkar, Visual Image Caption Generator Using Deep Learning, In: 2nd International Conference on Advances in Science & Technology (ICAST-2019), (2019)
X. Zhang, X. Wang, X. Tang, H. Zhou, C. Li, Description generation for remote sensing images using attribute attention mechanism, Remote Sensing, 11(6), (2019)
F. Xiao, X. Gong, Y. Zhang, Y. Shen, J. Li, X. Gao, DAA: Dual LSTMs with adaptive attention for image captioning, Neurocomputing, 364, 322-329(2019)
D. Qiu, B. Rothrock, T. Islam, A. K. Didier, V. Z. Sun, C. A. Mattmann, M. Ono, SCOTI: Science Captioning of Terrain Images for data prioritization and local image search, Planetary and Space Science, 188, (2020)
J. Wang, W. Wang, L. Wang, Z. Wang, D. D. Feng, T. Tan, Learning visual relationships and context-aware attention for image captioning, Pattern Recognition, 98, (2020)
S. Yan, Y. Xie, F. Wu, J. S. Smith, W. Lu, B. Zhang, Image captioning via hierarchical attention mechanism and policy gradient optimization, Signal Processing, 167, (2020)
S. Cao, G. An, Z. Zheng, Q. Ruan, Interactions Guided Generative Adversarial Network for unsupervised image captioning, Neurocomputing, 417, 419-431, (2020)
Y. Chu, X. Yue, L. Yu, M. Sergei, Z. Wang, Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention, Wireless Communications and Mobile Computing, (2020)
Z. Deng, Z. Jiang, R. Lan, W. Huang, X. Luo, Image captioning using DenseNet network and adaptive attention, Signal Processing: Image Communication, 85, (2020)
Z. Zhang, Q. Wu, Y. Wang, F. Chen, Exploring region relationships implicitly: Image captioning with visual relationship attention, Image and Vision Computing, 109, (2021)
P. Tian, H. Mo, L. Jiang, Image caption generation using multi-level semantic context information, Symmetry, 13(7), (2021)
J. Li, N. Xu, W. Nie, S. Zhang, Image Captioning with multi-level similarity-guided semantic matching, Visual Informatics, 5(4), 41-48(2021)
Y. H. Chang, Y. J. Chen, R. H. Huang, Y. T. Yu, Enhanced Image Captioning with Color Recognition Using Deep Learning Methods, Applied Sciences, 12(1), 209(2021)
X. Zhong, G. Nie, W. Huang, W. Liu, B. Ma, C. W. Lin, Attention-Guided Image Captioning With Adaptive Global and Local Feature Fusion, Journal of Visual Communication and Image Representation, 78, (2021)
P. Meel, D. K. Vishwakarma, HAN, Image Captioning, and Forensics Ensemble Multimodal Fake News Detection, Information Sciences, 567, 23-41(2021)
E. Boran, A. Erdem, N. Ikizler-Cinbis, E. Erdem, P. Madhyastha, L. Specia, Leveraging auxiliary image descriptions for dense video captioning, Pattern Recognition Letters, 146, 70-76(2021)
I. Afyouni, I. Azhar, A. Elnagar, AraCap: A hybrid deep learning architecture for Arabic Image Captioning, Procedia CIRP, 189, 382-389(2021)
K. Papineni, S. Roukos, T. Ward, W.J. Zhu, BLEU: a method for automatic evaluation of machine translation, 40th Annual Meeting on Association for Computational Linguistics, ACL, 311-318(2002)
S. Banerjee, A. Lavie, METEOR: an automatic metric for MT evaluation with improved correlation with human judgments, ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 65-72(2005)
M. Guillaumin, J. Verbeek, C. Schmid, Multiple Instance Metric Learning From Automatically Labeled Bags of Faces, In ECCV, 634-647(2010)
M. Hodosh , P. Young , J. Hockenmaier, Framing Image Description as A Ranking Task: Data, Models and Evaluation Metrics, Journal Artificial Intelligent Research, 47, 853-899(2013)
P. Young , A. Lai , M. Hodosh , J. Hockenmaier, From Image Descriptions To Visual Denotations: New Similarity Metrics for Semantic Inference over Event Descriptions, In: Proceedings of the Meeting on Association for Computational Linguistics, 67-78(2014)
X. Chen, H. Fang, T. Lin, R. Vedantam, S. Gupta, P. Dollar, C. Zitnick, Microsoft COCO Captions: Data Collection and Evaluation Server, 2015
B. Qu, X. Li, D. Tao, X. Lu, Deep Semantic Understanding of High-Resolution Remote Sensing Image, In: Proceedings of the 2016 International Conference on Computer, Information and Telecommunication Systems (CITS), Kunming, China, 1-5(2016)
F. Zhang, B. Du, L. Zhang, Saliency-Guided Unsupervised Feature Learning for Scene Classification, In: IEEE Transactions on Geoscience and Remote Sensing, 53(4), 2175-2184(2015)
Y. Yang, S. Newsam, Bag-of-Visual-Words And Spatial Extensions For Land-Use Classification, In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 270-279(2010)
X. Lu, B. Wang, X. Zheng, X. Li, Exploring Models and Data for Remote Sensing Image Caption Generation, IEEE Trans. Geosci. Remote Sens. 56, 2183-2195(2018)
C. Lu, R. Krishna, M. Bernstein, Visual Relationship Detection with Language Priors, In: Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016, 852–869
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 CC BY 4.0
This work is licensed under a Creative Commons Attribution 4.0 International License.