视觉问答系统用什么服务器比较好,基于深度学习的视觉问答系统

  • Post author:
  • Post category:其他


中图分类号:TP391.41;TP18         文献标识码:A         文章编号:2096-4706(2019)11-0011-04

Visual Question Answering System Based on Deep Learning

GE Mengying,SUN Baoshan

(School of Computer Science and Technology,Tianjin Polytechnic University,Tianjin 300387,China)

Abstract:With the development of the internet,the amount of information available to human beings increases exponentially,and the amount of knowledge we can get from the data also increases greatly. Artificial intelligence,which had been put on hold,is radiate vitality. With the continuous development of artificial intelligence, in recent years,visual question answer (VQA) hasemerged as a hot topic in the field of artificial intelligence. Visual question answer (VQA) system needs to take pictures and questions asinput and combine these two parts of information to produce a human language as output. The key solution for VQA is how to fuse visualand linguistic features extracted from input images and questions. This paper focuses on the visual question and answer,summarizesthe research progress in recent years from the aspects of concept and model,and discusses the existing deficiencies. Finally,the futureresearch direction of VQA are prospected.

Keywords:deep learning;artificial intelligence;visual question answer;natural language processing

参考文献:

[1] Malinowski M,Fritz M . A Multi-World Approach to QuestionAnswering about Real-World Scenes based on Uncertain Input [J].OALib Journal,2014.

[2] Lu J,Yang J,Batra D,et al. Hierarchical Question-ImageCo-Attention for Visual Question Answering [C].30th Conference onNeural Information Processing Systems(NIPS) in 2016,Barcelona,Spain,2016.

[3] Yu D,Fu J,Mei T,et al. Multi-level Attention Networks forVisual Question Answering [C]// 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR). IEEE,2017.

[4] Yu Z,Yu J,Fan J,et al. Multi-modal Factorized BilinearPooling with Co-Attention Learning for Visual Question Answering [J].2017 IEEE International Conference on Computer Vision,2017(1):1839-1848.

[5] Fukui A,Park D H,Yang D,et al. Multimodal CompactBilinear Pooling for Visual Question Answering and Visual Grounding [J].ScienceOpen,2016:457-468.

[6] He K,Zhang X,Ren S,et al. Deep ResidualLearning for Image Recognition [J].2016 IEEE Conferenceon Computer Vision and Pattern Recognition,2016(1):770-778.

[7] Deng J,Dong W,Socher R,et al. ImageNet:a Large-Scale Hierarchical Image Database [C]// 2009 IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPR2009),20-25 June 2009,Miami,Florida,USA. IEEE,2009.

[8] Nguyen D K,Okatani T. Improved Fusion of Visual andLanguage Representations by Dense Symmetric Co-Attention for VisualQuestion Answering [J/OL].https://arxiv.org/pdf/1804.00775.pdf,2018.

[9] Antol S,Agrawal A,Lu J,et al. VQA:Visual QuestionAnswering [J].International Journal of Computer Vision,2017,123(1):4-31.

[10] Zhou B,Tian Y,Sukhbaatar S,et al. Simple Baseline forVisual Question Answering [J].Computer Science,2015.

作者简介:

葛梦颖(1996.12-),女,汉族,安徽宿州人,硕士研究生,研究方向:自然语言处理、深度学习等。

孙宝山(1978.10-),男,汉族,天津人,副教授,硕士生导师,工学博士,研究方向:机器学习、自然语言处理等。