Multimodal sentiment analysis is a technology for traditional text-based sentiment analysis, which includes modalities such as audio and visual data.[1] It can be bimodal, which includes different combinations of two modalities, or trimodal, which incorporates three modalities.[2] With the extensive amount of social media data available online in different forms such as videos and images, the conventional text-based sentiment analysis has evolved into more complex models of multimodal sentiment analysis,[3][4] which can be applied in the development of virtual assistants,[5]analysis of YouTube movie reviews,[6]analysis of news videos,[7] and emotion recognition (sometimes known as emotion detection) such as depression monitoring,[8] among others.
Similar to the traditional sentiment analysis, one of the most basic task in multimodal sentiment analysis is sentiment classification, which classifies different sentiments into categories such as positive, negative, or neutral.[9] The complexity of analyzing text, audio, and visual features to perform such a task requires the application of different fusion techniques, such as feature-level, decision-level, and hybrid fusion.[3] The performance of these fusion techniques and the classificationalgorithms applied, are influenced by the type of textual, audio, and visual features employed in the analysis.[10]
^Pereira, Moisés H. R.; Pádua, Flávio L. C.; Pereira, Adriano C. M.; Benevenuto, Fabrício; Dalip, Daniel H. (9 April 2016). "Fusing Audio, Textual and Visual Features for Sentiment Analysis of News Videos". arXiv:1604.02612 [cs.CL].
^Zucco, Chiara; Calabrese, Barbara; Cannataro, Mario (November 2017). "Sentiment analysis and affective computing for depression monitoring". 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE. pp. 1988–1995. doi:10.1109/bibm.2017.8217966. ISBN978-1-5090-3050-7. S2CID24408937.
^Pang, Bo; Lee, Lillian (2008). Opinion mining and sentiment analysis. Hanover, MA: Now Publishers. ISBN978-1601981509.
^Cite error: The named reference s7 was invoked but never defined (see the help page).