{"id":69973,"date":"2025-01-24T18:07:57","date_gmt":"2025-01-24T10:07:57","guid":{"rendered":"https:\/\/inventec2.mjitec.tw\/?page_id=69973"},"modified":"2025-01-24T18:52:00","modified_gmt":"2025-01-24T10:52:00","slug":"who-brings-the-frisbee-probing-hidden-hallucination-factors-in-large-vision-language-model-via-causality-analysis","status":"publish","type":"page","link":"https:\/\/inventec2.mjitec.tw\/en\/ai\/who-brings-the-frisbee-probing-hidden-hallucination-factors-in-large-vision-language-model-via-causality-analysis\/","title":{"rendered":"Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis"},"content":{"rendered":"<div class=\"wpb-content-wrapper\"><p>[vc_row full_width=&#8221;stretch_row&#8221;][vc_column]<div id=\"rs-space-69e116e93b2d4\" class=\"rs-space\">\r\n                <div class=\"rs-space-data\" data-conf=\"{&quot;uqid&quot;:&quot;69e116e93b2d4&quot;,&quot;space_lg&quot;:&quot;150&quot;,&quot;space_md&quot;:&quot;80&quot;,&quot;space_sm&quot;:&quot;60&quot;,&quot;space_xs&quot;:&quot;60&quot;}\"><\/div>\t\t\t\r\n\t\t\t<\/div>[vc_row_inner el_class=&#8221;md-full-col&#8221;][vc_column_inner el_class=&#8221;m_p&#8221; width=&#8221;1\/2&#8243;]\n        <div class=\"rs-heading    \">\n        \t<div class=\"title-inner\"  data-border-color=\"\">\n        \t\t\n\t            \n\t            <h2 class=\"title \" style=\"color: #333333\">Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis <\/h2>\n\t        <\/div><\/div>[vc_column_text css=&#8221;.vc_custom_1737715755095{margin-bottom: 20px !important;}&#8221;]IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV 2025)[\/vc_column_text][vc_column_text css=&#8221;.vc_custom_1689320451803{margin-bottom: 5px !important;}&#8221;]<\/p>\n<div>\n<h6>Authors<\/h6>\n<\/div>\n<p>[\/vc_column_text][vc_column_text css=&#8221;.vc_custom_1737715744148{margin-bottom: 20px !important;}&#8221;]Po-Hsuan Huang\u2217, Jeng-Lin Li\u2217, Chin-Po Chen, Ming-Ching Chang, Wei-Chao Chen[\/vc_column_text][vc_column_text css=&#8221;.vc_custom_1689320463228{margin-bottom: 5px !important;}&#8221;]<\/p>\n<div>\n<h6>Published<\/h6>\n<\/div>\n<p>[\/vc_column_text][vc_column_text css=&#8221;&#8221;]2025\/3\/1[\/vc_column_text][\/vc_column_inner][vc_column_inner el_class=&#8221;m_p&#8221; width=&#8221;1\/2&#8243;][vc_single_image image=&#8221;69970&#8243; img_size=&#8221;full&#8221; css=&#8221;&#8221;][\/vc_column_inner][\/vc_row_inner][\/vc_column][\/vc_row][vc_row][vc_column]<div id=\"rs-space-69e116e93b412\" class=\"rs-space\">\r\n                <div class=\"rs-space-data\" data-conf=\"{&quot;uqid&quot;:&quot;69e116e93b412&quot;,&quot;space_lg&quot;:&quot;150&quot;,&quot;space_md&quot;:&quot;80&quot;,&quot;space_sm&quot;:&quot;60&quot;,&quot;space_xs&quot;:&quot;60&quot;}\"><\/div>\t\t\t\r\n\t\t\t<\/div>[\/vc_column][\/vc_row][vc_row full_width=&#8221;stretch_row&#8221;][vc_column][vc_row_inner content_placement=&#8221;top&#8221; css=&#8221;.vc_custom_1657794580528{margin-bottom: 20px !important;}&#8221;][vc_column_inner el_class=&#8221;m_p paragraph_title&#8221; width=&#8221;1\/3&#8243;]\n        <div class=\"rs-heading   vc_custom_1689320478921  \">\n        \t<div class=\"title-inner\"  data-border-color=\"\">\n        \t\t\n\t            \n\t            <h2 class=\"title \" style=\"color: #333333\">Abstract <\/h2>\n\t        <\/div><\/div>[\/vc_column_inner][vc_column_inner el_class=&#8221;m_p&#8221; width=&#8221;2\/3&#8243;][vc_column_text css=&#8221;&#8221;]<\/p>\n<div class=\"ewa-rteLine\">Recent advancements in large vision-language models (LVLM) have significantly enhanced their ability to comprehend visual inputs alongside natural language. However, a major challenge in their real-world application is hallucination, where LVLMs generate non-existent visual elements, eroding user trust. The underlying mechanism driving this multimodal hallucination is poorly understood. Minimal research has illuminated whether contexts such as sky, tree, or grass field involve the LVLM in hallucinating a<\/div>\n<div class=\"ewa-rteLine\">frisbee. We hypothesize that hidden factors, such as objects, contexts, and semantic foreground-background structures, induce hallucination. This study proposes a novel causal approach: a hallucination probing system to identify these hidden factors. By analyzing the causality between images, text prompts, and network saliency, we systematically explore interventions to block these factors. Our experimental findings show that a straightforward technique based on our analysis can significantly reduce hallucinations. Additionally, our analyses indicate the potential to edit network<\/div>\n<div class=\"ewa-rteLine\">internals to minimize hallucinated outputs.<\/div>\n<p>[\/vc_column_text][\/vc_column_inner][\/vc_row_inner][\/vc_column][\/vc_row][vc_row][vc_column]<div id=\"rs-space-69e116e93b4c2\" class=\"rs-space\">\r\n                <div class=\"rs-space-data\" data-conf=\"{&quot;uqid&quot;:&quot;69e116e93b4c2&quot;,&quot;space_lg&quot;:&quot;80&quot;,&quot;space_md&quot;:&quot;80&quot;,&quot;space_sm&quot;:&quot;60&quot;,&quot;space_xs&quot;:&quot;60&quot;}\"><\/div>\t\t\t\r\n\t\t\t<\/div>[\/vc_column][\/vc_row][vc_row][vc_column width=&#8221;1\/3&#8243; el_class=&#8221;m_p keyword_title&#8221;][vc_column_text]<\/p>\n<h2>Keywords<\/h2>\n<p>[\/vc_column_text][\/vc_column][vc_column width=&#8221;2\/3&#8243; el_class=&#8221;m_p keyword&#8221;][vc_row_inner content_placement=&#8221;middle&#8221;][vc_column_inner width=&#8221;1\/3&#8243;][vc_raw_html css=&#8221;&#8221;]JTNDdWwlMjBjbGFzcyUzRCUyMnN0eWxlbGlzdGluZyUyMiUzRSUwQSUyMCUwOSUzQ2xpJTIwc3R5bGUlM0QlMjJsaW5lLWhlaWdodCUzQTM0cHglM0IlMjIlM0VWaXNpb24lMjBMYW5ndWFnZSUyME1vZGVscyUzQyUyRmxpJTNFJTBBJTIwJTA5JTBBJTNDJTJGdWwlM0U=[\/vc_raw_html][\/vc_column_inner][vc_column_inner width=&#8221;1\/3&#8243;][vc_raw_html css=&#8221;&#8221;]JTNDdWwlMjBjbGFzcyUzRCUyMnN0eWxlbGlzdGluZyUyMiUzRSUwQSUyMCUwOSUzQ2xpJTIwc3R5bGUlM0QlMjJsaW5lLWhlaWdodCUzQTM0cHglM0IlMjIlM0VNdWx0aW1vZGFsJTNDJTJGbGklM0UlMEElMEElM0MlMkZ1bCUzRQ==[\/vc_raw_html][\/vc_column_inner][\/vc_row_inner][\/vc_column][\/vc_row][vc_row][vc_column]<div id=\"rs-space-69e116e93b4f7\" class=\"rs-space\">\r\n                <div class=\"rs-space-data\" data-conf=\"{&quot;uqid&quot;:&quot;69e116e93b4f7&quot;,&quot;space_lg&quot;:&quot;80&quot;,&quot;space_md&quot;:&quot;80&quot;,&quot;space_sm&quot;:&quot;60&quot;,&quot;space_xs&quot;:&quot;60&quot;}\"><\/div>\t\t\t\r\n\t\t\t<\/div>[\/vc_column][\/vc_row][vc_row full_width=&#8221;stretch_row&#8221; el_class=&#8221;bg&#8221; css=&#8221;.vc_custom_1657248474326{padding-top: 50px !important;padding-bottom: 50px !important;}&#8221;][vc_column][vc_column_text css=&#8221;.vc_custom_1689320511113{margin-bottom: 20px !important;}&#8221;]<\/p>\n<h3 style=\"text-align: center; color: #fff;\">Download<\/h3>\n<p>[\/vc_column_text][vc_row_inner content_placement=&#8221;middle&#8221;][vc_column_inner el_class=&#8221;download_btn_wrap&#8221;][vc_btn title=&#8221;PDF&#8221; style=&#8221;flat&#8221; color=&#8221;white&#8221; align=&#8221;center&#8221; css=&#8221;.vc_custom_1737715885106{padding-right: 20px !important;padding-left: 20px !important;}&#8221; link=&#8221;url:https%3A%2F%2Farxiv.org%2Fpdf%2F2412.02946|target:_blank&#8221; el_class=&#8221;download_btn&#8221;][\/vc_column_inner][\/vc_row_inner][\/vc_column][\/vc_row]<\/p>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>[vc_row full_width=&#8221;stretch_row&#8221;][vc_column][vc_row_inner el_class=&#8221;md-full-col&#8221;][vc_column_inner el_class=&#8221;m_p&#8221; width=&#8221;1\/2&#8243;][vc_column_text css=&#8221;.vc_custom_1737715755095{margin-bottom: 20px !important;}&#8221;]IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV 2025)[\/vc_column_text][vc_column_text css=&#8221;.vc_custom_1689320451803{margin-bottom: 5px !important;}&#8221;] Authors [\/vc_column_text][vc_column_text css=&#8221;.vc_custom_1737715744148{margin-bottom: 20px !important;}&#8221;]Po-Hsuan Huang\u2217, Jeng-Lin Li\u2217, Chin-Po Chen, Ming-Ching Chang, Wei-Chao Chen[\/vc_column_text][vc_column_text css=&#8221;.vc_custom_1689320463228{margin-bottom: 5px !important;}&#8221;] Published [\/vc_column_text][vc_column_text css=&#8221;&#8221;]2025\/3\/1[\/vc_column_text][\/vc_column_inner][vc_column_inner el_class=&#8221;m_p&#8221; width=&#8221;1\/2&#8243;][vc_single_image image=&#8221;69970&#8243; img_size=&#8221;full&#8221; css=&#8221;&#8221;][\/vc_column_inner][\/vc_row_inner][\/vc_column][\/vc_row][vc_row][vc_column][\/vc_column][\/vc_row][vc_row full_width=&#8221;stretch_row&#8221;][vc_column][vc_row_inner content_placement=&#8221;top&#8221; css=&#8221;.vc_custom_1657794580528{margin-bottom: 20px !important;}&#8221;][vc_column_inner el_class=&#8221;m_p paragraph_title&#8221; width=&#8221;1\/3&#8243;][\/vc_column_inner][vc_column_inner el_class=&#8221;m_p&#8221; width=&#8221;2\/3&#8243;][vc_column_text&#8230;<\/p>\n","protected":false},"author":4,"featured_media":0,"parent":4975,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-69973","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/inventec2.mjitec.tw\/en\/wp-json\/wp\/v2\/pages\/69973","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/inventec2.mjitec.tw\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/inventec2.mjitec.tw\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/inventec2.mjitec.tw\/en\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/inventec2.mjitec.tw\/en\/wp-json\/wp\/v2\/comments?post=69973"}],"version-history":[{"count":1,"href":"https:\/\/inventec2.mjitec.tw\/en\/wp-json\/wp\/v2\/pages\/69973\/revisions"}],"predecessor-version":[{"id":69994,"href":"https:\/\/inventec2.mjitec.tw\/en\/wp-json\/wp\/v2\/pages\/69973\/revisions\/69994"}],"up":[{"embeddable":true,"href":"https:\/\/inventec2.mjitec.tw\/en\/wp-json\/wp\/v2\/pages\/4975"}],"wp:attachment":[{"href":"https:\/\/inventec2.mjitec.tw\/en\/wp-json\/wp\/v2\/media?parent=69973"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}