{"id":69211,"date":"2024-10-25T14:51:43","date_gmt":"2024-10-25T06:51:43","guid":{"rendered":"https:\/\/inventec2.mjitec.tw\/?page_id=69211"},"modified":"2024-10-25T15:56:41","modified_gmt":"2024-10-25T07:56:41","slug":"benchmarking-smoothness-and-reducing-high-frequency-oscillations-in-continuous-control-policies","status":"publish","type":"page","link":"https:\/\/inventec2.mjitec.tw\/zh-hans\/ai\/benchmarking-smoothness-and-reducing-high-frequency-oscillations-in-continuous-control-policies\/","title":{"rendered":"Benchmarking Smoothness and Reducing High-Frequency Oscillations in Continuous Control Policies"},"content":{"rendered":"<div class=\"wpb-content-wrapper\"><p>[vc_row full_width=&#8221;stretch_row&#8221;][vc_column]<div id=\"rs-space-69e10eade3482\" class=\"rs-space\">\r\n                <div class=\"rs-space-data\" data-conf=\"{&quot;uqid&quot;:&quot;69e10eade3482&quot;,&quot;space_lg&quot;:&quot;150&quot;,&quot;space_md&quot;:&quot;80&quot;,&quot;space_sm&quot;:&quot;60&quot;,&quot;space_xs&quot;:&quot;60&quot;}\"><\/div>\t\t\t\r\n\t\t\t<\/div>[vc_row_inner el_class=&#8221;md-full-col&#8221;][vc_column_inner el_class=&#8221;m_p&#8221; width=&#8221;1\/2&#8243;]\n        <div class=\"rs-heading    \">\n        \t<div class=\"title-inner\"  data-border-color=\"\">\n        \t\t\n\t            \n\t            <h2 class=\"title \" style=\"color: #333333\">Benchmarking Smoothness and Reducing High-Frequency Oscillations in Continuous Control Policies <\/h2>\n\t        <\/div><\/div>[vc_column_text css=&#8221;.vc_custom_1729838205597{margin-bottom: 20px !important;}&#8221;]<\/p>\n<div>\n<p>IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)<\/p>\n<\/div>\n<p>[\/vc_column_text][vc_column_text css=&#8221;.vc_custom_1660542766334{margin-bottom: 5px !important;}&#8221;]<\/p>\n<div>\n<h6>\u4f5c\u8005<\/h6>\n<\/div>\n<p>[\/vc_column_text][vc_column_text css=&#8221;.vc_custom_1729838226692{margin-bottom: 20px !important;}&#8221;]Guilherme Christmann*, Ying-Sheng Luo*, Hanjaya Mandala*, and Wei-Chao Chen[\/vc_column_text][vc_column_text css=&#8221;.vc_custom_1729842484092{margin-bottom: 5px !important;}&#8221;]<\/p>\n<div>\n<h6>\u53d1\u8868\u65e5\u671f<\/h6>\n<\/div>\n<p>[\/vc_column_text][vc_column_text css=&#8221;&#8221;]<\/p>\n<div>\n<p>2024\/10\/22<\/p>\n<\/div>\n<p>[\/vc_column_text][\/vc_column_inner][vc_column_inner el_class=&#8221;m_p&#8221; width=&#8221;1\/2&#8243;][vc_single_image image=&#8221;69212&#8243; img_size=&#8221;full&#8221; css=&#8221;&#8221;][\/vc_column_inner][\/vc_row_inner][\/vc_column][\/vc_row][vc_row][vc_column]<div id=\"rs-space-69e10eade35a2\" class=\"rs-space\">\r\n                <div class=\"rs-space-data\" data-conf=\"{&quot;uqid&quot;:&quot;69e10eade35a2&quot;,&quot;space_lg&quot;:&quot;150&quot;,&quot;space_md&quot;:&quot;80&quot;,&quot;space_sm&quot;:&quot;60&quot;,&quot;space_xs&quot;:&quot;60&quot;}\"><\/div>\t\t\t\r\n\t\t\t<\/div>[\/vc_column][\/vc_row][vc_row full_width=&#8221;stretch_row&#8221;][vc_column][vc_row_inner content_placement=&#8221;top&#8221; css=&#8221;.vc_custom_1657794580528{margin-bottom: 20px !important;}&#8221;][vc_column_inner el_class=&#8221;m_p paragraph_title&#8221; width=&#8221;1\/3&#8243;]\n        <div class=\"rs-heading   vc_custom_1657008747808  \">\n        \t<div class=\"title-inner\"  data-border-color=\"\">\n        \t\t\n\t            \n\t            <h2 class=\"title \" style=\"color: #333333\">\u6982\u8981 <\/h2>\n\t        <\/div><\/div>[\/vc_column_inner][vc_column_inner el_class=&#8221;m_p&#8221; width=&#8221;2\/3&#8243;][vc_column_text css=&#8221;&#8221;]Reinforcement learning (RL) policies are prone to high-frequency oscillations, especially undesirable when deploying to hardware in the real-world. In this paper, we identify, categorize, and compare methods from the literature that aim to mitigate high-frequency oscillations in deep RL. We define two broad classes: loss regularization and architectural methods. At their core, these methods incentivize learning a smooth mapping, such that nearby states in the input space produce nearby actions in the output space.<\/p>\n<p>We present benchmarks in terms of policy performance and control smoothness on traditional RL environments from the Gymnasium and a complex manipulation task, as well as three robotics locomotion tasks that include deployment and evaluation with real-world hardware. Finally, we also propose hybrid methods that combine elements from both loss regularization and architectural methods. We find that the best-performing hybrid outperforms other methods, and improves control smoothness by 26.8% over the baseline, with a worst-case performance degradation of just 2.8%.[\/vc_column_text][\/vc_column_inner][\/vc_row_inner][vc_row_inner content_placement=&#8221;middle&#8221; el_class=&#8221;md-full-col video_wrap&#8221;][vc_column_inner el_class=&#8221;m_p&#8221;][vc_raw_html css=&#8221;&#8221;]JTNDaWZyYW1lJTIwd2lkdGglM0QlMjIxMjQwJTIyJTIwaGVpZ2h0JTNEJTIyODAwJTIyJTIwc3JjJTNEJTIyaHR0cHMlM0ElMkYlMkZ3d3cueW91dHViZS5jb20lMkZlbWJlZCUyRjBOUkhXN203aUs4JTNGc2klM0RBbEh0bldYdm85T3QtMllwJTIyJTIwdGl0bGUlM0QlMjJZb3VUdWJlJTIwdmlkZW8lMjBwbGF5ZXIlMjIlMjBmcmFtZWJvcmRlciUzRCUyMjAlMjIlMjBhbGxvdyUzRCUyMmFjY2VsZXJvbWV0ZXIlM0IlMjBhdXRvcGxheSUzQiUyMGNsaXBib2FyZC13cml0ZSUzQiUyMGVuY3J5cHRlZC1tZWRpYSUzQiUyMGd5cm9zY29wZSUzQiUyMHBpY3R1cmUtaW4tcGljdHVyZSUzQiUyMHdlYi1zaGFyZSUyMiUyMHJlZmVycmVycG9saWN5JTNEJTIyc3RyaWN0LW9yaWdpbi13aGVuLWNyb3NzLW9yaWdpbiUyMiUyMGFsbG93ZnVsbHNjcmVlbiUzRSUzQyUyRmlmcmFtZSUzRQ==[\/vc_raw_html][\/vc_column_inner][\/vc_row_inner][\/vc_column][\/vc_row][vc_row][vc_column]<div id=\"rs-space-69e10eade36aa\" class=\"rs-space\">\r\n                <div class=\"rs-space-data\" data-conf=\"{&quot;uqid&quot;:&quot;69e10eade36aa&quot;,&quot;space_lg&quot;:&quot;80&quot;,&quot;space_md&quot;:&quot;80&quot;,&quot;space_sm&quot;:&quot;60&quot;,&quot;space_xs&quot;:&quot;60&quot;}\"><\/div>\t\t\t\r\n\t\t\t<\/div>[\/vc_column][\/vc_row][vc_row][vc_column width=&#8221;1\/3&#8243; el_class=&#8221;m_p keyword_title&#8221;][vc_column_text css=&#8221;&#8221;]<\/p>\n<h2 class=\"p1\">\u5173\u952e\u5b57<\/h2>\n<p>[\/vc_column_text][\/vc_column][vc_column width=&#8221;2\/3&#8243; el_class=&#8221;m_p keyword&#8221;][vc_row_inner content_placement=&#8221;middle&#8221;][vc_column_inner width=&#8221;1\/3&#8243;][vc_raw_html css=&#8221;&#8221;]JTNDdWwlMjBjbGFzcyUzRCUyMnN0eWxlbGlzdGluZyUyMiUzRSUwQSUyMCUwOSUzQ2xpJTIwc3R5bGUlM0QlMjJsaW5lLWhlaWdodCUzQTM0cHglM0IlMjIlM0VSb2JvdGljcyUzQyUyRmxpJTNFJTBBJTNDJTJGdWwlM0U=[\/vc_raw_html][\/vc_column_inner][vc_column_inner width=&#8221;1\/3&#8243;][vc_raw_html css=&#8221;&#8221;]JTNDdWwlMjBjbGFzcyUzRCUyMnN0eWxlbGlzdGluZyUyMiUzRSUwQSUyMCUwOSUzQ2xpJTIwc3R5bGUlM0QlMjJsaW5lLWhlaWdodCUzQTM0cHglM0IlMjIlM0VSZWluZm9yY2VtZW50JTIwTGVhcm5pbmclM0MlMkZsaSUzRSUwQSUzQyUyRnVsJTNF[\/vc_raw_html][\/vc_column_inner][vc_column_inner width=&#8221;1\/3&#8243;][vc_raw_html]JTNDdWwlMjBjbGFzcyUzRCUyMnN0eWxlbGlzdGluZyUyMiUzRSUwQSUyMCUwOSUwQSUzQyUyRnVsJTNF[\/vc_raw_html][\/vc_column_inner][\/vc_row_inner][\/vc_column][\/vc_row][vc_row][vc_column]<div id=\"rs-space-69e10eade36e8\" class=\"rs-space\">\r\n                <div class=\"rs-space-data\" data-conf=\"{&quot;uqid&quot;:&quot;69e10eade36e8&quot;,&quot;space_lg&quot;:&quot;80&quot;,&quot;space_md&quot;:&quot;80&quot;,&quot;space_sm&quot;:&quot;60&quot;,&quot;space_xs&quot;:&quot;60&quot;}\"><\/div>\t\t\t\r\n\t\t\t<\/div>[\/vc_column][\/vc_row][vc_row full_width=&#8221;stretch_row&#8221; el_class=&#8221;bg&#8221; css=&#8221;.vc_custom_1657248474326{padding-top: 50px !important;padding-bottom: 50px !important;}&#8221;][vc_column][vc_column_text css=&#8221;.vc_custom_1729842998885{margin-bottom: 20px !important;}&#8221;]<\/p>\n<h3 style=\"text-align: center;\"><span style=\"color: #ffffff;\">\u4e0b\u8f7d\u4e0e\u5206\u4eab<\/span><\/h3>\n<p>[\/vc_column_text][vc_row_inner content_placement=&#8221;middle&#8221;][vc_column_inner el_class=&#8221;download_btn_wrap&#8221;][vc_btn title=&#8221;PDF&#8221; style=&#8221;flat&#8221; color=&#8221;white&#8221; align=&#8221;center&#8221; css=&#8221;&#8221; link=&#8221;url:https%3A%2F%2Farxiv.org%2Fhtml%2F2410.16632v1|target:_blank&#8221; el_class=&#8221;download_btn&#8221;][\/vc_column_inner][\/vc_row_inner][\/vc_column][\/vc_row]<\/p>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>[vc_row full_width=&#8221;stretch_row&#8221;][vc_column&#8230;<\/p>\n","protected":false},"author":4,"featured_media":0,"parent":4976,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-69211","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/inventec2.mjitec.tw\/zh-hans\/wp-json\/wp\/v2\/pages\/69211","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/inventec2.mjitec.tw\/zh-hans\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/inventec2.mjitec.tw\/zh-hans\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/inventec2.mjitec.tw\/zh-hans\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/inventec2.mjitec.tw\/zh-hans\/wp-json\/wp\/v2\/comments?post=69211"}],"version-history":[{"count":5,"href":"https:\/\/inventec2.mjitec.tw\/zh-hans\/wp-json\/wp\/v2\/pages\/69211\/revisions"}],"predecessor-version":[{"id":69227,"href":"https:\/\/inventec2.mjitec.tw\/zh-hans\/wp-json\/wp\/v2\/pages\/69211\/revisions\/69227"}],"up":[{"embeddable":true,"href":"https:\/\/inventec2.mjitec.tw\/zh-hans\/wp-json\/wp\/v2\/pages\/4976"}],"wp:attachment":[{"href":"https:\/\/inventec2.mjitec.tw\/zh-hans\/wp-json\/wp\/v2\/media?parent=69211"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}