1 00:00:00,000 --> 00:00:06,600 各位听众朋友大家好 2 00:00:06,600 --> 00:00:09,980 欢迎收听Hugging Face每日爱论文速递周末特辑 3 00:00:09,980 --> 00:00:14,280 每周日准时为您带来一周内Hugging Face向最受欢迎的论文汇总 4 00:00:14,280 --> 00:00:18,379 本期节目涵盖的时间段是2025年6月2日至6月8日 5 00:00:18,379 --> 00:00:25,199 在本期节目中我们将为您精选五篇备受关注的论文内容涵盖了通过强化学习RL 6 00:00:25,199 --> 00:00:28,400 提升大型语言模型LLM的自我改进 7 00:00:28,399 --> 00:00:33,079 高商仇恳在推理中的应用延长的强化学习对LM推理的拓展 8 00:00:33,079 --> 00:00:37,859 测试时驱动的大模型快慢思考框架以及一种经济高效的视觉 9 00:00:37,859 --> 00:00:39,500 语言动作模型 10 00:00:39,500 --> 00:00:44,159 接下来让我们一起深入这些前沿研究探索AI技术的最新进展 11 00:00:44,159 --> 00:00:45,340 节目正式开始 12 00:00:45,340 --> 00:00:53,500 本期节目的第一篇论文是反思重视奖励通过强化学习实现LM的自我提升 13 00:00:53,500 --> 00:00:57,039 这篇论文在Hugging Face社区获得了169个点赞 14 00:00:57,039 --> 00:00:59,759 显示出其研究价值和社区的关注度 15 00:00:59,759 --> 00:01:04,879 这篇论文的核心目标是提升大型语言模型LMS的性能 16 00:01:04,879 --> 00:01:06,700 通过一种名为反思 17 00:01:06,700 --> 00:01:07,359 重视 18 00:01:07,359 --> 00:01:09,239 奖励的新框架来实现 19 00:01:09,239 --> 00:01:13,219 这个框架的关键在于让模型在任务失败后进行自我反思 20 00:01:13,219 --> 00:01:14,400 分析失败原因 21 00:01:14,400 --> 00:01:17,799 并在再次尝试时利用这些反思来改进表现 22 00:01:17,799 --> 00:01:18,759 具体来说 23 00:01:18,759 --> 00:01:22,099 模型在失败后会生成一段自我反思的评论 24 00:01:22,099 --> 00:01:23,579 解释哪里出了问题 25 00:01:23,579 --> 00:01:25,019 并提出改进建议 26 00:01:25,019 --> 00:01:28,179 然后模型会根据这些反思再次尝试任务 27 00:01:28,179 --> 00:01:29,879 如果第二次尝试成功 28 00:01:29,879 --> 00:01:32,140 模型在反思阶段生成的内容 29 00:01:32,140 --> 00:01:34,920 会通过一种名为Group Relative Policy Optimization 30 00:01:34,920 --> 00:01:36,699 Gruple的算法获得奖励 31 00:01:36,699 --> 00:01:39,239 从而进一步优化其自我反思的能力 32 00:01:39,239 --> 00:01:42,319 论文中使用了多个模型进行实验 33 00:01:42,319 --> 00:01:43,379 包括Cornar 34 00:01:43,379 --> 00:01:44,519 Lama 3.1 35 00:01:44,519 --> 00:01:45,599 Fi 3.5 36 00:01:45,599 --> 00:01:46,799 Mini Instruct等 37 00:01:46,799 --> 00:01:48,579 并基于两个主要数据集 38 00:01:48,579 --> 00:01:49,780 Epojin和Countdown 39 00:01:49,780 --> 00:01:52,780 Epojin数据集包含6万个高质量的函数调用 40 00:01:52,780 --> 00:01:55,140 要求模型生成正确的工具调用 41 00:01:55,140 --> 00:01:56,299 Countdown数据集 42 00:01:56,299 --> 00:01:59,280 则包含45万个数字列表和目标数字 43 00:01:59,280 --> 00:02:03,000 要求模型通过这些数字生成正确的方程来达到目标 44 00:02:03,000 --> 00:02:04,299 研究结果显示 45 00:02:04,299 --> 00:02:05,200 这种反思 46 00:02:05,200 --> 00:02:05,820 重视 47 00:02:05,820 --> 00:02:09,219 奖励的方法在提升模型性能方面非常有效 48 00:02:09,219 --> 00:02:11,159 特别是在Epojin数据集上 49 00:02:11,159 --> 00:02:13,639 经过Gurple训练的Quin27B模型 50 00:02:13,639 --> 00:02:17,020 甚至超过了未经过训练的Quin272B模型 51 00:02:17,020 --> 00:02:17,639 此外 52 00:02:17,639 --> 00:02:21,620 自我反思显著提升了模型在Countdown数据集上的表现 53 00:02:21,620 --> 00:02:24,379 尤其是对于那些初始表现较差的模型 54 00:02:24,379 --> 00:02:26,000 论文还指出 55 00:02:26,000 --> 00:02:30,139 这种自我反思的方法不仅增强了模型解决复杂任务的能力 56 00:02:30,139 --> 00:02:33,599 还使得较小的模型能够超越较大的未训练模型 57 00:02:33,599 --> 00:02:36,359 显示出其在效率和通用性上的优势 58 00:02:36,359 --> 00:02:36,800 此外 59 00:02:36,800 --> 00:02:39,780 研究中几乎没有观察到灾难性遗忘的现象 60 00:02:39,780 --> 00:02:43,380 表明这种方法在模型乳棒性方面也有显著提升 61 00:02:43,380 --> 00:02:44,219 总的来说 62 00:02:44,219 --> 00:02:46,840 这篇论文提出了一种创新的方法 63 00:02:46,840 --> 00:02:48,660 通过强化学习的方式 64 00:02:48,660 --> 00:02:51,260 让LLMS进行自我反思和改进 65 00:02:51,260 --> 00:02:53,800 从而在复杂任务上取得更好的表现 66 00:02:54,500 --> 00:02:57,300 这是本期节目的第二篇论文 67 00:02:57,300 --> 00:02:59,300 题目是超越8020法则 68 00:02:59,300 --> 00:03:03,220 高商少数Token驱动LLM推理的有效强化学习 69 00:03:03,219 --> 00:03:07,319 这篇论文目前在Hugging Face社区获得了130个点赞 70 00:03:07,319 --> 00:03:10,120 显示出它在学术界引起了广泛关注 71 00:03:10,120 --> 00:03:12,300 这篇论文的核心研究问题是 72 00:03:12,300 --> 00:03:16,400 在大型语言模型LLMS的验证奖励强化学习 73 00:03:16,400 --> 00:03:17,379 RLVR中 74 00:03:17,379 --> 00:03:20,120 不同类型的Token如何影响推理性能 75 00:03:20,199 --> 00:03:24,680 以及是否可以通过专注于特定类型的Token来提升RLVR的效果 76 00:03:24,680 --> 00:03:26,719 研究团队提出了一个假设 77 00:03:26,719 --> 00:03:30,699 高商的少数Token作为推理路径中的关键分支点 78 00:03:30,699 --> 00:03:34,780 比低商的多数Token更能有效驱动RLVR他们进一步假设 79 00:03:34,780 --> 00:03:37,839 通过限制策略梯度更新到这些高商Token 80 00:03:37,839 --> 00:03:41,699 可以在保持或提升性能的同时提供计算上的优势 81 00:03:41,699 --> 00:03:43,599 为了验证这一假设 82 00:03:43,599 --> 00:03:46,079 研究团队进行了详细的实验设计 83 00:03:46,199 --> 00:03:51,839 他们选择了捆3LLM家族的8B 14B和32B基础模型作为研究对象 84 00:03:51,839 --> 00:03:55,219 通过链式思维COT推理中的Token商模式分析 85 00:03:55,219 --> 00:03:57,459 结合控制实验来调节这根商 86 00:03:57,460 --> 00:04:00,620 并在RLVR训练中选择性的更新策略梯度 87 00:04:00,620 --> 00:04:01,860 数据收集方面 88 00:04:01,860 --> 00:04:04,939 他们使用了M24 M25等数据集 89 00:04:04,939 --> 00:04:07,580 并在多个评估数据集上进行了验证 90 00:04:07,580 --> 00:04:08,900 实验结果显示 91 00:04:08,900 --> 00:04:11,980 高商Token在推理过程中起到了关键作用 92 00:04:11,980 --> 00:04:14,760 他们不仅连接了逻辑推理的各个环节 93 00:04:14,760 --> 00:04:18,319 还能通过调节节码温度来显著影响模型的性能 94 00:04:18,319 --> 00:04:19,240 具体来说 95 00:04:19,240 --> 00:04:21,819 降低高商Token的温度会降低性能 96 00:04:21,819 --> 00:04:24,060 而增加其温度则能提升性能 97 00:04:24,060 --> 00:04:24,620 此外 98 00:04:24,620 --> 00:04:27,980 RLVR在训练过程中保留了基础模型的商模式 99 00:04:27,980 --> 00:04:30,420 并且主要改变了高商Token的商值 100 00:04:30,420 --> 00:04:32,259 最令人振奋的是 101 00:04:32,259 --> 00:04:33,620 研究团队发现 102 00:04:33,620 --> 00:04:36,000 仅关注高商Token的策略梯度更新 103 00:04:36,000 --> 00:04:37,459 不仅没有降低性能 104 00:04:37,459 --> 00:04:40,639 反而在Koen3模型上显著提升了推理效果 105 00:04:40,639 --> 00:04:44,120 这一发现对于优化LM的推理能力具有重要意义 106 00:04:44,120 --> 00:04:46,480 尤其是在处理复杂推理任务时 107 00:04:46,480 --> 00:04:50,399 高商Token的聚焦策略能够平衡探索与训练稳定性 108 00:04:50,399 --> 00:04:52,560 为模型带来更大的性能提升 109 00:04:52,560 --> 00:04:57,100 总的来说这篇论文通过深入分析Token商对推理性能的影响 110 00:04:57,100 --> 00:05:01,019 揭示了高商少数Token在驱动LM推理中的关键作用 111 00:05:01,019 --> 00:05:04,720 为未来的LMU化提供了新的思路和方法 112 00:05:04,720 --> 00:05:08,220 这是本期节目的第三篇论文 113 00:05:08,220 --> 00:05:09,180 题目是Po 114 00:05:09,180 --> 00:05:12,760 延长的强化学习拓展大型语言模型的推理边界 115 00:05:12,760 --> 00:05:16,600 这篇论文目前在Hugging Face社区获得了115个点赞 116 00:05:16,600 --> 00:05:19,680 显示出它在研究社区中引起了广泛关注 117 00:05:19,680 --> 00:05:21,920 这篇论文的核心研究问题是 118 00:05:21,920 --> 00:05:26,820 延长的强化学习训练能否在大型语言模型中揭示出新的推理策略 119 00:05:26,819 --> 00:05:30,779 这些策略是基础模型在广泛采样下也无法获得的 120 00:05:30,779 --> 00:05:32,639 研究团队提出了一个假设 121 00:05:32,639 --> 00:05:34,779 通过延长的强化学习训练 122 00:05:34,779 --> 00:05:38,279 模型可以在其基础模型的基础上拓展推理能力 123 00:05:38,279 --> 00:05:40,079 发现新的解决方案路径 124 00:05:40,079 --> 00:05:42,079 并在各种任务中表现更好 125 00:05:42,079 --> 00:05:43,519 为了验证这一假设 126 00:05:43,519 --> 00:05:46,719 研究团队设计了一种名为Pro的新训练方法 127 00:05:46,719 --> 00:05:49,360 这种方法结合了KL散度控制 128 00:05:49,360 --> 00:05:52,259 参考策略重置以及一系列多样化的任务 129 00:05:52,259 --> 00:05:54,579 他们使用了三个模型进行实验 130 00:05:54,579 --> 00:05:55,939 DeepSea Car 1-1 131 00:05:55,939 --> 00:05:57,560 5B作为基础模型 132 00:05:57,560 --> 00:05:59,779 Demitra Research Reasoning宽1.5B 133 00:05:59,779 --> 00:06:01,660 作为经过Pro训练的模型 134 00:06:01,660 --> 00:06:04,519 以及DeepSea Car 1-7B用于比较 135 00:06:04,519 --> 00:06:05,600 在实验过程中 136 00:06:05,600 --> 00:06:09,100 Pro训练包括了超过2000步的强化学习训练 137 00:06:09,100 --> 00:06:11,819 同时引入了KL散度惩罚来保持伤 138 00:06:11,819 --> 00:06:13,220 并防止策略漂移 139 00:06:13,220 --> 00:06:14,980 参考策略会定期重置 140 00:06:14,980 --> 00:06:16,279 以允许持续改进 141 00:06:16,279 --> 00:06:18,060 训练数据涵盖了数学 142 00:06:18,060 --> 00:06:18,759 代码 143 00:06:18,759 --> 00:06:19,120 STEM 144 00:06:19,120 --> 00:06:21,560 逻辑谜题和指令跟随等多种任务 145 00:06:21,560 --> 00:06:24,480 共构建了一个包含136000个视力的 146 00:06:24,480 --> 00:06:25,800 多样化训练数据集 147 00:06:25,800 --> 00:06:27,160 研究结果显示 148 00:06:27,160 --> 00:06:29,259 经过强化学习训练的模型 149 00:06:29,259 --> 00:06:30,620 在各种任务中的表现 150 00:06:30,620 --> 00:06:32,100 显著优于基础模型 151 00:06:32,100 --> 00:06:32,700 例如 152 00:06:32,700 --> 00:06:33,900 在数学任务中 153 00:06:33,900 --> 00:06:36,900 PiSide1的提升达到了14.7% 154 00:06:36,900 --> 00:06:39,700 在编码任务中提升了13.9% 155 00:06:39,700 --> 00:06:42,640 在逻辑谜题中提升了54.8% 156 00:06:42,640 --> 00:06:45,860 在STEM推理任务中提升了25.1% 157 00:06:45,860 --> 00:06:49,080 在指令跟随任务中提升了18.1% 158 00:06:49,080 --> 00:06:49,439 此外 159 00:06:49,439 --> 00:06:50,540 研究还发现 160 00:06:50,540 --> 00:06:52,540 Pro训练在超过2000步 161 00:06:52,540 --> 00:06:54,860 后仍能持续提升模型性能 162 00:06:54,860 --> 00:06:57,220 论文还引入了创造力指数 163 00:06:57,220 --> 00:06:59,160 来量化推理路径的吸引性 164 00:06:59,160 --> 00:07:00,180 结果表明 165 00:07:00,180 --> 00:07:01,879 延长的强化学习训练 166 00:07:01,879 --> 00:07:04,560 确实能够产生更具创新性的解决方案 167 00:07:04,560 --> 00:07:05,360 这一发现 168 00:07:05,360 --> 00:07:06,379 挑战了之前认为 169 00:07:06,379 --> 00:07:07,500 强化学习模型 170 00:07:07,500 --> 00:07:09,620 不会获得新推理能力的研究结论 171 00:07:09,620 --> 00:07:10,420 总的来说 172 00:07:10,420 --> 00:07:12,520 这篇论文提供了新的见解 173 00:07:12,520 --> 00:07:14,259 展示了在什么条件下 174 00:07:14,259 --> 00:07:17,560 强化学习能够有效拓展语言模型的推理边界 175 00:07:17,560 --> 00:07:18,920 研究结果表明 176 00:07:18,920 --> 00:07:21,500 通过稳定且延长的强化学习训练 177 00:07:22,540 --> 00:07:24,080 开发出超越基础模型 178 00:07:24,080 --> 00:07:25,800 初始能力的新的推理模式 179 00:07:25,800 --> 00:07:29,080 本期节目的第四篇论文 180 00:07:29,080 --> 00:07:30,220 我们来关注一篇 181 00:07:30,220 --> 00:07:31,480 名为Alpha 1 182 00:07:31,480 --> 00:07:33,120 测试时驱动大模型 183 00:07:33,120 --> 00:07:35,340 进行快慢思考的推理框架的研究 184 00:07:35,340 --> 00:07:37,740 这篇论文目前在Hugging Face社区 185 00:07:37,740 --> 00:07:39,180 获得了89个点赞 186 00:07:39,180 --> 00:07:42,660 显示出它在学术界和开发者社区中的广泛关注 187 00:07:42,660 --> 00:07:46,200 这篇论文的核心目标是解决大型推理模型 188 00:07:46,200 --> 00:07:47,860 LRMS在测试时 189 00:07:47,860 --> 00:07:50,140 如何动态调节推理过程的挑战 190 00:07:50,139 --> 00:07:52,539 研究人员提出了一个名为Alpha 1 191 00:07:52,539 --> 00:07:53,919 Alpha 1的框架 192 00:07:53,919 --> 00:07:56,879 旨在提升LRMS的推理能力和效率 193 00:07:56,879 --> 00:07:57,839 简单来说 194 00:07:57,839 --> 00:07:59,560 Alpha 1通过在测试时 195 00:07:59,560 --> 00:08:02,099 动态调度慢思考和快思考的转换 196 00:08:02,099 --> 00:08:06,680 帮助模型在深度分析和计算效率之间找到平衡 197 00:08:06,680 --> 00:08:07,379 具体来看 198 00:08:07,379 --> 00:08:11,180 研究团队使用了三个开源的LRMS作为基础模型 199 00:08:11,180 --> 00:08:12,719 分别是DeepSeq R1 200 00:08:12,719 --> 00:08:14,180 Distil QN1.5B 201 00:08:14,180 --> 00:08:15,079 DeepSeq R1 202 00:08:15,079 --> 00:08:17,379 Distil QN7B和QNQXRB 203 00:08:17,379 --> 00:08:18,899 他们在一系列涵盖数学 204 00:08:18,899 --> 00:08:22,279 编程和科学领域的六个基准测试上进行了实验 205 00:08:22,279 --> 00:08:23,699 包括M2024 206 00:08:23,699 --> 00:08:24,779 AMCR3 207 00:08:24,779 --> 00:08:25,759 Minerva Math等 208 00:08:25,759 --> 00:08:29,339 实验在NVIDIA L40S和A100GPU上进行 209 00:08:29,339 --> 00:08:32,480 确保了计算资源的充足和实验的可靠性 210 00:08:32,480 --> 00:08:37,120 论文的主要创新点在于引入了Alpha时刻AlphaMoment这一概念 211 00:08:37,120 --> 00:08:39,659 通过于Alpha和后Alpha时刻的调节 212 00:08:39,659 --> 00:08:43,340 Alpha1能够有效地在测试时对LRMS进行缩放 213 00:08:43,340 --> 00:08:45,320 研究人员还通过对比实验 214 00:08:45,320 --> 00:08:47,899 验证了Alpha1在问题解决准确性 215 00:08:47,899 --> 00:08:49,680 PiCity和推理效率 216 00:08:49,680 --> 00:08:51,700 FAP指标上的显著提升 217 00:08:51,700 --> 00:08:53,759 例如1.5B的模型 218 00:08:53,759 --> 00:08:54,920 在使用Alpha1后 219 00:08:54,920 --> 00:08:58,039 问题解决准确性提高了6.15% 220 00:08:58,039 --> 00:09:00,480 同时令牌长度减少了14% 221 00:09:00,480 --> 00:09:02,220 研究结果显示 222 00:09:02,220 --> 00:09:06,379 Alpha1不仅在准确性上超越了传统的测试时缩放方法 223 00:09:06,379 --> 00:09:07,899 如SE和Chain of Draft 224 00:09:07,899 --> 00:09:10,220 而且在推理效率上也表现出色 225 00:09:10,220 --> 00:09:11,060 特别是 226 00:09:11,060 --> 00:09:14,300 论文发现慢思考到快思考的线性调度方式 227 00:09:14,300 --> 00:09:16,440 能够带来最高的推理准确性 228 00:09:16,440 --> 00:09:20,279 这表明慢思考在提升推理效率方面起到了关键作用 229 00:09:20,279 --> 00:09:21,180 总体而言 230 00:09:21,180 --> 00:09:25,860 Alpha1为大型推理模型提供了一个通用的推理过程调节框架 231 00:09:25,860 --> 00:09:28,620 展示了慢思考和快思考的动态转换 232 00:09:28,620 --> 00:09:30,800 如何有效提升模型的推理能力 233 00:09:30,799 --> 00:09:34,839 这一研究不仅为LRMS的实际应用提供了新的思路 234 00:09:34,839 --> 00:09:38,719 也为未来在测试时优化模型推理提供了宝贵的经验 235 00:09:38,719 --> 00:09:44,899 这就是本期节目关于Alpha1测试时驱动大模型进行快慢思考的推理框架的介绍 236 00:09:44,899 --> 00:09:48,439 这是本期节目的第五篇论文 237 00:09:48,439 --> 00:09:48,939 题目是Small Flux 238 00:09:48,939 --> 00:09:52,439 一种用于经济高效型机器人的视觉 239 00:09:52,439 --> 00:09:53,079 语言 240 00:09:53,079 --> 00:09:54,059 动作模型 241 00:09:54,059 --> 00:09:58,000 这篇论文目前在Hugging Face社区获得了75个点赞 242 00:09:58,000 --> 00:10:00,980 论文的核心目标是解决现有大规模视觉 243 00:10:00,980 --> 00:10:01,600 语言 244 00:10:01,600 --> 00:10:02,299 动作 245 00:10:02,299 --> 00:10:02,779 Flux 246 00:10:02,779 --> 00:10:07,379 模型在机器人领域中面临的高训练成本和实际部署困难的问题 247 00:10:07,379 --> 00:10:09,879 研究团队提出了一个关键问题 248 00:10:09,879 --> 00:10:11,679 是否可以开发一种小型 249 00:10:11,679 --> 00:10:13,980 高效且由社区驱动的伐模型 250 00:10:13,980 --> 00:10:16,360 既能大幅降低训练和推理成本 251 00:10:16,360 --> 00:10:19,319 同时还能在机器人任务中保持竞争力 252 00:10:19,319 --> 00:10:20,720 论文的答案是Small Flux 253 00:10:20,720 --> 00:10:22,579 这是一种紧凑的伐模型 254 00:10:22,579 --> 00:10:26,179 专门设计用于单GPU训练和消费级设备的部署 255 00:10:26,179 --> 00:10:29,740 Small Flux通过利用社区收集的数据和一部推理技术 256 00:10:29,740 --> 00:10:33,539 实现了与更大规模模型相媲美的性能 257 00:10:33,539 --> 00:10:34,419 在方法论上 258 00:10:34,419 --> 00:10:37,019 Small Flux有一个紧凑的与训练视觉 259 00:10:37,019 --> 00:10:40,259 以N模型VLM和一个动作专家组成 260 00:10:40,259 --> 00:10:42,240 VLM负责处理语言指令 261 00:10:42,240 --> 00:10:44,620 RGB图像和机器人传感器状态 262 00:10:44,620 --> 00:10:48,919 而动作专家则通过交替的交叉注意力和自注意力快进行训练 263 00:10:48,919 --> 00:10:50,299 输出低级别动作 264 00:10:50,299 --> 00:10:51,259 数据集方面 265 00:10:51,259 --> 00:10:55,980 研究团队使用了来自Hugging Face的481个社区数据集的子集 266 00:10:55,980 --> 00:10:57,879 以及新的MetaWorld数据集 267 00:10:57,879 --> 00:11:00,679 和几个真实世界的机器人操作任务数据集 268 00:11:00,679 --> 00:11:01,820 训练过程中 269 00:11:01,820 --> 00:11:03,639 Small Flux通过模仿学习 270 00:11:03,639 --> 00:11:05,639 在社区数据集上进行运训练 271 00:11:05,639 --> 00:11:07,299 并使用现成的VLM 272 00:11:07,299 --> 00:11:08,419 如Kun 2.5 273 00:11:08,419 --> 00:11:09,860 VL3B Instruct 274 00:11:09,860 --> 00:11:11,220 自动生成任务描述 275 00:11:11,220 --> 00:11:12,639 以改进任务注视 276 00:11:12,639 --> 00:11:13,559 推理阶段 277 00:11:13,559 --> 00:11:14,700 一部推理技术 278 00:11:14,700 --> 00:11:17,340 将动作执行与观察处理和动作预测机 279 00:11:17,340 --> 00:11:19,320 从而提高了控制频率 280 00:11:19,320 --> 00:11:21,080 并减少了任务完成时间 281 00:11:21,080 --> 00:11:22,059 在评估中 282 00:11:22,059 --> 00:11:26,279 Small Flux在模拟和真实世界的机器人基准测试中表现出色 283 00:11:26,279 --> 00:11:29,740 特别是在识取、放置、堆叠和分类任务中 284 00:11:29,740 --> 00:11:31,299 优于其他Fla模型 285 00:11:31,299 --> 00:11:32,259 一部推理 286 00:11:32,259 --> 00:11:35,839 还使任务完成时间减少了约30% 287 00:11:35,839 --> 00:11:36,959 论文的结论表明 288 00:11:36,959 --> 00:11:39,000 通过利用社区驱动数据集 289 00:11:39,000 --> 00:11:41,600 优化模型架构和一部推理技术 290 00:11:41,600 --> 00:11:43,240 紧凑高效的Fla模型 291 00:11:43,240 --> 00:11:45,720 可以在机器人任务中取得竞争性表现 292 00:11:45,720 --> 00:11:47,299 Small Flux成功展示了 293 00:11:47,299 --> 00:11:49,720 开发经济高效型Fla模型的可行性 294 00:11:49,720 --> 00:11:52,240 为机器人研究提供了新的可能性 295 00:11:52,240 --> 00:11:55,419 并使更多资源有限的实际应用成为可能 296 00:11:55,419 --> 00:11:59,139 以上就是本期节目的全部内容 297 00:11:59,139 --> 00:12:00,459 感谢大家的收听 298 00:12:00,459 --> 00:12:02,059 如果你喜欢本期内容 299 00:12:02,059 --> 00:12:03,539 欢迎在评论区留言 300 00:12:03,539 --> 00:12:04,159 点赞 301 00:12:04,159 --> 00:12:04,740 转发 302 00:12:04,740 --> 00:12:05,979 并订阅我们的节目 303 00:12:05,979 --> 00:12:06,559 同时 304 00:12:06,559 --> 00:12:08,659 别忘了关注我们在小红书的账号 305 00:12:08,659 --> 00:12:09,199 ISOD 306 00:12:09,199 --> 00:12:10,539 我们下期节目再见 307 00:12:10,539 --> 00:12:12,179 Hayae 308 00:12:12,179 --> 00:12:28,179 ��