中国的人工智能启动公司DeepSeek引入了新方法,以提高大型模型的效率,降低成本,提高可缩放性。
Chinese AI startup DeepSeek introduces new method to make large models more efficient, reducing costs and boosting scalability.
DeepSeek是中国AI的启动单位,它推出了一种新的培训方法,名为Manicide-Constraced Hyper-Conconnections,旨在使大型AI模型更加有效和可调整,同时降低计算成本和能源成本。
DeepSeek, a Chinese AI startup, has unveiled a new training method called Manifold-Constrained Hyper-Connections, designed to make large AI models more efficient and scalable while reducing computational and energy costs.
该技术在创始人梁文芬共同撰写的一篇论文中作了详细介绍,并在ArXiv上发表。 该技术用以前的模型处理培训不稳定性和记忆问题,使30亿至270亿参数系统能够进行稳定培训,并尽量减少计算。
The technique, detailed in a paper co-authored by founder Liang Wenfeng and published on arXiv, addresses training instability and memory issues in prior models, enabling stable training across 3 billion to 27 billion parameter systems with minimal added compute.
这一方法以ByteDance先前的工作为基础,反映了中国不顾美国对半导体的限制而推动AI创新。
Building on ByteDance’s earlier work, the approach reflects China’s push for AI innovation despite U.S. semiconductor restrictions.
DeepSeek下一个主要模型(可能是R2)的释放燃料预计将在2月的春节前后推出。
The release fuels anticipation for DeepSeek’s next major model, possibly R2, expected around the Spring Festival in February.