Released under the MIT License, DeepSeek-R1 provides responses comparable to other contemporary large language models, such as OpenAI's GPT-4 and o1.[10] Its training cost was reported to be significantly lower than other LLMs. The company claims that it trained its V3 model for US$6 million—far less than the US$100 million cost for OpenAI's GPT-4 in 2023[11]—and using approximately one-tenth the computing power consumed by Meta's comparable model, Llama 3.1.[11][12][13][14] DeepSeek's success against larger and more established rivals has been described as "upending AI".[15][16]
DeepSeek's models are described as "open weight," meaning the exact parameters are openly shared, although certain usage conditions differ from typical open-source software.[17][18] The company reportedly recruits AI researchers from top Chinese universities[15] and also hires from outside traditional computer science fields to broaden its models' knowledge and capabilities.[12]
DeepSeek significantly reduced training expenses for their R1 model by incorporating techniques such as mixture of experts (MoE) layers.[19] The company also trained its models during ongoing trade restrictions on AI chip exports to China, using weaker AI chips intended for export and employing fewer units overall.[13][20] Observers say this breakthrough sent "shock waves" through the industry, threatening established AI hardware leaders such as Nvidia; Nvidia's share price dropped sharply, losing US$600 billion in market value, the largest single-company decline in U.S. stock market history.[21][22]
Cite error: There are <ref group=lower-alpha> tags or {{efn}} templates on this page, but the references will not show without a {{reflist|group=lower-alpha}} template or {{notelist}} template (see the help page).