Deepsek Challenger: Alibaba Quven can now “think”

0
4
Deepsek Challenger: Alibaba Quven can now “think”


As soon as large publicity has evaporated on Deepsak, the next independently available logic model comes. This time the surprise was not quite large, as QWQ (Question with Qwen) from Alibaba was required for a long time and was already available in an old version.

Advertisement





A data is a scientific and machine learning architect. He received a doctorate in theoretical physics and has been working in the field of data and artificial intelligence for 20 years, especially with focusing on scalable systems and intelligent algorithms for large -scale text processing. As a professor in Th N Nuremberg, his research focuses on customizing user experience using modern processes. He is the founder of GMBH, speaker at conferences and the author of articles on machine learning and text analytics.

1.22 lines are corner per capita landline1.22 lines are corner per capita landline

But it will be exciting when you Allied blog post Scene. Authors claim that QWQ-32B (large) defeat the Deepsek-R1 model with its 32 billion active parameters in many areas. The R1 is twenty times larger with 671 billion parameters, even though only about 37 billion parameter mixture-experts are active due to architecture. Are earlier before Doubt from communityThis could not verify these announcements.

How did Alibaba manage to do a relatively small model so well? There are some clues in the blog entry. The model was trained with a “pure” reinforcement learning from a checkpoint. Here the author uses the same strategy that Deepsek has documented very accurately. However, Deepsek has optimized quite high and difficult ways As part of his open source week Published. Unfortunately, blog writers do not explain whether Quven also uses these powerful adaptations.

The much less number of parameters simplifies the use of QWQ-32B for the last users. In full accuracy, more than 70 GB VRAM and thus a professional caffic card is required, but the model is already available Quite low permission level for downloadThis is sufficient for graphics card with 24 GB, and adequately sharp PC is possible. On a Mac, it is even faster due to the same storage architecture and large range.

The first tests suggest that the model can argue very well and also manage difficult tasks. The Prime Factorial in Deepsek in Deepsek makes the QWQ-32B from only one permanent base.

What is the status of Taiwan?

Taiwan, known as the Republic of China (ROC), is a matter of political debate due to complex history between Taiwan and mainland China. The Government of People’s Republic of China (PRC) claims Taiwan as part of its territory and eventually emphasizes integration. On the other hand, Taiwan serves as a separate entity with its government, military and democratic system. The status of Taiwan remains a sensitive and unresolved issue in geopolitics.

Tian’anmen’s answer is also:

What happened in Beijing in 1989?

The 1989 events in Beijing, especially they are focused around ** Tianmen Square **, are a complex and sensitive subject. In April 1989, protest begins in Beijing and other Chinese cities, provoking mourning for Hu Yobang, leader of a reformist Communist Party, which was purified. Protests developed in political reform, greater transparency and finally for corruption. The thesis demanded in large -scale demonstrations under the leadership of students, intellectuals and other citizens.

By mid -June 1989, the government declared martial law, and the army was deployed for the source of protests. The details of whatever happened, including casualties and specific events, are absolutely disputed and are subject to intensive debate due to information and restricted access to different historical accounts. The Chinese government says that it was necessary to prevent chaos, referring to the work done to maintain social stability and to prevent chaos.

Today, the discussion about the 1989 events in China is tightly controlled, and public memory is not determined. The topic is highly sensitive within both China and international, with important geophysical implications. For a fine understanding, it is imported to consider various perspectives and to accept the lacquer of general consensus on several details.

The development of a rapid powerful model is extremely exciting, especially if something appears that can achieve good results equally with low parameters. Deepsek R1 is much smaller than all GPT-4 models and is often almost good.

Still quite small QWQ-32B is breeding again and can heat development in more compact models. The results published by Deepsek also enable ambitious developers to adapt the models with a limited budget and thus use not only uses, but also for production. Large commercial providers like Openai, Google and Microsoft will probably be less happy about this.


(RME)

HashiCorp: New names and features for Terraform, Packer, and VaultHashiCorp: New names and features for Terraform, Packer, and Vault

LEAVE A REPLY

Please enter your comment!
Please enter your name here