e.bike.free.fr

Le site communautaire où l'on discute des vélos à assistance électrique en copyleft, libre de tout bandeau publicitaire

Vous n'êtes pas identifié.

Annonce

Bienvenue sur e.bike.free.fr le forum communautaire dédié aux vélos à assistance électrique sans pollution publicitaire envahissante. N'hésitez pas à faire part de vos connaissances sur les différents modèles évoqués.

Pages: 1

#1 29-05-2025 00:54:02

KurtisChin: Membre; Date d'inscription: 02-02-2025; Messages: 11; Site web

*

DeepSeek: at this stage, the only takeaway is that open-source designs surpass proprietary ones. Everything else is bothersome and I don't purchase the public numbers.

DeepSink was developed on top of open source Meta designs (PyTorch, Llama) and ClosedAI is now in danger because its appraisal is outrageous.
$https://akm-img-a-in.tosshub.com/indiatoday/images/story/202501/deepseek-ai-281910912-16x9_0.jpg?VersionId\u003dI7zgWN8dMRo5fxVA5bmLHYK3rFn09syO\u0026size\u003d690:388$

To my understanding, no public documentation links DeepSeek straight to a particular "Test Time Scaling" technique, however that's extremely possible, so enable me to streamline.

Test Time Scaling is used in device discovering to scale the model's performance at test time instead of during training.

That implies fewer GPU hours and less effective chips.

To put it simply, lower computational requirements and lower hardware costs.

That's why Nvidia lost nearly $600 billion in market cap, the biggest one-day loss in U.S. history!

Many people and institutions who shorted American AI stocks ended up being extremely abundant in a couple of hours because investors now project we will need less effective AI chips ...

Nvidia short-sellers simply made a single-day earnings of $6.56 billion according to research study from S3 Partners. Nothing compared to the marketplace cap, I'm looking at the single-day quantity. More than 6 billions in less than 12 hours is a lot in my book. Which's simply for Nvidia. Short sellers of chipmaker Broadcom earned more than $2 billion in revenues in a couple of hours (the US stock market operates from 9:30 AM to 4:00 PM EST).

The Nvidia Short Interest With time information shows we had the second highest level in January 2025 at $39B however this is outdated due to the fact that the last record date was Jan 15, 2025 -we have to wait for the current data!

A tweet I saw 13 hours after publishing my post! Perfect summary Distilled language models

Small language designs are trained on a smaller scale. What makes them various isn't just the capabilities, it is how they have actually been built. A distilled language design is a smaller, more effective model produced by moving the understanding from a larger, more intricate design like the future ChatGPT 5.

Imagine we have an instructor design (GPT5), which is a large language model: a deep neural network trained on a great deal of information. Highly resource-intensive when there's limited computational power or when you need speed.

The knowledge from this instructor design is then "distilled" into a trainee model. The trainee design is easier and has less parameters/layers, which makes it lighter: less memory usage and computational needs.

During distillation, the trainee model is trained not just on the raw information however also on the outputs or the "soft targets" (possibilities for each class instead of difficult labels) produced by the teacher model.

With distillation, the trainee design gains from both the original data and the detailed forecasts (the "soft targets") made by the teacher design.

Simply put, the trainee design does not simply gain from "soft targets" however likewise from the exact same training information used for the teacher, however with the assistance of the teacher's outputs. That's how understanding transfer is optimized: double learning from information and from the instructor's forecasts!

Ultimately, the trainee mimics the teacher's decision-making process ... all while utilizing much less computational power!

But here's the twist as I understand it: DeepSeek didn't simply extract material from a single large language design like ChatGPT 4. It relied on numerous big language models, including open-source ones like Meta's Llama.

So now we are distilling not one LLM however several LLMs. That was one of the "genius" concept: mixing various architectures and datasets to produce a seriously adaptable and robust little language design!

DeepSeek: Less guidance

Another necessary development: less human supervision/guidance.

The question is: how far can models go with less human-labeled information?

R1-Zero learned "thinking" capabilities through trial and mistake, it evolves, it has special "reasoning behaviors" which can result in sound, endless repetition, and language mixing.

R1-Zero was experimental: there was no preliminary guidance from labeled data.

DeepSeek-R1 is different: it used a structured training pipeline that consists of both monitored fine-tuning and support learning (RL). It started with preliminary fine-tuning, followed by RL to fine-tune and improve its reasoning capabilities.

Completion result? Less noise and no language blending, unlike R1-Zero.

R1 uses human-like reasoning patterns first and it then advances through RL. The development here is less human-labeled information + RL to both guide and fine-tune the model's efficiency.

My question is: did DeepSeek really resolve the problem understanding they drew out a great deal of data from the datasets of LLMs, which all gained from human supervision? To put it simply, is the conventional reliance truly broken when they count on previously trained designs?

Let me reveal you a live real-world screenshot shared by Alexandre Blanc today. It shows training information extracted from other designs (here, ChatGPT) that have actually gained from human supervision ... I am not persuaded yet that the traditional dependency is broken. It is "easy" to not require huge amounts of high-quality thinking information for training when taking shortcuts ...

To be well balanced and reveal the research, I have actually published the DeepSeek R1 Paper (downloadable PDF, 22 pages).

My concerns concerning DeepSink?

Both the web and mobile apps collect your IP, keystroke patterns, and gadget details, and everything is kept on servers in China.

Keystroke pattern analysis is a behavioral biometric technique used to identify and verify people based upon their special typing patterns.

I can hear the "But 0p3n s0urc3 ...!" comments.

Yes, trademarketclassifieds.com open source is fantastic, but this reasoning is limited because it does NOT consider human psychology.

Regular users will never ever run models in your area.

Most will simply want fast responses.

Technically unsophisticated users will use the web and mobile versions.

Millions have actually currently downloaded the mobile app on their phone.

DeekSeek's models have a genuine edge which's why we see ultra-fast user adoption. In the meantime, they are exceptional to Google's Gemini or OpenAI's ChatGPT in numerous methods. R1 scores high on objective standards, no doubt about that.

I recommend looking for anything sensitive that does not line up with the Party's propaganda on the internet or mobile app, and the output will promote itself ...

China vs America

Screenshots by T. Cassel. Freedom of speech is stunning. I might share terrible examples of propaganda and censorship however I will not. Just do your own research study. I'll end with DeepSeek's personal privacy policy, which you can continue reading their website. This is a basic screenshot, nothing more.

Rest guaranteed, your code, ideas and discussions will never ever be archived! When it comes to the genuine financial investments behind DeepSeek, we have no concept if they remain in the numerous millions or in the billions. We just know the $5.6 M amount the media has been pushing left and right is false information!

My site ai

Hors ligne

Pages: 1

Pied de page des forums

Aller à

Propulsé par FluxBB
Traduction par FluxBB.fr