e.bike.free.fr

Le site communautaire où l'on discute des vélos à assistance électrique en copyleft, libre de tout bandeau publicitaire

Vous n'êtes pas identifié.

Annonce

Bienvenue sur e.bike.free.fr le forum communautaire dédié aux vélos à assistance électrique sans pollution publicitaire envahissante. N'hésitez pas à faire part de vos connaissances sur les différents modèles évoqués.

#1 15-02-2025 14:57:32

RayfordLat
Membre
Date d'inscription: 02-02-2025
Messages: 20
Site web

*

Machine-learning models can fail when they try to make forecasts for individuals who were underrepresented in the datasets they were trained on.


For instance, a design that anticipates the best treatment option for somebody with a chronic illness might be trained using a dataset that contains mainly male patients. That model may make inaccurate predictions for female patients when deployed in a hospital.
https://www.techadvisor.com/wp-content/uploads/2025/01/deepseek-explainer-2.jpg?quality\u003d50\u0026strip\u003dall

To improve results, engineers can try balancing the training dataset by eliminating information points until all subgroups are represented equally. While dataset balancing is promising, it typically requires removing big quantity of data, hurting the design's overall efficiency.


MIT researchers developed a new method that determines and gets rid of particular points in a training dataset that contribute most to a design's failures on minority subgroups. By eliminating far less datapoints than other methods, this technique maintains the general precision of the design while improving its performance regarding underrepresented groups.


In addition, the technique can determine hidden sources of predisposition in a training dataset that lacks labels. Unlabeled information are far more prevalent than identified information for oke.zone lots of applications.


This method could likewise be combined with other methods to improve the fairness of machine-learning designs deployed in high-stakes scenarios. For instance, it may sooner or later assist make sure underrepresented patients aren't misdiagnosed due to a prejudiced AI model.
https://i.ytimg.com/vi/yZ8C2RY54q0/hq720.jpg?sqp\u003d-oaymwEhCK4FEIIDSFryq4qpAxMIARUAAAAAGAElAADIQj0AgKJD\u0026rs\u003dAOn4CLClbyTfxjtQ8ai7_Vx428R2rBKKKg

"Many other algorithms that attempt to resolve this issue assume each datapoint matters as much as every other datapoint. In this paper, we are showing that presumption is not real. There are specific points in our dataset that are contributing to this bias, and we can discover those data points, eliminate them, and get better efficiency," says Kimia Hamidieh, an electrical engineering and computer science (EECS) graduate trainee at MIT and co-lead author of a paper on this technique.


She composed the paper with co-lead authors Saachi Jain PhD '24 and fellow EECS graduate trainee Kristian Georgiev; Andrew Ilyas MEng '18, PhD '23, a Stein Fellow at Stanford University; and senior authors Marzyeh Ghassemi, photorum.eclat-mauve.fr an associate professor in EECS and a member of the Institute of Medical Engineering Sciences and the Laboratory for Details and Decision Systems, and Aleksander Madry, the Cadence Design Systems Professor at MIT. The research will be provided at the Conference on Neural Details Processing Systems.
http://www.johnhagel.com/wp-content/uploads/2023/11/FB-AI-istockphoto-1206796363-612x612-1.jpg

Removing bad examples


Often, machine-learning models are trained utilizing big datasets gathered from many sources throughout the web. These datasets are far too large to be carefully curated by hand, so they may contain bad examples that harm model performance.


Scientists also know that some data points affect a design's efficiency on certain downstream tasks more than others.


The MIT scientists integrated these 2 concepts into an approach that identifies and eliminates these troublesome datapoints. They seek to resolve an issue understood as worst-group error, which takes place when a design underperforms on minority subgroups in a training dataset.
https://cdn.mos.cms.futurecdn.net/VFLt5vHV7aCoLrLGjP9Qwm-1200-80.jpg

The scientists' brand-new technique is driven by previous operate in which they presented a method, called TRAK, that determines the most essential training examples for a specific design output.
https://fortune.com/img-assets/wp-content/uploads/2025/01/GettyImages-2195402115_5043c9-e1737975454770.jpg?w\u003d1440\u0026q\u003d75

For this brand-new technique, they take inaccurate predictions the design made about minority subgroups and utilize TRAK to identify which training examples contributed the most to that inaccurate forecast.


"By aggregating this details throughout bad test forecasts in the proper way, we are able to discover the particular parts of the training that are driving worst-group accuracy down in general," Ilyas explains.


Then they remove those specific samples and retrain the model on the remaining information.
https://blog.chathub.gg/content/images/size/w1200/2024/12/deepseek-v3-released.jpeg

Since having more information usually yields much better overall efficiency, eliminating just the samples that drive worst-group failures maintains the model's total precision while improving its performance on minority subgroups.


A more available method


Across 3 machine-learning datasets, their method exceeded multiple techniques. In one instance, wiki.dulovic.tech it improved worst-group precision while removing about 20,000 fewer training samples than a conventional information balancing method. Their method also attained higher precision than methods that need making changes to the inner functions of a model.


Because the MIT method involves changing a dataset rather, it would be easier for a practitioner to utilize and can be used to many types of models.


It can also be used when predisposition is unidentified because subgroups in a training dataset are not identified. By identifying datapoints that contribute most to a feature the design is finding out, they can comprehend the variables it is utilizing to make a prediction.


"This is a tool anybody can use when they are training a machine-learning design. They can take a look at those datapoints and see whether they are aligned with the ability they are attempting to teach the design," says Hamidieh.
https://images.theconversation.com/files/603699/original/file-20240628-19-pk34ad.jpg?ixlib\u003drb-4.1.0\u0026rect\u003d17%2C579%2C5244%2C2617\u0026q\u003d45\u0026auto\u003dformat\u0026w\u003d1356\u0026h\u003d668\u0026fit\u003dcrop

Using the method to spot unknown subgroup predisposition would require instinct about which groups to try to find, so the scientists want to confirm it and explore it more completely through future human studies.


They likewise desire to enhance the efficiency and reliability of their technique and make sure the approach is available and easy-to-use for professionals who might at some point deploy it in real-world environments.


"When you have tools that let you seriously look at the information and find out which datapoints are going to cause predisposition or other unwanted behavior, it provides you a very first action toward building models that are going to be more fair and more reliable," Ilyas says.


This work is funded, in part, by the National Science Foundation and the U.S. Defense Advanced Research Projects Agency.


Review my weblog; ai

Hors ligne

 

Pied de page des forums

Propulsé par FluxBB
Traduction par FluxBB.fr