AdaMix, a parameter-efficient fine-tuning method, outperforms full model fine-tuning in few-shot NLU tasks across benchmarks like GLUE. Using prompt-based strategies without extra validation or unlabeled data, AdaMix consistently boosts performance with both BERT and RoBERTa encoders, demonstrating stability and efficiency in few-shot scenarios.AdaMix, a parameter-efficient fine-tuning method, outperforms full model fine-tuning in few-shot NLU tasks across benchmarks like GLUE. Using prompt-based strategies without extra validation or unlabeled data, AdaMix consistently boosts performance with both BERT and RoBERTa encoders, demonstrating stability and efficiency in few-shot scenarios.

Smarter AI Training with Few-Shot Natural Language Tasks

2025/10/02 17:00
Okuma süresi: 3 dk
Bu içerikle ilgili geri bildirim veya endişeleriniz için lütfen crypto.news@mexc.com üzerinden bizimle iletişime geçin.

Abstract and 1. Introduction

  1. Background

    2.1 Mixture-of-Experts

    2.2 Adapters

  2. Mixture-of-Adaptations

    3.1 Routing Policy

    3.2 Consistency regularization

    3.3 Adaptation module merging and 3.4 Adaptation module sharing

    3.5 Connection to Bayesian Neural Networks and Model Ensembling

  3. Experiments

    4.1 Experimental Setup

    4.2 Key Results

    4.3 Ablation Study

  4. Related Work

  5. Conclusions

  6. Limitations

  7. Acknowledgment and References

Appendix

A. Few-shot NLU Datasets B. Ablation Study C. Detailed Results on NLU Tasks D. Hyper-parameter

A Few-shot NLU Datasets

Data. In contrast to the fully supervised setting in the above experiments, we also perform fewshot experiments following the prior study (Wang et al., 2021) on six tasks including MNLI (Williams et al., 2018), RTE (Dagan et al., 2005; Bar Haim et al., 2006; Giampiccolo et al., 2007; Bentivogli et al., 2009), QQP[1] and SST-2 (Socher et al.). The results are reported on their development set following (Zhang et al., 2021). MPQA (Wiebe et al., 2005) and Subj (Pang and Lee, 2004) are used for polarity and subjectivity detection, where we follow (Gao et al., 2021) to keep 2, 000 examples for testing. The few-shot model only has access to |K| labeled samples for any task. Following true few-shot learning setting (Perez et al., 2021; Wang et al., 2021), we do not use any additional validation set for any hyper-parameter tuning or early stopping. The performance of each model is reported after fixed number of training epochs. For a fair comparison, we use the same set of few-shot labeled instances for training as in (Wang et al., 2021). We train each model with 5 different seeds and report average performance with standard deviation across the runs. In the few-shot experiments, we follow (Wang et al., 2021) to train AdaMix via the prompt-based fine-tuning strategy. In contrast to (Wang et al., 2021), we do not use any unlabeled data.

\

B Ablation Study

\ Table 11: Ablation study demonstrating the impact of parameter sharing in AdaMix adapter framework.

\

C Detailed Results on NLU Tasks

The results on NLU tasks are included in Table 1 and Table 13. The performance AdaMix with RoBERTa-large encoder achieves the best performance in terms of different task metrics in the GLUE benchmark. AdaMix with adapters is the

\ \ Table 12: Varying the bottleneck dimension of adapters in AdaMix with BERT-base and RoBERTa-large encoder. * denotes the bottleneck dimension used in AdaMix with adapters.

\ \ only PEFT method which outperforms full model fine-tuning on all the tasks and on average score. Additionally, the improvement brought by AdaMix is more significant with BERT-base as the encoder, demonstrating 2.2% and 1.2% improvement over the performance of full model fine-tuning and the best performing baseline UNIPELT with BERTbase. The improvement is observed to be consistent as that with RoBERTa-large on every task. The NLG results are included in Table 4 and 5.

D Hyper-parameter

Detailed hyper-parameter configuration for different tasks presented in Table 15 and Table 16.

\

:::info Authors:

(1) Yaqing Wang, Purdue University (wang5075@purdue.edu);

(2) Sahaj Agarwal, Microsoft (sahagar@microsoft.com);

(3) Subhabrata Mukherjee, Microsoft Research (submukhe@microsoft.com);

(4) Xiaodong Liu, Microsoft Research (xiaodl@microsoft.com);

(5) Jing Gao, Purdue University (jinggao@purdue.edu);

(6) Ahmed Hassan Awadallah, Microsoft Research (hassanam@microsoft.com);

(7) Jianfeng Gao, Microsoft Research (jfgao@microsoft.com).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

[1] https://www.quora.com/q/quoradata/

Piyasa Fırsatı
Sleepless AI Logosu
Sleepless AI Fiyatı(SLEEPLESSAI)
$0.02045
$0.02045$0.02045
+10.12%
USD
Sleepless AI (SLEEPLESSAI) Canlı Fiyat Grafiği
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen crypto.news@mexc.com ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

Troubling signs in new Trump intel report alarm expert: 'Raises real questions'

Troubling signs in new Trump intel report alarm expert: 'Raises real questions'

A new intelligence report on Iran's military capabilities alarmed an expert during a CNN interview. CNN reported, citing sources inside the Trump administration
Paylaş
Rawstory2026/04/03 10:22
Top Analyst Uses Hydraulic Pipe Analogy to Project XRP Rally from Bitcoin Capital Rotation

Top Analyst Uses Hydraulic Pipe Analogy to Project XRP Rally from Bitcoin Capital Rotation

The post Top Analyst Uses Hydraulic Pipe Analogy to Project XRP Rally from Bitcoin Capital Rotation appeared on BitcoinEthereumNews.com. Marketing analyst compares Bitcoin to wide pipe and XRP to narrow pipe system Theory suggests 5% Bitcoin capital rotation could generate $115 billion XRP inflow Projected targets range from $6-15 for slow flows to $15-60 for rapid movements Marketing research analyst Dr. Jim Willie has presented a hydraulic pipe analogy to explain how capital flowing from Bitcoin into XRP could trigger explosive price movements. During an appearance on Black Swan Capitalist with host Versan Aljarrah, Willie used physics principles to illustrate potential market dynamics between the two cryptocurrencies. Willie compared Bitcoin’s large market capitalization to a wide hydraulic pipe and XRP’s smaller market to a much narrower tube. His theory suggests that when pressure transfers from larger to smaller pipes, force increases substantially because area scales with the square of radius measurements. Market Cap Ratios Drive Theoretical Price Impact The analyst established a framework where Bitcoin’s market capitalization equals approximately 13 times XRP’s valuation, creating a mathematical basis for his projections. Under this model, identical capital flows that barely affect Bitcoin’s price could generate 13 times greater impact on XRP due to liquidity depth differences. Willie noted that real trading environments create non-linear effects as order books thin during large transactions, spreads widen, and liquidity providers withdraw. In smaller markets like XRP, price movements can follow quadratic rather than linear patterns, potentially amplifying the 13-fold liquidity gap into price swings tens or hundreds of times more extreme than Bitcoin. The analyst outlined different scenarios based on rotation speed. Slow transitions over weeks would allow market makers time to adjust, potentially driving XRP 2-5x higher while Bitcoin declines orderly. Daily timeframes could produce 5-20x XRP gains with sharper Bitcoin drops, while hourly rotations might create vertical XRP spikes of 10-20x before rapid corrections. Willie identified several amplifying factors including XRP’s limited…
Paylaş
BitcoinEthereumNews2025/09/23 06:20
Globalstar (GSAT) Stock Surges 15% on Amazon Acquisition Report

Globalstar (GSAT) Stock Surges 15% on Amazon Acquisition Report

TLDR Globalstar stock jumped more than 15% in after-hours trading following a Financial Times report that Amazon is in talks to acquire the satellite communications
Paylaş
Coincentral2026/04/02 19:49

Trade GOLD, Share 1,000,000 USDT

Trade GOLD, Share 1,000,000 USDTTrade GOLD, Share 1,000,000 USDT

0 fees, up to 1,000x leverage, deep liquidity