ML Research of this week:
▪️ AVFormer
▪️ Barkour
▪️ MatCha and DePlot
▪️ DIDACT
▪️ REVEAL
▪️ Improving mathematical reasoning
& more!
🧵
AVFormer: Injecting vision into frozen speech models for zero-shot AV-ASR
A simple method for augmenting existing large-scale audio-only models with visual information, at the same time performing lightweight domain adaptation.
twitter.com/GoogleAI/status/1664680207511322636?s=20
SQL-PaLM
An LLM-based tool adapted from PaLM-2 boasts SoTA performance in both in-context learning and fine-tuning settings.
The few-shot model outperforms the previous fine-tuned SoTA by a whopping 3.8% on the Spider benchmark.
twitter.com/omarsar0/status/1664441085693657088
Foundation models for reasoning on charts
MatCha is a pixels-to-text foundation model trained on two complementary tasks:
▪️ chart de-rendering
▪️ math reasoning
DePlot is a model built on top of MatCha for one-shot reasoning on charts via translation to tables.
twitter.com/hardy_qr/status/1662222363629588485
Saliency Cards
Researchers introduce saliency cards, a structured documentation of how saliency methods operate and their performance across a battery of evaluative metrics.
twitter.com/MIT_CSAIL/status/1664663370824392709
Improving mathematical reasoning
Step-by-step verification beats outcome supervision for training models to tackle mathematical problems, according to recent research.
Plus, the complete dataset of 800K human feedback labels (PRM800K) is now available.
twitter.com/DrJimFan/status/1663972818160332800
Barkour: Benchmarking animal-level agility with quadruped robots
The paper introduces the Barkour agility benchmark for quadruped robots, along with a Transformer generalist locomotion policy.
twitter.com/GoogleAI/status/1662145329180053506/video/1