Attention Is All You Need
Paper
• 1706.03762
• Published
• 115
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
• 1810.04805
• Published
• 26
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper
• 1907.11692
• Published
• 10
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
lighter
Paper
• 1910.01108
• Published
• 21
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer
Paper
• 1910.10683
• Published
• 16
Switch Transformers: Scaling to Trillion Parameter Models with Simple
and Efficient Sparsity
Paper
• 2101.03961
• Published
• 13
Finetuned Language Models Are Zero-Shot Learners
Paper
• 2109.01652
• Published
• 5
Multitask Prompted Training Enables Zero-Shot Task Generalization
Paper
• 2110.08207
• Published
• 2
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Paper
• 2112.06905
• Published
• 2
Scaling Language Models: Methods, Analysis & Insights from Training
Gopher
Paper
• 2112.11446
• Published
• 1
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper
• 2201.11903
• Published
• 15
LaMDA: Language Models for Dialog Applications
Paper
• 2201.08239
• Published
• 5
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A
Large-Scale Generative Language Model
Paper
• 2201.11990
• Published
• 1
Training language models to follow instructions with human feedback
Paper
• 2203.02155
• Published
• 24
PaLM: Scaling Language Modeling with Pathways
Paper
• 2204.02311
• Published
• 3
Training Compute-Optimal Large Language Models
Paper
• 2203.15556
• Published
• 11
OPT: Open Pre-trained Transformer Language Models
Paper
• 2205.01068
• Published
• 2
UL2: Unifying Language Learning Paradigms
Paper
• 2205.05131
• Published
• 5
Language Models are General-Purpose Interfaces
Paper
• 2206.06336
• Published
• 1
Improving alignment of dialogue agents via targeted human judgements
Paper
• 2209.14375
• Published
Scaling Instruction-Finetuned Language Models
Paper
• 2210.11416
• Published
• 7
GLM-130B: An Open Bilingual Pre-trained Model
Paper
• 2210.02414
• Published
• 3
Holistic Evaluation of Language Models
Paper
• 2211.09110
• Published
• 1
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper
• 2211.05100
• Published
• 37
Galactica: A Large Language Model for Science
Paper
• 2211.09085
• Published
• 4
OPT-IML: Scaling Language Model Instruction Meta Learning through the
Lens of Generalization
Paper
• 2212.12017
• Published
• 1
The Flan Collection: Designing Data and Methods for Effective
Instruction Tuning
Paper
• 2301.13688
• Published
• 9
LLaMA: Open and Efficient Foundation Language Models
Paper
• 2302.13971
• Published
• 20
PaLM-E: An Embodied Multimodal Language Model
Paper
• 2303.03378
• Published
Paper
• 2303.08774
• Published
• 7
Pythia: A Suite for Analyzing Large Language Models Across Training and
Scaling
Paper
• 2304.01373
• Published
• 9
Paper
• 2305.10403
• Published
• 8
RWKV: Reinventing RNNs for the Transformer Era
Paper
• 2305.13048
• Published
• 21
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published
• 250
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published
• 150
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Paper
• 2306.02707
• Published
• 49
Textbooks Are All You Need
Paper
• 2306.11644
• Published
• 154
Textbooks Are All You Need II: phi-1.5 technical report
Paper
• 2309.05463
• Published
• 89
Paper
• 2310.06825
• Published
• 58
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper
• 2310.09199
• Published
• 28
Zephyr: Direct Distillation of LM Alignment
Paper
• 2310.16944
• Published
• 123
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper
• 2310.17680
• Published
• 74
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper
• 2311.05437
• Published
• 51
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Paper
• 2311.16079
• Published
• 19
SeaLLMs -- Large Language Models for Southeast Asia
Paper
• 2312.00738
• Published
• 25
Kandinsky 3.0 Technical Report
Paper
• 2312.03511
• Published
• 45
Large Language Models for Mathematicians
Paper
• 2312.04556
• Published
• 12
FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper
• 2309.03852
• Published
• 45
Paper
• 2309.03450
• Published
• 8
Baichuan 2: Open Large-scale Language Models
Paper
• 2309.10305
• Published
• 22
Paper
• 2309.16609
• Published
• 38
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model
Pre-trained from Scratch
Paper
• 2309.10706
• Published
• 17
MiniGPT-v2: large language model as a unified interface for
vision-language multi-task learning
Paper
• 2310.09478
• Published
• 21
Position-Enhanced Visual Instruction Tuning for Multimodal Large
Language Models
Paper
• 2308.13437
• Published
• 4
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
Paper
• 2308.12067
• Published
• 4
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper
• 2310.17631
• Published
• 35
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation
Paper
• 2311.00272
• Published
• 11
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper
• 2311.00176
• Published
• 9
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model
Paper
• 2310.06266
• Published
• 2
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Paper
• 2312.04724
• Published
• 21
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published
• 61
Generative Multimodal Models are In-Context Learners
Paper
• 2312.13286
• Published
• 36
Code Llama: Open Foundation Models for Code
Paper
• 2308.12950
• Published
• 29
Unsupervised Cross-lingual Representation Learning at Scale
Paper
• 1911.02116
• Published
• 4
YAYI 2: Multilingual Open-Source Large Language Models
Paper
• 2312.14862
• Published
• 14
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Paper
• 2312.12682
• Published
• 9
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published
• 49
LLM360: Towards Fully Transparent Open-Source LLMs
Paper
• 2312.06550
• Published
• 57
WizardLM: Empowering Large Language Models to Follow Complex
Instructions
Paper
• 2304.12244
• Published
• 13
The Falcon Series of Open Language Models
Paper
• 2311.16867
• Published
• 14
Clinical Camel: An Open-Source Expert-Level Medical Language Model with
Dialogue-Based Knowledge Encoding
Paper
• 2305.12031
• Published
• 5
ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical
Domain Knowledge
Paper
• 2303.14070
• Published
• 10
LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day
Paper
• 2306.00890
• Published
• 14
BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical
Knowledge Graph Insights
Paper
• 2311.16075
• Published
• 6
KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained
Language Model
Paper
• 2311.11564
• Published
• 1
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training
Regime and Better Alignment to Human Preferences
Paper
• 2311.06025
• Published
• 1
BioT5: Enriching Cross-modal Integration in Biology with Chemical
Knowledge and Natural Language Associations
Paper
• 2310.07276
• Published
• 5
BIOptimus: Pre-training an Optimal Biomedical Language Model with
Curriculum Learning for Named Entity Recognition
Paper
• 2308.08625
• Published
• 2
BioCPT: Contrastive Pre-trained Transformers with Large-scale PubMed
Search Logs for Zero-shot Biomedical Information Retrieval
Paper
• 2307.00589
• Published
• 1
Radiology-GPT: A Large Language Model for Radiology
Paper
• 2306.08666
• Published
• 2
BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained
Transformer for Vision, Language, and Multimodal Tasks
Paper
• 2305.17100
• Published
• 2
Dr. LLaMA: Improving Small Language Models in Domain-Specific QA via
Generative Data Augmentation
Paper
• 2305.07804
• Published
• 2
Llemma: An Open Language Model For Mathematics
Paper
• 2310.10631
• Published
• 57
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper
• 2309.11568
• Published
• 11
Skywork: A More Open Bilingual Foundation Model
Paper
• 2310.19341
• Published
• 6
SkyMath: Technical Report
Paper
• 2310.16713
• Published
• 2
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language
Models
Paper
• 2309.12284
• Published
• 19
UT5: Pretraining Non autoregressive T5 with unrolled denoising
Paper
• 2311.08552
• Published
• 8
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
Paper
• 2312.11370
• Published
• 20
Language Is Not All You Need: Aligning Perception with Language Models
Paper
• 2302.14045
• Published
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse
Heterogeneous Computing
Paper
• 2303.10845
• Published
• 3
BloombergGPT: A Large Language Model for Finance
Paper
• 2303.17564
• Published
• 30
PMC-LLaMA: Towards Building Open-source Language Models for Medicine
Paper
• 2304.14454
• Published
StarCoder: may the source be with you!
Paper
• 2305.06161
• Published
• 32
OctoPack: Instruction Tuning Code Large Language Models
Paper
• 2308.07124
• Published
• 32
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper
• 2312.16862
• Published
• 31
GeoGalactica: A Scientific Large Language Model in Geoscience
Paper
• 2401.00434
• Published
• 9
TinyLlama: An Open-Source Small Language Model
Paper
• 2401.02385
• Published
• 95
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
• 2401.02954
• Published
• 53
Paper
• 2401.04088
• Published
• 160
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
• 2401.04081
• Published
• 74
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
• 2401.06066
• Published
• 59
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Paper
• 2306.08568
• Published
• 33
ChatQA: Building GPT-4 Level Conversational QA Models
Paper
• 2401.10225
• Published
• 36
Orion-14B: Open-source Multilingual Large Language Models
Paper
• 2401.12246
• Published
• 14
DeepSeek-Coder: When the Large Language Model Meets Programming -- The
Rise of Code Intelligence
Paper
• 2401.14196
• Published
• 70
Weaver: Foundation Models for Creative Writing
Paper
• 2401.17268
• Published
• 45
H2O-Danube-1.8B Technical Report
Paper
• 2401.16818
• Published
• 18
OLMo: Accelerating the Science of Language Models
Paper
• 2402.00838
• Published
• 85
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Paper
• 2204.06745
• Published
• 1
CroissantLLM: A Truly Bilingual French-English Language Model
Paper
• 2402.00786
• Published
• 26
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
Paper
• 2402.16840
• Published
• 25
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
• 2402.14905
• Published
• 134
Nemotron-4 15B Technical Report
Paper
• 2402.16819
• Published
• 46
StarCoder 2 and The Stack v2: The Next Generation
Paper
• 2402.19173
• Published
• 152
Gemma: Open Models Based on Gemini Research and Technology
Paper
• 2403.08295
• Published
• 50
Gemini 1.5: Unlocking multimodal understanding across millions of tokens
of context
Paper
• 2403.05530
• Published
• 65
Sailor: Open Language Models for South-East Asia
Paper
• 2404.03608
• Published
• 21
OpenELM: An Efficient Language Model Family with Open-source Training
and Inference Framework
Paper
• 2404.14619
• Published
• 126
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
• 2404.14219
• Published
• 259
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper
• 2404.05892
• Published
• 40
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
• 2405.04434
• Published
• 25
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code
Intelligence
Paper
• 2406.11931
• Published
• 69
Aya 23: Open Weight Releases to Further Multilingual Progress
Paper
• 2405.15032
• Published
• 32
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts
Language Models
Paper
• 2406.06563
• Published
• 20
Instruction Pre-Training: Language Models are Supervised Multitask
Learners
Paper
• 2406.14491
• Published
• 96
The Llama 3 Herd of Models
Paper
• 2407.21783
• Published
• 117