02-24 [PaperReading] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (MLA篇)
12-16 [论文阅读:NeurIPS 2024 Best Paper] Stochastic Taylor Derivative Estimator: Efficient amortization for arbitrary differential operators (Part 1)