本ブログ立ち上げ以前のものも軽く追加
# ブログ
GitHub / Huggingface
開発・技術系
研究系
その他
「私考える人、あなた作業する人」を越えて、プロダクトマネジメントがあたりまえになるチームを明日から実現していく方法/product management rsgt2023 - speaker deck
日本発、世界に通用するSaaSプロダクトを目指して ~Sansan Globalのキーパーソンが語る東南アジア戦略~ - Sansan公式note
「意識を変えると行動が変わる」は、組織変革では順序が逆 会社に活力を取り戻す「チェンジマネジメント」のやり方 - logmi
機械学習を「社会実装」するということ 2023年版 / Social Implementation of Machine Learning 2023 - Speaker Deck
# 論文
Open Domain Question Answering
Robinson+’22 - Leveraging Large Language Models for Multiple Choice Question Answering
Ram+’22 - What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary
Li+’22 - Self-Prompting Large Language Models for Open-Domain QA
Jong+’22 - FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference
Sachan+’22 - Improving Passage Retrieval with Zero-Shot Question Generation (EMNLP)
Zhou+’22 - Fine-Grained Distillation for Long Document Retrieval
Hai+'23 - CoSPLADE: Contextualizing SPLADE for Conversational Information Retrieval (ECIR)
Gao+'22 - Precise Zero-Shot Dense Retrieval without Relevance Labels
Yu+’23 - Generate rather than Retrieve: Large Language Models are Strong Context Generators (ICLR)
Vision and Language
Ji+’22 - Abstract Visual Reasoning with Tangram Shapes (EMNLP)
Lee+’22 - Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Chang+’22 - MapQA: A Dataset for Question Answering on Choropleth Maps
Liu+’22 - MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
Tito+’22 - Hierarchical multimodal transformers for Multi-Page DocVQA
Wang+’23 - Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models (AAAI)
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
Guo+’22 - From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models
Corona+’22 - Does unsupervised grammar induction need pixels?
Liu+'22 - Character-Aware Models Improve Visual Text Rendering
Yamada+'22 - When are Lemons Purple? The Concept Association Bias of CLIP
Tanaka+'23 - SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI)
Lerner+'23 - Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering (ECIR)
Aberdam+'23 - CLIPTER: Looking at the Bigger Picture in Scene Text Recognition
Kang+'22 - Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
Nukrai+'22 - Text-Only Training for Image Captioning using Noise-Injected CLIP (EMNLP)
Davis+’22 - End-to-end Document Recognition and Understanding with Dessurt
Brooks+’23 - InstructPix2Pix: Learning to Follow Image Editing Instructions
Chen+'23 - STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Others
Oswald+’22 - Transformers learn in-context by gradient descent
He+’22 - DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models
Wang+’22 - Text Embeddings by Weakly-Supervised Contrastive Pre-training
Weers+'23 - Self Supervision Does Not Help Natural Language Supervision at Scale
Zhou+'22 - Are All Losses Created Equal: A Neural Collapse Perspective (NeurIPS)
Comments