Licheng Yu

My name is Licheng Yu (虞立成). I completed my PhD in Computer Science from University of North Carolina at Chapel Hill in 2019 May. My advisor is Tamara L. Berg. I also work closely with Mohit Bansal during my PhD study. My research interest lies in computer vision and natural language processing.

I completed my Master's degrees from both Georgia Tech and Shanghai Jiaotong University in 2014. I received my Bachelor's degree from Shanghai Jiao Tong University.

Email: lichengyu [at] fb.com
Address: 1 Hacker Way, Menlo Park, CA 94025
More info: [Resume], [Google Scholar], [LinkedIn], [GitHub].

Highlights

Work Experience

2023.06—Present:   
2022.07—2023.06:
2021.07—2022.07:
2020.03—2021.07:

Research Scientist Manager
Staff Research Scientist
Senior Research Scientist
Research Scientist

2019.06—2020.03:

Researcher


      Graduated

2014.08—2019.05:

Research Assistant

2018.05—2018.08:

Research Intern

2017.05—2017.08:

Research Intern

2016.05—2016.08:

Research Intern

2011.09—2014.04:

Research Assistant

Projects & Publications

The Llama 3 Herd of Models
arXiv:2407.21783v2
Llama team
(Led Llama3.2 Multimodal 11B/90B Pre-training + 11B Post-training)
Animated Stickers: Bringing Stickers to Life with Video Diffusion
arXiv:2402.06088
David Yan, Winnie Zhang, Luxin Zhang, Anmol Kalia, Dingkang Wang, Ankit Ramchandani, Miao Liu, Albert Pumarola, Edgar Schoenfeld, Elliot Blanchard, Krishna Narni, Yaqiao Luo, Lawrence Chen, Guan Pang, Ali Thabet, Peter Vajda, Amy Bearman, Licheng Yu
[Paper]
AVID: Any-Length Video Inpainting with Diffusion Model
CVPR 2024
Zhixing Zhang, Bichen Wu, Xiaoyan Wang, Yaqiao Luo, Luxin Zhang, Yinan Zhao, Peter Vajda, Dimitris Metaxas, Licheng Yu
SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Integrated Character-Level Diffusion and Contextual Consistency
CVPR 2024
Qilong Zhangli, Praveen Krishnan, Ankit Ramchandani, Xiaoliang Dai, Licheng Yu, Di Liu, Jindong Jiang, Dimitris N. Metaxas, Guan Pang
[Paper]
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
CVPR 2024
Feng Liang, Bichen Wu, Jialiang Wang, Licheng Yu, Kunpeng Li, Yinan Zhao, Ishan Misra, Jia-Bin Huang, Peizhao Zhang, Peter Vajda, Diana Marculescu
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
CVPR 2024
Bichen Wu, Ching-Yao Chuang, Xiaoyan Wang, Yichen Jia, Kapil Krishnakumar, Tong Xiao, Feng Liang, Licheng Yu, Peter Vajda
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
CVPR 2024
Yuchao Gu, Yipin Zhou, Bichen Wu, Licheng Yu, Jia-Wei Liu, Rui Zhao, Jay Zhangjie Wu, David Junhao Zhang, Mike Zheng Shou, Kevin Tang
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
arXiv:2311.10794
Animesh Sinha, Bo Sun, Anmol Kalia, Arantxa Casanova, Elliot Blanchard, David Yan, Winnie Zhang, Tony Nelli, Jiahui Chen, Hardik Shah, Licheng Yu, Mitesh Kumar Singh, Ankit Ramchandani, Maziar Sanjabi, Sonal Gupta, Amy Bearman, Dhruv Mahajan
[Paper]
CiT: Curation in Training for Effective Vision-Language Data
ICCV 2023
Hu Xu, Saining Xie, Po-Yao Huang, Licheng Yu, Russell Howes, Gargi Ghosh Luke Zettlemoyer, Christoph Feichtenhofe
[Paper][Code]
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
CVPR 2023
Tsu-Jui Fu, Licheng Yu, Ning Zhang, Cheng-Yang Fu, Jong-Chyi Su, William Yang Wang, Sean Bell
Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
CVPR 2023
Yiwu Zhong, Licheng Yu, Yang Bai, Shangwen Li, Xueting Yan, Yin Li
[Paper][Code]
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
CVPR 2023
Xiao Han, Xiatian Zhu, Licheng Yu, Li Zhang, Yi-Zhe Song, Tao Xiang
[Paper][Code] (Oral)
Learning and Verification of Task Structure in Instructional Videos
arXiv:2303.13519
Medhini Narasimhan, Licheng Yu, Sean Bell, Ning Zhang, Trevor Darrell
AMELI: Enhancing Multimodal Entity Linking with Fine-Grained Attributes
arXiv:2305.14725
Barry Menglong Yao, Yu Chen, Qifan Wang, Sijia Wang, Minqian Liu, Zhiyang Xu, Licheng Yu, Lifu Huang
[Paper]
RoPAWS: Robust Semi-supervised Representation Learning from Uncurated Data
ICLR 2023
Sangwoo Mo, Jong-Chyi Su, Kevin Chih-Yao Ma, Mido Assran, Ishan Misra, Licheng Yu, Sean Bell
[Paper]

Que2Engage: Embedding-based Retrieval for Relevant and Engaging Products at Facebook Marketplace
WWW 2023
Yunzhong He, Yuxin Tian, Mengjiao Wang, Feier Chen, Licheng Yu, Maolong Tang, Congcong Chen, Ning Zhang, Bin Kuang, Arul Prakash
[Paper]
FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning
EMNLP 2022
Suvir Mirchandani, Licheng Yu, Mengjiao Wang, Animesh Sinha, Wenwen Jiang, Tao Xiang, Ning Zhang
[Paper]
FashionViL: Fashion-Focused Vision-and-Language Representation Learning
ECCV 2022
Xiao Han, Licheng Yu, Xiatian Zhu, Li Zhang, Yi-Zhe Song, Tao Xiang
[Paper][Code]
Generic Event Boundary Captioning: A Benchmark for Status Changes Understanding
ECCV 2022
Yuxuan Wang, Difei Gao, Licheng Yu, Weixian Lei, Matt Feiszli, Mike Zheng Shou
[Paper]
CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval
KDD 2022
Licheng Yu, Jun Chen, Animesh Sinha, Mengjiao Wang, Yu Chen, Tamara L. Berg, Ning Zhang
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
CVPR 2022
Mingyang Zhou*, Licheng Yu*, Amanpreet Singh, Mengjiao Wang, Yu Zhou, Ning Zhang
(*First 2 authors contribute equally.)
[Paper][Code] (Oral)
LOOPITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval
arxiv:2203.05465v1
Jie Lei, Xinlei Chen, Ning Zhang, Mengjiao Wang, Mohit Bansal, Tamara L. Berg, Licheng Yu
[Paper]
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
NeurIPS 2021
Linjie Li, Jie Lei, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang, William Yang Wang, Tamara L. Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu
Connecting What to Say With Where to Look by Modeling Human Attention Traces
CVPR 2021
Zihang Meng, Licheng Yu, Ning Zhang, Tamara L. Berg, Babak Damavandi, Vikas Singh, Amy Bearman
[Paper][Code]
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
EMNLP 2020
Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal
[Paper][Code]
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
EMNLP 2020
Linjie Li*, Yen-Chun Chen*, Yu Cheng, Zhe Gan, Licheng Yu, Jingjing Liu
(*First 2 authors contribute equally.)
Rank 1 on TVR Leaderboard
Rank 1 on TVC Leaderboard
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
ECCV 2020
Jize Cao, Zhe Gan, Yu Cheng, Licheng Yu, Yen-Chun Chen, Jingjing Liu
[Paper] (Spotlight)
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
ECCV 2020
Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal
UNITER: Learning UNiversal Image-Text Representations
ECCV 2020
Yen-Chun Chen*, Linjie Li*, Licheng Yu*, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu
(*First 3 authors contribute equally.)
Achieving SOTA on 13 Vision+Language Datasets/Tasks, and
Rank 1 on VCR Leaderboard
Rank 1 on NLVR2 Leaderboard
TVQA+: Spatio-Temporal Grounding for Video Question Answering
ACL 2020
Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal
BachGAN: High-Resolution Image Synthesis from Salient Object Layout
CVPR 2020
Yandong Li, Yu Cheng, Zhe Gan, Licheng Yu, Liqiang Wang, Jingjing Liu
[Paper][Code]
VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
CVPR 2020
Jingzhou Liu, Wenhu Chen, Yu Cheng, Zhe Gan, Licheng Yu, Yiming Yang, Jingjing Liu
Multi-Target Embodied Question Answering
CVPR 2019
Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L. Berg, Dhruv Batra
[Paper] [Video]
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
NAACL 2019
Hao Tan, Licheng Yu, Mohit Bansal
[Paper] [Code]
TVQA: Localized Compositional Video Question Answering
EMNLP 2018
Jie Lei, Licheng Yu, Mohit Bansal, Tamara L. Berg
[Paper] [Project] [Explore] (Oral)
MAttNet: Modular Attention Network for Referring Expression Comprehension
CVPR 2018
Licheng Yu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Mohit Bansal, Tamara L. Berg
From Image to Language and Back Again
Journal of Natural Language Engineering (JNLE), 2018
Anya Belz, Tamara L. Berg, Licheng Yu
[Paper]
Physics-Inspired Garment Recovery from a Single-View Image
ACM Transactions on Graphics, 2018
Shan Yang, Tanya Ambert, Zherong Pan, Ke Wang, Licheng Yu, Tamara L. Berg, Ming C. Lin
A Unified Framework for Manifold Landmarking
IEEE Transactions on Signal Processing, 2018
Hongteng Xu, Licheng Yu, Mark Davenport, Hongyuan Zha
[Paper]
Hierarchically-Attentive RNN for Album Summarization and Storytelling
EMNLP 2017
Licheng Yu, Mohit Bansal, Tamara L. Berg
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
CVPR 2017
Licheng Yu, Hao Tan, Mohit Bansal, Tamara L. Berg
[Paper] [Code] [Project] [Talk] (Spotlight presentation 8%)
Modeling Context in Referring Expressions
ECCV 2016
Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, Tamara L. Berg
[Paper] [Dataset] [Talk] (Spotlight presentation 4.7%)
Visual Madlibs: Fill-in-the-blank Image Description and Question Answering
ICCV 2015
Licheng Yu, Eunbyung Park, Alexander C. Berg, Tamara L. Berg
Dictionary Learning with Mutually Reinforcing Group-Graph Structures
AAAI 2015
Licheng Yu*, Hongteng Xu*, Hongyuan Zha, Yi Xu
(* denotes equal contribution)
[Paper]
Vector Sparse Representation of Color Image Using Quaternion Matrix Analysis
IEEE Transactions on Image Processing, TIP 2015
Yi Xu, Licheng Yu, Hongteng Xu, Truong Nguyen, Hao Zhang
[Paper][Code]
Quaternion-based Sparse Representation of Color Image
IEEE International Conference on Multimedia and Expo, ICME 2013
Licheng Yu, Yi Xu, Hongteng Xu, Hao Zhang
[Paper][Supplementary File] (Oral presentation)
Single Image Super-resolution via Phase Congruency Analysis
IEEE Visual Communications and Image Processing, VCIP 2013
Licheng Yu, Yi Xu, Bo Zhang
[Paper] (Oral presentation)
Self-Example Based Super-resolution with Fractal-based Gradient Enhancement
IEEE International Conference on Multimedia and Expo, ICME workshop 2013
Licheng Yu, Yi Xu, Hongteng Xu
[Paper]
Robust Single Image Super-resolution based on Gradient Enhancement
APSIPA Annual Summit and Conference, APSIPA 2012
Licheng Yu, Yi Xu, Hongteng Xu, Xiaokang Yang

Miscellaneous

Self-supervised Learning for Vision-and-Language
Recent Advances in Vision-and-Language Research
CVPR 2020 Tutorial
Licheng Yu, Linjie Li, Yen-Chun Chen
Revisiting Grid Features for VQA
Duy-Kien Nguyen, Huaizu Jiang, Vedanuj Goswami, Licheng Yu, Xinlei Chen
Winner of VQA 2020 Challenge
Gobang Android App (AI mode + 2-player mode)
Licheng Yu
Skill Measurement via Egocentric Vision in Wetlab
Licheng Yu, Yin Li, James Rehg


PhD Thesis: "Question Answering, Grounding, and Generation for Vision and Language" [PDF][Talk]