[ACM MM 2024 (Oral)] FLIP-80M: 80 Million Visual-Linguistic Pairs for Facial Language-Image Pre-Training