Face recognition (FR) technology has made significant strides with the emergence of deep learning. However, existing training paradigms of FR models primarily rely on RGB face images, which greatly impacts the model’s training/inference efficiency and presents a potential vulnerability in terms of user privacy violations. To remedy these problems, this paper investigates the intrinsic property of image bytes and proposes a superior FR model termed ByteFace. The proposed model is trained directly on image bytes, presenting a novel approach to address the aforementioned issues. Specifically, considering the importance of local correlations in bytes, an image bytes compression strategy named TIBC is introduced to extract prominent features from the raw bytes and integrates these features with byte embeddings, effectively mitigating information loss during the bytes mapping process. Moreover, to strengthen the model’s perception on geometric information encoded in image bytes, a novel cross-attention module named SICA is designed to inject structure information into byte tokens for information interaction, significantly improving the model’s generalization ability. Experiments on popular face benchmarks demonstrate the superiority of our ByteFace.
Under Review