-
Efficient ear alignment using a two-stack hourglass network
Anja Hrovatič, Peter Peer, Vitomir Štruc, Žiga Emeršič: "Efficient ear alignment using a two-stack hourglass network", IET Biometrics, 2023.Ear images have been shown to be a reliable modality for biometric recognition with desirable characteristics, such as high universality, distinctiveness, measurability and permanence. While a considerable amount of research has been directed towards ear recognition techniques, the problem of ear alignment is still under-explored in the open literature. Nonetheless, accurate alignment of ear images, especially in unconstrained acquisition scenarios, where the ear appearance is expected to vary widely due to pose and view point variations, is critical for the performance of all downstream tasks, including ear recognition. Here, the authors address this problem and present a framework for ear alignment that relies on a two-step procedure: (i) automatic landmark detection and (ii) fiducial point alignment. For the first (landmark detection) step, the authors implement and train a Two-Stack Hourglass model (2-SHGNet) capable of accurately predicting 55 landmarks on diverse ear images captured in uncontrolled conditions. For the second (alignment) step, the authors use the Random Sample Consensus (RANSAC) algorithm to align the estimated landmark/fiducial points with a pre-defined ear shape (i.e. a collection of average ear landmark positions). The authors evaluate the proposed framework in comprehensive experiments on the AWEx and ITWE datasets and show that the 2-SHGNet model leads to more accurate landmark predictions than competing state-of-the-art models from the literature. Furthermore, the authors also demonstrate that the alignment step significantly improves recognition accuracy with ear images from unconstrained environments compared to unaligned imagery.
@article{hrovatic2023efficient, title={Efficient ear alignment using a two-stack hourglass network}, author={Hrovati{\v{c}}, Anja and Peer, Peter and {\v{S}}truc, Vitomir and Emer{\v{s}}i{\v{c}}, {\v{Z}}iga}, journal={IET Biometrics}, year={2023}, publisher={Wiley Online Library} }
-
ContexedNet: Context–aware Ear Detection in Unconstrained Settings
Žiga Emeršič, Diego Sušanj, Blaž Meden, Peter Peer, Vitomir Štruc: "ContexedNet: Context–aware Ear Detection in Unconstrained Settings", IEEE Access, 2021.Ear detection represents one of the key components of contemporary ear recognition systems. While significant progress has been made in the area of ear detection over recent years, most of the improvements are direct results of advances in the field of visual object detection. Only a limited number of techniques presented in the literature are domain–specific and designed explicitly with ear detection in mind. In this paper, we aim to address this gap and present a novel detection approach that does not rely only on general ear (object) appearance, but also exploits contextual information, i.e., face–part locations, to ensure accurate and robust ear detection with images captured in a wide variety of imaging conditions. The proposed approach is based on a Context–aware Ear Detection Network (ContexedNet) and poses ear detection as a semantic image segmentation problem. ContexedNet consists of two processing paths: i) a context–provider that extracts probability maps corresponding to the locations of facial parts from the input image, and ii) a dedicated ear segmentation model that integrates the computed probability maps into a context–aware segmentation-based ear detection procedure. ContexedNet is evaluated in rigorous experiments on the AWE and UBEAR datasets and shown to ensure competitive performance when evaluated against state–of–the–art ear detection models from the literature. Additionally, because the proposed contextualization is model agnostic, it can also be utilized with other ear detection techniques to improve performance.
@article{earContext, author={Emeršič, Žiga and Sušanj, Diego and Meden, Blaž and Peer, Peter and Štruc, Vitomir}, journal={IEEE Access}, title={ContexedNet: Context–aware Ear Detection in Unconstrained Settings}, year={2021}, volume={}, number={}, pages={1-1}, doi={10.1109/ACCESS.2021.3121792} }
-
Evaluation and Analysis of Ear Recognition Models: Performance, Complexity and Resource Requirements
Žiga Emeršič, Blaž Meden, Vitomir Štruc, Peter Peer: "Evaluation and Analysis of Ear Recognition Models: Performance, Complexity and Resource Requirements", Neural Computing & Applications, 2018.Ear recognition technology has long been dominated by (local) descriptor-based techniques due to their formidable recognition performance and robustness to various sources of image variability. While deep-learning-based techniques have started to appear in this field only recently, they have already shown potential for further boosting the performance of ear recognition technology and dethroning descriptor-based methods as the current state of the art. However, while recognition performance is often the key factor when selecting recognition models for biometric technology, it is equally important that the behavior of the models is understood and their sensitivity to different covariates is known and well explored. Other factors, such as the train- and test-time complexity or resource requirements, are also paramount and need to be consider when designing recognition systems. To explore these issues, we present in this paper a comprehensive analysis of several descriptor- and deep-learning-based techniques for ear recognition. Our goal is to discover weak points of contemporary techniques, study the characteristics of the existing technology and identify open problems worth exploring in the future. We conduct our analysis through identification experiments on the challenging Annotated Web Ears (AWE) dataset and report our findings. The results of our analysis show that the presence of accessories and high degrees of head movement significantly impacts the identification performance of all types of recognition models, whereas mild degrees of the listed factors and other covariates such as gender and ethnicity impact the identification performance only to a limited extent. From a test-time-complexity point of view, the results suggest that lightweight deep models can be equally fast as descriptor-based methods given appropriate computing hardware, but require significantly more resources during training, where descriptor-based methods have a clear advantage. As an additional contribution, we also introduce a novel dataset of ear images, called AWE Extended (AWEx), which we collected from the web for the training of the deep models used in our experiments. AWEx contains 4104 images of 346 subjects and represents one of the largest and most challenging (publicly available) datasets of unconstrained ear images at the disposal of the research community.
@article{EarEvaluation2018, title={Evaluation and analysis of ear recognition models: performance, complexity and resource requirements}, author={Emer{\v{s}}i{\v{c}}, {\v{Z}}iga and Meden, Bla{\v{z}} and Peer, Peter and {\v{S}}truc, Vitomir}, journal={Neural computing and applications}, pages={1--16}, year={2018}, publisher={Springer} }
-
Convolutional Encoder-Decoder Networks for Pixel-Wise Ear Detection and Segmentation
Žiga Emeršič, Luka Lan Gabriel, Vitomir Štruc, Peter Peer: "Convolutional Encoder-Decoder Networks for Pixel-Wise Ear Detection and Segmentation", IET Biometrics, 2018.Object detection and segmentation represents the basis for many tasks in computer and machine vision. In biometric recognition systems the detection of the region-of-interest (ROI) is one of the most crucial steps in the processing pipeline, significantly impacting the performance of the entire recognition system. Existing approaches to ear detection, are commonly susceptible to the presence of severe occlusions, ear accessories or variable illumination conditions and often deteriorate in their performance if applied on ear images captured in unconstrained settings. To address these shortcomings, we present a novel ear detection technique based on convolutional encoder-decoder networks (CEDs). We formulate the problem of ear detection as a two-class segmentation problem and design and train a CED-network architecture to distinguish between image-pixels belonging to the ear and the non-ear class. Unlike competing techniques, our approach does not simply return a bounding box around the detected ear, but provides detailed, pixel-wise information about the location of the ears in the image. Experiments on a dataset gathered from the web (a.k.a. in the wild) show that the proposed technique ensures good detection results in the presence of various covariate factors and significantly outperforms competing methods from the literature.
@article{emersic2018convolutional, title={Convolutional encoder--decoder networks for pixel-wise ear detection and segmentation}, author={Emer{\v{s}}i{\v{c}}, {\v{Z}}iga and Gabriel, Luka L and {\v{S}}truc, Vitomir and Peer, Peter}, journal={IET Biometrics}, volume={7}, number={3}, pages={175--184}, year={2018}, publisher={IET} }
-
Ear Recognition: More Than a Survey
Žiga Emeršič, Vitomir Štruc, Peter Peer: "Ear Recognition: More Than a Survey", Neurocomputing, 2017.Automatic identity recognition from ear images represents an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes the technology an appealing choice for surveillance and security applications as well as other application domains. Significant contributions have been made in the field over recent years, but open research problems still remain and hinder a wider (commercial) deployment of the technology. This paper presents an overview of the field of automatic ear recognition (from 2D images) and focuses specifically on the most recent, descriptor-based methods proposed in this area. Open challenges are discussed and potential research directions are outlined with the goal of providing the reader with a point of reference for issues worth examining in the future. In addition to a comprehensive review on ear recognition technology, the paper also introduces a new, fully unconstrained dataset of ear images gathered from the web and a toolbox implementing several state-of-the-art techniques for ear recognition. The dataset and toolbox are meant to address some of the open issues in the field and are made publicly available to the research community.
@article{emersic2017ear, title={Ear recognition: More than a survey}, author={Emer{\v{s}}i{\v{c}}, {\v{Z}}iga and {\v{S}}truc, Vitomir and Peer, Peter}, journal={Neurocomputing}, volume={255}, pages={26--39}, year={2017}, publisher={Elsevier} }
The AWE Toolbox: fill in and sign this form and send it to ziga.emersic@fri.uni-lj.si with the subject "AWE Request: The Toolbox".
This chapter introduces COM-Ear, a deep constellation model for ear recognition. Different from competing solutions, COM-Ear encodes global as well as local characteristics of ear images and generates descriptive ear representations that ensure competitive recognition performance. The model is designed as dual-path convolutional neural network (CNN), where one path processes the input in a holistic manner, and the second captures local images characteristics from image patches sampled from the input image. A novel pooling operation, called patch-relevant-information pooling, is also proposed and integrated into the COM-Ear model. The pooling operation helps to select features from the input patches that are locally important and to focus the attention of the network to image regions that are descriptive and important for representation purposes. The model is trained in an end-to-end manner using a combined cross-entropy and center loss. Extensive experiments on the recently introduced Extended Annotated Web Ears (AWEx).