Work Experience

  • Present 10/2019

    Research Scientist

    ObjectVideo Labs / Alarm.com, Remote / Spain

  • 10/2019 12/2018

    Chief Technology Officer (CTO)

    Pixcellence Inc., Remote / Spain

  • 12/2018 10/2018

    AI Tech Lead

    Pixcellence Inc., Remote / Spain

  • 3/2018 4/2018

    Computer Vision Scientist

    Nielsen, Madrid (Spain)

  • Present 9/2014

    Researcher @ Electronics dpt.

    University of Alcala (UAH), Alcala de Henares (Spain)

Research Experience

  • Present 11/2018

    Post-doctoral researcher

    University of Alcalá, RobeSafe Research group in Electronics department

  • 11/2018 9/2015

    Ph.D. (honored with cum laude)

    University of Alcalá, RobeSafe Research group in Electronics department

  • 12/2017 9/2017

    Researcher (PhD Visit)

    University of California San Diego, California (USA)

  • 12/2016 9/2016

    Researcher (PhD Visit)

    NICTA/CSIRO (Data61), Canberra (Australia)

  • 9/2015 9/2014

    Researcher

    University of Alcalá, RobeSafe Research group in Electronics department

  • 9/2014 10/2013

    Research Assistant

    Fraunhofer IOSB, Karlsruhe (Germany)

Education

  • Ph.D. Nov 2018

    Ph.D. (cum laude) in Deep Learning/Computer Vision applied to Autonomous Vehicles

    University of Alcalá (UAH), Spain

  • M.Sc.July 2015

    Master in Electronics: "Master in Advanced Electronic Systems. Intelligent Systems"

    University of Alcalá (UAH), Spain

  • Erasmus2013-2014

    Fulfilled last year and Final Project of my Telecommunication studies

    Karlsruher Institute for Technology (KIT), Karlsruhe (Germany)

  • B.Sc. + M.Sc.Sept 2014

    5-years degree in Telecommunications Engineering (Ingeniería Superior en Telecomunicaciones)

    University of Alcalá (UAH), Spain

Honors, Awards and Grants

  • Nov 2018
    PhD honored with cum laude
    image University of Alcala (UAH), Madrid, Spain
  • June 2017
    Best Student Paper Award (1st Prize), IV 2017
    image IEEE Intelligent Vehicles Symposium (IV 2017)
  • November 2015
    Best Master Thesis on Intelligent Transportation Systems - Second Prize
    image IEEE Intelligent Transportation Systems Society (ITSS), Spanish Chapter
  • July 2015
    Honored Master Thesis
    image University of Alcala (UAH), Madrid, Spain
  • March 2015
    4-year "FPI" grant to perform my Ph.D.
    image University of Alcalá (UAH), Madrid, Spain
  • 2013-2014
    Erasmus grant to study in Germany
    imageimage

Filter by type:

Sort by year:

Vehicle Detection and Localization using 3D LIDAR Point Cloud and Image Semantic Segmentation

R. Barea, C. Pérez, L.M. Bergasa, E. López, E. Romera, E. Molinos, M. Ocaña amd J. López
Conference PapersIEEE Intelligent Transportation Systems Conference (ITSC), Hawaii, USA, Nov 2018 (Accepted paper)

Abstract

Unifying Terrain Awareness for the Visually Impaired through Real-Time Semantic Segmentation

K. Yang, K. Wang, L.M. Bergasa, E. Romera, W. Hu, D. Sun, J. Sun, R. Cheng, T. Chen and E. Lopez.
Journal PapersSensors, 2018

Abstract

Navigational assistance aims to help visually-impaired people to ambulate the environment safely and independently. This topic becomes challenging as it requires detecting a wide variety of scenes to provide higher level assistive awareness. Vision-based technologies with monocular detectors or depth sensors have sprung up within several years of research. These separate approaches have achieved remarkable results with relatively low processing time and have improved the mobility of impaired people to a large extent. However, running all detectors jointly increases the latency and burdens the computational resources. In this paper, we put forward seizing pixel-wise semantic segmentation to cover navigation-related perception needs in a unified way. This is critical not only for the terrain awareness regarding traversable areas, sidewalks, stairs and water hazards, but also for the avoidance of short-range obstacles, fast-approaching pedestrians and vehicles. The core of our unification proposal is a deep architecture, aimed at attaining efficient semantic understanding. We have integrated the approach in a wearable navigation system by incorporating robust depth segmentation. A comprehensive set of experiments prove the qualified accuracy over state-of-the-art methods while maintaining real-time speed. We also present a closed-loop field test involving real visually-impaired users, demonstrating the effectivity and versatility of the assistive framework.

Train Here, Deploy There: Robust Segmentation in Unseen Domains

E. Romera, L. M. Bergasa, J. M. Alvarez and M. Trivedi
Conference PapersIEEE Intelligent Vehicles Symposium (IV), Changshu, China, June 2018

Abstract

Semantic Segmentation methods play a key role in today’s Autonomous Driving research, since they provide a global understanding of the traffic scene for upper-level tasks like navigation. However, main research efforts are being put on enlarging deep architectures to achieve marginal accuracy boosts in existing datasets, forgetting that these algorithms must be deployed in a real vehicle with images that were not seen during training. On the other hand, achieving robustness in any domain is not an easy task, since deep networks are prone to overfitting even with thousands of training images. In this paper, we study in a systematic way what is the gap between the concepts of “accuracy” and “robustness”. A comprehensive set of experiments demonstrates the relevance of using data augmentation to yield models that can produce robust semantic segmentation outputs in any domain. Our results suggest that the existing domain gap can be significantly reduced when appropriate augmentation techniques regarding geometry (position and shape) and texture (color and illumination) are applied. In addition, the proposed training process results in better calibrated models, which is of special relevance to assess the robustness of current systems.

CNN-based Fisheye Image Real-Time Semantic Segmentation

A. Saez, L.M. Bergasa, E. Romera, E. Lopez, R. Barea
Conference PapersIEEE Intelligent Vehicles Symposium (IV), Changshu, China, June 2018

Abstract

Semantic segmentation based on Convolutional Neural Networks (CNNs) has been proven as an efficient way of facing scene understanding for autonomous driving applications. Traditionally, environment information is acquired using narrow-angle pin-hole cameras, but autonomous vehicles need wider field of view to perceive the complex surrounding, especially in urban traffic scenes. Fisheye cameras have begun to play an increasingly role to cover this need. This paper presents a real-time CNN-based semantic segmentation solution for urban traffic images using fisheye cameras. We adapt our Efficient Residual Factorized CNN (ERFNet) architecture to handle distorted fish-eye images. A new fisheye image dataset for semantic segmentation from the existing CityScapes dataset is generated to train and evaluate our CNN. We also test a data augmentation suggestion for fisheye image proposed in [1]. Experiments show outstanding results of our proposal regarding other methods of the state of the art.

Unifying terrain awareness through real-time semantic segmentation

K. Yang, L.M. Bergasa, E. Romera, R. Cheng, T. Chen and K. Wang
Conference PapersIEEE Intelligent Vehicles Symposium (IV), Changshu, China, June 2018

Abstract

Active research on computer vision accelerates the progress in autonomous driving. Following this trend, we aim to leverage the recently emerged methods for Intelligent Vehicles (IV), and transfer them to develop navigation assistive technologies for the Visually Impaired (VI). This topic grows notoriously challenging as it requires to detect a variety of scenes towards higher level of assistance. Computer vision based techniques with monocular detectors or depth sensors sprung up within years of research. These separate approaches achieved remarkable results with relatively low processing time, and improved the mobility of visually impaired people to a large extent. However, running all detectors jointly increases the latency and burdens the computational resources. In this paper, we put forward to seize pixel-wise semantic segmentation to cover the perception needs of navigational assistance in a unified way. This is critical not only for the terrain awareness regarding traversable areas, sidewalks, stairs and water hazards, but also for the avoidance of short-range obstacles, fast-approaching pedestrians and vehicles. At the heart of our proposal is a combination of efficient residual factorized network (ERFNet), pyramid scene parsing network (PSPNet) and 3D point cloud based segmentation. This approach proves to be with qualified accuracy and speed for real-world applications by a comprehensive set of experiments on a wearable navigation system.

ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation

E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo
Journal PapersIEEE Transactions on Intelligent Transportation Systems (T-ITS), Dec. 2017

Abstract

Semantic segmentation is a challenging task that addresses most of the perception needs of Intelligent Vehicles (IV) in an unified way. Deep Neural Networks excel at this task, as they can be trained end-to-end to accurately classify multiple object categories in an image at pixel level. However, a good trade-off between high quality and computational resources is yet not present in state-of-the-art semantic segmentation approaches, limiting their application in real vehicles. In this paper, we propose a deep architecture that is able to run in real-time while providing accurate semantic segmentation. The core of our architecture is a novel layer that uses residual connections and factorized convolutions in order to remain efficient while retaining remarkable accuracy. Our approach is able to run at over 83 FPS in a single Titan X, and 7 FPS in a Jetson TX1 (embedded GPU). A comprehensive set of experiments on the publicly available Cityscapes dataset demonstrates that our system achieves an accuracy that is similar to the state of the art, while being orders of magnitude faster to compute than other architectures that achieve top precision. The resulting trade-off makes our model an ideal approach for scene understanding in IV applications. The code is publicly available at: https://github.com/Eromera/erfnet

Are you ABLE to perform a life-long visual topological localization

R. Arroyo, P.F. Alcantarilla, L.M. Bergasa and E. Romera.
Journal PapersAutonomous Robots (AURO), 2017

Abstract

Visual topological localization is a process typically required by varied mobile autonomous robots, but it is a complex task if long operating periods are considered. This is because of the appearance variations suffered in a place: dynamic elements, illumination or weather. Due to these problems, long-term visual place recognition across seasons has become a challenge for the robotics community. For this reason, we propose an innovative method for a robust and efficient life-long localization using cameras. In this paper, we describe our approach (ABLE), which includes three different versions depending on the type of images: monocular, stereo and panoramic. This distinction makes our proposal more adaptable and effective, because it allows to exploit the extra information that can be provided by each type of camera. Besides, we contribute a novel methodology for identifying places, which is based on a fast matching of global binary descriptors extracted from sequences of images. The presented results demonstrate the benefits of using ABLE, which is compared to the most representative state-of-the-art algorithms in long-term conditions.

Efficient ConvNet for Real-time Semantic Segmentation

E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo
Conference PapersIEEE Intelligent Vehicles Symposium (IV), pp. 1789-1794, Redondo Beach (California, USA), June 2017, Best Student Paper Award

Abstract

Semantic segmentation is a task that covers most of the perception needs of intelligent vehicles in an unified way. ConvNets excel at this task, as they can be trained end-to-end to accurately classify multiple object categories in an image at the pixel level. However, current approaches normally involve complex architectures that are expensive in terms of computational resources and are not feasible for ITS applications. In this paper, we propose a deep architecture that is able to run in real-time while providing accurate semantic segmentation. The core of our ConvNet is a novel layer that uses residual connections and factorized convolutions in order to remain highly efficient while still retaining remarkable performance. Our network is able to run at 83 FPS in a single Titan X, and at more than 7 FPS in a Jetson TX1 (embedded GPU). A comprehensive set of experiments demonstrates that our system, trained from scratch on the challenging Cityscapes dataset, achieves a classification performance that is among the state of the art, while being orders of magnitude faster to compute than other architectures that achieve top precision. This makes our model an ideal approach for scene understanding in intelligent vehicles applications.

A Multi-Sensorial Simultaneous Localization and Mapping (SLAM) System for Low-Cost Micro Aerial Vehicles in GPS-Denied Environments

E. López, S. García, R. Barea, L. M. Bergasa, E. J. Molinos, R. Arroyo, E. Romera and S. Pardo
Journal PapersSensors 2017, vol. 17, no. 4, pp. 802, April 2017

Abstract

One of the main challenges of aerial robots navigation in indoor or GPS-denied environments is position estimation using only the available onboard sensors. This paper presents a Simultaneous Localization and Mapping (SLAM) system that remotely calculates the pose and environment map of different low-cost commercial aerial platforms, whose onboard computing capacity is usually limited. The proposed system adapts to the sensory configuration of the aerial robot, by integrating different state-of-the art SLAM methods based on vision, laser and/or inertial measurements using an Extended Kalman Filter (EKF). To do this, a minimum onboard sensory configuration is supposed, consisting of a monocular camera, an Inertial Measurement Unit (IMU) and an altimeter. It allows to improve the results of well-known monocular visual SLAM methods (LSD-SLAM and ORB-SLAM are tested and compared in this work) by solving scale ambiguity and providing additional information to the EKF. When payload and computational capabilities permit, a 2D laser sensor can be easily incorporated to the SLAM system, obtaining a local 2.5D map and a footprint estimation of the robot position that improves the 6D pose estimation through the EKF. We present some experimental results with two different commercial platforms, and validate the system by applying it to their position control.

Fusion and binarization of CNN features for robust topological localization across seasons

R. Arroyo, P. F. Alcantarilla, L. M. Bergasa and E. Romera
Conference Papers IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4656-4663, Daejeon (Korea), October 2016

Abstract

The extreme variability in the appearance of a place across the four seasons of the year is one of the most challenging problems in life-long visual topological localization for mobile robotic systems and intelligent vehicles. Traditional solutions are typically based on the description of images using manually hand-crafted features, which have demonstrated not to be completely reliable against these seasonal changes. In this paper, we present a new proposal focused on robust automatically learned features, which are processed by means of a revolutionary concept recently popularized in the computer vision community: Convolutional Neural Networks (CNNs). Commonly, deep learning involves a high consumption of resources and computational costs. Due to this, we contribute our CNN-VTL architecture adapted to the conditions of our place recognition system, with the aim of optimizing the efficiency maintaining the effectiveness. The final CNN features are also reduced as possible using compression techniques and binarized for a fast matching based on the Hamming distance. A wide set of results is discussed, confirming the outstanding performance of our method against the main state-of-the-art algorithms and over varied long-term datasets recorded across seasons.

Need Data for Driver Behaviour Analysis? Presenting the Public UAH-DriveSet

E. Romera, L.M. Bergasa and R. Arroyo
Conference PapersIEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 387-392, Rio de Janeiro (Brazil), November 2016

Abstract

Driving analysis is a recent topic of interest due to the growing safety concerns in vehicles. However, the lack of publicly available driving data currently limits the progress on this field. Machine learning techniques could highly enhance research, but they rely on large amounts of data which are difficult and very costly to obtain through Naturalistic Driving Studies (NDSs), resulting in limited accessibility to the general research community. Additionally, the proliferation of smartphones has provided a cheap and easy-to-deploy platform for driver behavior sensing, but existing applications do not provide open access to their data. For these reasons, this paper presents the UAH-DriveSet, a public dataset that allows deep driving analysis by providing a large amount of data captured by our driving monitoring app DriveSafe. The application is run by 6 different drivers and vehicles, performing 3 different behaviors (normal, drowsy and aggressive) on two types of roads (motorway and secondary road), resulting in more than 500 minutes of naturalistic driving with its associated raw data and processed semantic information, together with the video recordings of the trips. This work also introduces a tool that helps to plot the data and display the trip videos simultaneously, in order to ease data analytics. The UAH-DriveSet is available at: http://www.robesafe.com/personal/eduardo.romera/uah-driveset

OpenABLE: An Open-Source Toolbox for Application in Life-Long Visual Localization of Autonomous Vehicles

R. Arroyo, P. F. Alcantarilla, L. M. Bergasa and E. Romera
Conference PapersIEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 965-970, Rio de Janeiro (Brazil), November 2016

Abstract

Life-long visual localization is one of the most challenging topics in robotics over the last few years. The difficulty of this task is in the strong appearance changes that a place suffers due to dynamic elements, illumination, weather or seasons. In this paper, we propose a novel method (ABLE-M) to cope with the main problems of carrying out a robust visual topological localization along time. The novelty of our approach resides in the description of sequences of monocular images as binary codes, which are extracted from a global LDB descriptor and efficiently matched using FLANN for fast nearest neighbor search. Besides, an illumination invariant technique is applied. The usage of the proposed binary description and matching method provides a reduction of memory and computational costs, which is necessary for long-term performance. Our proposal is evaluated in different life-long navigation scenarios, where ABLE-M outperforms some of the main state-of-the-art algorithms, such as WI-SURF, BRIEF-Gist, FAB-MAP or SeqSLAM. Tests are presented for four public datasets where a same route is traversed at different times of day or night, along the months or across all four seasons.

Adaptive Fuzzy Classifier to Detect Driving Events from the Inertial Sensors of a Smartphone

C. Arroyo, L. M. Bergasa and E. Romera
Conference PapersIEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 1896-1901, Rio de Janeiro (Brazil), November 2016

Abstract

In the last years there has been a rising interest in monitoring driver behaviors by using smartphones, due to their increasing market penetration. Inertial sensors embedded in these devices are key to carry out this task. Most of the state-of-the-art apps use fix thresholds to detect driving events from the inertial sensors. However, sensors output values can differ depending on many parameters. In this paper we present an Adaptive Fuzzy Classifier to identify sudden driving events (acceleration, steering, braking) and road bumps from the inertial and GPS sensors. An on-line calibration method is proposed to adjust the decision thresholds of the Membership Functions (MFs) to the specific phone pose and vehicle dynamics. To validate our method, we use the UAH-Driveset database, which includes more than 500 minutes of naturalistic driving, and we compare results with our previous DriveSafe app version, based on fix thresholds. Results show a notable improvement in the events detection regarding our previous version.

Can we unify monocular detectors for autonomous driving by using the pixel-wise semantic segmentation of CNNs?

E. Romera, L.M. Bergasa and R. Arroyo
Conference Papers (WS)IEEE Intelligent Vehicles Symposium (IV), Gothenburg (Sweden), June 2016. Workshop: "DeepDriving. Learning Representations for Intelligent Vehicles"

Abstract

Autonomous driving is a challenging topic that requires complex solutions in perception tasks such as recognition of road, lanes, traffic signs or lights, vehicles and pedestrians. Through years of research, computer vision has grown capable of tackling these tasks with monocular detectors that can provide remarkable detection rates with relatively low processing times. However, the recent appearance of Convolutional Neural Networks (CNNs) has revolutionized the computer vision field and has made possible approaches to perform full pixel-wise semantic segmentation in times close to real time (even on hardware that can be carried on a vehicle). In this paper, we propose to use full image segmentation as an approach to simplify and unify most of the detection tasks required in the perception module of an autonomous vehicle, analyzing major concerns such as computation time and detection performance.

A Real-time Multi-scale Vehicle Detection and Tracking Approach for Smartphones

E. Romera, L.M. Bergasa and R. Arroyo
Conference Papers IEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 1298-1303, Las Palmas, Canary Islands (Spain), September 2015

Abstract

Automated vehicle detection is a research field in constant evolution due to the new technological advances and security requirements demanded by the current intelligent transportation systems. For these reasons, in this paper we present a vision-based vehicle detection and tracking pipeline, which is able to run on an iPhone in real time. An approach based on smartphone cameras supposes a versatile solution and an alternative to other expensive and complex sensors on the vehicle, such as LiDAR or other range-based methods. A multi-scale proposal and simple geometry consideration of the roads based on the vanishing point are combined to overcome the computational constraints. Our algorithm is tested on a publicly available road dataset, thus demonstrating its real applicability to ADAS or autonomous driving.

Towards Life-Long Visual Localization using an Efficient Matching of Binary Sequences from Images

R. Arroyo, P. F. Alcantarilla, L. M. Bergasa and E. Romera
Conference Papers IEEE International Conference on Robotics and Automation (ICRA), pp. 6328-6335, Seattle, Washington (United States), May 2015.

Abstract

Life-long visual localization is one of the most challenging topics in robotics over the last few years. The difficulty of this task is in the strong appearance changes that a place suffers due to dynamic elements, illumination, weather or seasons. In this paper, we propose a novel method (ABLE-M) to cope with the main problems of carrying out a robust visual topological localization along time. The novelty of our approach resides in the description of sequences of monocular images as binary codes, which are extracted from a global LDB descriptor and efficiently matched using FLANN for fast nearest neighbor search. Besides, an illumination invariant technique is applied. The usage of the proposed binary description and matching method provides a reduction of memory and computational costs, which is necessary for long-term performance. Our proposal is evaluated in different life-long navigation scenarios, where ABLE-M outperforms some of the main state-of-the-art algorithms, such as WI-SURF, BRIEF-Gist, FAB-MAP or SeqSLAM. Tests are presented for four public datasets where a same route is traversed at different times of day or night, along the months or across all four seasons.