Multilingual text detection and recognition in images and video

Multilingual text dectection and recognition

Hundreds of text detection methods have been proposed, motivated by their widespread use in several applications. Despite the huge progress in the area, which includes even the use of sophisticated learning schemes, ad-hoc post-processing procedures are often employed to improve the text detection rate, by removing both false positives and negatives. Another issue refers to the lack of the use of the complementary views provided by different text detection methods. This paper aims to fill these gaps. We propose the use of a soft computing framework, based on genetic programming (GP), to guide the definition of suitable post-processing procedures through the combination of basic operators, which may be applied to improve detection results provided by multiple methods at the same time. Performed experiments in the widely used ICDAR'11, ICDAR'13, and ICDAR'15 datasets demonstrate that our GP-based approach leads to F1 effectiveness gains up to $5.1$ percentage points, when compared to several baselines.

Jose Luis Flores Campana
Jose Luis Flores Campana
Ph.D. in Computer Science

Jose Luis Flores received his B.Sc. in Computer and Software Engineering from the University of San Antonio Abad de Cusco (UNSAAC), Peru, in 2016. As a bachelor’s student, Jose worked on a research paper related to the recognition and classification of hand gestures based on sign language using artisanal and deep learning techniques. After. Jose obtained his M.Sc in Computer Science from the State University of Campinas (Unicamp), Brazil, in 2020. As a master’s student, Jose was part of a team of researchers from SAMSUNG Brasil and UNICAMP. In this team he worked on two projects, “Multilingual text detection and recognition in images and videos” and “Generation of parallax motion effects”. In 2024, Jose received his Ph.D. from the State University of Campinas (Unicamp), Brazil. As a Ph.D. student, Jose worked on topics such as Image Inpainting and Image Synthesis, focusing his research on Deep Learning models such as Generative Adversarial Networks and Vision Transformer. His research focuses on Machine Learning, Deep Learning, and Image Processing, with specialization in Text Detection and Recognition in images and videos, Image Inpainting, and Image Synthesis. He currently works as a software engineer at Loggi.