Jose Luis Flores Campana

Ph.D. in Computer Science

University of Campinas

Biography

Jose Luis Flores is a highly accomplished software engineer and researcher specializing in Machine Learning, Deep Learning, and Image Processing. He holds a B.Sc. in Computer and Software Engineering from the University of San Antonio Abad de Cusco (UNSAAC), Peru, where his undergraduate research focused on the recognition and classification of hand gestures in sign language using advanced machine and deep learning techniques. He pursued his M.Sc. in Computer Science at the University of Campinas (Unicamp), Brazil, graduating in 2020. During this time, Jose collaborated with SAMSUNG Brasil and Unicamp on groundbreaking projects, including multilingual text detection and recognition in images and videos, as well as the generation of parallax motion effects. In 2024, Jose completed his Ph.D. at Unicamp, where his research centered on cutting-edge topics like Image Inpainting and Image Synthesis. Leveraging state-of-the-art Deep Learning models such as Generative Adversarial Networks (GANs) and Vision Transformers, he made significant contributions to the fields of computer vision and image processing. Currently, Jose works as a software engineer at Loggi, a leading logistics and technology company. At Loggi, he has contributed to numerous impactful projects spanning backend, frontend, and machine learning domains. His backend expertise includes the development of APIs, microservices, event-driven architectures, and cloud computing solutions using AWS services like S3 and sagemaker. On the frontend, he has developed robust user interfaces with technologies such as React, JavaScript, Node.js, HTML, and TypeScript. Jose’s work is characterized by his innovative approach to solving complex problems and his passion for leveraging technology to drive meaningful impact.

Jose Luis Flores é um engenheiro de software e pesquisador altamente realizado, especializado em Aprendizado de Máquina, Aprendizado Profundo (Deep Learning) e Processamento de Imagens. Ele possui um Bacharelado em Engenharia de Computação e Software pela Universidade de San Antonio Abad de Cusco (UNSAAC), Peru, onde sua pesquisa de graduação focou no reconhecimento e classificação de gestos manuais em linguagem de sinais, utilizando técnicas avançadas de aprendizado de máquina e aprendizado profundo. Ele concluiu seu Mestrado em Ciência da Computação na Universidade Estadual de Campinas (Unicamp), Brasil, em 2020. Durante esse período, Jose colaborou com a SAMSUNG Brasil e a Unicamp em projetos inovadores, incluindo a detecção e reconhecimento de texto multilíngue em imagens e vídeos, além da geração de efeitos de movimento paralaxe. Em 2024, Jose concluiu seu Doutorado na Unicamp, onde sua pesquisa se concentrou em tópicos de ponta, como Preenchimento de Imagens (Image Inpainting) e Síntese de Imagens. Utilizando modelos de Aprendizado Profundo de última geração, como Redes Adversárias Generativas (GANs) e Vision Transformers, ele fez contribuições significativas aos campos de visão computacional e processamento de imagens. Atualmente, Jose trabalha como engenheiro de software na Loggi, uma empresa líder em logística e tecnologia. Na Loggi, ele contribuiu para diversos projetos impactantes, abrangendo as áreas de backend, frontend e aprendizado de máquina. Sua experiência em backend inclui o desenvolvimento de APIs, microsserviços, arquiteturas orientadas a eventos e soluções de computação em nuvem usando serviços AWS, como o S3 e Sagemaker. No frontend, ele desenvolveu interfaces de usuário robustas com tecnologias como React, JavaScript, Node.js, HTML e TypeScript. O trabalho de Jose é caracterizado por sua abordagem inovadora para resolver problemas complexos e sua paixão por alavancar a tecnologia para gerar impacto significativo.

Download my resumé.

Interests

Pattern Recognition
Computer Vision
Image Processing
Image Synthesis
Machine Learning
Deep Learning

Education

Ph.D. in Computer Science, 2024
University of Campinas (IC/Unicamp)
M.Sc. in Computer Science, 2020
University of Campinas (IC/Unicamp)
B.Sc. in Computer Engineering, 2017
University San Antonio Abad of Cusco (UNSAAC)

Skills

Git

70%

Github

70%

Docker

80%

Java

60%

Python

90%

C++

70%

Pytorch

70%

Tensorflow

60%

Keras

60%

SQL Server

70%

Looker

50%

Grafana

70%

Elastic Search

70%

PostgreSQL

60%

Javascript

50%

React

60%

Experience

Software Engineering

1oggi

Aug 2021 – Present São Paulo

Responsibilities include:

Designed and implemented a machine learning-based solution to automate damaged package declarations, reducing processing time by 10x compared to manual methods. The application utilized text detection for damage assessment, cloud storage via Amazon S3 for photo management, and was developed using Python and JavaScript.
Built and deployed microservices and APIs for Loggi’s desktop and mobile applications using Python, JavaScript, React, and Node.js. These services enable real-time tracking of packages across various statuses—such as in-progress, delivered, and damaged—helping the operations team optimize decision-making.
Developed an intelligent chatbot to handle common customer inquiries, including tracking missing or in-progress packages. The chatbot improved response times by 5x, significantly enhancing customer satisfaction and operational efficiency.

Researcher

Unicamp

Mar 2020 – Apr 2024 Campinas

Responsibilities included:

Proposed an advanced image inpainting model combining CNNs and transformers to effectively address challenges posed by large missing regions. The model delivered competitive performance compared to state-of-the-art methods.
Developed an innovative variable hyperparameter strategy for transformers, significantly reducing computational complexity. The proposed approach demonstrated a 3x improvement in efficiency over recent methods.
Introduced a novel image inpainting model that leverages auxiliary information from the pencil sketch domain to address structural and textural inconsistencies. Achieved state-of-the-art results on datasets such as CelebA and Paris StreetView, and competitive performance on the Places365 dataset.

Researcher

Unicamp and Samsung Electronics America

Apr 2020 – Jun 2021 Campinas

Responsibilities include:

Developed innovative algorithms using scene representations such as Layered Depth Images (LDI) and Multiplane Images (MPI) for generating parallax motion effects from a single image. Proposed a lightweight scene representation tailored for constrained devices like smartphones, achieving 3% more efficient compared to recent MPI-based methods. This work was published in a peer-reviewed paper.
Implemented and evaluated advanced image inpainting algorithms utilizing GANs and Vision Transformers. Achieved competitive results on the Places2 dataset and outperformed state-of-the-art methods by 2–3% on CelebA and Paris Street View datasets. These results demonstrated the model’s ability to effectively address complex inpainting challenges.

Researcher

Unicamp and Samsung Electronics America

Aug 2018 – Mar 2020 Campinas

Responsibilities include:

Implemented advanced post-processing algorithms to address challenges in text localization methods using Tesseract OCR, achieving a 4% improvement in accuracy.
Developed and evaluated multilingual text localization and recognition algorithms optimized for devices with low computational resources, ensuring efficient and accurate performance in constrained environments.

Researcher

Unicamp

Jun 2018 – Jan 2020 Campinas

Responsibilities included:

Implemented advanced post-processing algorithms to address challenges in text localization methods using Tesseract OCR, achieving a 4% improvement in accuracy.
Developed and evaluated multilingual text localization and recognition algorithms optimized for devices with low computational resources, ensuring efficient and accurate performance in constrained environments.

Researcher

UNSAAC

Jun 2016 – Jun 2017 Cusco

Responsibilities include:

Researched and developed a hand gesture detection and classification system for sign language using CNN-based deep learning, achieving 96% accuracy compared to handcrafted methods under different environments.
Created a new hand gesture dataset with 10,000+ images, incorporating variations like rotation, translation, background changes, and noise train and test our hand gesture detection and classification method

Software Engineering

Brain Systems

Jan 2016 – Jun 2017 Cusco

Responsibilities include:

Designed and developed APIs and services for generating XML files and PDF reports as part of the electronic invoicing project (BS EFACT). Played a key role in establishing one of the first electronic invoicing solutions in Cusco, leveraging efficient SQL procedures with SQL Server to ensure robust and scalable performance.

Projects

Parallax effect motion generation

I and my team were part of a project to generate parallax effect motion from single view images using depth information, segmentation maps, and inpainting methods.

Multilingual text detection and recognition in images and video

I and my team were part of a project to develop new algorithms for multilingual text detection and recognition in images and videos (including Latin and Korean alphabets) with high accuracy and low computational cost.

Detection and classification of hand gestures based on the sign language using handcrafted and deep learning methods

I developed a project to detect and classify hand gestures based on the sign language, and using handcrafted and deep learning methods.

Evolving neural networks to play Mega Man X

I and my group used Genetic Algorithms (GA), Artificial Neural Networks (ANN) and Neuro Evolution of Augmenting Topologies (NEAT) methods to learn to win the Highway Stage phase of the game Mega Man X of the console Super Nintendo Entertainment System.

Featured Publications

Jose Luis Flores Campana., Luis Gustavo Lorgus Decker., Marcos Roberto e Souza, Helena de Almeida Maia, Helio Pedrini

February, 2024 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications

Image Inpainting on the Sketch-Pencil Domain with Vision Transformers

Image inpainting aims to realistically fill missing regions in images, which requires both structural and textural understanding. Traditionally, methods in the literature have employed Convolutional Neural Networks (CNN), especially Generative Adversarial Networks (GAN), to restore missing regions in a coherent and reliable manner. However, limited receptive fields of the CNNs can sometimes result in unreliable outcomes due to their inability to capture the broader context of the image. Transformer-based models, on the other hand, can learn long-range dependencies through self-attention mechanisms. In order to generate more consistent results, some approaches have further incorporated auxiliary information to guide the understanding of structural information of the model. In this work, we propose a new method for image inpainting that uses sketch-pencil information to guide the restoration of structural, as well as textural elements. Unlike previous works that employ edges, lines, or segmentation maps, we leverage the sketch-pencil domain and the capabilities of Transformers to learn long-range dependencies to properly match structural and textural information, resulting in more consistent results. Experimental results show the effectiveness of our approach, demonstrating either superior or competitive performance when compared to existing methods, especially in scenarios involving complex images and large missing areas.

Jose Luis Flores Campana., Luis Gustavo Lorgus Decker., Marcos Roberto e Souza, Helena de Almeida Maia, Helio Pedrini

May, 2023 Computers & Graphics

Variable-hyperparameter visual transformer for efficient image inpainting

Image inpainting has shown a great evolution in the reconstruction of damaged regions or holes since the advent of deep neural networks. Recently, transformers have been used in the field of computer vision to capture global information about the image, which cannot be done with convolutional neural networks due to the limitation of their local receptive fields. Therefore, the transformer may be essential to achieve realistic results when damaged regions cover a large part of the image. However, the quadratic computational and memory costs in the self-attention layer have led to its prohibited usage in high-resolution images and restricted devices, especially for image inpainting when the method must deal with large masks. To overcome this problem, we propose a variable-hyperparameter visual transformer architecture that (i) subdivides the feature maps into a variable number of multi-scale patches, (ii) distributes the feature map into a variable number of heads to balance the complexity of the self-attention operation, and (iii) includes a new strategy based on depth-wise convolution to reduce the number of channels of the feature map sent to each transformer block. We conduct experiments on three datasets from the literature. Our experiments show that our method consistently achieved the best results for the FID and LPIPS metrics on the CelebA dataset. We obtained competitive results for Places2 and Paris StreetView datasets compared to state-of-the-art methods. Moreover, our model presents the best performance in terms of model size, number of parameters, and FLOPS. Our qualitative results indicate that our proposed method is capable of reconstructing semantic content, such as parts of human faces.

Jose Luis Flores Campana., Luis Gustavo Lorgus Decker., Marcos Roberto e Souza, Helena de Almeida Maia, Helio Pedrini

October, 2022 35th Conference on Graphics, Patterns and Images

Multi-Scale Patch Partitioning for Image Inpainting Based on Visual Transformers

Image inpainting is a challenging task that aims to reconstruct missing pixels with semantically coherent content and realistic texture using available information. Modern inpainting works rely on neural networks to generate realistic images. However, due to their limited receptive field in convolution operators, they may produce distorted content when a large region needs to be filled. Recent methods have employed transformers to deal with this problem, but their high computational cost makes it difficult to work with global image information. To address this, we propose a multi-scale patch partitioning strategy to subdivide feature maps into non-overlapping patches and a transformer with a variable number of heads to control the computational cost growth according to the number of patches. Smaller patches enable a broader image coverage, helping to recover structural information, whereas larger patches lead to a reduced computational cost. In contrast to the fixed and small sizes employed in other literature methods, here we explore different patch sizes in the transformer blocks to achieve a good balance between the computational cost and the number of pixel references used in the reconstruction. Extensive experiments on three datasets show that our method achieves very competitive results compared to the state-of-the-art, reaching the best scores in various scenarios, especially for metrics based on human perception. Moreover, our model presented the smallest size. Our qualitative results suggest that the proposed method can reconstruct structural content such as parts of human faces.

J. L. Flores Campana, A. Pinto, M. Alberto Córdova Neira, L. Gustavo Lorgus Decker, A. Santos, J. S. Conceição, R. da Silva Torres

April, 2020 IEEE Access

On the Fusion of Text Detection Results: A Genetic Programming Approach

Hundreds of text detection methods have been proposed, motivated by their widespread use in several applications. Despite the huge progress in the area, which includes even the use of sophisticated learning schemes, ad-hoc post-processing procedures are often employed to improve the text detection rate, by removing both false positives and negatives. Another issue refers to the lack of the use of the complementary views provided by different text detection methods. This paper aims to fill these gaps. We propose the use of a soft computing framework, based on genetic programming (GP), to guide the definition of suitable post-processing procedures through the combination of basic operators, which may be applied to improve detection results provided by multiple methods at the same time. Performed experiments in the widely used ICDAR 2011, ICDAR 2013, and ICDAR 2015 datasets demonstrate that our GP-based approach leads to F1 effectiveness gains up to 5.1 percentage points, when compared to several baselines.

Recent Publications

Quickly discover relevant content by filtering publications.

Jose Luis Flores Campana., Luis Gustavo Lorgus Decker., Marcos Roberto e Souza, Helena de Almeida Maia, Helio Pedrini (2024). Image Inpainting on the Sketch-Pencil Domain with Vision Transformers. 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications.

Cite DOI Link

Jose Luis Flores Campana., Luis Gustavo Lorgus Decker., Marcos Roberto e Souza, Helena de Almeida Maia, Helio Pedrini (2023). Variable-hyperparameter visual transformer for efficient image inpainting. Computers & Graphics.

Cite DOI Link

Jose Luis Flores Campana., Luis Gustavo Lorgus Decker., Marcos Roberto e Souza, Helena de Almeida Maia, Helio Pedrini (2022). Multi-Scale Patch Partitioning for Image Inpainting Based on Visual Transformers. 35th Conference on Graphics, Patterns and Images.

Cite DOI Link

Diogo C. Luvizon, Gustavo Sutter P. Carvalho, Andreza A. dos Santos., Jhonatas S. Conceição., Jose L. Flores-Campana, Luis G. L. Decker, Marcos R. Souza, Helio Pedrini, Antonio Joia, Otavio A. B. Penatti (2021). Adaptive Multiplane Image Generation From a Single Internet Picture. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

PDF Cite Link

Marcos R. Souza, Jhonatas S. Conceição., Jose L. Flores-Campana, Luis G. L. Decker, Diogo C. Luvizon, Gustavo Sutter P. Carvalho, Helena A. Maia, Helio Pedrini (2021). Pyramidal Layered Scene Inference with Image Outpainting for Monocular View Synthesis. Computer Analysis of Images and Patterns.

PDF Cite

A. Pinto, M. A. Córdova, L. G. L. Decker, J. L. Flores Campana, M. R. Souza, A. A. dos Santos, J. S. Conceição, H. F. Gagliardi, D. C. Luvizon, R. d. S. Torres, H. Pedrini (2020). Parallax Motion Effect Generation Through Instance Segmentation And Depth Estimation. 2020 IEEE International Conference on Image Processing (ICIP).

PDF Cite Link

J. L. Flores Campana, A. Pinto, M. Alberto Córdova Neira, L. Gustavo Lorgus Decker, A. Santos, J. S. Conceição, R. da Silva Torres (2020). On the Fusion of Text Detection Results: A Genetic Programming Approach. IEEE Access.

PDF Cite DOI Link

(2020). MobText: A Compact Method for Scene Text Localization. Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP,.

PDF Cite DOI Link

M. A. Córdova, L. G. L. Decker, J. L. Flores Campana, A. A. dos Santos, J. S. Conceição, A. Pinto, H. Pedrini, R. da S. Torres (2019). Pelee-Text: A Tiny Convolutional Neural Network for Multi-oriented Scene Text Detection. 2019 18th IEEE International Conference On Machine Learning And Applications ( ICMLA).

PDF Cite DOI Link

Jhonatas Conceição, Allan Pinto., Luis Decker, Jose Luis Campana, Manuel Neira, Andrezza dos Santos, Helio Pedrini, Ricardo Torres (2019). Multi-Lingual Text Localization via Language-Specific Convolutional Neural Networks. Anais Estendidos da XXXII Conference on Graphics, Patterns and Images.

PDF Cite DOI Link

C. Jose L. Flores, A. E. Gladys Cutipa, R. Lauro Enciso (2017). Application of convolutional neural networks for static hand gestures recognition under different invariant features. 2017 IEEE XXIV International Conference on Electronics, Electrical Engineering and Computing (INTERCON).

PDF Cite DOI Link

Jose Luis Flores Campana

Ph.D. in Computer Science

University of Campinas

Biography

Skills

Experience

Projects

Featured Publications

Recent Publications

Popular Topics

Contact