Nowadays, scene text detection has received a lot of attention due to its complexity given variations in terms of orientations, font size, aspect ratio, and natural backgrounds. In this vein, several deep neural networks have been proposed to deal with this challenging problem. However, such networks produce "heavy" models, hampering their use in applications running in devices with computational constraints. Additionally, few works are focused on the detection of multi-oriented and/or multi-lingual text. Herein, we propose an end-to-end tiny convolutional neural network for multi-oriented multi-lingual scene text called Pelee- Text. Experimental results show that Pelee-Text is at least 3 times smaller than its counterparts with a speed of 2.93 and 18.64 frames per second for its multi-scale and 768-scale versions, respectively. Moreover, in terms of F-measure, our method achieved competitive results on four well-known datasets, i.e., ICDAR'2011 (90.96%), ICDAR'2013 (85.24%), ICDAR'2015 (80.08%), and MSRA-TD500 (80.90%).