Mario Sato



Graduate Student in Computer Science University of Tsukuba

Undergraduate in Business Administration INSPER - Instituto de Ensino e Pesquisa


View My LinkedIn Profile

View My GitHub Profile

My current CV


2024/07 Inception v1 module

The first idea of the Inception model was to apply different kernel sizes to the original pixels to capture different features and learn it. image

Inception v1 module optimized: Then, they tried to reduce the computational cost by adding the 1x1 layer which enables to reduce the number of dimensions before it is used as inputs to the model. This technique can potentially reduce the convergence time. image

Xception model

The Xception model came to apply a different logic that was used in Inception. It replaces these complex Inception modules with depthwise separable convolutions, which factorize a standard convolution into a depthwise convolution (applies a single filter per input channel) followed by a pointwise convolution (applies a 1x1 convolution to combine the outputs of the depthwise convolution). This reduces the number of parameters and computational cost.


2024/07 Global Average Pooling

Currently, instead of using the Dense layer in the final layer of the CNN architecture, one of the most used techniques is the Global Average Pooling. This technique consists of taking the average of the channels and reducing the number of parameters significantly.


This can be connected directly to the output layer.



2024/07 Vim tutorial


2024/07: Resnet: pre-activation

The ResNet addressed the issue of vanishing gradient problem by introducing residual learning. Instead of learning a direct mapping from the input to the output, ResNet learns the residual, or difference, between the input and the output. This is implemented using skip connections, which allow the input to bypass one or more layers and be added directly to the output. This helps in preserving the gradient and allowing the training of much deeper networks.

Now, the ResNet with Full Pre-Activation has performed better in terms of error reduction compared to the classical ResNet. One of the biggest change the ResNet with Full Pre-Activation brought to the AI community is the reordering of the residual block. Before, the order of residual block consisted of the following order: Convolution layer, Batch Normalization, and Activation Function. This approach is often referred to as “pre-activation” because the activations are applied before the convolutions. With the implementation of the Resnet, it was set the order: Batch Normalization, Activation Function, and Convolution. This structure has been found to improve training and generalization in very deep networks.

2024/07 N-Beats

(Boris et al., 2019)


2024/07 N-Hits

(Cristian et al., 2019)

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

2023/10: Talking about @dataclass and @property.

@dataclass: By using the @dataclass, you do not need to declare the init() function in your class, because it does automatically by its own. It can facilitate you to declare the class.

@abstractmethod: Once you declare the @abstractmethod to a class, and you inherit the this class to use it to create another class, you will need to declare the function with this method in a mandatory way.

@property: It makes usage of getter and setters much easier in Object-Oriented Programming.

2023/09: Why do you need a buffer in your time series data.

Time series data buffer consists of the initial part and the final part of the source data that will be used to train a model.


t0 t1 (features and labels) t2 t3

The period of t0-t1 is the initial part, and the period of t2-t3 is the final part. Buffer is necessary to be considered when the model utilizes the features and labels that is calculated using the past and future information. For example, you can use a 180 minutes return as one of the features. However, in the very first rows of this features, the values will be NaN because you do not have a previous 180 min data. The buffer offers that information to make the model be able to use this information. Also, the same thing applies to the label. If you are using information such as return with 180 min lookahead data, then you are considering the future data. Which is only possible because you have a final part buffer in the dataset.

2023/09: CoAtNet (2021) - Overview.



2023/09: DenseNet (2016) - Overview.



2023/09: What is Data Leakage in Machine Learning (Time series data)

In this blog, I will explain about what is Data Leakage in Time series data, how it can happen and how you can avoid this to happening in your machine learning model. Data leakage happens in the moment of feature engineering. It consists of the introduction to the feature of the information that is not available in the moment of prediction. For example, let’s say that you want to predict if the price of a stock will go up or down. Then, you select one specific feature that contains the information of the date x+10. However, your label reflects the result of increase or decrease in the stock value of date x+5. Then you are using an information of the future that in the moment of prediction, it will not be available to you. In other words, in this case, you are including an information to the feature that you are trying to predict. Some of the best practices to avoid data leakage are the followings:

2023: Deploy your Object Detection app in Streamlit using YOLOv8 model (COCO dataset)

Github: HERE
DEMO in Streamlit: HERE
This is a project of object detection using YOLOv8 model with COCO dataset deployed in streamlit.

2023: Deep Learning - Cat and Dog classification with pre-trained Resnet50

Github: HERE
Cat and Dog classification using Jupyter Notebook (ipynb) with pre-trained Resnet50.

2022: Deep Learning - U-Net model applied to the rope detection

Github: HERE
A project carried on to apply the semantic segmentation to the rope detection.

2022: For and While loop comparison

An experiment conducted with statistical analysis of comparison between For and While loop.
Conclusion: For loop is relatively faster to do the counting rather than using while loops.
Check the analysis in the pdf clicking on the image.


Page template forked from evanca