Mario Sato

Logo



English/Japanese/Portuguese(BR)

Graduate Student in Computer Science University of Tsukuba

Undergraduate in Business Administration INSPER - Instituto de Ensino e Pesquisa

contact: mariotsato.wb@gmail.com

View My LinkedIn Profile

View My GitHub Profile

My current CV

Blog:


Cardinality in CNN (ResNeXt)

The definition of cardinality is the size of the set of transformations.

image

In the illustration above, the architecture on the left represents ResNet, while on the right, you see ResNeXt. Both these networks utilize the split-transform-merge strategy.

This approach initially divides the input into lower dimensions through a 1x1 convolutional layer, then applies transformations using 3x3 convolutional filters, and finally integrates the outputs through a summation operation.

The key aspect of this strategy is that the transformations are derived from the same structural design, facilitating ease of implementation without necessitating specialized architectural modifications. The primary goal of ResNeXt is to effectively manage large input sizes and enhance network accuracy.

This is achieved not by adding more layers, but by increasing the cardinality - the number of parallel paths in the network. This approach effectively boosts performance while maintaining a relatively simple complexity compared to deeper networks

Source: https://www.ikomia.ai/blog/resnext-cnn-cardinality-efficiency-explained


2024/07 Inception v1 module

The first idea of the Inception model was to apply different kernel sizes to the original pixels to capture different features and learn it. image

Inception v1 module optimized: Then, they tried to reduce the computational cost by adding the 1x1 layer which enables to reduce the number of dimensions before it is used as inputs to the model. This technique can potentially reduce the convergence time. image


Xception model

The Xception model came to apply a different logic that was used in Inception. It replaces these complex Inception modules with depthwise separable convolutions, which factorize a standard convolution into a depthwise convolution (applies a single filter per input channel) followed by a pointwise convolution (applies a 1x1 convolution to combine the outputs of the depthwise convolution). This reduces the number of parameters and computational cost.

image


2024/07 Global Average Pooling

Currently, instead of using the Dense layer in the final layer of the CNN architecture, one of the most used techniques is the Global Average Pooling. This technique consists of taking the average of the channels and reducing the number of parameters significantly.

image

This can be connected directly to the output layer.

image

model.add(GlobalAveragePooling2D())

2024/07 Vim tutorial

Play: https://vim-adventures.com/


2024/07: Resnet: pre-activation

The ResNet addressed the issue of vanishing gradient problem by introducing residual learning. Instead of learning a direct mapping from the input to the output, ResNet learns the residual, or difference, between the input and the output. This is implemented using skip connections, which allow the input to bypass one or more layers and be added directly to the output. This helps in preserving the gradient and allowing the training of much deeper networks.

Now, the ResNet with Full Pre-Activation has performed better in terms of error reduction compared to the classical ResNet. One of the biggest change the ResNet with Full Pre-Activation brought to the AI community is the reordering of the residual block. Before, the order of residual block consisted of the following order: Convolution layer, Batch Normalization, and Activation Function. This approach is often referred to as “pre-activation” because the activations are applied before the convolutions. With the implementation of the Resnet, it was set the order: Batch Normalization, Activation Function, and Convolution. This structure has been found to improve training and generalization in very deep networks.


2024/07 N-Beats

(Boris et al., 2019)

N-beats: NEURAL BASIS EXPANSION ANALYSIS FOR INTERPRETABLE TIME SERIES FORECASTING


2024/07 N-Hits

(Cristian et al., 2019)

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting


2023/10: Talking about @dataclass and @property.

@dataclass: By using the @dataclass, you do not need to declare the init() function in your class, because it does automatically by its own. It can facilitate you to declare the class.

@abstractmethod: Once you declare the @abstractmethod to a class, and you inherit the this class to use it to create another class, you will need to declare the function with this method in a mandatory way.

@property: It makes usage of getter and setters much easier in Object-Oriented Programming.


2023/09: Why do you need a buffer in your time series data.

Time series data buffer consists of the initial part and the final part of the source data that will be used to train a model.

Example:

             |------|------------------------------|------|
t0 t1 (features and labels) t2 t3

The period of t0-t1 is the initial part, and the period of t2-t3 is the final part. Buffer is necessary to be considered when the model utilizes the features and labels that is calculated using the past and future information. For example, you can use a 180 minutes return as one of the features. However, in the very first rows of this features, the values will be NaN because you do not have a previous 180 min data. The buffer offers that information to make the model be able to use this information. Also, the same thing applies to the label. If you are using information such as return with 180 min lookahead data, then you are considering the future data. Which is only possible because you have a final part buffer in the dataset.


2023/09: CoAtNet (2021) - Overview.

Untitled

Source: https://paperswithcode.com/paper/coatnet-marrying-convolution-and-attention


2023/09: DenseNet (2016) - Overview.

Untitled

Source: https://github.com/liuzhuang13/DenseNet


2023/09: What is Data Leakage in Machine Learning (Time series data)

In this blog, I will explain about what is Data Leakage in Time series data, how it can happen and how you can avoid this to happening in your machine learning model. Data leakage happens in the moment of feature engineering. It consists of the introduction to the feature of the information that is not available in the moment of prediction. For example, let’s say that you want to predict if the price of a stock will go up or down. Then, you select one specific feature that contains the information of the date x+10. However, your label reflects the result of increase or decrease in the stock value of date x+5. Then you are using an information of the future that in the moment of prediction, it will not be available to you. In other words, in this case, you are including an information to the feature that you are trying to predict. Some of the best practices to avoid data leakage are the followings:


2023: Deploy your Object Detection app in Streamlit using YOLOv8 model (COCO dataset)

Github: HERE
DEMO in Streamlit: HERE
This is a project of object detection using YOLOv8 model with COCO dataset deployed in streamlit.


2023: Deep Learning - Cat and Dog classification with pre-trained Resnet50

Github: HERE
Cat and Dog classification using Jupyter Notebook (ipynb) with pre-trained Resnet50.


2022: Deep Learning - U-Net model applied to the rope detection

Github: HERE
A project carried on to apply the semantic segmentation to the rope detection.


2022: For and While loop comparison

PDF: HERE
An experiment conducted with statistical analysis of comparison between For and While loop.
Conclusion: For loop is relatively faster to do the counting rather than using while loops.
Check the analysis in the pdf clicking on the image.


Experiences


Page template forked from evanca