Skip to main content

Quaternion Temporal Convolutional Networks

Quaternion Temporal Convolutional Networks

Introduction

Recently, the field of deep learning has made significant advancement, especially in sequence classification. Recent research has shown that traditional sequence processing deep learning architectures, such as the Long Short-Term Memory cell (LSTM) or the Gated Recurrent Unit (GRU), can be replaced by Dilated Fully Convolutional Networks called Temporal Convolutional Networks (TCN), as shown in Figure 1. Other research has shown that by representing deep networks with complex and quaternion numbers, the networks tend to show improved convergence while using significantly less learned parameters. We combine these two observations to create a Quaternion Temporal Convolution Network (QTCN), and show its performance on sequential classification tasks.

Figure 1. The TCN architecture


Goals/Objectives
  • A new deep learning architecture, the Quaternion Temporal Convolutional Network (QTCN)
  • Exhibits the same general performance as the TCN
  • Reduces the parameter count over the traditional TCN
  • Generation of a technique for Quaternion Weight Normalization
  • Explore the performance of 1D quaternion convolutions

Methodology

Figure 2. The difference between CNN and a Quaternion CNN

The architecture was generated by taking the residual block that a TCN consists of and converting it to a quaternion version. Each Layer of a TCN consists of a residual block that performs various operations on the input data and produces a sequence of the same length as the input. By swapping in Quaternion Convolutions, and applying Weight normalization to the weights of a Quaternion Convolution, A Quaternion Residual Block is created, as shown in Figure 3.

Figure 3. The conversion between the Residual Block from the TCN to QTCN

Results/Evaluation

To evaluate the model, comparisons were done on a variety of tasks, including sequence classification, synthetic memory tasks, and modeling tasks. The QTCN is found to perform very well in classification tasks, about equivalent on synthetic memory tasks, and performs worse in general on modeling tasks, all while using significantly less parameters.

*Note for result metrics: an h means a higher result is better, while an l means a lower score is better.

Table 1. Sequence classification task results for TCN and QTCN.

 

Table 2. Synthetic memory task results for TCN and QTCN.

 

Table 3. Music modeling task results for TCN and QTCN.

 

Table 4. Language modeling task results for TCN and QTCN.



Future Work

Future work could look into mixing quaternions and regular networks and see if performance results change. A comparison to the Quaternion LSTM should be made at some time. Finally, the existence of a Quaternion TCN means that higher order hyper-complex networks can also exist.

CONTACT

Vision Lab, Dr. Vijayan Asari, Director

Kettering Laboratories
300 College Park
Dayton, Ohio 45469 - 0232
937-229-1779
Email