Graph Neural Network for Multivariate Time Series Analytics

•

Multivariate Time series analytics have become increasingly crucial in various fields such as economics, finance, cloud computing, energy, IOT, and social networks.

Many multivariate time series involve complex interactions across the temporal domain (such as lags in the propagation of effects) and between variables (such as the relationship among the variables representing neighboring traffic sensors).

GNN can be used for various multivariate time series analytics like forecasting; by treating time points or variables as nodes and their relationships as edges, a model structured like a network or a graph can effectively learn the intricacies of these relationships.

But most GNN networks capture temporal correlations in the time domain but depend on well-defined graph structures for information propagation, which means they cannot be directly applied to multivariate series, where dependency is unknown.

Some methods learn graph structures directly from data, end-to-end with the task. The process typically optimizes the graph structure & model parameters simultaneously during training. An exciting step beyond heuristic methods!

One such approach is discussed in the paper Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks arxiv.org/pdf/2005.11650.pdf

They propose a graph learning module that learns hidden spatial dependencies among variables. The model architecture consists of graph convolutions interleaved with temporal convolutions to capture both spatial and temporal convolutions dependencies, respectively.

The graph Learning layer learns the sparse unidirectional relationships as follows.

For the graph convolution module, they used the mix-hop propagation layer to fuse the node's information with its neighbors' information.

Exploring the Mix-Hop Propagation Layer! It's a two-step process: information propagation and selection, enabling efficient use of local and broader neighborhood information in node representations.

The propagation step recursively spreads node features to neighboring nodes along the graph, retaining a portion of the root node's features at each step. This approach mitigates the over-smoothing issue common in standard graph convolutions.

The selection step concatenates propagated features from each step and passes them through linear layers. This selects useful features at each 'hop,' effectively balancing local and neighborhood information usage.

Mix-Hop Propagation is argued to be more parameter-efficient than previous works. It uses weight matrices in the selection step to represent differences between consecutive hops, avoiding the need for concatenation.

The retain ratio is key, controlling the proportion of the root node's features kept at each step. Balancing this hyperparameter is crucial to avoid constraining the model to local info or causing over-smoothing.

The depth of propagation matters too! How many hops the features spread out determines the context from a wider neighborhood. Too deep leads to over-smoothing, but the right depth gathers relevant context. Typically, 2-3 hops work well.

Let's delve into the Temporal Convolution Module - a mechanism designed to capture temporal patterns and long-term dependencies in time series data using 1D convolutional filters.

The module comprises stacked dilated inception layers applying filters of different sizes (1x2, 1x3, 1x6, 1x7) on downsampled inputs. This aids in exponentially expanding the receptive field.

By using multiple filter sizes, the model can capture temporal patterns at different scales. The paper suggests a set of 1x2, 1x3, 1x6, 1x7 filters outperform 1x1, 1x3, 1x5 for common periodicities in time series.

The module harnesses dilated convolution, handling long sequences without requiring extremely deep networks or large filters. It achieves rapid receptive field expansion with exponential increase in dilation rate.

Merging inception and dilated convolution creates a temporal module capturing multi-scale temporal patterns and efficiently modeling long sequences - a great boon for time series data.

The module proves more efficient and flexible than RNNs or standard CNNs at capturing complex temporal dependencies. Its multi-scale filtering and sequence modeling capabilities are particularly adept for time series data.

Proposed Learning Algorithm: - Curriculum Learning: tarts model on easiest 1-step prediction task, then gradually increases to full multi-step prediction. Enables good initialization.

- Graph sampling splits nodes randomly into groups per batch. Reduces time & memory complexity for graph learning from O(N^2) to O(N^2/s) where s is # groups.

Model Performance/Results: MTGNN delivers state-of-the-art results on 3 out of 4 single-step forecasting benchmarks, outshining both statistical and deep learning models. It demonstrates its prowess, even when learning the graph.

Authors also performed ablation studies that show mix-hop's strength in incorporating neighborhood information, and limitations of standard graph convolutions are clearly evident. The temporal module proves effective in modeling multiple periodicities and long sequences.

MTGNN stands out as it jointly learns graph structure and spatiotemporal model in an end-to-end manner. It proves its applicability even to data without explicit graph information. #AI #ML #GNN #EndToEndLearning #timeseries #forecasting