# Feature Extraction for Time Series Analysis: A Practical Guide

Written on

Time series data presents unique challenges and opportunities in data science.

When I embarked on my journey in machine learning, my passion for physics initially drew me in—a rather unconventional motivation. My academic pursuits revealed a deep appreciation for coding and data science. At that time, the nature of the data didn’t concern me; my primary goal was to immerse myself in programming and produce extensive lines of code daily.

However, irrespective of initial preferences, career paths tend to guide you toward certain data types. For instance, working at SpaceX likely involves substantial signal processing, whereas a position at Netflix would probably engage you with natural language processing (NLP) and recommendation systems. If you found yourself at Tesla, your focus would likely shift towards computer vision and image-related tasks.

During my early career as a physicist, and later through my PhD in engineering, I was quickly introduced to the realm of signal processing. This is the essence of engineering: whenever you collect data and derive insights, you are essentially dealing with signals. It’s important to note that signals are not exclusive to engineering; finance also heavily relies on time series data, such as stock price fluctuations.

The key takeaway here is: time series data is distinct. Many transformations and processing techniques applicable to tabular data or images may not hold the same relevance for time series. Take feature extraction, for example.

The concept of **feature extraction** involves refining the data to identify and retain essential characteristics that can enhance subsequent machine learning processes. In essence, it serves to optimize the input for machine learning by prioritizing significant features while discarding irrelevant ones.

The complete feature extraction process is illustrated below:

When we differentiate between feature extraction methods for tabular data versus signal data, we recognize that they operate in entirely different contexts.

For instance, concepts such as **peaks** and **valleys**, the **Fourier Transform**, and **Wavelet Transform** only gain significance in the context of signals. My intention in outlining these points is to emphasize that a specific suite of feature extraction techniques is uniquely suited for signal data.

Broadly, feature extraction methods can be categorized into two groups:

**Data-driven methods:**These techniques focus on deriving features by merely analyzing the signals themselves, without consideration for the machine learning objectives, such as classification or forecasting.**Model-based methods:**These approaches take a holistic view, seeking to identify features tailored to the specific problem at hand.

Data-driven methods typically offer computational simplicity and do not rely on target outputs. However, their drawback is that the features they produce may lack specificity for your particular task. For example, applying a Fourier Transform to a signal might yield general features that are not as effective as those specifically learned through an end-to-end model.

In this article, we will concentrate on **data-driven methods**. Specifically, we will explore **domain-specific**, **frequency-based**, **time-based**, and **statistical-based** methods. Let's dive in!

## 1. Domain-Specific Feature Extraction

The first method I’ll discuss is intentionally somewhat vague. The optimal way to extract features often hinges on the particular problem you are tackling. For example, if you are analyzing a signal from an engineering experiment and need to focus on the **amplitude after t = 6s**, that information is crucial for your analysis, even if it may not seem significant in a broader context.

## 2. Frequency-Based Feature Extraction

### 2.1 Explanation

This technique pertains to the **spectral analysis** of your time series or signal. The most straightforward way to analyze a signal is within the **time domain**, where we consider it as a value at a given moment.

To illustrate, let’s examine the following signal in its natural domain:

When we visualize it, we get:

This represents the simplest form of our dataset. We can transform this into the **frequency domain**, where we decompose the signal into its periodic components—frequencies, amplitudes, and phases.

The **Fourier Transform** Y(k) of the signal y(t) is expressed as follows:

This illustrates the amplitude and phase for each frequency component k. For feature extraction, we can derive the amplitudes, phases, and frequency values of the top 10 components (those with the highest amplitudes), yielding 30 features (10 each for amplitude, frequency, and phase).

Moreover, we can extend this method by utilizing **wavelets** instead of sines or cosines, leading to what is known as **Wavelet Decomposition**.

Understanding this material can be complex, so let’s proceed with some coding to demonstrate its application.

### 2.2 Code

Let's implement the basic **Fourier Transform** in practice.

First, we need to import the necessary libraries:

import numpy as np

import matplotlib.pyplot as plt

Next, let’s consider this signal as our example:

t = np.linspace(0, 1, 1000)

y = np.sin(2 * np.pi * 1 * t) + 0.4 * np.sin(2 * np.pi * 2 * t) + 2 * np.sin(2 * np.pi * 3.2 * t)

This signal comprises three primary components: one with amplitude = 1 and frequency = 1, one with amplitude = 0.4 and frequency = 2, and one with amplitude = 2 and frequency = 3.2. We can recover these through the Fourier Transform:

from scipy.fft import fft

y_f = fft(y)

frequencies = np.fft.fftfreq(len(y), d=t[1] - t[0])

We observe three distinct peaks that correspond to the respective amplitudes and frequencies.

While elaborate plotting isn’t necessary here, we could certainly create a simple function for this task:

def extract_features(signal, time_array=None, num_features=10, max_frequency=None):

# Implementation for feature extraction

pass

This function allows you to input the signal y and, optionally, the time array, the number of peaks to consider, and the maximum frequency to explore.

If we aim to extract features using **wavelets**, we would need to install the following library:

pip install PyWavelets

Then we would execute the wavelet transform.

> *Note: I delve into wavelets in detail in another article; feel free to check it out for more insights.*

## 3. Statistical-Based Feature Extraction

### 3.1 Explanation

Another viable approach for feature extraction is leveraging traditional **statistical methods**. There are numerous statistical metrics that can be computed from a signal, ranging from simple to complex:

- The
**mean**, which is simply the sum of the signal divided by its number of time steps.

- The
**variance**, indicating how much the signal deviates from the mean.

**Skewness**and**Kurtosis**, which assess the non-Gaussian characteristics of the signal distribution. Skewness measures asymmetry, while kurtosis evaluates "tailedness."**Quantiles**: These values segment the time series into intervals defined by probability ranges.**Autocorrelation**: This metric quantifies how patterned the time series is, indicating how the current values relate to past values.**Entropy**: Reflects the complexity or unpredictability of the time series.

In 2024, each of these properties can be easily computed with just a single line of code.

## 4. Time-Based Feature Extraction

### 4.1 Explanation

In this section, we focus on extracting time features, particularly peaks and valleys within the signal. We will utilize the **find_peaks** function from SciPy for this purpose.

Numerous characteristics can enhance peak extraction, such as expected width, threshold, or plateau size. If such parameters are known (e.g., you may only want to consider peaks with an amplitude greater than 2), they can be adjusted accordingly. Otherwise, default settings may suffice.

We also have the flexibility to determine how many peaks/features we want to extract. For instance, if we choose N = 10, we will identify the 10 largest peaks and valleys, resulting in 20 features (10 locations and 10 amplitudes).

### 4.2 Code

The implementation for this is straightforward:

from scipy.signal import find_peaks

peaks, _ = find_peaks(y, height=2)

Be mindful that if you specify N = 10 peaks, but only have M = 4 peaks in your signal, the remaining 6 locations and amplitudes will default to 0.

## 5. Which Method Should You Use?

Having explored four distinct classes of methods, you may wonder which one is most suitable for your analysis.

While I could take the diplomatic route and say, "It depends on the problem," it is indeed true that it often does.

In reality, if you have the opportunity for **domain-based feature extraction**, that should be your primary focus. When the underlying physics of an experiment or prior knowledge of the issue is clear, these features should be prioritized, and sometimes even considered as the only relevant ones. However, it’s also common not to have domain-based features at your disposal, and that’s perfectly acceptable.

Regarding **frequency**, **statistical**, and **time-based features**, I recommend integrating them all. Add these features to your dataset and assess their effectiveness—whether they aid, hinder, or confuse your machine learning model.

## 6. Conclusions

I appreciate your time and attention as we recap the key points covered in this article:

- We introduced the concept of
**feature extraction**, emphasizing its significance and the need for specific techniques in time series analysis. - We clarified the distinction between
**model-based**and**data-driven**feature extraction methods, focusing on the latter in this discussion. - We explored
**domain-based feature extraction techniques**, which are tailored to the specific problem at hand. - We examined
**spectral techniques**that utilize the Fourier/frequency spectrum of signals. - We covered
**statistical techniques**that derive metrics like mean, standard deviation, entropy, and autocorrelation from signals. - We looked into
**time-based techniques**, which focus on extracting peak information from signals. - We provided guidance on selecting the appropriate technique for your particular situation.

## 7. About Me

Thank you once again for your engagement; it truly means a lot!

My name is Piero Paialunga, and I am currently pursuing a Ph.D. in Aerospace Engineering at the University of Cincinnati while also working as a Machine Learning Engineer for Gen Nine. I write about AI and Machine Learning on my blog and LinkedIn. If you enjoyed this article and wish to learn more about machine learning, feel free to:

- Connect with me on
**LinkedIn**for updates and insights. - Subscribe to my
**newsletter**for new stories and the opportunity to reach out with questions. - Become a
**referred member**to access unlimited articles from myself and many other leading writers in Machine Learning and Data Science. - Interested in collaborating? Check out my rates and projects on
**Upwork**!

For inquiries or potential collaborations, you can reach me at: