Classify Time-Series Faults with Pytorch
Classifying Time-Series Faults with PyTorch
Predictive maintenance and fault detection are critical in industrial applications to prevent severe equipment damage, reduce costly downtime, and ensure worker safety. This blog post explores how to preprocess time-series data and train a Multilayer Perceptron (MLP) using PyTorch to classify faults efficiently. For this project, we used the MAFAULDA dataset, which is composed of multivariate time-series acquired by sensors on a Machinery Fault Simulator (MFS). All the code presented in this post is accessible in my github repository.
These are the steps covered in the project:
- Exploratory Data Analysis (EDA)
- Preprocessing/Feature Engineering
- Downsampling & Rolling Mean
- Data Transformation & Visualization (t-SNE)
- Building a Multi-Layer Perceptron (MLP) with PyTorch
- Custom Dataset Class
- Model Architecture & Training
- Evaluation Metrics (Accuracy, F1-score, Precision, Recall, AUC-ROC)
Let’s dive in!
1. Exploratory Data Analysis (EDA)
Understanding the Data
We are dealing with vibration and microphone signals recorded from a motor simulator under two conditions:
1. Normal
2. Imbalance (faulty)
There are other types of faulty data present in the MAFAULDA’s website, but we only consider normal and imbalance (6g) datasets. After donwloading the data, we have this folder structure:
data
├── imbalance
│ └── imbalance
│ ├── 10g
│ ├── 15g
│ ├── 20g
│ ├── 25g
│ ├── 30g
│ ├── 35g
│ └── 6g
└── normal
└── normal
At each labeled folder presented above, there are multiple recordings stored in CSV files. Each recording is about 5 seconds at a 50 kHz sampling rate, resulting in 250,000 samples per sensor. The dataset includes eight features:
tachometer
underhang_axial
underhang_radiale
underhang_tangential
overhang_axial
overhang_radiale
overhang_tangential
microphone
The first step is to get a sense of data distributions, missing values, basic statistics, etc. In EDA.ipynb
, we use ydata-profiling
package for a quickly review of how data looklike:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ydata_profiling import ProfileReport
# Loading a CSV file for normal condition
= [
col_names 'tachometer', 'underhang_axial', 'underhang_radiale', 'underhang_tangential',
'overhang_axial', 'overhang_radiale', 'overhang_tangential', 'microphone'
]= pd.read_csv("path/to/some_normal.csv", names=col_names)
normal_df
# Generate HTML report (ydata-profiling)
= ProfileReport(normal_df, title="Normal Data")
profile "normal_data_report.html") profile.to_file(
The generated report normal_data_report.html
reveals distributions and correlations. This helps us spot which features might be redundant or highly correlated. For instance, you might notice that microphone
and underhang_radiale
have high correlation, which could inform feature selection later. Also, we noticed how the features are normally distributed, which could be lead to a smoother convergence when training models using Gradient descent. Besides being normally distributed, standardizing features (zero mean, unit variance) remains an important aspect of preprocessing.
Visualizing Time-Series
To visualize how signal data changes over time, we can create quick plots of 50,000 samples (1 second snippet) from each class:
def plot_timeseries(df, columns, n_samples=50000):
=(12, len(columns)))
plt.figure(figsizefor i, col in enumerate(columns, 1):
len(columns), 1, i)
plt.subplot(
plt.plot(df[col].values[:n_samples])f"Time Series of {col}")
plt.title(
plt.tight_layout()
plt.show()
=normal_df.columns) plot_timeseries(normal_df, columns
2. Feature Engineering
High-frequency time-series data can be noisy and large, so feature engineering becomes crucial. Two primary transformations were used in FeatureEngineering.ipynb
and ultimately in the main pipeline:
- Downsampling: Consolidates raw data by averaging every b samples. In the repository,
b=2500
was used, reducing 250,000 samples per file to just 100 samples.
- Rolling Mean: Applies a moving average (with a given window size) to smooth abrupt fluctuations and incorporate temporal context into each feature.
In FeatureEngineering.ipynb
, you’ll see:
def downSampler(data, b):
"""
Downsamples the given DataFrame by averaging every 'b' rows.
"""
return data.groupby(data.index // b).mean().reset_index(drop=True)
def rolling_mean_data(df, window_size=100, columns=None):
"""
Applies a rolling mean transformation to specified columns.
"""
if columns is None:
= df.select_dtypes(include=[np.number]).columns
columns
= df.copy()
df_copy = df_copy[columns].rolling(window=window_size, min_periods=1).mean()
df_copy[columns] return df_copy
# Usage:
= downSampler(normal_df, 2500)
normal_df = rolling_mean_data(normal_df, window_size=100) normal_df
After applying both transformation, the behaviour of each feature across the time is presented below:
t-SNE Visualization for Feature Separability
After feature transformations, we often test whether the classes (normal vs. imbalance) are more distinguishable, and we can visually check this using dimensionality reduction methods such as Principal component analysis for linear transformation or t-distributed Stochastic Neighbor Embedding for non-linear transformation. In this case, we apply t-SNE as it is capable of dealing with non-linear transformation.
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler
def plot_tsne(df, label_column='label'):
= df.select_dtypes(include=[np.number]).drop(columns=[label_column], errors='ignore')
features = StandardScaler().fit_transform(features)
features_scaled
= TSNE(n_components=2, perplexity=30, random_state=42)
tsne = tsne.fit_transform(features_scaled)
tsne_results
# Plot
=(8, 6))
plt.figure(figsize0], tsne_results[:, 1], c=df[label_column], cmap="viridis", alpha=0.7)
plt.scatter(tsne_results[:, "t-SNE Visualization")
plt.title( plt.show()
The following t-SNE plot clearly shows how both classes can be distinguished visually. This visualization gives us confidence that there is a non-linear transformation capable of producing a rule that will correctly classify the binary time-series dataset.
3. Building & Training a Multi-Layer Perceptron (MLP) in PyTorch
While more sophisticated sequence models (e.g., LSTM, 1D CNNs) are more relevant for time-series data, a Multi-Layer Perceptron is a basic architecture that does not intrinsically include temporal dependencies. Since we have applied feature engineering capable of introducing basic temporal effects to features, the MLP might be sufficient for correctly classify this dataset. In general, if sequential dependencies matter, alternatives like RNN, LSTM, 1-D CNN, or Transformers may perform better, whereas MLP is effective for feature-based classification when raw time series is converted into useful representations through feature engineering.
3.1 The Dataset & DataLoader
In PyTorch, we create a custom Dataset
class to handle how features and labels are fed to the model:
import torch
from torch.utils.data import Dataset, DataLoader
class MachineryDataset(Dataset):
def __init__(self, data, label_column='label'):
self.labels = data[label_column].values.astype(np.float32)
self.features = data.drop(columns=[label_column, 'time'], errors='ignore').values.astype(np.float32)
def __len__(self):
return len(self.features)
def __getitem__(self, idx):
= self.features[idx]
x = self.labels[idx]
y return x, y
__getitem__
: Returns a single sample(features, label)
.__len__
: Provides the total length of the dataset.
We also build a DataLoader object that batches the data and shuffles it during training:
= MachineryDataset(all_data, label_column='label')
train_dataset = DataLoader(train_dataset, batch_size=64, shuffle=True) train_loader
3.2 MLP Model Architecture
A simple feed-forward neural network can be built using fully connected layers (nn.Linear
):
import torch.nn as nn
class TimeSeriesMLP(nn.Module):
def __init__(self, input_dim, hidden_dim, n_layers, dropout_prob=0.3):
super(TimeSeriesMLP, self).__init__()
= []
layers
layers.append(nn.Linear(input_dim, hidden_dim))
layers.append(nn.ReLU())
layers.append(nn.Dropout(dropout_prob))
for _ in range(n_layers - 1):
layers.append(nn.Linear(hidden_dim, hidden_dim))
layers.append(nn.ReLU())
layers.append(nn.Dropout(dropout_prob))
1))
layers.append(nn.Linear(hidden_dim,
layers.append(nn.Sigmoid())
self.net = nn.Sequential(*layers)
def forward(self, x):
return self.net(x)
Hidden Layers and Activation:
The model consists of
n_layers
hidden layers, each applying a linear transformation (nn.Linear
), followed by ReLU activation (nn.ReLU()
).ReLU (Rectified Linear Unit) is used because it helps mitigate the vanishing gradient problem and accelerates training.
Dropout Regularization:
A dropout layer (
nn.Dropout(dropout_prob)
) is applied after each hidden layer to reduce overfitting.Dropout randomly disables a fraction of neurons during training, forcing the model to learn more robust features.
Output Layer and Activation:
The final layer maps the last hidden representation to a single output neuron using
nn.Linear(hidden_dim, 1)
.Sigmoid activation (
nn.Sigmoid()
) is applied to ensure the output is in the range [0, 1], making it suitable for binary classification.
3.3 Training Loop
The train_model
function in src/main.py
file trains the MLP using a binary classification approach, tracking both loss and accuracy.
import torch.optim as optim
import torch
def train_model(model, train_loader, val_loader, epochs=50, lr=0.0005):
= nn.BCELoss()
criterion = optim.Adam(model.parameters(), lr=lr)
optimizer
= {
history 'train_loss': [], 'val_loss': [], 'train_acc': [], 'val_acc': []
}
for epoch in range(epochs):
# Enable training mode
model.train() = 0
epoch_train_loss = 0
correct_train = 0
total_train
for inputs, targets in train_loader:
optimizer.zero_grad()= model(inputs).squeeze() # Forward pass
outputs = criterion(outputs, targets) # Compute loss
loss # Backpropagation
loss.backward() # Update weights
optimizer.step()
+= loss.item()
epoch_train_loss = (outputs > 0.5).float() # Convert probabilities to binary predictions
preds += (preds == targets).sum().item()
correct_train += targets.size(0)
total_train
= correct_train / total_train # Compute training accuracy
train_accuracy
eval() # Enable evaluation mode
model.= 0
epoch_val_loss = 0
correct_val = 0
total_val
with torch.no_grad():
for inputs, targets in val_loader:
= model(inputs).squeeze()
outputs = criterion(outputs, targets)
val_loss += val_loss.item()
epoch_val_loss
= (outputs > 0.5).float()
preds += (preds == targets).sum().item()
correct_val += targets.size(0)
total_val
= correct_val / total_val # Compute validation accuracy
val_accuracy
# Store metrics for analysis
'train_loss'].append(epoch_train_loss / len(train_loader))
history['val_loss'].append(epoch_val_loss / len(val_loader))
history['train_acc'].append(train_accuracy)
history['val_acc'].append(val_accuracy)
history[
print(f"Epoch {epoch+1}/{epochs}, Validation Loss: {epoch_val_loss / len(val_loader):.4f}, Validation Accuracy: {val_accuracy:.4f}")
return history
- Binary Cross-Entropy Loss (
nn.BCELoss()
):- Suitable for binary classification, where the target labels are
0
or1
.
- Suitable for binary classification, where the target labels are
- Adam Optimizer (
optim.Adam()
):- Adaptive learning rate for better convergence.
- Accuracy Tracking:
- Uses thresholding (
outputs > 0.5
) to determine class predictions. - Compares predictions to true labels (
targets
) to compute accuracy.
- Uses thresholding (
- Training & Validation Loss History:
- Logs loss and accuracy at each epoch for performance monitoring.
4. Putting It All Together
Finally, in src/main.py
we orchestrate the entire workflow: 1. Data Ingestion & Labeling
2. Feature Engineering (Downsampling, Rolling Mean, StandardScaler)
3. Splitting (Time-series split into train/val/test sets)
4. MLP Training
5. Evaluation: Accuracy, F1, Precision, Recall, AUC-ROC
Below is a condensed snippet showing the pipeline’s main logic:
# Load normal and imbalance data
= load_filtered_dfs(data_path, "normal")
normal_dfs = load_filtered_dfs(data_path, "imbalance-6g")
imbalance_dfs
# Apply augmentation (downsampling + rolling) to each DF, then concat
= pd.concat([augment_features(df) for df in normal_dfs], ignore_index=True)
normal_df = pd.concat([augment_features(df) for df in imbalance_dfs], ignore_index=True)
imbalance_df
# Label the data
"label"] = 0
normal_df["label"] = 1
imbalance_df[
= pd.concat([normal_df, imbalance_df], ignore_index=True)
all_data
# Show correlation matrix
save_correlation_matrix(all_data)
# t-SNE visualization
='label', output_file="../figures/tsne_visualization.png")
plot_tsne(all_data, label_column
# Time-series split (train/val/test)
= time_series_split(all_data)
train_data, val_data, test_data
# Normalize features
= StandardScaler()
scaler -1] = scaler.fit_transform(train_data.iloc[:, :-1])
train_data.iloc[:, :-1] = scaler.transform(val_data.iloc[:, :-1])
val_data.iloc[:, :-1] = scaler.transform(test_data.iloc[:, :-1])
test_data.iloc[:, :
# Datasets & Loaders
= MachineryDataset(train_data)
train_dataset = MachineryDataset(val_data)
val_dataset = MachineryDataset(test_data)
test_dataset
= DataLoader(train_dataset, batch_size=32, shuffle=False)
train_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
val_loader
# Initialize and train MLP
= TimeSeriesMLP(
model =train_dataset.features.shape[1],
input_dim=3,
hidden_dim=2
n_layers
)= train_model(model, train_loader, val_loader) # default epochs=50
history
# Evaluate
= DataLoader(test_dataset, batch_size=32, shuffle=False)
test_loader = evaluate_model(model, test_loader)
test_metrics ="../figures/evaluation_plot.png") plot_evaluation_results(test_metrics, output_file
Evaluation & Metrics
After training, we apply the model to the test set. For binary classification, we typically measure:
- Accuracy: Ratio of correct predictions over total.
- F1-score: Harmonic mean of precision & recall.
- Precision: Among predicted positives, how many are truly positive?
- Recall: Among all actual positives, how many did we predict correctly?
Excellent performance, which suggests that even a relatively straightforward MLP can separate normal vs. imbalance classes well, thanks to feature engineering (downsampling + rolling mean).
5. Next Steps & Enhancements
- Hyperparameter Optimization
- Test different hidden layer sizes, dropout probabilities, and learning rates.
- Consider
GridSearch
orBayesian Optimization
for an automated approach.
- Test different hidden layer sizes, dropout probabilities, and learning rates.
- Include More Fault Conditions
- The MAFAULDA dataset has multiple fault types (unbalance, misalignment, bearing faults). Extending beyond just normal vs. imbalance classification can add realism.
- Sequence Models
- For a deeper time-series approach, experiment with CNNs (1D Convolutions) or LSTM architectures. Those are better at capturing sequential dependencies without relying only on rolling averages.
- Real-Time Inference
- Deploy the trained model in a streaming or edge environment for real-time fault detection in industrial settings.
Conclusion
We’ve walked through the basic steps to a complete pipeline for classifying mechanical faults using time-series data. The key lessons include:
- EDA is indispensable for quickly assessing data quality and distributions.
- Feature Engineering (downsampling, rolling means) can convert raw time series into useful representations to train deep learning models that does not include temporal dependencies inherently.
- Even a basic MLP can achieve high accuracy if the features reflect the underlying process well.
- Evaluation metrics (Accuracy, F1, Precision, Recall) are critical to understand true performance.
If you’re looking to adapt this pipeline to your own fault classification tasks—whether it’s rotating machinery, bearings, or other mechanical equipment—these concepts should be straightforward to customize. Feel free to explore the MAFAULDA dataset for your own experiments or extend it with advanced deep learning architectures.
Thanks for reading, and happy fault detecting!