Modeling, learning, and leveraging functional data in smart and connected systems

Jinwei Yao

doi:10.25820/etd.007996

Back

Modeling, learning, and leveraging functional data in smart and connected systems

Dissertation

Open access

Modeling, learning, and leveraging functional data in smart and connected systems

Jinwei Yao

University of Iowa

Doctor of Philosophy (PhD), University of Iowa

Spring 2025

DOI: 10.25820/etd.007996

Files and links (1)

pdf

University_of_Iowa_Thesis_Jinwei_Re4.75 MBDownload View

Free to read and download, Open Access

Abstract

In smart and connected systems, functional data is becoming increasingly prevalent due to advancements in sensing technologies. The uniqueness of functional data is that it provides continuous and smooth measurements. For example, temperature readings in a roasting process cycle or heart rate measurements from wearable health devices are inherently time-varying, serving as functional data. Compared with scalar data, functional data exhibits intrinsic correlations in temporal and/or spatial domains, allowing for the description of patterns and trends embedded within it. When modeled correctly, these patterns can provide valuable insights for anomaly detection, performance optimization, and smart decision-making. Although functional data offers many promising benefits, fully leveraging these benefits in the context of smart and connected systems is a challenging task. This is because smart and connected systems typically generate multi-stream functional data with heterogeneous features, which requires modeling not only within-system correlation but also between-system correlation. As a result, the key to modeling, learning, and leveraging multi-stream functional data in smart and connected systems is to correctly understand the within- and between-system relationships. In practice, these relationships are highly relevant to the data acquisition processes. In general, there are two typical ways functional data is acquired: 1) online functional data, which refers to real-time data collection through sensors, and 2) offline functional data, which refers to historical data collection over an extended period. Although both cases generate a large amount of functional data, the practical challenges raised in each case are different. When multi-stream functional data is collected online, it usually faces challenges due to the conflict between massive real-time data and limited processing/storage capability. For example, in semiconductor manufacturing, thousands of sensors or sensing systems are installed to collect process information, generating terabytes of data in real time. It is indeed impossible to record all the data for in-situ analysis, but an efficient strategy is still needed to guide in-situ monitoring based on the massive functional data. On the other hand, when a large amount of functional data has already been collected, the data analysis is usually performed offline with sufficient computational power. In this case, the research focus is usually on how to leverage the massive amount of already collected functional data to benefit the understanding of new, yet data-scarce, systems, i.e., transfer learning. Nevertheless, there are few transfer learning works applicable to multi-stream functional data in smart and connected systems. The critical research gap arises from the lack of modeling components that differentiate within- and between-system correlations. For example, when modeling multiple systems with multiple functional streams in each system, existing methods focus on modeling either the correlation among multiple functional streams within a system or the general correlation among multiple systems. There is a lack of a comprehensive framework for leveraging multi-system and multi-functional stream information to benefit the learning of a specific functional stream in another system. These challenges indeed originate from engineering problems in smart and connected systems. This thesis focuses on real engineering applications facing challenges in modeling, learning, and leveraging multi-stream functional data, with a focus on online process monitoring and offline functional prediction. Adaptive resource allocation for online process monitoring To address the conflict between massive real-time data and limited processing and sensing resources, an intuitive approach is to allocate the limited resources to the most informative data streams. In the context of online process monitoring for multi-stream functional data, the goal is to detect anomalies that occur at any time in any of the streams as quickly as possible. Here, the `informative stream' refers to data streams containing anomalies. Consequently, if the limited processing and sensing resources can focus and continuously monitor these streams with anomalies, online process monitoring can achieve satisfactory results, even if most streams are not recorded or analyzed. Adaptive sampling for multi-profile data. When dealing with 1D (input) multi-stream functional data, i.e., multi-profile data, the critical challenge in adaptively selecting and sampling 'informative' streams is finding an index that represents both within- and between-stream correlations, as well as the anomaly information when the functional data deviates from normal conditions. We leverage multivariate functional principal component analysis (MFPCA) to define such an index. Specifically, the multi-profile functional data is modeled by a summation of products between eigenfunctions and MFPC scores, with the MFPC scores used as the monitoring index. This index is fed into a multivariate CUSUM control chart to implement the monitoring. We also demonstrate key theoretical properties of the proposed monitoring strategy. Adaptive sampling for partially observed image data. The proposed method can also be applied to 2D (input) multi-stream functional data, i.e., multiple images. However, directly applying it to the 2D case requires a significant computational load when constructing the 2D MFPCA. Furthermore, the monitoring performance depends on the accuracy of parameter estimation in the training stage, but there is a lack of effective estimators in 2D MFPCA that guarantee consistency. To address these issues, we apply practical strategies to reduce the computational demands of 2D MFPCA. We also demonstrate the asymptotic consistency of 2D MFPCA estimators, which justifies using MFPC scores as monitoring statistics. Thanks to these investigations, the online process monitoring of 2D multi-stream functional data inherits the properties of monitoring performance obtained in the 1D case. Transfer learning for individualized prediction The focus of transfer learning for individualized prediction is to leverage multi-system and multi-functional stream information to enhance the learning of a specific functional stream in a data-scarce system. The critical challenge in this task is differentiating function-to-function correlations within a system and system-to-system correlations between systems. Meanwhile, computational complexity is another concern, as transfer learning usually deals with a large amount of data in the source domain. As a result, it requires a comprehensive, flexible, and computationally efficient correlation structure to model the within- and between-system relationships. Transfer learning of stochastic Kriging for individualized prediction. The multi-output stochastic Kriging (SK) is investigated to model within- and between-system correlation. Specifically, a tailored covariance matrix is designed to model both types of correlations. This matrix incorporates a sub-matrix into the system-to-system covariance matrix to facilitate the modeling of function-to-function correlations. We also provide a theoretical analysis to ensure that the designed matrix and its embedded correlation components achieve superior performance in transfer learning, asymptotically. Transferred Neural Processes for Individualized Prediction. To further enhance time efficiency and modeling flexibility, we propose the transfer neural process for individualized prediction, which incorporates transfer learning into Neural Processes (NPs) for individualized prediction. The proposed method leverages both within-system information and cross-system information through feature sharing with a joint Neural Network (NN) structure to achieve transfer learning. The transferred information is exclusively directed toward the individual function of interest, with a specially designed structure to compensate for the impact of data scarcity on NP. The statistical properties of the proposed method are analyzed, providing theoretical guarantees for its predictive performance. The proposed methods are suitable for a range of engineering applications involving both online and offline multi-stream functional data. They address emerging challenges in the modeling, learning, and application of functional data within smart and connected systems.

Regression

Data Scarcity

Function Data

Individualized Prediction

Online Monitoring

Transfer Learning

Details

Title: Subtitle: Modeling, learning, and leveraging functional data in smart and connected systems
Creators: Jinwei Yao
Contributors: Chao Wang (Advisor)
Andrew Kusiak (Committee Member)
Yong Chen (Committee Member)
Xin Zan (Committee Member)
Resource Type: Dissertation
Degree Awarded: Doctor of Philosophy (PhD), University of Iowa
Degree in: Industrial Engineering
Date degree season: Spring 2025
DOI: 10.25820/etd.007996
Publisher: University of Iowa
Number of pages: xvii, 199 pages
Language: English
Date submitted: 04/29/2025
Description illustrations: illustrations (some color)
Description bibliographic: Includes bibliographical references (page 183-199).
Public Abstract (ETD): In smart and connected systems, functional data is becoming increasingly prevalent due to advancements in sensing technologies. The uniqueness of functional data is that it provides continuous and smooth measurements. For example, temperature readings in a roasting process cycle or heart rate measurements from wearable health devices are inherently time-varying, serving as functional data. Compared with scalar data, functional data exhibits intrinsic correlations in temporal and/or spatial domains, allowing for the description of patterns and trends embedded within it. When modeled correctly, these patterns can provide valuable insights for anomaly detection, performance optimization, and smart decision-making.

Although functional data offers many promising benefits, fully leveraging these benefits in the context of smart and connected systems is a challenging task. This is because smart and connected systems typically generate multi-stream functional data with heterogeneous features, which requires modeling not only within-system correlation but also between-system correlation. As a result, the key to modeling, learning, and leveraging multi-stream functional data in smart and connected systems is to correctly understand the within- and between-system relationships. In practice, these relationships are highly relevant to the data acquisition processes. In general, there are two typical ways the functional data is acquired: 1) online functional data, which refers to real-time data collection through sensors, and 2) offline functional data, which refers to historical data collection over an extended period. Although both cases generate a large amount of functional data, the practical challenges raised in each case are different. When multi-stream functional data is collected online, it usually faces challenges due to the conflict between massive real-time data and limited processing/storage capability. For example, in semiconductor manufacturing, thousands of sensors or sensing systems are installed to collect process information, generating terabytes of data in real time. It is indeed impossible to record all the data for in-situ analysis, but an efficient strategy is still needed to guide in-situ monitoring based on the massive functional data. On the other hand, when a large amount of functional data has already been collected, the data analysis is usually performed offline with sufficient computational power. In this case, the research focus is usually on how to leverage the massive amount of already collected functional data to benefit the understanding of new, yet data-scarce, systems, i.e., transfer learning. Nevertheless, there are few transfer learning works applicable to multi-stream functional data in smart and connected systems. The critical research gap arises from the lack of modeling components that differentiate within- and between-system correlations. For example, when modeling multiple systems with multiple functional streams in each system, existing methods focus on modeling either the correlation among multiple functional streams within a system or the general correlation among multiple systems. There is a lack of a comprehensive framework for leveraging multi-system and multi-functional stream information to benefit the learning of a specific functional stream in another system.

These challenges indeed originate from engineering problems in smart and connected systems. This thesis focuses on real engineering applications facing challenges in modeling, learning, and leveraging multi-stream functional data, with a focus on online process monitoring and offline functional prediction.

Adaptive resource allocation for online process monitoring

To address the conflict between massive real-time data and limited processing and sensing resources, an intuitive approach is to allocate the limited resources to the most informative data streams. In the context of online process monitoring for multi-stream functional data, the goal is to detect anomalies that occur at any time in any of the streams as quickly as possible. Here, the ‘informative stream’ refers to data streams containing anomalies. Consequently, if the limited processing and sensing resources can focus and continuously monitor these streams with anomalies, online process monitoring can achieve satisfactory results, even if most streams are not recorded or analyzed.

Adaptive sampling for multi-profile data. When dealing with 1D (input) multi-stream functional data, i.e., multi-profile data, the critical challenge in adaptively selecting and sampling ‘informative’ streams is finding an index that represents both within- and between-stream correlations, as well as the anomaly information when the functional data deviates from normal conditions. We leverage multivariate functional principal component analysis (MF-PCA) to define such an index. Specifically, the multi-profile functional data is modeled by a summation of products between eigenfunctions and MFPC scores, with the MFPC scores used as the monitoring index. This index is fed into a multivariate CUSUM control chart to implement the monitoring. We also demonstrate key theoretical properties of the proposed monitoring strategy.

Adaptive sampling for partially observed image data. The proposed method can also be applied to 2D (input) multi-stream functional data, i.e., multiple images. However, directly applying it to the 2D case requires a significant computational load when constructing the 2D MFPCA. Furthermore, the performance of the monitoring depends on the accuracy of parameter estimation in the training stage, but there is a lack of effective estimators in 2D MFPCA that guarantee consistency. To address these issues, we apply practical strategies to reduce the computational demands of 2D MFPCA. We also demonstrate the asymptotic consistency of 2D MFPCA estimators, which justifies using MFPC scores as monitoring statistics. Thanks to these investigations, the online process monitoring of 2D multi-stream functional data inherits the properties of monitoring performance obtained in the 1D case.

Transfer learning for individualized prediction

The focus of transfer learning for individualized prediction is to leverage multi-system and multi-functional stream information to enhance the learning of a specific functional stream in a data-scarce system. The critical challenge in this task is differentiating function-to-function correlations within a system and system-to-system correlations between systems. Meanwhile, computational complexity is another concern, as transfer learning usually deals with a large amount of data in the source domain. As a result, it requires a comprehensive, flexible, and computationally efficient correlation structure to model the within- and between-system relationships.

Transfer learning of stochastic Kriging for individualized prediction. The multi-output stochastic Kriging (SK) is investigated to model within- and between-system correlation. Specifically, a tailored covariance matrix is designed to model both types of correlations. This matrix incorporates a sub-matrix into the system-to-system covariance matrix to facilitate the modeling of function-to-function correlations. We also provide a theoretical analysis to ensure that the designed matrix and its embedded correlation components achieve superior performance in transfer learning, asymptotically.

Transfer Learning for Individualized Neural Process. To further enhance time efficiency and modeling flexibility of transfer learning for individualized prediction, we propose a transfer learning architecture based on Neural Process. This architecture efficiently leverages both within-system and between-system information through shared hyper features in the neural networks. The transferred knowledge is exclusively directed toward the individual function of interest using a specially designed structure, which mitigates the effects of data scarcity. Theoretical analysis confirms that the proposed method defines a valid stochastic process and can be trained by optimizing a tailored evidence lower bound (ELBO).

The proposed methods are suitable for a range of engineering applications involving both online and offline multi-stream functional data. They address emerging challenges in the modeling, learning, and application of functional data within smart and connected systems.
Academic Unit: Industrial and Systems Engineering
Record Identifier: 9984831229402771

Metrics

1 Record Views