Modeling, learning, and leveraging functional data in smart and connected systems
Abstract
Details
- Title: Subtitle
- Modeling, learning, and leveraging functional data in smart and connected systems
- Creators
- Jinwei Yao
- Contributors
- Chao Wang (Advisor)Andrew Kusiak (Committee Member)Yong Chen (Committee Member)Xin Zan (Committee Member)
- Resource Type
- Dissertation
- Degree Awarded
- Doctor of Philosophy (PhD), University of Iowa
- Degree in
- Industrial Engineering
- Date degree season
- Spring 2025
- DOI
- 10.25820/etd.007996
- Publisher
- University of Iowa
- Number of pages
- xvii, 199 pages
- Copyright
- Copyright 2025 Jinwei Yao
- Language
- English
- Date submitted
- 04/29/2025
- Description illustrations
- illustrations (some color)
- Description bibliographic
- Includes bibliographical references (page 183-199).
- Public Abstract (ETD)
In smart and connected systems, functional data is becoming increasingly prevalent due to advancements in sensing technologies. The uniqueness of functional data is that it provides continuous and smooth measurements. For example, temperature readings in a roasting process cycle or heart rate measurements from wearable health devices are inherently time-varying, serving as functional data. Compared with scalar data, functional data exhibits intrinsic correlations in temporal and/or spatial domains, allowing for the description of patterns and trends embedded within it. When modeled correctly, these patterns can provide valuable insights for anomaly detection, performance optimization, and smart decision-making.
Although functional data offers many promising benefits, fully leveraging these benefits in the context of smart and connected systems is a challenging task. This is because smart and connected systems typically generate multi-stream functional data with heterogeneous features, which requires modeling not only within-system correlation but also between-system correlation. As a result, the key to modeling, learning, and leveraging multi-stream functional data in smart and connected systems is to correctly understand the within- and between-system relationships. In practice, these relationships are highly relevant to the data acquisition processes. In general, there are two typical ways the functional data is acquired: 1) online functional data, which refers to real-time data collection through sensors, and 2) offline functional data, which refers to historical data collection over an extended period. Although both cases generate a large amount of functional data, the practical challenges raised in each case are different. When multi-stream functional data is collected online, it usually faces challenges due to the conflict between massive real-time data and limited processing/storage capability. For example, in semiconductor manufacturing, thousands of sensors or sensing systems are installed to collect process information, generating terabytes of data in real time. It is indeed impossible to record all the data for in-situ analysis, but an efficient strategy is still needed to guide in-situ monitoring based on the massive functional data. On the other hand, when a large amount of functional data has already been collected, the data analysis is usually performed offline with sufficient computational power. In this case, the research focus is usually on how to leverage the massive amount of already collected functional data to benefit the understanding of new, yet data-scarce, systems, i.e., transfer learning. Nevertheless, there are few transfer learning works applicable to multi-stream functional data in smart and connected systems. The critical research gap arises from the lack of modeling components that differentiate within- and between-system correlations. For example, when modeling multiple systems with multiple functional streams in each system, existing methods focus on modeling either the correlation among multiple functional streams within a system or the general correlation among multiple systems. There is a lack of a comprehensive framework for leveraging multi-system and multi-functional stream information to benefit the learning of a specific functional stream in another system.
These challenges indeed originate from engineering problems in smart and connected systems. This thesis focuses on real engineering applications facing challenges in modeling, learning, and leveraging multi-stream functional data, with a focus on online process monitoring and offline functional prediction.
Adaptive resource allocation for online process monitoring
To address the conflict between massive real-time data and limited processing and sensing resources, an intuitive approach is to allocate the limited resources to the most informative data streams. In the context of online process monitoring for multi-stream functional data, the goal is to detect anomalies that occur at any time in any of the streams as quickly as possible. Here, the ‘informative stream’ refers to data streams containing anomalies. Consequently, if the limited processing and sensing resources can focus and continuously monitor these streams with anomalies, online process monitoring can achieve satisfactory results, even if most streams are not recorded or analyzed.
- Adaptive sampling for multi-profile data. When dealing with 1D (input) multi-stream functional data, i.e., multi-profile data, the critical challenge in adaptively selecting and sampling ‘informative’ streams is finding an index that represents both within- and between-stream correlations, as well as the anomaly information when the functional data deviates from normal conditions. We leverage multivariate functional principal component analysis (MF-PCA) to define such an index. Specifically, the multi-profile functional data is modeled by a summation of products between eigenfunctions and MFPC scores, with the MFPC scores used as the monitoring index. This index is fed into a multivariate CUSUM control chart to implement the monitoring. We also demonstrate key theoretical properties of the proposed monitoring strategy.
- Adaptive sampling for partially observed image data. The proposed method can also be applied to 2D (input) multi-stream functional data, i.e., multiple images. However, directly applying it to the 2D case requires a significant computational load when constructing the 2D MFPCA. Furthermore, the performance of the monitoring depends on the accuracy of parameter estimation in the training stage, but there is a lack of effective estimators in 2D MFPCA that guarantee consistency. To address these issues, we apply practical strategies to reduce the computational demands of 2D MFPCA. We also demonstrate the asymptotic consistency of 2D MFPCA estimators, which justifies using MFPC scores as monitoring statistics. Thanks to these investigations, the online process monitoring of 2D multi-stream functional data inherits the properties of monitoring performance obtained in the 1D case.
Transfer learning for individualized prediction
The focus of transfer learning for individualized prediction is to leverage multi-system and multi-functional stream information to enhance the learning of a specific functional stream in a data-scarce system. The critical challenge in this task is differentiating function-to-function correlations within a system and system-to-system correlations between systems. Meanwhile, computational complexity is another concern, as transfer learning usually deals with a large amount of data in the source domain. As a result, it requires a comprehensive, flexible, and computationally efficient correlation structure to model the within- and between-system relationships.
- Transfer learning of stochastic Kriging for individualized prediction. The multi-output stochastic Kriging (SK) is investigated to model within- and between-system correlation. Specifically, a tailored covariance matrix is designed to model both types of correlations. This matrix incorporates a sub-matrix into the system-to-system covariance matrix to facilitate the modeling of function-to-function correlations. We also provide a theoretical analysis to ensure that the designed matrix and its embedded correlation components achieve superior performance in transfer learning, asymptotically.
- Transfer Learning for Individualized Neural Process. To further enhance time efficiency and modeling flexibility of transfer learning for individualized prediction, we propose a transfer learning architecture based on Neural Process. This architecture efficiently leverages both within-system and between-system information through shared hyper features in the neural networks. The transferred knowledge is exclusively directed toward the individual function of interest using a specially designed structure, which mitigates the effects of data scarcity. Theoretical analysis confirms that the proposed method defines a valid stochastic process and can be trained by optimizing a tailored evidence lower bound (ELBO).
The proposed methods are suitable for a range of engineering applications involving both online and offline multi-stream functional data. They address emerging challenges in the modeling, learning, and application of functional data within smart and connected systems.
- Academic Unit
- Industrial and Systems Engineering
- Record Identifier
- 9984831229402771