Research Spotlight: Improving Data Sampling Methods

Dr. Xiaochen Xian, assistant professor in the Department of Industrial & Systems Engineering and Warren B. Nelms Institute faculty member, and her students are currently researching new methods that can make sampling and monitoring data more efficient. These methods have significant real-world applications including COVID-19 pandemic surveillance and faster wildfire detection.

Spatial Rank-Based Augmentation for Nonparametric Online Monitoring and Adaptive Sampling of Big Data Streams

Xin Zan, Di Wang, Xiaochen Xian

In the era of Internet of Things (IoT), evolving technology of data acquisition devices in communicating-actuating networks enables the generation of big data streams shared across platforms for remote and efficient decision making of many critical systems in various real-world applications including but not limited to manufacturing, environmental geology, and healthcare. In practice, the real-time monitoring of such big data streams remains a challenging task mainly due to their complexity in interrelationships, large volume, and high velocity, which places prohibitive demands on monitoring methodologies and resources, e.g., online data acquisition, communication, storage and analysis, thus restricting the accessibility to full observations during online monitoring.

To tackle the challenges of monitoring general big data streams with only partial observations available under resource constraints, we propose a spatial rank-based method incorporated with effective data augmentation techniques for the online unobservable data streams to analytically inform the monitoring and sampling decisions based only on partially observed data streams. By exploiting historical data, the proposed method preserves strong descriptive power of general big data streams under partial observations and thus allows effective monitoring and equitable sampling, which is free of simplified assumptions (e.g., exchangeability) compared to existing methods. Theoretical investigations are carried out to guarantee the superiority of the sampling performance over existing methods.

Positive rates of 54 regions (data streams) in the Brown county from March 2 to June 23, 2020 (114 days), where an abrupt change occurs in day 88.


Data-driven Pathwise Sampling Approaches for Online Anomaly Detection

Miao Bai, Dongmin Li, Xiaochen Xian

Moving vehicle-based sensors (MVSs), such as unmanned aerial vehicles, unmanned surface vehicles and unmanned underwater vehicles, have been increasingly used for real-time sensing and anomaly detection in various applications such as the detection of wildfires and oil spills. In this work, we propose data-driven sampling strategies using MVSs to quickly identify abrupt changes in an area of interest in real time considering their movement constraints. In the applications, the number of the MVSs is much smaller than the number of locations in the area, so only part of the locations can be observed by an MVS each time. To tackle challenges due to this partial observability of online observations, we integrate statistical process control methods and mathematical optimization to monitor the global status of the area and adaptively adjust paths of MVSs to sample from suspicious locations based on real-time data. We provide theoretical investigations and conduct simulations to validate the superior performance of the proposed methods. In a numerical study based on real-world wildfire data, we illustrate that our proposed strategies are able to detect wildfires much earlier than benchmark methods and can significantly reduce wildfire-related costs.

Satellite Image of wildfire detected.