Reduced execution time, automated manual analysis work, made scalable from field computers to distributed clusters.

Software Engineer
1.5 years
2023-25
At Oregon State University's Coastal Imaging Lab (CIL), the original data processing method for analyzing wave footage and LiDAR data relied on MatPIV, a MATLAB-based program. Each execution of MatPIV required 15-18 seconds to analyze a single frame pair, creating a significant bottleneck when dealing with terabytes of video and LiDAR data. Researchers needed a faster, more scalable way to process large datasets efficiently and reproducibly across different computing environments.
A high-performance data pipeline was developed to enhance the lab's wave analysis capabilities. The core algorithm was translated from MATLAB to Julia and optimized by restructuring memory-intensive data operations, reducing processing time to 6-8 seconds per frame pair. To manage large datasets, an automated Python pipeline was created to wrap the underlying Julia model, enabling reproducible workflows across Linux systems. The pipeline features a modular command-line interface using YAML configuration files for flexible parallel processing. This system allowed researchers to process data efficiently both on limited field computers and through the high-performance computing resources at the College of Earth, Ocean, and Atmospheric Sciences (CEOAS).