Scientists are generating ever-growing scale of data with supercomputers in this big data era. Visualization has been increasingly important for analyzing, understanding, and revealing insights in data. Our research interests generally involve in scalable visualization and analysis of large scientific data.
In the context of our studies, the "scalability" is not limited to its narrow implications, but also the follows:
Our research focus on multivariate, multi-valued, and ensemble flow data, which consists of both scalar and vector attributes in the context of our studies. Alternative to traditional flow visualization methods which have been studied for decades, we emphasize on the scalable analysis of indirect and multi-faceted features. We also extensively use both Eulerian and Lagrangian method for the flow data analysis from multiple perspectives.
It is challenging to investigate the multivariate features and the correlations between the scalar attributes in real applications. We map the multivariate samples into the seamlessly integrated Parallel Coordinates Plots (PCP) and Multi-Dimensional Projection (MDS) plots for feature identification and selection. The PCP visualizes the numerical distributions for each attribute, and the MDS presents the similarities of the samples. The selected features are further visualized by volume rendering, after Gaussian transfer functions are constructed from user selection. In the case study of a hurricane simulation, the hurricane eye surroundings (red) shows lower pressure, medium temperature, and high wind speed. Various insightful features can be selected and explored in different views. A scalable and parallel system is developed on distributed many-core environments, to accelerate the multivariate volume rendering, PCP rendering, and the MDS computation for the analysis. In the oil exploration application. To find the underground flow path from the scalar intensities collected by seismic devices, is a significant step for oil and gas searching. However, neither the classic volume exploration methods, e.g. transfer function design, nor the traditional volume cut algorithms can be directly used due to its three natural properties, various compositions, discontinuity and noise. Thus new interactive approach to visualize the underground flow path is needed.
We also couple the multivariate analysis with flow advection, which makes it possible to discover and explain complicated transport phenomena in various studies like carbon footprint, pollution, etc. However, there is a lack of such compound analysis in current flow visualization practice. Lagrangian-based Attribute Space Projection (LASP) helps users identify features with similar pathline attributes. Compared to our multivariate analysis for scalar attributes, which can be categorized as an Eulerian-based method, LASP not only accounts for the scalar values at current location, but also the locations along the pathlines. For example, the orange cluster in the figure demonstrates the transportation process of water vapor to the periphery regions. The water vapor mixing ratio decreases as the particles traverse, meanwhile the temperature and pressure also drop as they go outwards. The projection, which shows the similarities, provides a good entry point for the feature selection and refinement. Due to the high computational cost of LASP, we tightly integrate the MapReduce-style particle tracer DStep [Kendall et al. 2011] and the scalable projection algorithm for large-scale analysis.
Furthermore, to alleviate the interaction burden of trial-and-error in projection plot, we present a novel feature extraction approach called FLDA for unsteady flow fields based on Latent Dirichlet allocation (LDA) model. Analogous to topic modeling in text analysis, in our approach, pathlines and features in a given flow field are defined as documents and words respectively. Flow topics are then extracted based on Latent Dirichlet allocation Different from other feature extraction methods, our approach clusters pathlines with probabilistic assignment, and aggregates features to meaningful topics at the same time. We build a prototype system to support exploration of unsteady flow field with our proposed LDA-based method Interactive techniques are also developed to explore the extracted topics and to gain insight from the data.
We provide comparative methods for ensemble flow datasets, which are increasingly common in computational science, yet few effective comparative tools have developed for constituents comparison, especially for vector fields. We compute differences between ensemble members are measured with the Lagrangian-based distance metric, instead of fixed locations with the Eulerian specification. For example, the transportation behaviors of wind fields appear totally different in the regions. Although the concept of this idea seems to be straightforward, it requires extraordinary computational power and memory for the pathline computation and comparision. DStep is extended with improved intermediate data management mechanism. The system shows good scalability in the benchmark, which was performed in National Supercomputer Center in Jinan, Shandong, China.
Besides the scalable parallel computation for ensemble data, we have also proposed a longest common subsequence (LCSS)-based approach to compute the distance among vector field ensembles. By measuring how many common blocks the ensemble pathlines pass through, the LCSS distance defines the similarity among vector field ensembles by counting the number of shared domain data blocks. Compared with traditional methods (e.g., pointwise Euclidean distance or dynamic time warping distance), the proposed approach is robust to outliers, missing data, and the sampling rate of the pathline timesteps. Taking advantage of smaller and reusable intermediate output, visualization based on the proposed LCSS approach reveals temporal trends in the data at low storage cost and avoids tracing pathlines repeatedly. We evaluate our method on both synthetic data and simulation data, demonstrating the robustness of the proposed approach.
We firstly trace pathlines in parallel from the raw ensemble data. By employing the parallel LCSS sequence encoding and distance metric, the generated pathlines then are encoded into LCSS sequences for further visualization and multiscale temporal comparison.
In addition to the data analysis methods with scalability, we also introduce novel user interaction for the visualization of flow data. In WYSIWYG VolViz, sketch-based user interface is provided for flexible exploration of scalar fields, without editing visualization parameters, e.g. transfer functions.
When computing integral curves and integral surfaces for large-scale unsteady flow fields, a major bottleneck is the widening gap between data access demands and the available bandwidth (both I/O and in-memory). In this work, we explore a novel advection-based scheme to manage flow field data for both efficiency and scalability. The key is to first partition flow field into blocklets (e.g. cells or very fine-grained blocks of cells), and then (pre)fetch and manage blocklets on-demand using a parallel key-value store. The benefits are (1) greatly increasing the scale of local-range analysis (e.g. source-destination queries, streak surface generation) that can fit within any given limit of hardware resources; (2) improving memory and I/O bandwidth-efficiencies as well as the scalability of naive task-parallel particle advection. We demonstrate our method using a prototype system that works on workstation and also in supercomputing environments. Results show significantly reduced I/O overhead compared to accessing raw flow data, and also high scalability on a supercomputer for a variety of applications.
Furthermore, based on the observation that more sophisticated access patterns exist in particle tracing, we present a novel high-order access dependencies-based model for efficient pathline computation in unsteady flow visualization. By taking longer access sequences into account to model more sophisticated data access patterns in particle tracing, our method greatly improves the accuracy and reliability in data access prediction. In our work, high-order access dependencies are calculated by tracing uniformly seeded pathlines in both forward and backward directions in a preprocessing stage. The effectiveness of our approach is demonstrated through a parallel particle tracing framework with high-order data prefetching. Results show that our method achieves higher data locality and hence improves the efficiency of pathline computation.
Scientific visualization, which is an interdisciplinary field of research and application, aims at helping domain scientists analyze and understand the data. Visualization systems play the central role in the delivery process. However, there are several problems in the current production visualization systems, including the intuitiveness of the user interface, complexity of the parameter configuration, and the difficulty of developing new visualization methods. In this work, we design and implement a new visualization system. It not only supports the common data types and formats, but also includes several cutting-edge research works. The distributed software architecture also makes it flexible to create extensions. The system can be widely applied in various domains, including climate and environment sciences, geology, biology and medical sciences.
Seismic visualization plays an indispensable role in exploring oil and gas. The formation and distribution of gas resource correlate closely with structural geological-composition such as underground flow path, which may influence the position of major subsurface folds and faults. The seismic volume data are collected by sending sound waves into the earth, recording and processing the reflection echoes. The seismic volume exploration is challenging due to its natural properties, composition-intensive, discontinuous and noisy. The data may generally consist of a variety of different sedimentary deposits. Therefore, we can neither use transfer function design methods nor employ the existing intelligent volume cut algorithms to extract the structural geological-composition from the volume. In this work, we propose an interactive approach to visualize the 3D structural geological-composition by a 2D slice analyzer guided by multi-scale transfer function sensitivity. The contributions in this paper are twofold. First, we carefully design a local transfer function guided by visualized sensitivity. In particular, we use a sensitivity-aware lightweight transfer function to guide the user find the cut-off values with some predicted cues. Second, we propose some GPU-based volume cuts methods based on algebraic set operators to extract the target 3D seismic structure. Specially, we use an automatical volume cut method (named convexhull cut) derived from algebraic union operator on singlestep cut. and then utilize an interactive volume cut approach derived from algebraic intersection operator on single-step cut.