The hidden structure within datasets can be extracted and used to improve various data processing applications. Utilizing such hidden structure allows one to exploit the characteristics of the input datasets that are not captured by data format, and potentially achieve better performance than traditional algorithms. In this talk, I will summarize a general framework for algorithms that use automatically extracted hidden structures to improve data processing performance, and demonstrate how we can use hidden structures to address data processing tasks via two examples: a) Tabular dataset compression; b) Log dataset structure extraction. I believe that this framework can offer new opportunities to design algorithms that surpass the current limit, and will have new applications in database research and many other data-centric disciplines.
Yihan Gao is a Computer Science PhD candidate at the University of Illinois at Urbana-Champaign. He works with Prof. Aditya Parameswaran in the Database Group. His research interest centers on database applications with additional expertise in data mining.