Research

My research interests are in the fields of data management, including data-intensive computing,  databases, text processing, and large-scale analytics and visualization. My PhD thesis at Stanford was on data integration, with an emphasis on both theoretical and practical aspects. My recent research, especially after spending a few quarters at Google and a few years doing a startup as its founder and CTO, has a strong preference on engineering and open source system building.  I believe “Computer Science” is a “Science” to support great engineering, and we need to build systems to stay relevant in this fast-paced IT era. My recent research projects are closely related to social media data analytics due to its increasing importance in many disciplines.

Current Projects

  • Apache AsterixDB: A scalable, open source Big Data Management System (BDMS).
  • Cloudberry: Supports interactive analytics and visualization on big data sets (e.g., sub-second queries on billions of records).
  • Texera: Supports cloud-based declarative text analytics by allowing users to formulate workflows using a Web service.

The following figure illustrates one scenario where these projects are integrated to support management of social media data. With other techniques on machine learning, we can complete the lifecycle of data analytics. Each system is independent and general purpose.

Research overview

Past Projects

Released Prototypes and Source Code Packages