My research interests are in the fields of data management, including data-intensive computing,  databases, text processing, and large-scale analytics and visualization. My PhD thesis at Stanford was on data integration, with an emphasis on both theoretical and practical aspects. My recent research, especially after spending a few quarters at Google and a few years doing a startup as its founder and CTO, has a strong preference on engineering and open source system building.  I believe “Computer Science” is a “Science” to support great engineering, and we need to build systems to stay relevant in this fast-paced IT era. My recent research projects are closely related to social media data analytics due to its increasing importance in many disciplines.

Current Projects

  • Apache AsterixDB: A scalable, open source Big Data Management System (BDMS).
  • Cloudberry: With intelligent middleware on top of an AsterixDB cluster, Cloudberry supports interactive analytics and visualization on big data sets (e.g., sub-second queries on billions of records).
  • TextDB: It is a text-centric data management system to support declarative and scalable text processing, in particular, information extraction.

Past Projects

Released Prototypes and Source Code Packages