Top

2022

  • (11/22) PhD student Avinash Kumar successfully defended his thesis titled "Towards Interactive, Adaptive and Result-aware Big Data Analytics". Congratulations, Dr. Kumar!
  • (11/22) Elevated to IEEE Fellow, effective January 1, 2023.
  • (11/22) Check our latest blog on how we built a real-time collaborative workflow editor in Texera.
  • (9/22) Paper titled “Fries: Fast and Consistent Runtime Reconfiguration in Dataflow Systems with Transactional Guarantees” by Zuozhi Wang, Shengquan Ni, Avinash Kumar, Chen Li accepted by VLDB 2023.
  • (9/22) Our team showed a demo at the VLDB conference to illustrate how to support collaborative data analytics in Texera, including shared editing and shared execution.
  • (9/22) Look forward to serving on the Startups Panel at VLDB 2022 in Sydney.
  • (9/22) Check our first Texera blog on how to use the service to do debuggable web crawling.
  • (8/22) Together with colleagues from UCI and UCLA, we received an NSF PIPP Phase I grant.
  • (8/22) Paper titled “GSViz: Progressive Visualization of Geospatial Influences in Social Networks” with Sadeem, Qiushi Bai, and Shuang accepted by SIGSPATIAL 2022.
  • (6/22) Thrilled to know our former student, Alex Behm, was the leading author and a main contributor of the Databricks Photon paper, which received the 2022 SIGMOD Best Industrial Paper Award! He was working on the Flamingo and AsterixDB projects at UCI. Great to see our students are making a big impact in the field of big data systems.
  • (6/22) Paper titled “Maliva: Using Machine Learning to Rewrite Visualization Queries Under Time Constraints” accepted by EDBT 2023.
  • (6/22) Two demo papers titled “Demonstration of Collaborative and Interactive Workflow-Based Data Analytics in Texera” and “Demonstration of Accelerating Machine Learning Inference Queries with Correlative Proxy Models” accepted by VLDB 2022.
  • (4/22) Paper titled "Optimizing Machine Learning Inference Queries with Correlative Proxy Models" with colleagues accepted by VLDB 2022.
  • (2/22) Will organize ICDE 2023 in Anaheim CA as the General Chair.
  • 2021

  • (12/21) Paper titled "JEDI: These aren’t the JSON documents you’re looking for… " with colleagues accepted by SIGMOD 2022.
  • (9/21) Teaching CS122B and CS220P this quarter.
  • (9/21) Welcome new PhD student Xiaozhen Liu to join our team!
  • (9/21) A collaborative paper titled "The Social Amplification and Attenuation of COVID-19 Risk Perception Shaping Mask Wearing Behavior: A Longitudinal Twitter Analysis" accepted by PLOS ONE.
  • (7/21) Together with Prof. Suellen Hopfer (Public Health, UCI) and Prof. Wei Wang (CS, UCLA), received an NSF IIS-2107150 award titled "Collaborative Research: III: Medium: Collaborative Machine-Learning-Centric Data Analytics at Scale" related to our Texera project.
  • (2/21) Co-chairing the VLDB 2021 Industrial Track.
  • (2/21) A collaborative paper with Informatics colleagues titled "Why Do People Oppose Mask Wearing? A Comprehensive Analysis of US Tweets During the COVID-19 Pandemic" accepted by JAMIA .
  • (1/21) Co-teaching Stats170A titled "Project in Data Science".
  • 2020

  • (10/20) Together with Prof. David Timberlake of Public Health, received a grant from TRDRP on social media analysis on tobacco.
  • (10/20) Gave a keynote talk titled Collaborative Interdisciplinary ML-Centric Data Analytics at Scale at NDBC 2020.
  • (9/20) Check our Amber video and Texera demo video at VLDB 2020.
  • (7/20) Paper titled "Tempura: A General Cost-Based Optimizer Framework for Incremental Data Processing" accepted by VLDB 2021.
  • (7/20) Became the Faculty Director of the ICS Master of Computer Science Program.
  • (6/20) Received an NSF RAPID grant with Profs. Gloria Mark and Suellen Hopefer on Covid-19 analysis using social media.
  • (6/20) Our paper titled "Demonstration of Interactive Runtime Debugging of Distributed Dataflows in Texera" has been accepted by VLDB 2020.
  • (4/20) Teaching CS122B ("Projects in Databases and Web Applications") and STATS ("Project in Data Science") this quarter.
  • (3/20) Check the CoronavirusTwitterMap our team is developing to visualize coronavirus-related tweets.
  • (3/20) Our paper titled "Marviq: Quality-Aware Geospatial Visualization of Range-Selection Queries Using Materialization" has been accepted by ACM SIGMOD 2020.
  • (2/20) Our demo paper titled "Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental Computing" has been accepted by ACM SIGMOD 2020.
  • (2/20) Our paper titled "Robust and efficient memory management in Apache AsterixDB" has been accepted by Software: Practice and Experience (PDF).
  • (1/20) Start serving as the treasurer and a board member of the VLDB Endowment.
  • 2019

  • (12/19) Our paper titled "Amber: A Debuggable Dataflow System Based on the Actor Model" has been accepted by VLDB 2020(PDF).
  • (9/19) Our People search service (PSearch) has been integrated into the official UCI Directory Search.
  • (9/19) Received an NSF grant for AsterixDB with Professor Mike Carey.
  • (7/19) Gave talks at TU Berlin (Germany), Alibaba (Seattle), and Fudan University (China).
  • (7/19) Gave a tutorial on visualization of big spatial data at the VLDB Summer School, Beijing, China.
  • (5/19) Glad to receive a UCI Chancellor’s Award for Excellence in Fostering Undergraduate Research.
  • (4/19) Undergraduates who participated in our research did very well in their graduate school applications. Here are the results so far.
  • (3/19) Teaching CS221: Information Retrieval this quarter. It’s my first time to offer it, which will be fun!
  • (1/19) Teaching CS122B this quarter.
  • 2018

  • (11/18) Honored to become an ACM Distinguished Member.
  • (10/18) Attended the NDBC conference in Dalian China and gave a keynote talk.
  • (9/18) Will serve as the Liaison for Industrial Sponsors and Exhibitors of VLDB 2019.
  • (9/18) Teaching CS222 and CS222P this quarter.
  • (9/18) Taewoo Kim has successfully defended his PhD thesis. Congratulations, Dr. Kim!
  • (9/18) Visited various universities and companies in China.
  • (7/18) Attended the Microsoft faculty summit 2018 with a “systems” theme.
  • (5/18) The SoCal Social Analytics Workshop was a success.  Check the web site for the talk slides and videos, and an ICS School article about this event.
  • (3/18) I am organizing a SoCal Social Analytics Workshop in May 11 at UC Irvine.
  • (1/18) This quarter I am teaching CS122B with a few ideas to further improve this course.
  • 2017

  • (12/17) Our paper titled “Supporting Similarity Queries in Apache AsterixDB” has been accepted by EDBT. AsterixDB is the first open-source DBMS system with full support of various similarity operations (edit distance and Jaccard, selection and join).
  • (12/17) I am looking for 1-2 new systems-oriented PhD students working on AsterixDB and Cloudberry. Previous system-building experiences are a big plus!
  • (11/17) Jianfeng has successfully defended his PhD thesis.  Congratulations, Dr. Jia!
  • (10/17) Paper titled “Drum: A Rhythmic Approach to Interactive Analytics on Large Data” with Jianfeng Jia and Mike Carey accepted by the IEEE Big Data 2017 conference.
  • (10/17) Big Viz of Big Data: Thanks to our colleagues at ARL, we release this nice picture of running Cloudberry/AsterixDB on a large display of 24 monitors.  Call it “1 billion tweets on 48 million pixels!” 🙂
  • (9/17) Paper titled “Caching Geospatial Objects in Web Browsers” with students accepted by ACM SIGSPATIAL 2017 (demo track)
  • (9/17) Paper titled “Visual Analytics Ecology for Complex System Testing” with students and ARL colleagues accepted by Visualization in Practice 2017 at IEEE VIS 2017.
  • (9/17) Together with Profs. Mike Carey and Vassilis Tsotras (UCR), we received a gift grant from Google for our AsterixDB work.  Thank you, Google!
  • (9/14/17) Our TwitterMap (powered byCloudberry and AsterixDB) has more than 1 billion tweets now (~2TB)!
  • (9/7/17) Visited MSR in Redmond and happy to see the great colleagues again!
  • (9/17) Congratulations to Jianfeng for receiving a Google Graduate Student Award in ICS!
  • (9/17) Received an NSF EAGER grant for the Texera project.
  • (8/17) Received an NIH subcontract (though UCLA) on using AsterixDB and Cloudberry for HIV studies with social media data.
  • (8/17) We renamed TextDB to Texera to better reflect its value proposition, since it’s not a DB.
  • (8/17) Visited the Army Research Lab and gave a talk about Cloudberry and AsterixDB. Excited to see the TwitterMap on a huge display!
  • (8/2/17) Check the video to show our latest TextDB user interface.
  • (7/2/17) Visited a few companies in the bay area, including Google, Huawei, and Teradata.
  • (6/17) Summer: working with our team on the research projects!
  • (4/26/17) Our TextDB demo at ICDE 2017 won the Best Demo award 🙂
  • (4/2017) This quarter I am again teaching CS122B titled “Projects in Databases and Web Applications,” and planning to make some changes, e.g., adding Google Cloud Platform.
  • (4/2017) I also teach CS290 titled “Text Analytics in the Big Data Era,” in which I work with a team of graduate students to conduct research in the context of the TextDB project.
  • (1/2017) Glad to announce that the recent Couchbase Analytics extension is based on the Apache AsterixDB codebase 🙂
  • (1/2017) This quarter I am teaching CS122B titled “Projects in Databases and Web Applications.”
  • 2016

  • (11/2016) I am looking for 2-3 PhD students to work on my current projects. If you are interested in large-scale data management (in particular, analytics and visualization), text analytics, and open source system building, feel free to contact me.
  • (11/2016) At the ACM GIS conference in San Francisco, Jianfeng will show our Cloudberry system to support interactive analytics and visualization on one billion tweets. Here’s the paper.
  • (9/2016) I gave a talk about Cloudberry at APWeb, Suzhou, China.
  • (9/2016) This quarter I am teaching CS222/CS122C titled “Principles of Data Management.”
  • (8/18/2016) Our PhD student, Young-Seok Kim, co-advised by Prof. Mike Carey, has successfully defended his PhD thesis. He will join Samsung in Korea.  Congratulations!
  • (8/16/2016) I attended the MHSRS symposium in Florida and presented a poster about Cloudberry.
  • (8/15/2016) Jianfeng made a great video about  Cloudberry.
  • (7-8/2016) Summer talks:
  • (6/2016) Summer plan: I am working with a group of students on the following projects: (1) Improve AsterixDB; (2) Use AsterixDB to develop Cloudberry to do large-scale data analytics and visualization; and (3) Continue developing TextDB to do scalable and declarative information extraction. We also study how to use these techniques to solve Zika-related problems.
  • (6/2016) Together with Prof. Aditi Majumder,  we received a grant from ARL to study how to use analytics and visualization on large data sets using AsterixDB.
  • (6/2016) Our Apache AsterixDB project has officially graduated from its incubator.
  • (5/15/2016) Our student team used Cloudberry to build a system to win an award at the UCI Data Science Hackathon. Congratulations!
  • (5/2016) We have an active project called “Cloudberry” on exploring and visualizing large amounts of spatio-temporal data (e.g., social media information) using AsterixDB.
  • (3/28/2016) This quarter I am teaching CS122A/EECS116, Spring 2016: Introduction to Data Management. It has been a while since I taught it long time ago 🙂
  • (3/28/2016) I am also teaching a graduate course CS290 on text-centric data management.  For the first time I work with a group of students to build an open-source data system using github. The vision is to study how to store, index, and query text information efficiently and declaratively.
  • (1/2016) Teaching CS122B this quarter. Finally I got time to make significant changes to the course materials, including using AWS, adding new topics such as Jmeter, database replication, load balancing, and securing HTTP. It’s going to be fun!
  • 2015

  • (9/2015) Teaching CS222 this quarter.
  • (9/2015) We had a great VLDB 2015 conference in Hawaii!
  • (1/2015) I came back to UCI after a 1.5-year leave at SRCH2.
  • 2014

  • (2/2014) Together with Prof. Volker Markl, I will be a Program Co-Chair (PVLDB Editor-in-Chief) for VLDB 2015, which will be in Hawaii 🙂
  • 2013

  • (7/2013) Starting from July 2013, I am taking a leave of absence from UCI to work at my startup, SRCH2.
  • From August 2012 to June 2013, I was the Vice Chair of Department of Computer Science.
  • (6/10/2013) We are very excited to release our AsterixDB Beta! Here are some pictures at our celebration lunch in Laguna Beach.
  • (4/2013) Attending DASFAA 2013 in Wuhan, China. Sharad and I gave a talk for our 10-year best paper award. Here’s a picture at the ceremony. Here are our slides: [Chen’s PPT], [Sharad’s PPT].
  • (4/2013) Two papers collaborated with my Chinese colleagues were accepted by SIGMOD 2013, one titled “String Similarity Measures and Joins with Synonyms” with Jiaheng Lu and Chunbin Lin at Renming University, and one titled “Improving Regular-Expression Matching on Strings Using Negative Factors” with Xiaochun Yang et al at Northeastern University.
  • (4/6/2013) We are very excited to release our AsterixDB alpha! Here are some pictures at our celebrarion dinner. Stay tuned for the beta release, which is coming soon!
  • (3/2013) Prof. Xiaohui and I received an NIH grant of $662K on assembling complete individual genomes. I am working with a team to do efficient genome assembly using parallel computing in our ASTERIX project.
  • (2/2013) Our PhD student, Alex Behm, has graduated and will join Cloudera. See the pictures taken at his party.
  • (1/13/2013) Our DASFAA 2003 paper titled “Efficient Record Linkage in Large Data Sets” received the 10-year Best Paper Award for DASFAA 2013. It was my first paper in the area of data cleaning and approximiate string search in the context of the Flamingo project.
  • (1/7/2013) This quarter I am teaching CS122B titled “Projects in Databases and Web Applications.”
  • 2012

  • (11/6/2012) On the Election Day I gave an invited talk about Election and ASTERIX at the ACM GIS BigSpatial workshop in Redondo Beach, CA.
  • (11/5/2012) I was invited to write an article titled “Entrepreneurship in Data Management Research” at the ACM SIGMOD Blog.
  • (11/1/2012) “Full professor-ed” 🙂
  • (9/27/2012) This quarter I am teaching CS222/CS122C titled “Principles of Data Management.” For the first time it’s co-listed as a undergraduate course CS122C since we want to encourage undergraduate students to get familiar with “what’s inside a DBMS system” earlier.
  • (9/2012) I visited several universties and companies in China to talk about our research on powerful search and ASTERIX.
  • (8/2012) I gave a talk titled “Search as You Type: From Research to Commercialization” at the DBRank 2012 workshop at VLDB in Istanbul, Turkey.
  • (8/2012) I gave a talk titled “Supporting Efficient Top-k Queries in Type-Ahead Search” at SIGIR.
  • (5/2012) Our paper titled “Supporting Efficient Top-k Queries in Type-Ahead Search” with Tsinghua colleagues (Guoliang Li, Jiannan Wang, and Jianhua Feng) got accepted by SIGIR. It is amazing to see how reviewers from different communities (Databases and Information Retrieval) have so different tastes 🙂
  • (5/2012) Our paper titled “Executing SQL over encrypted data in the database-service-provider model” received ACM SIGMOD 2012 Test-of-Time Award. The paper, published 10 years ago, envisioned the “Database as a service” model.
  • (4/2012) This quarter I am again teaching CS122B titled “Projects in Databases and Web Applications”. I am also organizing the CS Seminar Series.
  • (3/2012) I gave a talk at University of Toronto titled Improving Search for Emerging Applications.
  • (3/2012) We recently released a paper titled Analysis of Instant Search Query Logs. It is based on our study to analyze the log of our instant, fuzzy search system called PSearch. We compared it with a traditional search system and showed the benefits of the new search paradigm. Some user behavior patterns are very interesting.
  • (2/2012) I am glad to receive the 2012 ICS Dean’s Award for Graduate Student Mentoring.
  • (1/2012) We released an improved version of the source code of the Hobbes project.
  • 2011

  • (12/2011) Our paper titled Hobbes: optimized gram-based methods for efficient read alignment was published by Nucleic Acids Research.
  • (9/2011) This quarter I am teaching CS122B titled “Projects in Databases and Web Applications”. I am also organizing the CS Seminar Series.
  • (9/2011) Check OmniPlaces.com, a location-based search engine to demonstrate the technology of Bimaple. It also has an iPhone App.
  • (9/2011) Check a cool system built by our students, Sattam Alsubaiee and Zachary Heilbron, to support spatial aggregation on Twitter data using ASTERIX.
  • (8/26/2011) Our PhD student, Rares Vernica, co-advised by Prof. Mike Carey, has successfully graduated and will join HP Labs. Here’s a picture of our celebration. We will surely miss Rares!
  • (8/2011) Our MS student, Nagesh Honnalli, has successfully graduated and will join Amazon. Here’s a picture of our celebration.
  • (7/2011) Check my blog on instant search.
  • (7/8/2011) We are glad to release the first software to support instant fuzzy search on large data sets.
  • (6/24/2011) Check the video clip on the Bimaple homepage to show location-based instant, fuzzy search on iPhone and a live demo on more than 17 million records.
  • (6/17/2011) I advised a group of students to participate in the Microsoft Speller Challenge and won the third place. Congratulations to the team! Here is our qSpeller project page for the Microsoft Speller Challenge.
  • (5/18/2011) Bimaple released a prototype to do location-based instant, fuzzy search. To our best knwoledge, it is the first system that can do this type of search in a unified framework.
  • (5/2011) We (my Tsinghua colleagues and I) released our CHIME demo to support error-tolerant Chinese input. It’s based on our coming IJCAI 2011 paper.
  • (4/22/2011) I gave an invited talk titled “The Flamingo Software Package on Approximate String Queries” at the DQIS 2011 workshop in Hong Kong. Here is the Powerpoint file.
  • (4/2011) Our paper titled “ASTERIX: Towards a Scalable, Semistructured Data Platform for Evolving-World Models” by the ASTERIX project has been accepted for publication inDistributed and Parallel Databases.
  • (4/2011) Our paper titled “An Efficient Error-Tolerant Chinese Pinyin Input Method” with Tsinghua collaborators (Yabin Zheng and Maosong Sun) has been accepted for publication in IJCAI 2011. It’s my first paper in this conference 🙂
  • (4/2011) Our paper titled “Location-Based Instant Search” with my graduated student Shengyue Ji has been accepted by the SSDBM conference.
  • (4/2011) I am glad to launch the Hobbes project on genome sequence mapping.
  • (3/26/2011) This quarter I am again teaching CS122B: Projects in Databases and Web Applications.
  • (2/2011) My PhD student, Shengyue Ji, has just graduated and joined the “Don’t be evil” company.
  • (2/2011) Check a new system prototype Bimaple built to support instant, error-tolerant search on Stack Overflow messages.
  • (1/2/2011) This quarter I am teaching CS122B: Projects in Databases and Web Applications.
  • (1/2/2011) The company I am starting, Bimaple, is hiring: http://www.bimaple.com/jobs.html.
  • 2010

  • (12/5/2010) On the weekend of Dec. 4-5, I attended the Random Hacks of Kindness (RhoK) in Chicago. Together with three other people on a team and my UCI students, Manik Sikka, Vijay Rajakumar, and Inci Centindl, we did a project of supporting full-text search on the Person Finder project on the Google App Engine platform. Our project won the third-best-project prize.
  • (11/2010) Check my new “photo” above. Thanks to Heri Ramampiaro for taking the nice picture 🙂
  • (10/2010) Our paper titled “Answering Approximate String Queries on Large Data Sets Using External Memory” with Alexander Behm and Michael Carey has been accepted by ICDE 2011.
  • (10/23/2010) We are glad to release the Flamingo Package Version 4.0.
  • (10/2010) My student, Shengyue Ji, received a Yahoo! Best Dissertation Student Award.
  • (9/2010) My student, Alex Behm, received an ARCS scholar award.
  • (9/2010) Together with Professor Xiaohui Xie, I am receving an NIH grant to support our research on the iPubMed system.
  • (9/2010) I am teaching CS222: Principles of Data Management this quarter.
  • (8/2010) On August 14, 2010, I gave a talk about scalable interactive search at the NFIC conference. Here is my talk slides.
  • (6/2010) On June 29, I gave a talk about set-similarity joins using Hadoop at the Yahoo Hadoop Summit. Here is my talk file.
  • (5/2010) Together with Prof. Xiaohui Xie, we received an Intel grant to study compression of personal human genome data. See the ICS news for details. This is a collaboration with our colleagues, Bin Wang and Xiaochun Yang, at the Northeastern University in China.
  • (4/2010) Ray wins a Yahoo! Key Scientific Challenge award: Here is the Yahoo announcement and ICS news.
  • (4/2010) DASFAA excellent demo: Our demo won a DASFAA excellent demo award.
  • (3/2010) Source-code/Demo Releases: My research team released the flamingo package version 3.0, source code of fuzzy joins using MapReduce, and demos of supporting fuzzy keyword search on spatial data (such as maps).
  • (3/2010) Teaching: CS223 – Transaction Processing and Distributed Data Management
  • (3/2010) New NSF Grant: We are glad to receive an NSF award 1030002 to support research on powerful keyword search with efficient indexing structures and algorithms in a cloud-computing environment, especially in the domain of family reunification in disasters such as the Haiti Earthquake.
  • (2/28/2010) Chile Earthquake Family Reunification: My team is working on family reunification in the Chile Earthquake. Here is the project home page.
  • (2/28/2010) ICDE 2010: Busy with local arrangements at ICDE 2010 in Long Beach.
  • (2/2010) Media article on our Haiti Project: On Feb. 8, the UCI homepage published an article to report our Haiti Family Reunification Project
  • (2/2010) SIGMOD 2010 paper: Our paper titled “Efficient Parallel Set-Similarity Joins Using MapReduce” with Rares Vernica and Mike Carey has been accepted by ACM SIGMOD 2010. The paper studies how to do set-similarity joins (such as record linkage) on large amounts of data using MapReduce.
  • (1/2010) Haiti Earthquake Family Reunification: My team is working on getting data about missing people in the Haiti Earthquake and doing powerful search on it. Here is theproject home page.
  • (1/2010) Teaching: This quarter I am again teaching CS122B: Projects in Database Management.
  • 2009

  • (11/2009) iPubMed: Check out our new iPubMed system co-developed by my team and Tsinghua University to support type-ahead, fuzzy search on more than 18 million MEDLINErecords.
  • (9/2009) Life after Sabbatical: I am teaching two courses this quarter: CS122B: Projects in Database Management, and CS295: Database Management and Information Retrieval .
  • (9/2009) VLDB 2009 Tutorial: Marios Hadjieleftheriou and I gave a tutorial at VLDB 2009 on approximate string matching. Here are the slides: [Part I], [Part II]. Here are the slides of our ICDE09 tutorial: [Part I], [Part II].
  • (9/2009) NSF Funding for ASTERIX: The multi-UC-campus project ASTERIX led by Prof. Mike Caey and me has been funded at $2.7M for three years from the NSF Data Intensive Computing program. The project, based at UCI, also includes UCSD and UCR participants. UCI’s share is $1.8M.
  • (6/2009) Summer: I will be visiting colleagues at Tsinghua University, China in the summer. I will also work with several colleagues in China during the visit.
  • (5/2009) PSearch News: Read this NACS news article about our PSearch prototype.
  • (5/2009) Our research needs a student: We are looking for an undergraduate or MS student for a research project. The details are here.
  • (4/2009) Students’ award: I am proud that two of our ISG students, Shengyue Ji and Mingya Gao, together with Wen Pu from UIUC, have been selected as one of the five finalist teams for the SIGMOD 2009 programming contest (Main Memory Transactional Index).
  • (4/2009) Dean’s Award for Mid-Career Research: I am glad to receive the ICS Dean’s Award for Mid-Career Research.
  • (3/27/2009) Pictures of my home where I grew up : I had a trip to my hometown in Jinan, Shandong, China. I took several pictures of the home where I grew up as a child.
  • (3/2009) Startup: I have officially started a company BiMaple to support a novel, powerful way to do search.
  • (3/2009) Launching new project: I am glad to officially launch TASTIER: a joint research project with Tsinghua University on efficient auto-complete and type-ahead search on large data sets. .
  • (3/2009) New SIGMOD 2009 paper: Our paper titled “Type-Ahead Search on Relational Data: a TASTIER Approach” by Guoliang Li, Shengyue Ji, Chen Li, and Jianhua Feng has been accepted by the SIGMOD 2009 conference.
  • (2/2009) New NSF award: We are glad to receive an NSF award IIS-0844574 from the NSF CluE program to support our research on large-scale data cleaning using MapReduce/Hadoop environments. In addition to receiving the NSF support, we will also use software and services on a Google-IBM cluster to explore innovative research ideas in data-intensive computing.
  • (1/2009) New WWW2009 paper: Our paper titled “Efficient Interactive Fuzzy Keyword Search” by Shengyue Ji, Guoliang Li, Chen Li, and Jianhua Feng has been accepted by theWWW 2009 conference.
  • 2008

  • (11/2008) Launch of our new ISG group home page: Check out this new page of our Information Systems Group (ISG)!
  • (11/2008) First paper on bioinformatics: My first paper on bioinformatics titled “Human genomes as email attachments” has been published on the journal Bioinformatics. We used novel techniques to compress a human genome from 3.2GB to 4.1MB. From the date we submitted the paper (Oct. 7, 2008) to the date it was published online (Nov. 7, 2008), it took just one month! The PDF is available at here. It was once the No. 1 most-frequently read article in the Journal of Bioinformatics in January and February of 2009 according to the following link (as of March 2009).
  • (10/2008) Flamingo Release 2.0: we are glad to release version 2.0 of the package to sup\ port fuzzy string search. Version 2.0.1 (released on Nov. 7, 2008) fixed compatibility issues for GCC 4.3.2.
  • (9/2008) New funding award from China: Together with Prof. Xiaochun Yang from Northeastern University of China, I received a funding award from the “Research Funds for Oversea Scholars” program of the National Natural Science Foundation of China. It will support our research on fuzzy search on text documents.
  • (9/2008) Sabbatical: I am on sabbatical this year. I will be mainly at UCI.
  • (9/2008) New PhD students: Two new PhD students, Minh Doan and Sattam Mubark Alsubaiee, have joined our research team.
  • (9/2008) New ICDE2009 Publications: We have two full research papers accepted by ICDE 2009: “Space-Constrained Gram-Based Indexing for Efficient Approximate String Search,” by Alexander Behm, Shengyue Ji, Chen Li, and Jiaheng Lu; “Best-Effort Top-k Query Processing Under Budgetary Constraints,” by Michal Shmueli-Scheuer, Chen Li, Yosi Mass, Haggai Roitman, Ralf Schenkel, and Gerhard Weikum. In addition, I will be presenting a tutorial titled “Efficient Approximate Search on String Collections” with Marios Hadjieleftheriou (from AT&T Labs–Research).
  • (8/2008) Mike Carey joined us! We are extremely happy that Prof. Mike Carey has joined our department.
  • (7/3/2008) Launching Search@ICS: I am glad to our research prototype has been launched on the ICS Homepage that can support interactive, fuzzy search for ICS people and general pages at ICS.UCI.EDU.
  • (4/1/2008) Launching PSearch: I am glad to release the PSearch Prototype to support interactive, fuzzy search for UCI Directory.
  • (3/31/2008) This quarter I am teaching CS122B and CS224.
  • (2/22/2008) New SIGMOD08 paper: The conference has accepted our paper titled “Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently”, a joint work with Bin Wang and Xiaochun Yang when they visited our place last fall. The paper solves several open, important problems not addressed in our VLDB07 VGRAM paper.
  • (2/1/2008) New Visitor: I am glad that Guoliang Li from Tsinghua University is visiting my research team for about four months.
  • 2007

  • (12/12/2007) Today I attended a local computer industry forum about the computer cluster workforce in Orange County. There is an excellent survey on the needs of computer cluster workforce in the county. One interesting finding is that the county is facing the challenge of not being able to find enough workers in the IT industry. The survey also gives us some thoughts on how we design our education curriculum to meet the need of the industry.
  • (12/2007) I am looking for a motivated BS/MS student for an independent research project. Requirements: strong java programming skills. Please contact me if you are interested.
  • (10/2007) New paper on approximate string matching: Our recent paper titled “Efficient Merging and Filtering Algorithms for Approximate String Searches” by Chen Li, Jiaheng Lu, and Yiming Lu will appear in ICDE 2008. We developed new algorithms and indexing structures that can significantly improve the performance of approximate string search.
  • (10/2007) New NSF Grant: We received an NSF grant of $95K for our proposal titled “SGER: Answering Approximate String Queries Using Variable-Length Grams.”
  • (8/2007) Visitors: Bin Wang and Xiaochun Yang are visiting our team again this summer. We will continue working on topics related to approximate query answering.
  • (8/2007) New PhD student: I am glad that Alex Behm has joined our research team as a new PhD student.
  • (6/2007) Summer: My students, Ray and Yiming, will be doing summer internships at Microsoft Research and IBM T.J. Watson, respectively. I will be traveling early summer in China, attending conferences and visting schools and companies. After that, I will be working with my students, postdoc, and visitors at UCI. There are several very exciting ideas I would like to pursue.
  • (6/2007) Tenured.
  • (6/2007) VGRAM for VLDB07: Our paper titled “VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams” by Chen Li, Bin Wang, and Xiaochun Yang will appear in VLDB 2007. I am glad that the reviewers liked the work as much as we do.
  • (4/17/2007) Flamingo 1.0 Release: I am glad to release our Flamingo Package 1.0 on approximate string matching.
  • (4/17/2007) Release of Web-object-history data: I am glad to release our data set of the history of data objects collected from 6 web sites in 1.5 years.
  • (4/2007) SIGMOD07 Undergraduate Scholarship Program: I am chairing this program. Click here for more information.
  • (4/2007) Teaching: This quarter I am teaching CS223 (formerly ICS214B) – Transaction Processing and Distributed Data Management.
  • (1/2007) Teaching: This quarter I am teaching CS122B (formerly ICS185), Projects in Database Management.
  • 2006

  • (12/2006) Research Funds: I received an ICS Ted & Janice Smith Faculty Seed Fund and an ICS CORCLR research/travel fund.
  • (12/2006) NSF Proposals: My team and I submitted two proposals to the NSF IIS program. Both proposals are based on our observations on several critical problems the solutions of which are greatly needed by many real applications.
  • (9/2006) New Project on Family Reunification: Ray and I have started working on a new project called Family Reunification. It’s a data-integration project using real data from many Web sources. It’s part of the RESCUE project. More information will come soon.
  • (9/2006) Release of SEPIA 1.0: Ray has released SEPIA 1.0 on selectivity estimation of fuzzy string predicases based on our VLDB 2005 paper.
  • (9/2006) New Junior Specialist: We have a new junior specialist, Jiaheng Lu, who is joining our research team. He’s expecting his PhD from the National University of Singapore. He will be working on projects related to data integration.
  • (9/2006) Google Research Award: I received a Google Research Award in the amount of $37,500 renewable for a second year. It will be used to support my research on data cleaning, especially on approximate string searching. I am very thankful for their support, especially since this is the largest support I received from the industry.
  • (7/2006) Work on Data Exchange: Recently I finished a technical report with Foto Afrati and Vassia Pavlaki (at NTUA, Greece) titled “Data Exchange with Arithmetic Comparisons.” It is a work we have been working on for almost one year: all of us went to Stanford for one week, and Vassia visited UCI twice. It took us a lot of time to think about all the subtle issues that are not covered in the excellent paper on data exchange by Fagin et al. I am glad that finally we completed the work, and I really like it.
  • (6/2006) Summer: My student, Ray, is doing a summer internship at Yahoo!. My other students are working with me during the summer. I will have two visitors (Xiaochun Yang and Bin Wang). I will visit a few places (IBM, SRI, Yahoo, Google, possibly Toronto, and VLDB in Korea). Well, these will keep me busy enough, not to mention I have two sons to play with 🙂
  • (5/2006) New PhD Student: I am glad that a new student, Yiming Lu, is joining our PhD program soon. He graduated from Shanghai Jiaotong University with a BS and an MS, and has been working on data quality at Microsoft Research Asia.
  • (5/2006) Work on Query Relaxation: Our paper titled Relaxing Join and Selection Queries (joint work with Nick Koudas, Anthony Tung, and my student, Rares Vernica) will appear in VLDB 2006, Seoul, Korea. It is about how to relax empty-answer SQL queries in RDBMS in order to compute answers for users with a minimal relaxation. We use skyline as our relaxation framework, in which we need to consider join conditions as well. The work extends our previous work on supporting approximate query answering in applications such as data cleaning. See our two VLDB’2005 papers on similar topics.
  • (5/2006) CleanDB Workshop: I am currently organizing the CleanDB Workshop with Dongwon Lee. It will be colocated with VLDB2006 in Seoul, Korea.
  • (5/2006) New Release of StringMap: I spent some days cleaning the StringMap code that supports approximate string searches and joins. The new release is available at here.
  • (4/2006) $$ from M$R: In April 2006, I received an unrestricted gift fund from Microsoft Research. I want to thank them for their generous support. It’s very encouraging, and I wish to receive more support from the industry in the future.