Teaching

The following are the courses I teach and their recent offerings.

  • CS 122A: Introduction to Data Management. [S16]
This course teaches introductory concepts related to data management, including information modeling, entity-relationship (ER) diagrams, relational model, ER-to-relation translation, SQL, functional dependencies, 1NF, 2NF, 3NF, BCNF, normalization theory, views, and access control. It includes various hands-on assignments to design a database and write SQL queries. I didn’t teach it recently due to other teaching commitments.
This course teaches students how to use database systems to build a user-facing application. Topics include database connectivity, JDBC, Web servers (Tomcat), Java servlets, frontend techniques (HTML, CSS, Javascript), user management, stored procedure, user-defined functions (UDF), web security (HTTPS and ReCaptcha), mobile programming (Android), cloud services (AWS and GCP), load balancing, database replication, performance tuning, and performance testing (Jmeter).  Students do five hands-on projects to build a full-stack web-based application. I prefer to teach it once a year.
This course teaches “under the hood” concepts inside a database system, including disk storage, record format, page structures, heap files, sorted files, buffer manager, catalog manager, indexing (clustered/unclustered), hash tables, tree indexes (ISAM and B+ tree), relational operators, query processing, query optimization (System-R and Volcano), and query parsing. Students implement a database system by doing four hands-on projects in C++. 
  • CS 220P: Databases and Data Management . [F21]
This course is offered for MCS students and MDS students. It’s an extension of CS 122A, with additional materials about basic database principles (some of them covered in CS 222P).
This course is offered for MCS students and MDS students.  It has similar content as CS 222. Some students found the content challenging but rewarding, especially for their job search.
  • CS 221: Information Retrieval. [S19]
In 2019 I decided to teach it by utilizing my research knowledge and SRCH2 startup experience related to search as I led a team to develop a search engine in C++ from scratch. The course teaches advanced topics about information retrieval and search, including text processing, tokenization, stop words, stemming, phrases, character encoding, inverted index, log-structured merge tree (LSM), positional indexing, ranking, vector space model, web search, enterprise search, and my own startup story. I taught it only once so far due to other teaching commitments. Hope to have a chance to teach it again in the future.
  • CS 223: Database Transactions and Distributed Systems.
I taught it quite a long time ago, and didn’t have a chance to teach it recently.
  • STATS 170AB: Project in Data Science. [S21], [W21], [S20], [W20].
I co-taught the courses with Prof. Vladimir Minin for Statistics students. This two-course sequence is intended to be the “grand finale” for Data Science majors. Its goal is to tie together many of the topics that are independently covered in the first 3+ years of Data Science requirements and electives; it also aims to fill in some of the potential gaps required to solve an end-to-end problem. In addition to a brief review of some of the required skills, the course will cover problem definition and analysis, data representation, algorithm selection, solution validation, and results presentation. Students will do data science projects, while the lecture periods will cover analysis alternatives, project planning, and data analysis issues.  
  • DATA 296P/297P: Capstone Professional Writing and Communication for Data Science Careers: [F23], [F22]
I co-taught them with Prof. Annie Qu and Prof. Babak Shahbaba for MDS students.  In the class the students did capstone projects related to data science and learned communication skills.
 
A course I may teach in the future is CS 122D “Beyond SQL Data Management”, regularly taught by my colleague, Prof. Mike Carey. It must be fun to teach!