SEEM 5680 Text Mining Models and Applications

Teacher

Prof. Lam Wai
wlam@se.cuhk.edu.hk
 

Annoucement

Instructions for Online Quiz

Course Schedule and Content (ongoing updates)

Jan 12,13 recorded lecture basic information retrieval (IR), text preprocessing
Jan 19,20 recorded lecture recorded lecture recorded lecture parametric search, term weighting, vector space model, evaluation of IR
Jan 26,27 Zoom: 292 425 107 recorded lecture recorded lecture evaluation of IR, probabilistic IR, text classification - Bayes
Feb 2,3 (holiday) Holiday
Feb 9 Zoom: 292 425 107 recorded lecture text classification - feature selection, kNN
Feb 10 Zoom: 292 425 107 (lecture: 15:30 / Quiz0: 16:30) recorded lecture text classification - linear classifier, SVM; Quiz0
Feb 16 Zoom: 5925617453 (by Haoran Yang) recorded lecture Lucene IR package
Feb 17 Zoom: 292 425 107 (lecture: 15:30 / Quiz1: 16:15) recorded lecture text classification - SVM; Quiz1
Feb 23 Zoom: 933 1752 8226 (by Jackie Ho) recorded lecture Demo on text classification
Feb 24 Zoom: 292 425 107 recorded lecture latent semantic analysis
Mar 2 Zoom: 292 425 107 recorded lecture link analysis
Mar 3 Zoom: 292 425 107 recorded lecture vector semantics, word embedding
Mar 9 Zoom: 292 425 107 recorded lecture word embedding
Mar 10 Zoom: 292 425 107 (Quiz2 15:30) Quiz2
Mar 16, 17 Zoom: 292 425 107 recorded lecture recorded lecture neural models and neural language models
Mar 23, 24 Zoom: 292 425 107 recorded lecture recorded lecture recurrent networks, Transformers
Mar 30, 31 Zoom: 292 425 107 recorded lecture recorded lecture Transformers, encoder-decoder
Apr 6 Zoom: 292 425 107 recorded lecture transfer learning with pretrained language models
Apr 7 Zoom: 921 0334 6933 (by Wenxuan Zhang) recorded lecture Demo Class on neural models
Apr 13 Zoom: 292 425 107 recorded lecture transfer learning with pretrained language models
Apr 14 Zoom: 292 425 107 Quiz3
Apr 20,21 (no class);
submit presentation video by the deadline mentioned below
preparation of project presentation video

 

Course Outline

  1. basic information retrieval (IR)
  2. vector space model
  3. evaluation in IR
  4. Naive Bayes text classification
  5. text classification - linear classifiers and SVM
  6. link analysis
  7. vector semantics
  8. word embeddings
  9. neural models
  10. recurrent networks, encoder-decoder
  11. neural attention
  12. neural IR

 

Reference Books

Lecture Notes

 

Assessment Scheme

 

Project

Project description is given here
The project presentation time of each student is less than 10 minutes. There is no live presentation. You should record your project presentation via Zoom with camera on showing your face, and save to the cloud. Then submit the link to the cloud to the Blackboard system before Apr 29 (Fri) 23:59. The deadline of submitting an electronic version of the project report via the Blackboard system is May 4 17:00.