| Short description | Machine learning techniques to mine the Web and other unstructured/semistructured, hypertextual, distributed information repositories. Crawling, indexing, ranking and filtering algorithms using text and link analysis. Applications to search, classification, tracking, monitoring, and Web intelligence. Group project on one of the topics covered in class. |
|---|---|
| Prerequisites | This course is open to CS, Informatics, SLIS, CogSci, and other graduate students with an interest in information systems, artificial intelligence, and the Web. Although prior exposure to machine learning algorithms, information retrieval, and/or Web programming is helpful, there are no advanced AI or DB prerequisites. |
| Textbook | Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data by Bing Liu (with a chapter on crawling by yours truly, slides here), Springer, 2007. Another excellent reference is Mining the Web by Soumen Chakrabarti, Morgan-Kaufmann, 2002, which we used in past offerings of this course. Note the second edition of this book is in the making. |
| Lecture | TR 11:15A-12:30P in room I 107 (map) |
| Instructor | Fil Menczer (Office hours by appointment in I2:300; please schedule in class) |
| AI | Jacob Ratkiewicz (Office hours by appointment in I2:310) |
| Contact | Use the group discussions and pages for all class-related questions and communications, unless privacy is necessary. |