| Short description | Machine learning techniques to mine the Web and other unstructured/semistructured, hypertextual, distributed information repositories. Crawling, indexing, ranking and filtering algorithms using text and link analysis. Applications to search, classification, tracking, monitoring, and Web intelligence. Group project on one of the topics covered in class. |
|---|---|
| Prerequisites | This course is open to CS, Informatics, SLIS, and other graduate students with an interest in information systems, artificial intelligence, and the Web. Although prior exposure to machine learning algorithms, information retrieval, and/or Web programming is helpful, there are no advanced AI or DB prerequisites. |
| Textbook | S. Chakrabarti (Morgan-Kaufmann, 2002) Mining the Web. Note the second edition of this book is in the making. Another excellent reference is Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data by Bing Liu (with a chapter on crawling by yours truly, slides here), Springer, 2007 |
| Lecture | TR 11:15A-12:30P in room Eigenmann 921 (maps: Google IUB) |
| Instructor | Filippo Menczer (Office hours by appointment in Eig 909; please schedule in class) |
| AI | Heather Roinestad (Office hours by appointment in Eig 914; please schedule in class) |
| Contact | Use the group discussions and pages for all class-related questions and communications, unless privacy is necessary. |