CSCI B656 Web Mining (3 CR)

Class Projects

Ideas | Proposal format | Project wiki | Sample of past projects | Evaluation form

Ideas for Future Projects


Project Proposal Format

The project proposal is free-format but cannot be longer than a page in length. Use your best judgement about margins and font size (fitting too much on a page would be a bad idea!). The proposal should be concise, concrete, focused and to the point. It should answer a few basic questions:
  1. Why? (Motivate your idea; is it interesting, important, relevant?)
  2. What? (Exactly what do you propose?)
  3. How? (State your hypothesis and evaluation procedure)
  4. When? (You need a realistic timetable and deliverable; is it doable?)


Project Wiki

Each group should maintain a wiki page about their project. The Oncourse site wiki tool is available for this purpose. The function the wiki is that of an open lab notebook, to note planned research, problems and solutions, delays, preliminary results and analyses, and to track progress along the proposed timeline and toward the stated project objectives, as well as any deviations from the approved proposal. The instructors will monitor the wiki on a weekly basis to check for progress, and the wiki will be used as a component of the project evaluation toward the course final grade.


Sample of past projects

Spring 2007

  1. Mining Pharmacogenomics Information using Topical Web Crawlers
  2. Pagerank & Sample Size
  3. Can we beat Cinematch (Netflix Recommendation System)?
  4. SMILES Index for Retrieving Chemical Information
  5. Social network community emergence around new digital media
  6. Trend Prediction
  7. Evaluating Hypertext Documents for Authenticity

Spring 2006

  1. Usage Statistics of Robots Exclusion Standard (paper in Proc. IADIS WWW/Internet 2006)
  2. Mining for Blog communities
  3. Directed News Analysis
  4. KidsCrawler
  5. Ontology Generation from Specialized Corpora
  6. Using Online Social Networks in Topical Sentiment Analysis
  7. Using Page History to Rank Search Results
  8. Web Mining Developmental Trends in Social Networks
  9. Web Topology of the Indiana University Domain (paper in Proc. IV07)
  10. Web user profiling and its applicability to system security

Spring 2005

  1. Multilingual news search
  2. Effects of guided summarization on QA using the Web
  3. Mining people connections
  4. Personalized search by history context
  5. Phishing Attacks Using Social Networks (see coverage in IDS, IDS, and Slashdot; paper to appear in CACM)
  6. Structural evolution of Web content
  7. Clustering of political opinion sites using unsupervised techniques

Spring 2004

  1. Experiments with PageRank Computation
  2. Clustering Weblogs using LSA and Link-based Methods
  3. Sherlock News Search Engine
  4. Focused Crawlers vs Accelerated Focused Crawlers
  5. Domain-Based PageRank Personalization (paper presented at WebKDD 2004!)