This course focused on practical recommender systems and web mining, with Python-based implementations over real interaction data (user–book logs, ratings, etc.).


Project 1: Book Recommendation from Implicit & Explicit Feedback

Goal:
Given a large set of (user, book, rating) interactions, build two models:

  1. A rating predictor r̂(u, b) that estimates how a user would rate a book.
  2. A read predictor p̂_read(u, b) that guesses whether a user would read a given book at all.

All predictions were written out to CSV files to be graded and evaluated on hidden test sets.

Part 1 – Rating Prediction with User & Item Biases

For ratings, I implemented a classic bias-based model:

r^(u,b)=α+βu+βb

where:

From the training data:

Then I iteratively refined the biases with L2 regularization to avoid overfitting users/books with few ratings. Conceptually:

# Pseudocode – not the exact assignment code
alpha = global_mean_rating
user_bias = defaultdict(float)
book_bias = defaultdict(float)

for _ in range(max_iters):
    # update user biases
    for u in users:
        # average residual over books user u rated
        residuals = [
            r - alpha - book_bias[b]
            for (b, r) in ratings_per_user[u]
        ]
        user_bias[u] = f_user(residuals)  # regularized update

    # update book biases
    for b in books:
        residuals = [
            r - alpha - user_bias[u]
            for (u, r) in ratings_per_book[b]
        ]
        book_bias[b] = f_book(residuals)  # regularized update

Key ideas (without giving away the exact formulas):

For prediction on new (user, book) pairs, I combined everything and clamped ratings to the allowed range (e.g., 1–5):

pred = alpha + user_bias.get(u, 0.0) + book_bias.get(b, 0.0)
pred = min(max(pred, 1.0), 5.0)

This model is simple, fast, and surprisingly strong as a baseline, especially on large, sparse recommendation datasets. The RMSE on the hidden test set was around 1.5, placing me top 3% in the class.

Part 2 – Read Prediction with User & Item Read Biases

The second task was to predict whether a user would read a given book (binary: 0/1). There were no explicit read labels beyond the observed interactions, so this is an implicit feedback problem.

I built a rule-based model combining:

1. Book popularity

2. User reading activity

Using these, I defined thresholds and bands (e.g., “highly popular”, “moderately popular”) based on percentiles of the popularity distribution. The prediction rule, in words:

In pseudocode:

# Pseudocode – high-level logic
def predict_read(user, book):
    pop = book_popularity.get(book, 0.0)
    seen_before = book in user_read_books[user]
    user_reads = len(user_read_books[user])

    if pop >= high_pop_threshold:
        return 1
    if pop >= mid_pop_threshold and user_reads >= active_user_threshold:
        return 1
    if seen_before:
        return 1
    return 0

This keeps the model interpretable and avoids overfitting while still reacting to:

The accuracy on the hidden test set was around 76%, placing me in the top 24% of the class.


Project 2: Open-Ended Recommender System Exploration

For the second assignment, we could choose any recommendation approach and dataset. The project can be read about in more detail here.