Makayla's Project Guide

Project: CLP Curator Category: Web Development (Flask) + Web Scraping Last updated: April 18

Note: This guide reflects the latest state of your project repo. It may not match the most up-to-date version if you've worked since.

Where You Are

You have a working Flask app: form takes interest categories, results page shows recommendations. recommend_clps in data_scraper.py already scores events against interests with keyword matching — the core logic is there.

Two things to tackle:

  1. get_clp_events() returns hardcoded placeholder events, not real scraped data.
  2. Your data_scraper.py currently holds two very different things — the scraping (library code) and the recommendation algorithm (your business logic). We're splitting them.

Project Structure

Your project splits into two kinds of code:

  • Business logic — you handwrite this. The recommendation algorithm (recommend.py): scoring, deduplication, edge-case handling, fallback when no CLP matches. Also text cleaning in utils.py. This is what makes CLP Curator different from "just scraping a calendar."
  • Library / view code — agent-assisted is fine. The scraping code (scraper.py): HTTP fetch, BeautifulSoup parsing. Flask routes, HTML templates. These patterns are the same for any scraping web app.

Target layout by Thursday:

final-project-makaylacarnahan/
├── app.py                  ← Flask routes — agent-assisted OK
├── recommend.py            ← business logic — handwrite (NEW, yours to own)
├── scraper.py              ← BeautifulSoup scraping — agent-assisted OK (renamed from data_scraper.py)
├── utils.py                ← text cleaning — yours (it's small but it's yours)
├── pyproject.toml
├── templates/              ← HTML — agent-assisted OK
└── data/

Why the split? From Lecture 1: The MVP — on demo day the interesting question is "how does your curator decide which CLPs to recommend?" That's recommend_clps — your handwritten scoring logic. Scraping HTML is just how you get the inputs; anyone could write that part.

recommend.py should not import requests or bs4. It takes a list of events and returns a ranked list — pure logic.

Phase 1: Inspect the CLP Page HTML

Handwrite this yourself. You need to SEE the page's structure before scraping it. Eyes on the HTML first; code later.

Objective

Identify the HTML selectors for each event before writing any scraper code.

Instructions

Sample Notes

Event wrapper: <div class="event-card">
  Title:       <h3 class="event-title">
  Date:        <time class="event-date">
  Description: <p class="event-description">

(Your selectors will be different — what matters is getting them from the real page.)

Hints

If the page doesn't load cleanly, save it with curl and open the HTML file locally:

curl https://www.furman.edu/academics/cultural-life-program/upcoming-clp-events/ > clp.html

Warning. If the page loads events via JavaScript (not in the raw HTML), scraping will return zero events. Phase 2 includes a fallback for that case — don't panic if your scraper returns empty on the first try.

Optional — get help from your agent:

Fetch this URL and tell me the HTML tag and class name for (a) the
wrapper around each event, (b) the title, (c) the description/date.
URL: https://www.furman.edu/academics/cultural-life-program/upcoming-clp-events/
Don't write the scraper yet — just list the selectors.

Phase 2: Rename + Write the Real Scraper

Agent-assisted is fine here. You hand over the selectors from Phase 1; the agent writes the BeautifulSoup loop. Read what you get back and tweak.

Objective

Rename data_scraper.pyscraper.py. Replace the placeholder list with a real parser that uses your Phase 1 selectors, with a fallback if scraping fails.

Instructions

Hints

The real scraper pattern (replace the select call with YOUR selectors):

# scraper.py
import requests
from bs4 import BeautifulSoup


def get_clp_events():
    url = "https://www.furman.edu/academics/cultural-life-program/upcoming-clp-events/"
    try:
        response = requests.get(url, timeout=10)
        soup = BeautifulSoup(response.text, "html.parser")

        events = []
        for card in soup.select("div.event-card"):       # ← your wrapper selector
            title_el = card.select_one("h3.event-title")
            desc_el  = card.select_one("p.event-description")
            if not title_el:
                continue
            events.append({
                "title": title_el.get_text(strip=True),
                "description": desc_el.get_text(strip=True) if desc_el else "",
            })

        if events:
            return events
        print("Scraper found zero events, falling back to placeholders")
        return _placeholder_events()
    except Exception as e:
        print("Scraping error:", e)
        return _placeholder_events()


def _placeholder_events():
    return [
        {"title": "Climate Change Talk", "description": "environment sustainability climate"},
        {"title": "AI Ethics Panel", "description": "technology ethics ai future"},
        # ... keep your current 6
    ]

Also cleanup: your current data_scraper.py has from utils import clean_text imported twice (lines 3 and 27). Delete the second one when you move code over.

Optional — get help from your agent:

Here are my selectors from inspecting the CLP page: [paste from Phase 1].
Wire them into get_clp_events() using BeautifulSoup. Keep the
placeholder fallback. Show me the diff, don't commit.

Phase 3: Create recommend.py + Fix Edge Cases

Handwrite this yourself. This IS your project — the algorithm that decides which 5 CLPs to show a student.

Objective

Move recommend_clps into a new recommend.py file. Fix the edge cases: duplicates, empty input, no matches.

Instructions

    • Duplicate interests (["music", "music", "music"]) — dedupe before scoring
    • Empty-string interests ([""]) — filter out
    • Zero matches — return featured events instead of an empty page

Hints

Improved recommend_clps:

# recommend.py
from utils import clean_text


def recommend_clps(events, interests):
    # Clean + dedupe interests (business decision: "music, music, music" = one "music")
    interests = [i.strip().lower() for i in interests if i.strip()]
    interests = list(set(interests))

    scored = []
    for event in events:
        text = clean_text(event["description"])
        score = 0
        hits = []
        for word in interests:
            if word in text:
                score += 1
                hits.append(word)
        if score > 0:
            scored.append({"score": score, "event": event, "hits": hits})

    scored.sort(key=lambda x: x["score"], reverse=True)

    if not scored:
        # No matches — fall back to the first 5 events (business decision: show SOMETHING)
        return [{"event": e, "hits": []} for e in events[:5]]

    return scored[:5]

Notice the change: the function now returns dicts with event and hits (so your template can show which interests matched). Update home.html / results.html to use r.event.title etc.

Test it quickly from a REPL or paste into a test_recommend.py:

from scraper import get_clp_events
from recommend import recommend_clps
events = get_clp_events()
print(recommend_clps(events, ["music", "music", "music"]))
print(recommend_clps(events, [""]))
print(recommend_clps(events, ["underwater basket weaving"]))

Optional — get help from your agent:

Walk me through my new recommend_clps function. For each of these
inputs, tell me what it returns: ["music","music","music"], [""],
["underwater basket weaving"]. Don't change the code — I want to
verify my mental model.

Phase 4: Bootstrap the UI

Agent-assisted is fine here. Forms, cards, badges — all standard Bootstrap.

Objective

Style the interest form and results so the app looks intentional.

Instructions

Hints

Bootstrap CDN:

<link
  href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css"
  rel="stylesheet"
/>

CLP result card (with badges for hits):

{% for r in results %}
<div class="card mb-3">
  <div class="card-body">
    <h5 class="card-title">{{ r.event.title }}</h5>
    <p class="card-text">{{ r.event.description }}</p>
    {% for h in r.hits %}
    <span class="badge bg-primary">{{ h }}</span>
    {% endfor %}
  </div>
</div>
{% endfor %}

Optional — get help from your agent:

Style home.html and my results template using Bootstrap 5. Each
result should be a card with title + description + matched-interest
badges. Keep the HTML simple enough for me to edit.

Checkpoint 2 Readiness

By Thursday April 23 at 3pm:

Helpful Resources