Makayla's Project Guide
Project: CLP Curator Category: Web Development (Flask) + Web Scraping Last updated: April 18
Note: This guide reflects the latest state of your project repo. It may not match the most up-to-date version if you've worked since.
Where You Are
You have a working Flask app: form takes interest categories, results page shows recommendations. recommend_clps in data_scraper.py already scores events against interests with keyword matching — the core logic is there.
Two things to tackle:
get_clp_events()returns hardcoded placeholder events, not real scraped data.- Your
data_scraper.pycurrently holds two very different things — the scraping (library code) and the recommendation algorithm (your business logic). We're splitting them.
Project Structure
Your project splits into two kinds of code:
- Business logic — you handwrite this. The recommendation algorithm (
recommend.py): scoring, deduplication, edge-case handling, fallback when no CLP matches. Also text cleaning inutils.py. This is what makes CLP Curator different from "just scraping a calendar." - Library / view code — agent-assisted is fine. The scraping code (
scraper.py): HTTP fetch, BeautifulSoup parsing. Flask routes, HTML templates. These patterns are the same for any scraping web app.
Target layout by Thursday:
final-project-makaylacarnahan/
├── app.py ← Flask routes — agent-assisted OK
├── recommend.py ← business logic — handwrite (NEW, yours to own)
├── scraper.py ← BeautifulSoup scraping — agent-assisted OK (renamed from data_scraper.py)
├── utils.py ← text cleaning — yours (it's small but it's yours)
├── pyproject.toml
├── templates/ ← HTML — agent-assisted OK
└── data/
Why the split? From Lecture 1: The MVP — on demo day the interesting question is "how does your curator decide which CLPs to recommend?" That's recommend_clps — your handwritten scoring logic. Scraping HTML is just how you get the inputs; anyone could write that part.
recommend.py should not import requests or bs4. It takes a list of events and returns a ranked list — pure logic.
Phase 1: Inspect the CLP Page HTML
Handwrite this yourself. You need to SEE the page's structure before scraping it. Eyes on the HTML first; code later.
Objective
Identify the HTML selectors for each event before writing any scraper code.
Instructions
Sample Notes
Event wrapper: <div class="event-card">
Title: <h3 class="event-title">
Date: <time class="event-date">
Description: <p class="event-description">
(Your selectors will be different — what matters is getting them from the real page.)
Hints
If the page doesn't load cleanly, save it with curl and open the HTML file locally:
curl https://www.furman.edu/academics/cultural-life-program/upcoming-clp-events/ > clp.html
Warning. If the page loads events via JavaScript (not in the raw HTML), scraping will return zero events. Phase 2 includes a fallback for that case — don't panic if your scraper returns empty on the first try.
Optional — get help from your agent:
Fetch this URL and tell me the HTML tag and class name for (a) the wrapper around each event, (b) the title, (c) the description/date. URL: https://www.furman.edu/academics/cultural-life-program/upcoming-clp-events/ Don't write the scraper yet — just list the selectors.
Phase 2: Rename + Write the Real Scraper
Agent-assisted is fine here. You hand over the selectors from Phase 1; the agent writes the BeautifulSoup loop. Read what you get back and tweak.
Objective
Rename data_scraper.py → scraper.py. Replace the placeholder list with a real parser that uses your Phase 1 selectors, with a fallback if scraping fails.
Instructions
Hints
The real scraper pattern (replace the select call with YOUR selectors):
# scraper.py
import requests
from bs4 import BeautifulSoup
def get_clp_events():
url = "https://www.furman.edu/academics/cultural-life-program/upcoming-clp-events/"
try:
response = requests.get(url, timeout=10)
soup = BeautifulSoup(response.text, "html.parser")
events = []
for card in soup.select("div.event-card"): # ← your wrapper selector
title_el = card.select_one("h3.event-title")
desc_el = card.select_one("p.event-description")
if not title_el:
continue
events.append({
"title": title_el.get_text(strip=True),
"description": desc_el.get_text(strip=True) if desc_el else "",
})
if events:
return events
print("Scraper found zero events, falling back to placeholders")
return _placeholder_events()
except Exception as e:
print("Scraping error:", e)
return _placeholder_events()
def _placeholder_events():
return [
{"title": "Climate Change Talk", "description": "environment sustainability climate"},
{"title": "AI Ethics Panel", "description": "technology ethics ai future"},
# ... keep your current 6
]
Also cleanup: your current data_scraper.py has from utils import clean_text imported twice (lines 3 and 27). Delete the second one when you move code over.
Optional — get help from your agent:
Here are my selectors from inspecting the CLP page: [paste from Phase 1]. Wire them into get_clp_events() using BeautifulSoup. Keep the placeholder fallback. Show me the diff, don't commit.
Phase 3: Create recommend.py + Fix Edge Cases
Handwrite this yourself. This IS your project — the algorithm that decides which 5 CLPs to show a student.
Objective
Move recommend_clps into a new recommend.py file. Fix the edge cases: duplicates, empty input, no matches.
Instructions
- Duplicate interests (
["music", "music", "music"]) — dedupe before scoring - Empty-string interests (
[""]) — filter out - Zero matches — return featured events instead of an empty page
- Duplicate interests (
Hints
Improved recommend_clps:
# recommend.py
from utils import clean_text
def recommend_clps(events, interests):
# Clean + dedupe interests (business decision: "music, music, music" = one "music")
interests = [i.strip().lower() for i in interests if i.strip()]
interests = list(set(interests))
scored = []
for event in events:
text = clean_text(event["description"])
score = 0
hits = []
for word in interests:
if word in text:
score += 1
hits.append(word)
if score > 0:
scored.append({"score": score, "event": event, "hits": hits})
scored.sort(key=lambda x: x["score"], reverse=True)
if not scored:
# No matches — fall back to the first 5 events (business decision: show SOMETHING)
return [{"event": e, "hits": []} for e in events[:5]]
return scored[:5]
Notice the change: the function now returns dicts with event and hits (so your template can show which interests matched). Update home.html / results.html to use r.event.title etc.
Test it quickly from a REPL or paste into a test_recommend.py:
from scraper import get_clp_events
from recommend import recommend_clps
events = get_clp_events()
print(recommend_clps(events, ["music", "music", "music"]))
print(recommend_clps(events, [""]))
print(recommend_clps(events, ["underwater basket weaving"]))
Optional — get help from your agent:
Walk me through my new recommend_clps function. For each of these inputs, tell me what it returns: ["music","music","music"], [""], ["underwater basket weaving"]. Don't change the code — I want to verify my mental model.
Phase 4: Bootstrap the UI
Agent-assisted is fine here. Forms, cards, badges — all standard Bootstrap.
Objective
Style the interest form and results so the app looks intentional.
Instructions
Hints
Bootstrap CDN:
<link
href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css"
rel="stylesheet"
/>
CLP result card (with badges for hits):
{% for r in results %}
<div class="card mb-3">
<div class="card-body">
<h5 class="card-title">{{ r.event.title }}</h5>
<p class="card-text">{{ r.event.description }}</p>
{% for h in r.hits %}
<span class="badge bg-primary">{{ h }}</span>
{% endfor %}
</div>
</div>
{% endfor %}
Optional — get help from your agent:
Style home.html and my results template using Bootstrap 5. Each result should be a card with title + description + matched-interest badges. Keep the HTML simple enough for me to edit.