> Source URL: /unit-3/project-paths/makayla-c/makayla-c-2026-04-18.guide
# Makayla's Project Guide

**Project:** CLP Curator
**Category:** Web Development (Flask) + Web Scraping
**Last updated:** April 18

---

> Note: This guide reflects the latest state of your project repo. It may not match the most up-to-date version if you've worked since.

## Where You Are

You have a working Flask app: form takes interest categories, results page shows recommendations. `recommend_clps` in `data_scraper.py` already scores events against interests with keyword matching — the core logic is there.

Two things to tackle:

1. `get_clp_events()` returns hardcoded placeholder events, not real scraped data.
2. Your `data_scraper.py` currently holds two very different things — the scraping (library code) and the recommendation algorithm (your business logic). We're splitting them.

---

## Project Structure

Your project splits into two kinds of code:

- **Business logic — you handwrite this.** The recommendation algorithm (`recommend.py`): scoring, deduplication, edge-case handling, fallback when no CLP matches. Also text cleaning in `utils.py`. This is what makes CLP Curator different from "just scraping a calendar."
- **Library / view code — agent-assisted is fine.** The scraping code (`scraper.py`): HTTP fetch, BeautifulSoup parsing. Flask routes, HTML templates. These patterns are the same for any scraping web app.

Target layout by Thursday:

```
final-project-makaylacarnahan/
├── app.py                  ← Flask routes — agent-assisted OK
├── recommend.py            ← business logic — handwrite (NEW, yours to own)
├── scraper.py              ← BeautifulSoup scraping — agent-assisted OK (renamed from data_scraper.py)
├── utils.py                ← text cleaning — yours (it's small but it's yours)
├── pyproject.toml
├── templates/              ← HTML — agent-assisted OK
└── data/
```

Why the split? From [Lecture 1: The MVP](../../lectures/01-the-mvp/01-the-mvp.lecture.md) — on demo day the interesting question is "how does your curator decide which CLPs to recommend?" That's `recommend_clps` — your handwritten scoring logic. Scraping HTML is just how you get the inputs; anyone could write that part.

**`recommend.py` should not import `requests` or `bs4`.** It takes a list of events and returns a ranked list — pure logic.

---

## Phase 1: Inspect the CLP Page HTML

> **Handwrite this yourself.** You need to SEE the page's structure before scraping it. Eyes on the HTML first; code later.

### Objective

Identify the HTML selectors for each event before writing any scraper code.

### Instructions

- [ ] Open https://www.furman.edu/academics/cultural-life-program/upcoming-clp-events/
- [ ] Right-click one event → "Inspect" (opens browser dev tools)
- [ ] Find the HTML tag + class that wraps **one event**
- [ ] Inside that wrapper, find the tags/classes for **title**, **date**, and **description**
- [ ] Write the selectors down — you'll hand them to your agent in Phase 2

### Sample Notes

```
Event wrapper: <div class="event-card">
  Title:       <h3 class="event-title">
  Date:        <time class="event-date">
  Description: <p class="event-description">
```

(Your selectors will be different — what matters is getting them from the real page.)

### Hints

**If the page doesn't load cleanly, save it with `curl` and open the HTML file locally:**

```bash
curl https://www.furman.edu/academics/cultural-life-program/upcoming-clp-events/ > clp.html
```

**Warning.** If the page loads events via JavaScript (not in the raw HTML), scraping will return zero events. Phase 2 includes a fallback for that case — don't panic if your scraper returns empty on the first try.

> **Optional — get help from your agent:**
>
> ```text
> Fetch this URL and tell me the HTML tag and class name for (a) the
> wrapper around each event, (b) the title, (c) the description/date.
> URL: https://www.furman.edu/academics/cultural-life-program/upcoming-clp-events/
> Don't write the scraper yet — just list the selectors.
> ```

---

## Phase 2: Rename + Write the Real Scraper

> **Agent-assisted is fine here.** You hand over the selectors from Phase 1; the agent writes the BeautifulSoup loop. Read what you get back and tweak.

### Objective

Rename `data_scraper.py` → `scraper.py`. Replace the placeholder list with a real parser that uses your Phase 1 selectors, with a fallback if scraping fails.

### Instructions

- [ ] Rename `data_scraper.py` → `scraper.py`
- [ ] Move `recommend_clps` OUT of it and into a new file `recommend.py` (Phase 3)
- [ ] Replace `get_clp_events()` with a real BeautifulSoup parser using your selectors
- [ ] If the fetch fails OR finds zero events, fall back to your existing placeholder list
- [ ] Update the import in `app.py` (`from scraper import get_clp_events`, `from recommend import recommend_clps`)

### Hints

**The real scraper pattern (replace the `select` call with YOUR selectors):**

```python
# scraper.py
import requests
from bs4 import BeautifulSoup


def get_clp_events():
    url = "https://www.furman.edu/academics/cultural-life-program/upcoming-clp-events/"
    try:
        response = requests.get(url, timeout=10)
        soup = BeautifulSoup(response.text, "html.parser")

        events = []
        for card in soup.select("div.event-card"):       # ← your wrapper selector
            title_el = card.select_one("h3.event-title")
            desc_el  = card.select_one("p.event-description")
            if not title_el:
                continue
            events.append({
                "title": title_el.get_text(strip=True),
                "description": desc_el.get_text(strip=True) if desc_el else "",
            })

        if events:
            return events
        print("Scraper found zero events, falling back to placeholders")
        return _placeholder_events()
    except Exception as e:
        print("Scraping error:", e)
        return _placeholder_events()


def _placeholder_events():
    return [
        {"title": "Climate Change Talk", "description": "environment sustainability climate"},
        {"title": "AI Ethics Panel", "description": "technology ethics ai future"},
        # ... keep your current 6
    ]
```

**Also cleanup:** your current `data_scraper.py` has `from utils import clean_text` imported twice (lines 3 and 27). Delete the second one when you move code over.

> **Optional — get help from your agent:**
>
> ```text
> Here are my selectors from inspecting the CLP page: [paste from Phase 1].
> Wire them into get_clp_events() using BeautifulSoup. Keep the
> placeholder fallback. Show me the diff, don't commit.
> ```

---

## Phase 3: Create `recommend.py` + Fix Edge Cases

> **Handwrite this yourself.** This IS your project — the algorithm that decides which 5 CLPs to show a student.

### Objective

Move `recommend_clps` into a new `recommend.py` file. Fix the edge cases: duplicates, empty input, no matches.

### Instructions

- [ ] Create `recommend.py` at the project root
- [ ] Move `recommend_clps(events, interests)` from `data_scraper.py`/`scraper.py` into it
- [ ] Fix three edge cases:
  - Duplicate interests (`["music", "music", "music"]`) — dedupe before scoring
  - Empty-string interests (`[""]`) — filter out
  - Zero matches — return featured events instead of an empty page
- [ ] Update `app.py` import to `from recommend import recommend_clps`

### Hints

**Improved `recommend_clps`:**

```python
# recommend.py
from utils import clean_text


def recommend_clps(events, interests):
    # Clean + dedupe interests (business decision: "music, music, music" = one "music")
    interests = [i.strip().lower() for i in interests if i.strip()]
    interests = list(set(interests))

    scored = []
    for event in events:
        text = clean_text(event["description"])
        score = 0
        hits = []
        for word in interests:
            if word in text:
                score += 1
                hits.append(word)
        if score > 0:
            scored.append({"score": score, "event": event, "hits": hits})

    scored.sort(key=lambda x: x["score"], reverse=True)

    if not scored:
        # No matches — fall back to the first 5 events (business decision: show SOMETHING)
        return [{"event": e, "hits": []} for e in events[:5]]

    return scored[:5]
```

**Notice the change:** the function now returns dicts with `event` and `hits` (so your template can show which interests matched). Update `home.html` / `results.html` to use `r.event.title` etc.

**Test it quickly from a [REPL](../../../resources/REPL.guide.md) or paste into a `test_recommend.py`:**

```python
from scraper import get_clp_events
from recommend import recommend_clps
events = get_clp_events()
print(recommend_clps(events, ["music", "music", "music"]))
print(recommend_clps(events, [""]))
print(recommend_clps(events, ["underwater basket weaving"]))
```

> **Optional — get help from your agent:**
>
> ```text
> Walk me through my new recommend_clps function. For each of these
> inputs, tell me what it returns: ["music","music","music"], [""],
> ["underwater basket weaving"]. Don't change the code — I want to
> verify my mental model.
> ```

---

## Phase 4: Bootstrap the UI

> **Agent-assisted is fine here.** Forms, cards, badges — all standard Bootstrap.

### Objective

Style the interest form and results so the app looks intentional.

### Instructions

- [ ] Add Bootstrap CDN link to `base.html` and `home.html`
- [ ] Style the interest input as a clean form with a labeled textarea or text input
- [ ] Render each recommended CLP as a Bootstrap card showing title + description + matched interests as badges

### Hints

**Bootstrap CDN:**

```html
<link
  href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css"
  rel="stylesheet"
/>
```

**CLP result card (with badges for hits):**

```html
{% for r in results %}
<div class="card mb-3">
  <div class="card-body">
    <h5 class="card-title">{{ r.event.title }}</h5>
    <p class="card-text">{{ r.event.description }}</p>
    {% for h in r.hits %}
    <span class="badge bg-primary">{{ h }}</span>
    {% endfor %}
  </div>
</div>
{% endfor %}
```

> **Optional — get help from your agent:**
>
> ```text
> Style home.html and my results template using Bootstrap 5. Each
> result should be a card with title + description + matched-interest
> badges. Keep the HTML simple enough for me to edit.
> ```

---

## Checkpoint 2 Readiness

By Thursday April 23 at 3pm:

- [ ] `scraper.py` has a real BeautifulSoup parser (with placeholder fallback)
- [ ] `recommend.py` exists with `recommend_clps`
- [ ] `recommend.py` does **not** import `requests` or `bs4`
- [ ] Edge cases handled: duplicates deduped, empty strings filtered, zero-match fallback
- [ ] Duplicate `from utils import clean_text` removed
- [ ] Interest form + results styled (Bootstrap)
- [ ] Checkpoint 2 entry in `project.journal.md`
- [ ] Committed and pushed

## Helpful Resources

- [Checkpoint 2 Instructions](../../projects/final-project-checkpoint-2.project.md)
- [Lecture 1: The MVP](../../lectures/01-the-mvp/01-the-mvp.lecture.md)
- [Flask Setup Guide](../../resources/flask-setup.guide.md)
- [BeautifulSoup quickstart](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#quick-start)


---

## Backlinks

The following sources link to this document:

- [April 18 -- Checkpoint 2 (Working MVP)](/unit-3/project-paths/makayla-c/makayla-c.path.llm.md)