> Source URL: /unit-3/project-paths/thu-h/thu-h-2026-04-18.guide
# Thu's Project Guide

**Project:** H1-B Employer Data Hub
**Category:** Web App (Flask) + Data Science
**Last updated:** April 18

---

> Note: This guide reflects the latest state of your project repo. It may not match the most up-to-date version if you've worked since.

## Where You Are

**Honest read:** your repo still has only template files — spec, journal, and code are blank. **Checkpoint 1 didn't happen.** That puts us behind, but there's a clear plan from our earlier conversation and 5 days to catch up. Doable if we move now.

Reminder of the plan:

- **Project:** career-exploration tool for H1-B sponsorship data
- **Data:** [USCIS H1-B Employer Data Hub](https://www.uscis.gov/tools/reports-and-studies/h-1b-employer-data-hub) (~12k records)
- **Stack:** Flask + pandas
- **MVP feel:** browse / filter sponsor companies by a couple of fields

## ⚠️ Most Important: Message me on Discord today

Before you start any of the phases below, **message me on Discord** to confirm the plan and unblock you. A 5-minute conversation now saves days. I'd rather you start with _something_ today than spend more days deciding.

---

## Project Structure

Your project splits into two kinds of code:

- **Business logic — you handwrite this.** The filter functions in `search.py` (`filter_by_company`, `filter_by_state`, and combining them). These ARE your app — the decisions about how a user narrows down 12,000 employers to find the ones they care about.
- **Library / view code — agent-assisted is fine.** Flask routes in `app.py` (they just read URL params and pass data to templates), HTML templates, Bootstrap classes.

Target layout by Thursday:

```
final-project-huynth4/
├── app.py                  ← Flask routes — agent-assisted OK
├── search.py               ← business logic — handwrite (yours to own)
├── pyproject.toml
├── templates/              ← HTML — agent-assisted OK
└── data/
    └── h1b.csv             ← data
```

Why the split? From [Lecture 1: The MVP](../../lectures/01-the-mvp/01-the-mvp.lecture.md) — on demo day the interesting question is "how does your search work?" The answer is in `search.py` — not in the Flask plumbing.

**`search.py` should not import `flask`.** It takes a dataframe and some params, returns a filtered dataframe.

---

## Phase 1: Fill Out the Spec

> **Handwrite this yourself.** Your spec is your plan. No code, just writing.

### Objective

Replace the blank template with a real spec.

### Instructions

- [ ] Open `project.spec.md`
- [ ] Fill every section:
  - **Project Name:** H1-B Employer Data Hub
  - **Category:** Web Development (Flask)
  - **Description:** 2 sentences
  - **MVP features:** 3 realistic (see sample)
  - **Stretch features:** 2
  - **Tech stack:** Flask, pandas

### Sample MVP Features

```markdown
**Must have (MVP):**
- Load a CSV of H1-B sponsor companies from `data/`
- Filter results by company name (partial match, case-insensitive)
- Filter results by state
- Display a table of matching companies
```

> **Optional — get help from your agent:**
>
> ```text
> Help me fill in my @project.spec.md for an H1-B Employer Data Hub
> Flask web app. Use these MVP features: filter by company name,
> filter by state, display a results table. Don't write any code.
> ```

---

## Phase 2: Download the Data

> **Agent-assisted is fine here.** No code — just downloading a file.

### Objective

Get one fiscal-year CSV into your project's `data/` folder.

### Instructions

- [ ] Visit https://www.uscis.gov/tools/reports-and-studies/h-1b-employer-data-hub
- [ ] Download one fiscal year's CSV (pick the most recent)
- [ ] Save it as `data/h1b.csv`
- [ ] Open the file and note the column names — you'll need them in Phases 3–5

### Hints

**Typical columns:**

```
Fiscal Year, Employer (Petitioner) Name, Tax ID, Industry (NAICS) Code,
Petitioner City, Petitioner State, Petitioner Zip Code, Initial Approval,
Initial Denial, Continuing Approval, Continuing Denial
```

Exact names may vary year to year. Write yours down.

> **Optional — get help from your agent:**
>
> Skip — downloading a file is not a coding task.

---

## Phase 3: Scaffold Flask + Display First 10 Rows

> **Agent-assisted is fine here.** Minimal Flask setup + a table render. Same for any data-display Flask app.

### Objective

Smallest possible first slice: load the CSV, show the first 10 rows in an HTML table.

### Instructions

- [ ] Run `uv init` and `uv add flask pandas`
- [ ] Create `app.py` with one route that loads the CSV and displays the top 10 rows
- [ ] Create `templates/home.html` with a plain table
- [ ] Confirm it runs: `uv run flask --app app run --debug`

### Hints

**Minimal `app.py`:**

```python
from flask import Flask, render_template
import pandas as pd

app = Flask(__name__)

@app.route("/")
def home():
    df = pd.read_csv("data/h1b.csv")
    top = df.head(10).to_dict(orient="records")
    columns = df.columns.tolist()
    return render_template("home.html", rows=top, columns=columns)
```

**`templates/home.html`:**

```html
<!DOCTYPE html>
<html>
<head><title>H1-B Data Hub</title></head>
<body>
  <h1>H1-B Sponsor Companies</h1>
  <table border="1">
    <thead>
      <tr>{% for c in columns %}<th>{{ c }}</th>{% endfor %}</tr>
    </thead>
    <tbody>
      {% for row in rows %}
        <tr>{% for c in columns %}<td>{{ row[c] }}</td>{% endfor %}</tr>
      {% endfor %}
    </tbody>
  </table>
</body>
</html>
```

Visit `http://127.0.0.1:5000` — you should see 10 rows. Don't worry about style yet.

> **Optional — get help from your agent:**
>
> ```text
> Scaffold a minimal Flask app that loads data/h1b.csv and displays
> the first 10 rows in a table. Show me app.py and templates/home.html.
> Walk me through how .head(10).to_dict(orient="records") shapes the
> data so I know what my template is receiving.
> ```

---

## Phase 4: Build `search.py` + Company-Name Filter

> **Handwrite this yourself.** The filter logic is your product. You're designing how someone searches 12k companies.

### Objective

Create `search.py` with a `filter_by_company(df, query)` function. Wire it into `app.py`.

### Instructions

- [ ] Create `search.py` at the project root
- [ ] Write `filter_by_company(df, query)` that returns a new dataframe with rows whose company name contains the query (case-insensitive)
- [ ] Add a `<form method="get">` with one text input to `home.html`
- [ ] Update `app.py` to read `q` from `request.args` and call `filter_by_company`

### Hints

**`search.py`:**

```python
# search.py
COMPANY_COL = "Employer (Petitioner) Name"   # ← match YOUR CSV's column name

def filter_by_company(df, query):
    if not query:
        return df
    return df[df[COMPANY_COL].str.contains(query, case=False, na=False)]
```

**Why `na=False`?** Some rows have missing values. Without it, `.str.contains` would error on those.

**In `app.py`:**

```python
from flask import request
from search import filter_by_company

@app.route("/")
def home():
    df = pd.read_csv("data/h1b.csv")
    query = request.args.get("q", "").strip()
    df = filter_by_company(df, query)
    top = df.head(50).to_dict(orient="records")
    columns = df.columns.tolist()
    return render_template("home.html",
                           rows=top, columns=columns, query=query)
```

**In `home.html`:**

```html
<form method="get">
  <input name="q" placeholder="Search company name" value="{{ query or '' }}">
  <button type="submit">Search</button>
</form>
```

> **Optional — get help from your agent:**
>
> ```text
> Walk me through pandas .str.contains — what does case=False do,
> and why do I need na=False? Don't change my code — I want to
> understand before I commit.
> ```

---

## Phase 5: Add State Filter (in `search.py`)

> **Handwrite this yourself.** Second filter function. Simple pattern, but it's yours.

### Objective

Add `filter_by_state(df, state)` in `search.py`. Both filters should work together.

### Instructions

- [ ] Add `filter_by_state(df, state)` in `search.py`
- [ ] Add a `<select>` for state to the form in `home.html`
- [ ] In `app.py`, read `state` from `request.args` and call `filter_by_state` after `filter_by_company`
- [ ] Confirm both filters combine (e.g. company contains "Google" AND state = "CA")

### Hints

**`search.py`:**

```python
STATE_COL = "Petitioner State"   # ← match YOUR CSV

def filter_by_state(df, state):
    if not state:
        return df
    return df[df[STATE_COL] == state]


def unique_states(df):
    return sorted(df[STATE_COL].dropna().unique().tolist())
```

**`app.py`:**

```python
from search import filter_by_company, filter_by_state, unique_states

@app.route("/")
def home():
    df = pd.read_csv("data/h1b.csv")
    states = unique_states(df)
    query = request.args.get("q", "").strip()
    state = request.args.get("state", "").strip()

    df = filter_by_company(df, query)
    df = filter_by_state(df, state)

    top = df.head(50).to_dict(orient="records")
    return render_template("home.html",
                           rows=top,
                           columns=df.columns.tolist(),
                           query=query, state=state, states=states)
```

**`home.html` (state dropdown):**

```html
<select name="state">
  <option value="">Any state</option>
  {% for s in states %}
    <option value="{{ s }}" {% if s == state %}selected{% endif %}>{{ s }}</option>
  {% endfor %}
</select>
```

**Notice the pattern:** each filter function takes a df and returns a df. You can chain them. This is why they live in `search.py` and not inline in the route — they're reusable.

> **Optional — get help from your agent:**
>
> ```text
> Walk me through why my filter functions take a df and return a df,
> instead of mutating the df. What does that buy me? Don't change
> my code.
> ```

---

## Phase 6: Polish + Journal Catch-Up

> **Agent-assisted is fine here.** Bootstrap styling + a README.

### Objective

Make it look like a real app. Catch up the journal.

### Instructions

- [ ] Add Bootstrap via CDN so the table and form aren't raw HTML
- [ ] Add a short README: description + how to run
- [ ] Fill in **both** Checkpoint 1 and Checkpoint 2 sections of `project.journal.md`
- [ ] Commit and push

### Hints

**Bootstrap CDN (in `<head>`):**

```html
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">
```

**Table class:**

```html
<table class="table table-striped">
```

**README template:**

```markdown
# H1-B Employer Data Hub

A Flask web app to explore H1-B visa sponsor data from USCIS.

## Run locally

    uv run flask --app app run --debug

Open http://127.0.0.1:5000
```

> **Optional — get help from your agent:**
>
> ```text
> Style my table with Bootstrap 5 and make my filter form look like
> a real search bar. Keep the HTML simple enough for me to edit.
> ```

---

## Checkpoint 2 Readiness

By Thursday April 23 at 3pm:

- [ ] `project.spec.md` filled out
- [ ] `pyproject.toml` with flask + pandas
- [ ] H1-B CSV in `data/`, app loads it without crashing
- [ ] `search.py` exists with `filter_by_company`, `filter_by_state`, `unique_states`
- [ ] `search.py` does **not** import `flask`
- [ ] Both filters combine
- [ ] Basic Bootstrap styling
- [ ] README in place
- [ ] Checkpoint 1 + Checkpoint 2 entries in `project.journal.md`
- [ ] Committed and pushed

## Helpful Resources

- [Checkpoint 2 Instructions](../../projects/final-project-checkpoint-2.project.md)
- [Lecture 1: The MVP](../../lectures/01-the-mvp/01-the-mvp.lecture.md)
- [Flask Setup Guide](../../resources/flask-setup.guide.md)
- [H1-B Employer Data Hub](https://www.uscis.gov/tools/reports-and-studies/h-1b-employer-data-hub)


---

## Backlinks

The following sources link to this document:

- [April 18 -- Catch-up + Checkpoint 2](/unit-3/project-paths/thu-h/thu-h.path.llm.md)