Thu's Project Guide

Project: H1-B Employer Data Hub Category: Web App (Flask) + Data Science Last updated: April 18

Note: This guide reflects the latest state of your project repo. It may not match the most up-to-date version if you've worked since.

Where You Are

Honest read: your repo still has only template files — spec, journal, and code are blank. Checkpoint 1 didn't happen. That puts us behind, but there's a clear plan from our earlier conversation and 5 days to catch up. Doable if we move now.

Reminder of the plan:

Project: career-exploration tool for H1-B sponsorship data
Data: USCIS H1-B Employer Data Hub (~12k records)
Stack: Flask + pandas
MVP feel: browse / filter sponsor companies by a couple of fields

⚠️ Most Important: Message me on Discord today

Before you start any of the phases below, message me on Discord to confirm the plan and unblock you. A 5-minute conversation now saves days. I'd rather you start with something today than spend more days deciding.

Project Structure

Your project splits into two kinds of code:

Business logic — you handwrite this. The filter functions in search.py (filter_by_company, filter_by_state, and combining them). These ARE your app — the decisions about how a user narrows down 12,000 employers to find the ones they care about.
Library / view code — agent-assisted is fine. Flask routes in app.py (they just read URL params and pass data to templates), HTML templates, Bootstrap classes.

Target layout by Thursday:

final-project-huynth4/
├── app.py                  ← Flask routes — agent-assisted OK
├── search.py               ← business logic — handwrite (yours to own)
├── pyproject.toml
├── templates/              ← HTML — agent-assisted OK
└── data/
    └── h1b.csv             ← data

Why the split? From Lecture 1: The MVP — on demo day the interesting question is "how does your search work?" The answer is in search.py — not in the Flask plumbing.

search.py should not import flask. It takes a dataframe and some params, returns a filtered dataframe.

Phase 1: Fill Out the Spec

Handwrite this yourself. Your spec is your plan. No code, just writing.

Objective

Replace the blank template with a real spec.

Instructions

Open project.spec.md
Fill every section:
- Project Name: H1-B Employer Data Hub
- Category: Web Development (Flask)
- Description: 2 sentences
- MVP features: 3 realistic (see sample)
- Stretch features: 2
- Tech stack: Flask, pandas

Sample MVP Features

**Must have (MVP):**
- Load a CSV of H1-B sponsor companies from `data/`
- Filter results by company name (partial match, case-insensitive)
- Filter results by state
- Display a table of matching companies

Optional — get help from your agent:

Help me fill in my @project.spec.md for an H1-B Employer Data Hub
Flask web app. Use these MVP features: filter by company name,
filter by state, display a results table. Don't write any code.

Phase 2: Download the Data

Agent-assisted is fine here. No code — just downloading a file.

Objective

Get one fiscal-year CSV into your project's data/ folder.

Instructions

Visit https://www.uscis.gov/tools/reports-and-studies/h-1b-employer-data-hub
Download one fiscal year's CSV (pick the most recent)
Save it as data/h1b.csv
Open the file and note the column names — you'll need them in Phases 3–5

Hints

Typical columns:

Fiscal Year, Employer (Petitioner) Name, Tax ID, Industry (NAICS) Code,
Petitioner City, Petitioner State, Petitioner Zip Code, Initial Approval,
Initial Denial, Continuing Approval, Continuing Denial

Exact names may vary year to year. Write yours down.

Optional — get help from your agent:

Skip — downloading a file is not a coding task.

Phase 3: Scaffold Flask + Display First 10 Rows

Agent-assisted is fine here. Minimal Flask setup + a table render. Same for any data-display Flask app.

Objective

Smallest possible first slice: load the CSV, show the first 10 rows in an HTML table.

Instructions

Run uv init and uv add flask pandas
Create app.py with one route that loads the CSV and displays the top 10 rows
Create templates/home.html with a plain table
Confirm it runs: uv run flask --app app run --debug

Hints

Minimal app.py:

from flask import Flask, render_template
import pandas as pd

app = Flask(__name__)

@app.route("/")
def home():
    df = pd.read_csv("data/h1b.csv")
    top = df.head(10).to_dict(orient="records")
    columns = df.columns.tolist()
    return render_template("home.html", rows=top, columns=columns)

templates/home.html:

<!DOCTYPE html>
<html>
<head><title>H1-B Data Hub</title></head>
<body>
  <h1>H1-B Sponsor Companies</h1>
  <table border="1">
    <thead>
      <tr>{% for c in columns %}<th>{{ c }}</th>{% endfor %}</tr>
    </thead>
    <tbody>
      {% for row in rows %}
        <tr>{% for c in columns %}<td>{{ row[c] }}</td>{% endfor %}</tr>
      {% endfor %}
    </tbody>
  </table>
</body>
</html>

Visit http://127.0.0.1:5000 — you should see 10 rows. Don't worry about style yet.

Optional — get help from your agent:

Scaffold a minimal Flask app that loads data/h1b.csv and displays
the first 10 rows in a table. Show me app.py and templates/home.html.
Walk me through how .head(10).to_dict(orient="records") shapes the
data so I know what my template is receiving.

Phase 4: Build `search.py` + Company-Name Filter

Handwrite this yourself. The filter logic is your product. You're designing how someone searches 12k companies.

Objective

Create search.py with a filter_by_company(df, query) function. Wire it into app.py.

Instructions

Create search.py at the project root
Write filter_by_company(df, query) that returns a new dataframe with rows whose company name contains the query (case-insensitive)
Add a <form method="get"> with one text input to home.html
Update app.py to read q from request.args and call filter_by_company

Hints

search.py:

# search.py
COMPANY_COL = "Employer (Petitioner) Name"   # ← match YOUR CSV's column name

def filter_by_company(df, query):
    if not query:
        return df
    return df[df[COMPANY_COL].str.contains(query, case=False, na=False)]

Why na=False? Some rows have missing values. Without it, .str.contains would error on those.

In app.py:

from flask import request
from search import filter_by_company

@app.route("/")
def home():
    df = pd.read_csv("data/h1b.csv")
    query = request.args.get("q", "").strip()
    df = filter_by_company(df, query)
    top = df.head(50).to_dict(orient="records")
    columns = df.columns.tolist()
    return render_template("home.html",
                           rows=top, columns=columns, query=query)

In home.html:

<form method="get">
  <input name="q" placeholder="Search company name" value="{{ query or '' }}">
  <button type="submit">Search</button>
</form>

Optional — get help from your agent:

Walk me through pandas .str.contains — what does case=False do,
and why do I need na=False? Don't change my code — I want to
understand before I commit.

Phase 5: Add State Filter (in `search.py`)

Handwrite this yourself. Second filter function. Simple pattern, but it's yours.

Objective

Add filter_by_state(df, state) in search.py. Both filters should work together.

Instructions

Add filter_by_state(df, state) in search.py
Add a <select> for state to the form in home.html
In app.py, read state from request.args and call filter_by_state after filter_by_company
Confirm both filters combine (e.g. company contains "Google" AND state = "CA")

Hints

search.py:

STATE_COL = "Petitioner State"   # ← match YOUR CSV

def filter_by_state(df, state):
    if not state:
        return df
    return df[df[STATE_COL] == state]


def unique_states(df):
    return sorted(df[STATE_COL].dropna().unique().tolist())

app.py:

from search import filter_by_company, filter_by_state, unique_states

@app.route("/")
def home():
    df = pd.read_csv("data/h1b.csv")
    states = unique_states(df)
    query = request.args.get("q", "").strip()
    state = request.args.get("state", "").strip()

    df = filter_by_company(df, query)
    df = filter_by_state(df, state)

    top = df.head(50).to_dict(orient="records")
    return render_template("home.html",
                           rows=top,
                           columns=df.columns.tolist(),
                           query=query, state=state, states=states)

home.html (state dropdown):

<select name="state">
  <option value="">Any state</option>
  {% for s in states %}
    <option value="{{ s }}" {% if s == state %}selected{% endif %}>{{ s }}</option>
  {% endfor %}
</select>

Notice the pattern: each filter function takes a df and returns a df. You can chain them. This is why they live in search.py and not inline in the route — they're reusable.

Optional — get help from your agent:

Walk me through why my filter functions take a df and return a df,
instead of mutating the df. What does that buy me? Don't change
my code.

Phase 6: Polish + Journal Catch-Up

Agent-assisted is fine here. Bootstrap styling + a README.

Objective

Make it look like a real app. Catch up the journal.

Instructions

Add Bootstrap via CDN so the table and form aren't raw HTML
Add a short README: description + how to run
Fill in both Checkpoint 1 and Checkpoint 2 sections of project.journal.md
Commit and push

Hints

Bootstrap CDN (in <head>):

<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">

Table class:

<table class="table table-striped">

README template:

# H1-B Employer Data Hub

A Flask web app to explore H1-B visa sponsor data from USCIS.

## Run locally

    uv run flask --app app run --debug

Open http://127.0.0.1:5000

Optional — get help from your agent:

Style my table with Bootstrap 5 and make my filter form look like
a real search bar. Keep the HTML simple enough for me to edit.

Checkpoint 2 Readiness

By Thursday April 23 at 3pm:

project.spec.md filled out
pyproject.toml with flask + pandas
H1-B CSV in data/, app loads it without crashing
search.py exists with filter_by_company, filter_by_state, unique_states
search.py does not import flask
Both filters combine
Basic Bootstrap styling
README in place
Checkpoint 1 + Checkpoint 2 entries in project.journal.md
Committed and pushed