Thu's Project Guide
Project: H1-B Employer Data Hub Category: Web App (Flask) + Data Science Last updated: April 18
Note: This guide reflects the latest state of your project repo. It may not match the most up-to-date version if you've worked since.
Where You Are
Honest read: your repo still has only template files — spec, journal, and code are blank. Checkpoint 1 didn't happen. That puts us behind, but there's a clear plan from our earlier conversation and 5 days to catch up. Doable if we move now.
Reminder of the plan:
- Project: career-exploration tool for H1-B sponsorship data
- Data: USCIS H1-B Employer Data Hub (~12k records)
- Stack: Flask + pandas
- MVP feel: browse / filter sponsor companies by a couple of fields
⚠️ Most Important: Message me on Discord today
Before you start any of the phases below, message me on Discord to confirm the plan and unblock you. A 5-minute conversation now saves days. I'd rather you start with something today than spend more days deciding.
Project Structure
Your project splits into two kinds of code:
- Business logic — you handwrite this. The filter functions in
search.py(filter_by_company,filter_by_state, and combining them). These ARE your app — the decisions about how a user narrows down 12,000 employers to find the ones they care about. - Library / view code — agent-assisted is fine. Flask routes in
app.py(they just read URL params and pass data to templates), HTML templates, Bootstrap classes.
Target layout by Thursday:
final-project-huynth4/
├── app.py ← Flask routes — agent-assisted OK
├── search.py ← business logic — handwrite (yours to own)
├── pyproject.toml
├── templates/ ← HTML — agent-assisted OK
└── data/
└── h1b.csv ← data
Why the split? From Lecture 1: The MVP — on demo day the interesting question is "how does your search work?" The answer is in search.py — not in the Flask plumbing.
search.py should not import flask. It takes a dataframe and some params, returns a filtered dataframe.
Phase 1: Fill Out the Spec
Handwrite this yourself. Your spec is your plan. No code, just writing.
Objective
Replace the blank template with a real spec.
Instructions
- Project Name: H1-B Employer Data Hub
- Category: Web Development (Flask)
- Description: 2 sentences
- MVP features: 3 realistic (see sample)
- Stretch features: 2
- Tech stack: Flask, pandas
Sample MVP Features
**Must have (MVP):**
- Load a CSV of H1-B sponsor companies from `data/`
- Filter results by company name (partial match, case-insensitive)
- Filter results by state
- Display a table of matching companies
Optional — get help from your agent:
Help me fill in my @project.spec.md for an H1-B Employer Data Hub Flask web app. Use these MVP features: filter by company name, filter by state, display a results table. Don't write any code.
Phase 2: Download the Data
Agent-assisted is fine here. No code — just downloading a file.
Objective
Get one fiscal-year CSV into your project's data/ folder.
Instructions
Hints
Typical columns:
Fiscal Year, Employer (Petitioner) Name, Tax ID, Industry (NAICS) Code,
Petitioner City, Petitioner State, Petitioner Zip Code, Initial Approval,
Initial Denial, Continuing Approval, Continuing Denial
Exact names may vary year to year. Write yours down.
Optional — get help from your agent:
Skip — downloading a file is not a coding task.
Phase 3: Scaffold Flask + Display First 10 Rows
Agent-assisted is fine here. Minimal Flask setup + a table render. Same for any data-display Flask app.
Objective
Smallest possible first slice: load the CSV, show the first 10 rows in an HTML table.
Instructions
Hints
Minimal app.py:
from flask import Flask, render_template
import pandas as pd
app = Flask(__name__)
@app.route("/")
def home():
df = pd.read_csv("data/h1b.csv")
top = df.head(10).to_dict(orient="records")
columns = df.columns.tolist()
return render_template("home.html", rows=top, columns=columns)
templates/home.html:
<!DOCTYPE html>
<html>
<head><title>H1-B Data Hub</title></head>
<body>
<h1>H1-B Sponsor Companies</h1>
<table border="1">
<thead>
<tr>{% for c in columns %}<th>{{ c }}</th>{% endfor %}</tr>
</thead>
<tbody>
{% for row in rows %}
<tr>{% for c in columns %}<td>{{ row[c] }}</td>{% endfor %}</tr>
{% endfor %}
</tbody>
</table>
</body>
</html>
Visit http://127.0.0.1:5000 — you should see 10 rows. Don't worry about style yet.
Optional — get help from your agent:
Scaffold a minimal Flask app that loads data/h1b.csv and displays the first 10 rows in a table. Show me app.py and templates/home.html. Walk me through how .head(10).to_dict(orient="records") shapes the data so I know what my template is receiving.
Phase 4: Build search.py + Company-Name Filter
Handwrite this yourself. The filter logic is your product. You're designing how someone searches 12k companies.
Objective
Create search.py with a filter_by_company(df, query) function. Wire it into app.py.
Instructions
Hints
search.py:
# search.py
COMPANY_COL = "Employer (Petitioner) Name" # ← match YOUR CSV's column name
def filter_by_company(df, query):
if not query:
return df
return df[df[COMPANY_COL].str.contains(query, case=False, na=False)]
Why na=False? Some rows have missing values. Without it, .str.contains would error on those.
In app.py:
from flask import request
from search import filter_by_company
@app.route("/")
def home():
df = pd.read_csv("data/h1b.csv")
query = request.args.get("q", "").strip()
df = filter_by_company(df, query)
top = df.head(50).to_dict(orient="records")
columns = df.columns.tolist()
return render_template("home.html",
rows=top, columns=columns, query=query)
In home.html:
<form method="get">
<input name="q" placeholder="Search company name" value="{{ query or '' }}">
<button type="submit">Search</button>
</form>
Optional — get help from your agent:
Walk me through pandas .str.contains — what does case=False do, and why do I need na=False? Don't change my code — I want to understand before I commit.
Phase 5: Add State Filter (in search.py)
Handwrite this yourself. Second filter function. Simple pattern, but it's yours.
Objective
Add filter_by_state(df, state) in search.py. Both filters should work together.
Instructions
Hints
search.py:
STATE_COL = "Petitioner State" # ← match YOUR CSV
def filter_by_state(df, state):
if not state:
return df
return df[df[STATE_COL] == state]
def unique_states(df):
return sorted(df[STATE_COL].dropna().unique().tolist())
app.py:
from search import filter_by_company, filter_by_state, unique_states
@app.route("/")
def home():
df = pd.read_csv("data/h1b.csv")
states = unique_states(df)
query = request.args.get("q", "").strip()
state = request.args.get("state", "").strip()
df = filter_by_company(df, query)
df = filter_by_state(df, state)
top = df.head(50).to_dict(orient="records")
return render_template("home.html",
rows=top,
columns=df.columns.tolist(),
query=query, state=state, states=states)
home.html (state dropdown):
<select name="state">
<option value="">Any state</option>
{% for s in states %}
<option value="{{ s }}" {% if s == state %}selected{% endif %}>{{ s }}</option>
{% endfor %}
</select>
Notice the pattern: each filter function takes a df and returns a df. You can chain them. This is why they live in search.py and not inline in the route — they're reusable.
Optional — get help from your agent:
Walk me through why my filter functions take a df and return a df, instead of mutating the df. What does that buy me? Don't change my code.
Phase 6: Polish + Journal Catch-Up
Agent-assisted is fine here. Bootstrap styling + a README.
Objective
Make it look like a real app. Catch up the journal.
Instructions
Hints
Bootstrap CDN (in <head>):
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">
Table class:
<table class="table table-striped">
README template:
# H1-B Employer Data Hub
A Flask web app to explore H1-B visa sponsor data from USCIS.
## Run locally
uv run flask --app app run --debug
Open http://127.0.0.1:5000
Optional — get help from your agent:
Style my table with Bootstrap 5 and make my filter form look like a real search bar. Keep the HTML simple enough for me to edit.