Structured YouTube data for your research pipeline
Turn YouTube's unstructured content into scored, typed JSON ready for analysis. Title patterns with effectiveness ratings, content gaps with demand scores, channel benchmarks with tier classifications. Pipe it straight into pandas or R.
The problem
YouTube research shouldn't mean web scraping
Manual collection
Web scrapers, rate limits, inconsistent HTML parsing. Weeks of engineering work before any actual analysis begins. And the scraper breaks every time YouTube changes their page structure.
YouTube Data API alone
Raw metadata without scoring or content quality signals. You get view counts and upload dates, but nothing about why certain titles or formats outperform others.
With BrightBean
Structured, scored data via API. Content gaps with demand scores, title patterns with effectiveness ratings, channel benchmarks with tier classifications. All typed JSON, ready for your pipeline.
See it in action
How a media researcher studies title patterns across 3 niches
A researcher wants to compare which title patterns drive engagement in different YouTube categories. Here's the full workflow, from data collection to export.
Collect title scores across niches
You have a list of titles from three niches: tech reviews, cooking, and fitness. Instead of manually coding each title for patterns, you score them all against BrightBean's niche-calibrated models in a single Python loop.
Each title gets a 0-100 score plus a list of detected patterns. The scoring adjusts per niche, so a title pattern that works in fitness may score differently in cooking. That difference is exactly what you're measuring.
import requests
titles_by_niche = {
"tech": [
"5 Laptops Under $500 That Don't Suck",
"I Tested Every USB-C Hub So You Don't Have To",
],
"cooking": [
"My Grandmother's Pasta Recipe (Finally Revealed)",
"I Ate Only $1 Meals for a Week",
],
"fitness": [
"The Workout Nobody Talks About",
"I Did 100 Pushups a Day for 30 Days",
],
}
results = []
for niche, titles in titles_by_niche.items():
for title in titles:
resp = requests.post(
"https://api.brightbean.xyz/v1/score/title",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={"title": title, "niche": niche}
)
data = resp.json()
data["niche"] = niche
results.append(data)
# Sample output per title:
# { "title_score": 74, "patterns": ["number_list",
# "under_price"], "niche": "tech" }
Compare patterns across niches
Tech reviews
Avg. score: 54
Titles lean on number_list and under_price patterns. Specificity drives performance here: exact dollar amounts and product counts outscore vague alternatives.
Top pattern: number_list (68% of top-performers)
Cooking
Avg. score: 48
Personal connection matters most. personal_pronoun and experience dominate. "My" and "I" in titles signal authenticity that cooking audiences reward.
Top pattern: personal_pronoun (72% of top-performers)
Fitness
Avg. score: 61
Highest average of the three niches. Titles combining curiosity_gap with controversy consistently outperform straightforward how-to formats.
Top pattern: curiosity_gap (64% of top-performers)
Map content gaps by category
Beyond title patterns, you can map which topics are underserved in each niche. The /content-gaps endpoint returns topics where audience demand is high but few quality videos exist.
Each result includes a demand score, a competition count (videos with 10k+ views), and an opportunity rating. For cross-niche research, you can compare gap density: some niches have many unfilled topics while others are saturated.
This data works for media studies (how content supply matches audience demand) and market research (where brands should invest in video).
{
"content_gaps": [
{
"topic": "AI ethics in higher education",
"demand_score": 71,
"competing_videos_above_10k_views": 1,
"opportunity_rating": "very_high"
},
{
"topic": "peer review process explained",
"demand_score": 63,
"competing_videos_above_10k_views": 4,
"opportunity_rating": "medium"
},
{
"topic": "open access publishing for grad students",
"demand_score": 58,
"competing_videos_above_10k_views": 2,
"opportunity_rating": "high"
}
],
"niche": "education",
"videos_analyzed": 28400
}
Export to pandas for analysis
Every BrightBean response is typed JSON, so converting to a DataFrame takes one line. From there, run statistical tests, build visualizations, or export to CSV for collaborators who prefer spreadsheets.
Field names and types are consistent across all endpoints. You can join title scores with content gap data or channel benchmarks without any data cleaning.
import pandas as pd
# results = list of JSON responses from step 1
df = pd.json_normalize(results)
# Quick summary by niche
print(df.groupby("niche")["title_score"].describe())
# Export for collaborators
df.to_csv("title_patterns_by_niche.csv", index=False)
# Output:
# count mean std min max
# niche
# cooking 42 48.2 12.1 22.0 79.0
# fitness 38 61.4 14.3 28.0 91.0
# tech 45 54.1 11.8 31.0 85.0
Getting started
Two ways to use BrightBean
Pick whichever fits your workflow. Both use the same data, same scoring, same intelligence.
Option A: Python quickstart
Call the API from your research scripts
Install requests, grab your API key, and start collecting data. The response is JSON you can feed directly into pandas, scipy, or any analysis library you already use.
# pip install requests
import requests
resp = requests.post(
"https://api.brightbean.xyz/v1/score/title",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"title": "I Tested 5 Budget Cameras So You Don't Have To",
"niche": "tech"
}
)
print(resp.json())
Response in under 500ms. JSON you can pipe into any tool.
See all endpoints →Option B: Exploratory research
Use BrightBean through Claude Desktop
For hypothesis generation before formal data collection, connect BrightBean to Claude Desktop via MCP. Ask questions in plain English and get structured answers without writing a single line of code.
{
"mcpServers": {
"brightbean": {
"url": "https://api.brightbean.xyz/mcp",
"transport": "sse"
}
}
}
Add this to claude_desktop_config.json, then ask: "Compare title patterns in cooking vs tech review niches."
Why researchers choose BrightBean
Built for rigorous analysis
Structured JSON
Every response is typed and documented. Fields are consistent across endpoints, ready for your pipeline without cleanup.
Niche-calibrated models
Scoring is calibrated per category. A 70 in cooking reflects cooking-specific patterns, not a generic average across all of YouTube.
Reproducible
Same inputs, same outputs. Deterministic scoring means your results hold up to replication and peer review. Good for longitudinal studies.
Common questions from researchers
Can I use BrightBean data in academic publications? +
Yes. BrightBean provides structured data via API that you can cite as a data source. We recommend noting the API version, endpoint used, and date of data collection in your methodology section for full reproducibility. The deterministic scoring means other researchers can replicate your results with the same inputs.
How many niches does BrightBean cover? +
All YouTube niches. Scoring models are trained across hundreds of content categories with niche-specific calibration. This includes broad categories like "education" and specific sub-niches like "organic chemistry tutorials." If a niche exists on YouTube, BrightBean can score for it.
Can I export data to CSV or pandas DataFrame? +
BrightBean returns JSON. Converting to a DataFrame is one line: pd.json_normalize(results). From there, df.to_csv() gives you a spreadsheet for SPSS, Stata, or Excel. In R, jsonlite::fromJSON() handles the same conversion.
What's the rate limit? +
Depends on your plan. Free: 500 calls. Standard: 100,000 calls/month. Growth: 500,000 calls/month. For a study analyzing 50 channels, you need about 100 API calls. The free tier covers most small-scale studies. Each call returns in under 500ms, so collecting a dataset of 10,000 title scores takes about 90 minutes with serial requests.
Is there a bulk endpoint? +
Currently each API call processes one item. For bulk analysis, run requests in parallel using Python's concurrent.futures or asyncio. With 10 parallel workers, you can process 1,000 items in about 50 seconds. Batch endpoints are on our roadmap. Join the waitlist for early access.
How is the title score calculated? +
Each title is scored 0-100 across 16 pattern categories grouped into four areas: structure (number lists, brackets, length), engagement (curiosity gaps, questions, power words), authenticity (personal pronouns, experience markers), and specificity (price anchors, timeframes, named brands). Scores are weighted by how strongly each pattern correlates with performance in the given niche.
Can I compare patterns across niches? +
Yes. Score the same title across different niches, or collect scores for niche-specific titles and compare the pattern distributions. Because scoring is calibrated per niche, cross-niche comparisons reveal which patterns are universal and which are category-specific. The walkthrough above shows exactly this workflow.
Structured YouTube data for your research.
Free tier: 500 API calls, no credit card required.