YouTube Data for Academic & Market Researchers

The problem

YouTube research shouldn't mean web scraping

Manual collection

Web scrapers, rate limits, inconsistent HTML parsing. Weeks of engineering work before any actual analysis begins. And the scraper breaks every time YouTube changes their page structure.

YouTube Data API alone

Raw metadata without scoring or content quality signals. You get view counts and upload dates, but nothing about why certain titles or formats outperform others.

With BrightBean

Structured, scored data via API. Content gaps with demand scores, title patterns with effectiveness ratings, channel benchmarks with tier classifications. All typed JSON, ready for your pipeline.

See it in action

How a media researcher studies title patterns across 3 niches

A researcher wants to compare which title patterns drive engagement in different YouTube categories. Here's the full workflow, from data collection to export.

1

Collect title scores across niches

You have a list of titles from three niches: tech reviews, cooking, and fitness. Instead of manually coding each title for patterns, you score them all against BrightBean's niche-calibrated models in a single Python loop.

Each title gets a 0-100 score plus a list of detected patterns. The scoring adjusts per niche, so a title pattern that works in fitness may score differently in cooking. That difference is exactly what you're measuring.

collect_scores.py

import requests

titles_by_niche = {
    "tech": [
        "5 Laptops Under $500 That Don't Suck",
        "I Tested Every USB-C Hub So You Don't Have To",
    ],
    "cooking": [
        "My Grandmother's Pasta Recipe (Finally Revealed)",
        "I Ate Only $1 Meals for a Week",
    ],
    "fitness": [
        "The Workout Nobody Talks About",
        "I Did 100 Pushups a Day for 30 Days",
    ],
}

results = []
for niche, titles in titles_by_niche.items():
    for title in titles:
        resp = requests.post(
            "https://api.brightbean.xyz/v1/score/title",
            headers={"Authorization": "Bearer YOUR_API_KEY"},
            json={"title": title, "niche": niche}
        )
        data = resp.json()
        data["niche"] = niche
        results.append(data)

# Sample output per title:
# { "title_score": 74, "patterns": ["number_list",
#   "under_price"], "niche": "tech" }

2

Compare patterns across niches

Tech reviews

Avg. score: 54

Titles lean on number_list and under_price patterns. Specificity drives performance here: exact dollar amounts and product counts outscore vague alternatives.

Top pattern: number_list (68% of top-performers)

Cooking

Avg. score: 48

Personal connection matters most. personal_pronoun and experience dominate. "My" and "I" in titles signal authenticity that cooking audiences reward.

Top pattern: personal_pronoun (72% of top-performers)

Fitness

Avg. score: 61

Highest average of the three niches. Titles combining curiosity_gap with controversy consistently outperform straightforward how-to formats.

Top pattern: curiosity_gap (64% of top-performers)

3

Map content gaps by category

Beyond title patterns, you can map which topics are underserved in each niche. The /content-gaps endpoint returns topics where audience demand is high but few quality videos exist.

Each result includes a demand score, a competition count (videos with 10k+ views), and an opportunity rating. For cross-niche research, you can compare gap density: some niches have many unfilled topics while others are saturated.

This data works for media studies (how content supply matches audience demand) and market research (where brands should invest in video).

/content-gaps response

{
  "content_gaps": [
    {
      "topic": "AI ethics in higher education",
      "demand_score": 71,
      "competing_videos_above_10k_views": 1,
      "opportunity_rating": "very_high"
    },
    {
      "topic": "peer review process explained",
      "demand_score": 63,
      "competing_videos_above_10k_views": 4,
      "opportunity_rating": "medium"
    },
    {
      "topic": "open access publishing for grad students",
      "demand_score": 58,
      "competing_videos_above_10k_views": 2,
      "opportunity_rating": "high"
    }
  ],
  "niche": "education",
  "videos_analyzed": 28400
}

4

Export to pandas for analysis

Every BrightBean response is typed JSON, so converting to a DataFrame takes one line. From there, run statistical tests, build visualizations, or export to CSV for collaborators who prefer spreadsheets.

Field names and types are consistent across all endpoints. You can join title scores with content gap data or channel benchmarks without any data cleaning.

export_data.py

import pandas as pd

# results = list of JSON responses from step 1
df = pd.json_normalize(results)

# Quick summary by niche
print(df.groupby("niche")["title_score"].describe())

# Export for collaborators
df.to_csv("title_patterns_by_niche.csv", index=False)

# Output:
#          count  mean   std   min   max
# niche
# cooking   42   48.2  12.1  22.0  79.0
# fitness   38   61.4  14.3  28.0  91.0
# tech      45   54.1  11.8  31.0  85.0

Getting started

Two ways to use BrightBean

Pick whichever fits your workflow. Both use the same data, same scoring, same intelligence.

Option A: Python quickstart

Call the API from your research scripts

Install requests, grab your API key, and start collecting data. The response is JSON you can feed directly into pandas, scipy, or any analysis library you already use.

# pip install requests
import requests

resp = requests.post(
    "https://api.brightbean.xyz/v1/score/title",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "title": "I Tested 5 Budget Cameras So You Don't Have To",
        "niche": "tech"
    }
)
print(resp.json())

Response in under 500ms. JSON you can pipe into any tool.

See all endpoints →

Option B: Exploratory research

Use BrightBean through Claude Desktop

For hypothesis generation before formal data collection, connect BrightBean to Claude Desktop via MCP. Ask questions in plain English and get structured answers without writing a single line of code.

{
  "mcpServers": {
    "brightbean": {
      "url": "https://api.brightbean.xyz/mcp",
      "transport": "sse"
    }
  }
}

Add this to claude_desktop_config.json, then ask: "Compare title patterns in cooking vs tech review niches."

Full MCP setup guide →

Why researchers choose BrightBean

Built for rigorous analysis

Structured JSON

Every response is typed and documented. Fields are consistent across endpoints, ready for your pipeline without cleanup.

Niche-calibrated models

Scoring is calibrated per category. A 70 in cooking reflects cooking-specific patterns, not a generic average across all of YouTube.

Reproducible

Same inputs, same outputs. Deterministic scoring means your results hold up to replication and peer review. Good for longitudinal studies.

Related resources

Competitive intelligence

Benchmark channels against competitors in any niche

Solo creators

Score title ideas and find content gaps for individual channels

YouTube intelligence glossary

100+ terms covering content strategy, SEO, analytics, and AI agents

Common questions from researchers

Can I use BrightBean data in academic publications? +

Yes. BrightBean provides structured data via API that you can cite as a data source. We recommend noting the API version, endpoint used, and date of data collection in your methodology section for full reproducibility. The deterministic scoring means other researchers can replicate your results with the same inputs.

How many niches does BrightBean cover? +

All YouTube niches. Scoring models are trained across hundreds of content categories with niche-specific calibration. This includes broad categories like "education" and specific sub-niches like "organic chemistry tutorials." If a niche exists on YouTube, BrightBean can score for it.

Can I export data to CSV or pandas DataFrame? +

BrightBean returns JSON. Converting to a DataFrame is one line: pd.json_normalize(results). From there, df.to_csv() gives you a spreadsheet for SPSS, Stata, or Excel. In R, jsonlite::fromJSON() handles the same conversion.

What's the rate limit? +

Depends on your plan. Free: 500 calls. Standard: 100,000 calls/month. Growth: 500,000 calls/month. For a study analyzing 50 channels, you need about 100 API calls. The free tier covers most small-scale studies. Each call returns in under 500ms, so collecting a dataset of 10,000 title scores takes about 90 minutes with serial requests.

Is there a bulk endpoint? +

Currently each API call processes one item. For bulk analysis, run requests in parallel using Python's concurrent.futures or asyncio. With 10 parallel workers, you can process 1,000 items in about 50 seconds. Batch endpoints are on our roadmap. Join the waitlist for early access.

How is the title score calculated? +

Each title is scored 0-100 across 16 pattern categories grouped into four areas: structure (number lists, brackets, length), engagement (curiosity gaps, questions, power words), authenticity (personal pronouns, experience markers), and specificity (price anchors, timeframes, named brands). Scores are weighted by how strongly each pattern correlates with performance in the given niche.

Can I compare patterns across niches? +

Yes. Score the same title across different niches, or collect scores for niche-specific titles and compare the pattern distributions. Because scoring is calibrated per niche, cross-niche comparisons reveal which patterns are universal and which are category-specific. The walkthrough above shows exactly this workflow.

Structured YouTube data for your research.

Free tier: 500 API calls, no credit card required.

Structured YouTube data for your research pipeline