12 Demo: Meta-Analysis of Cell Type Mapping Tools

In this demo, we’ll determine which cell type annotation tools are most commonly used in single-cell RNA sequencing studies. This is a practical literature mining exercise with real scientific relevance.

12.1 The Question

Single-cell RNA-seq requires mapping cells to known cell types. Several tools exist:

CellTypist - Automated cell type annotation
SingleR - Reference-based annotation
scType - Marker gene-based annotation
Azimuth - Reference mapping via Seurat
scArches - Transfer learning approach

Which one do researchers actually use most?

We’ll answer this by mining the literature.

12.2 Project Setup

mkdir ~/Projects/celltype-meta-analysis
cd ~/Projects/celltype-meta-analysis
claude

Start with setup:

> Set up a Python project for analyzing cell type annotation tool usage in literature.
> I'll need requests, pandas, matplotlib, and biopython for PubMed access.

12.3 Phase 1: Finding the Original Papers

First, let’s get citation information for each tool’s original publication.

> Help me find the original publication DOI for each of these cell type annotation tools:
> 1. CellTypist
> 2. SingleR
> 3. scType
> 4. Azimuth
> 5. scArches

Claude will search (possibly using web search or asking you to provide them) and create a reference file:

> Create a JSON file with this information: tool name, original paper title, DOI, publication year

Example output (tools_metadata.json):

{
  "tools": [
    {
      "name": "CellTypist",
      "title": "Cross-tissue immune cell analysis reveals tissue-specific features in humans",
      "doi": "10.1126/science.abl5197",
      "year": 2022
    },
    ...
  ]
}

12.4 Phase 2: Searching for Citations with MCPs

Now we need to find papers that cite these tools. This is where MCPs (Model Context Protocol) shine.

Why MCPs?

You could write a Python script using Biopython’s Entrez module or the Semantic Scholar API to do this—and that would work fine. But this demo is a great opportunity to introduce MCPs: extensions that give Claude Code direct access to external services.

With an MCP, you simply ask Claude in natural language, and it handles the API calls, pagination, rate limiting, and data formatting for you. It’s the difference between writing code about your research question versus just asking your research question.

12.4.1 Using the PubMed MCP

With a literature search MCP configured, searching becomes conversational:

> Search PubMed for papers mentioning CellTypist in the context of single-cell RNA-seq analysis. Return the title, authors, year, and PMID for each.

Claude uses the MCP to query PubMed and returns structured results directly.

> Now do the same for SingleR, scType, Azimuth, and scArches

> Combine all results into a single table and save as citations_raw.csv

Setting Up MCPs

MCPs need to be configured before use. See Appendix B: MCP Setup for instructions on installing and configuring the PubMed MCP and others.

12.4.2 What the MCP Does Behind the Scenes

When you ask Claude to search PubMed via an MCP, it’s essentially doing what you’d otherwise script manually:

# This is what you'd write without an MCP
from Bio import Entrez
Entrez.email = "your.email@example.com"

def search_pubmed(tool_name, max_results=100):
    query = f'"{tool_name}"[All Fields] AND single-cell[All Fields]'
    handle = Entrez.esearch(db="pubmed", term=query, retmax=max_results)
    record = Entrez.read(handle)
    return record["IdList"]

The MCP wraps this complexity, handles authentication, manages rate limits, and presents results in a conversational way. You focus on what you want, not how to get it.

12.5 Phase 3: Analyzing Actual Usage

Finding citations isn’t enough—a paper might cite a tool to critique it or mention alternatives. We need to check actual usage.

> Create a function that, given a PubMed ID, fetches the full text (if available via PMC)
> and checks whether the tool is mentioned in the Methods section

This involves: 1. Checking if the paper is in PubMed Central (open access) 2. Fetching the full text XML 3. Parsing the Methods section 4. Looking for tool usage patterns (not just mentions)

def check_methods_usage(pmc_id, tool_name):
    """Check if a tool is mentioned in the Methods section."""
    # Fetch PMC XML
    handle = Entrez.efetch(db="pmc", id=pmc_id, rettype="xml")
    # Parse and find methods section
    # Search for phrases like "used CellTypist", "annotated with CellTypist", etc.
    usage_patterns = [
        f"used {tool_name}",
        f"using {tool_name}",
        f"annotated with {tool_name}",
        f"{tool_name} was used",
        f"ran {tool_name}",
    ]
    # Return True if any pattern found in methods

12.6 Phase 4: Tallying Results

> Create a script that:
> 1. Loads the tool metadata
> 2. For each tool, searches for citing papers
> 3. For papers with open full text, checks actual Methods usage
> 4. Creates a tally of confirmed usage per tool
> 5. Saves results to a CSV

The output might look like:

Tool	Total Citations	PMC Available	Confirmed Methods Usage
CellTypist	245	128	89
SingleR	892	456	312
scType	156	78	45
Azimuth	523	267	198
scArches	178	89	52

12.7 Phase 5: Visualization

> Create a publication-quality bar chart showing:
> 1. Total citations per tool
> 2. Confirmed usage per tool
> Use a colorblind-safe palette and save as both PNG and PDF

Claude will generate matplotlib code:

import matplotlib.pyplot as plt
import seaborn as sns

# Set style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("colorblind")

fig, ax = plt.subplots(figsize=(10, 6))

# Create grouped bar chart
x = range(len(tools))
width = 0.35

bars1 = ax.bar([i - width/2 for i in x], citations, width, label='Total Citations')
bars2 = ax.bar([i + width/2 for i in x], confirmed, width, label='Confirmed Usage')

ax.set_ylabel('Number of Papers')
ax.set_xlabel('Cell Type Annotation Tool')
ax.set_title('Usage of Cell Type Annotation Tools in Literature')
ax.set_xticks(x)
ax.set_xticklabels(tool_names)
ax.legend()

plt.tight_layout()
plt.savefig('results/tool_usage_comparison.png', dpi=300)
plt.savefig('results/tool_usage_comparison.pdf')

12.8 Phase 6: Going Deeper (Optional)

If time permits, we can extend the analysis:

> Add a timeline showing adoption over time (citations per year for each tool)

> Identify which journals publish the most single-cell studies and whether tool preference varies by journal

> Look for patterns: do certain tools tend to be used together?

12.9 Handling Real-World Messiness

This analysis will encounter issues. Claude Code helps handle them:

12.9.1 Rate Limiting

> I'm getting rate limited by the API. Add appropriate delays and retry logic.

12.9.2 Missing Data

> Some papers don't have full text available. Create a separate category for "unable to verify"

12.9.3 Ambiguous Mentions

> Some papers mention multiple tools. How should we count those?

12.9.4 False Positives

> "SingleR" sometimes matches unrelated uses of "single" + "R". Add filtering for false positives.

12.10 Complete Working Example

Here’s how a session might flow:

> Let me start by understanding what we have. Show me the tools_metadata.json

> Good. Now let's test the Semantic Scholar approach with just CellTypist first

> That returned 150 citing papers. Let's check how many have PMC full text

> Only 67 have full text. Let's sample 10 and manually verify our Methods detection works

> The detection missed some papers that use phrases like "cell type labels were assigned using..."
> Add that pattern

> Better. Now run the full analysis for all 5 tools. This might take a while due to API limits.

> Great, save the results and create the visualization

12.11 What You’ve Learned

By completing this demo, you’ve practiced:

Using MCPs: Leveraging external services through natural language instead of writing API code
Literature mining: Searching and filtering scientific publications programmatically
Text mining: Parsing and searching document text for specific patterns
Data handling: Managing incomplete/messy real-world data
Iterative development: Building up complexity step by step
Visualization: Creating publication-ready figures

12.12 Final Cleanup

> Add a README explaining what this project does and how to run it

> Create a requirements.txt with pinned versions of all packages we used

> Commit everything with a descriptive message

12.13 Next Steps

Try another demo or continue to Part 4: Paper Writing Track.