12 Demo: Meta-Analysis of Cell Type Mapping Tools
In this demo, we’ll determine which cell type annotation tools are most commonly used in single-cell RNA sequencing studies. This is a practical literature mining exercise with real scientific relevance.
12.1 The Question
Single-cell RNA-seq requires mapping cells to known cell types. Several tools exist:
- CellTypist - Automated cell type annotation
- SingleR - Reference-based annotation
- scType - Marker gene-based annotation
- Azimuth - Reference mapping via Seurat
- scArches - Transfer learning approach
Which one do researchers actually use most?
We’ll answer this by mining the literature.
12.2 Project Setup
mkdir ~/Projects/celltype-meta-analysis
cd ~/Projects/celltype-meta-analysis
claudeStart with setup:
> Set up a Python project for analyzing cell type annotation tool usage in literature.
> I'll need requests, pandas, matplotlib, and biopython for PubMed access.
12.3 Phase 1: Finding the Original Papers
First, let’s get citation information for each tool’s original publication.
> Help me find the original publication DOI for each of these cell type annotation tools:
> 1. CellTypist
> 2. SingleR
> 3. scType
> 4. Azimuth
> 5. scArches
Claude will search (possibly using web search or asking you to provide them) and create a reference file:
> Create a JSON file with this information: tool name, original paper title, DOI, publication year
Example output (tools_metadata.json):
{
"tools": [
{
"name": "CellTypist",
"title": "Cross-tissue immune cell analysis reveals tissue-specific features in humans",
"doi": "10.1126/science.abl5197",
"year": 2022
},
...
]
}12.4 Phase 2: Searching for Citations with MCPs
Now we need to find papers that cite these tools. This is where MCPs (Model Context Protocol) shine.
You could write a Python script using Biopython’s Entrez module or the Semantic Scholar API to do this—and that would work fine. But this demo is a great opportunity to introduce MCPs: extensions that give Claude Code direct access to external services.
With an MCP, you simply ask Claude in natural language, and it handles the API calls, pagination, rate limiting, and data formatting for you. It’s the difference between writing code about your research question versus just asking your research question.
12.4.1 Using the PubMed MCP
With a literature search MCP configured, searching becomes conversational:
> Search PubMed for papers mentioning CellTypist in the context of single-cell RNA-seq analysis. Return the title, authors, year, and PMID for each.
Claude uses the MCP to query PubMed and returns structured results directly.
> Now do the same for SingleR, scType, Azimuth, and scArches
> Combine all results into a single table and save as citations_raw.csv
MCPs need to be configured before use. See Appendix B: MCP Setup for instructions on installing and configuring the PubMed MCP and others.
12.4.2 What the MCP Does Behind the Scenes
When you ask Claude to search PubMed via an MCP, it’s essentially doing what you’d otherwise script manually:
# This is what you'd write without an MCP
from Bio import Entrez
Entrez.email = "your.email@example.com"
def search_pubmed(tool_name, max_results=100):
query = f'"{tool_name}"[All Fields] AND single-cell[All Fields]'
handle = Entrez.esearch(db="pubmed", term=query, retmax=max_results)
record = Entrez.read(handle)
return record["IdList"]The MCP wraps this complexity, handles authentication, manages rate limits, and presents results in a conversational way. You focus on what you want, not how to get it.
12.5 Phase 3: Analyzing Actual Usage
Finding citations isn’t enough—a paper might cite a tool to critique it or mention alternatives. We need to check actual usage.
> Create a function that, given a PubMed ID, fetches the full text (if available via PMC)
> and checks whether the tool is mentioned in the Methods section
This involves: 1. Checking if the paper is in PubMed Central (open access) 2. Fetching the full text XML 3. Parsing the Methods section 4. Looking for tool usage patterns (not just mentions)
def check_methods_usage(pmc_id, tool_name):
"""Check if a tool is mentioned in the Methods section."""
# Fetch PMC XML
handle = Entrez.efetch(db="pmc", id=pmc_id, rettype="xml")
# Parse and find methods section
# Search for phrases like "used CellTypist", "annotated with CellTypist", etc.
usage_patterns = [
f"used {tool_name}",
f"using {tool_name}",
f"annotated with {tool_name}",
f"{tool_name} was used",
f"ran {tool_name}",
]
# Return True if any pattern found in methods12.6 Phase 4: Tallying Results
> Create a script that:
> 1. Loads the tool metadata
> 2. For each tool, searches for citing papers
> 3. For papers with open full text, checks actual Methods usage
> 4. Creates a tally of confirmed usage per tool
> 5. Saves results to a CSV
The output might look like:
| Tool | Total Citations | PMC Available | Confirmed Methods Usage |
|---|---|---|---|
| CellTypist | 245 | 128 | 89 |
| SingleR | 892 | 456 | 312 |
| scType | 156 | 78 | 45 |
| Azimuth | 523 | 267 | 198 |
| scArches | 178 | 89 | 52 |
12.7 Phase 5: Visualization
> Create a publication-quality bar chart showing:
> 1. Total citations per tool
> 2. Confirmed usage per tool
> Use a colorblind-safe palette and save as both PNG and PDF
Claude will generate matplotlib code:
import matplotlib.pyplot as plt
import seaborn as sns
# Set style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("colorblind")
fig, ax = plt.subplots(figsize=(10, 6))
# Create grouped bar chart
x = range(len(tools))
width = 0.35
bars1 = ax.bar([i - width/2 for i in x], citations, width, label='Total Citations')
bars2 = ax.bar([i + width/2 for i in x], confirmed, width, label='Confirmed Usage')
ax.set_ylabel('Number of Papers')
ax.set_xlabel('Cell Type Annotation Tool')
ax.set_title('Usage of Cell Type Annotation Tools in Literature')
ax.set_xticks(x)
ax.set_xticklabels(tool_names)
ax.legend()
plt.tight_layout()
plt.savefig('results/tool_usage_comparison.png', dpi=300)
plt.savefig('results/tool_usage_comparison.pdf')12.8 Phase 6: Going Deeper (Optional)
If time permits, we can extend the analysis:
> Add a timeline showing adoption over time (citations per year for each tool)
> Identify which journals publish the most single-cell studies and whether tool preference varies by journal
> Look for patterns: do certain tools tend to be used together?
12.9 Handling Real-World Messiness
This analysis will encounter issues. Claude Code helps handle them:
12.9.1 Rate Limiting
> I'm getting rate limited by the API. Add appropriate delays and retry logic.
12.9.2 Missing Data
> Some papers don't have full text available. Create a separate category for "unable to verify"
12.9.3 Ambiguous Mentions
> Some papers mention multiple tools. How should we count those?
12.9.4 False Positives
> "SingleR" sometimes matches unrelated uses of "single" + "R". Add filtering for false positives.
12.10 Complete Working Example
Here’s how a session might flow:
> Let me start by understanding what we have. Show me the tools_metadata.json
> Good. Now let's test the Semantic Scholar approach with just CellTypist first
> That returned 150 citing papers. Let's check how many have PMC full text
> Only 67 have full text. Let's sample 10 and manually verify our Methods detection works
> The detection missed some papers that use phrases like "cell type labels were assigned using..."
> Add that pattern
> Better. Now run the full analysis for all 5 tools. This might take a while due to API limits.
> Great, save the results and create the visualization
12.11 What You’ve Learned
By completing this demo, you’ve practiced:
- Using MCPs: Leveraging external services through natural language instead of writing API code
- Literature mining: Searching and filtering scientific publications programmatically
- Text mining: Parsing and searching document text for specific patterns
- Data handling: Managing incomplete/messy real-world data
- Iterative development: Building up complexity step by step
- Visualization: Creating publication-ready figures
12.12 Final Cleanup
> Add a README explaining what this project does and how to run it
> Create a requirements.txt with pinned versions of all packages we used
> Commit everything with a descriptive message
12.13 Next Steps
Try another demo or continue to Part 4: Paper Writing Track.