9 Demo: Citation Recovery with PubMed MCP
In this demo, we’ll take an introduction stripped of citations and systematically find and insert appropriate references using Claude Code with the PubMed MCP.
This demonstration is more accessible than the data analysis track—you don’t need programming knowledge or domain expertise. It showcases how MCPs let you interact with scientific databases through natural conversation, making it an ideal introduction to Claude Code’s extended capabilities.
9.1 Setting Up the PubMed MCP
Before we begin, we need to configure the PubMed MCP. This is a one-time setup.
9.1.1 Installation
The PubMed MCP runs via npx (comes with Node.js). Add it to your Claude Code configuration:
1. Open your Claude config file:
- Mac/Linux:
~/.claude/config.json - Windows:
%USERPROFILE%\.claude\config.json
2. Add the MCP configuration:
{
"mcpServers": {
"pubmed": {
"command": "npx",
"args": ["-y", "@anthropic-ai/mcp-pubmed"]
}
}
}3. Restart Claude Code for the changes to take effect.
For heavy usage, get a free API key from NCBI and add it:
{
"mcpServers": {
"pubmed": {
"command": "npx",
"args": ["-y", "@anthropic-ai/mcp-pubmed"],
"env": {
"NCBI_API_KEY": "your-api-key-here"
}
}
}
}9.2 The Task
You have a draft introduction with claims that need citations. Instead of manually searching PubMed for each one, we’ll use Claude Code to:
- Analyze the text and identify citation needs
- Expand each claim into searchable sub-claims
- Search PubMed for the most relevant and highly-cited papers
- Select 2-3 citations per claim for robust support
- Generate a properly formatted reference list
9.3 Project Setup
mkdir ~/Projects/paper-citations
cd ~/Projects/paper-citations
claude9.4 The Sample Introduction
Let’s start with a draft introduction that needs citations:
> Create a file called draft_introduction.md with the following content:
# Introduction
Single-cell RNA sequencing has revolutionized our understanding of cellular
heterogeneity [CITATION NEEDED]. Unlike bulk RNA-seq, which averages gene
expression across thousands of cells, single-cell approaches reveal the
diversity of cell types and states within a tissue [CITATION NEEDED].
The development of droplet-based methods dramatically increased the throughput
of single-cell experiments, enabling the profiling of millions of cells in a
single experiment [CITATION NEEDED]. This scalability has enabled ambitious
projects to create comprehensive cell atlases of entire organisms [CITATION
NEEDED].
A key challenge in single-cell analysis is cell type annotation—assigning
biological identities to clusters of cells [CITATION NEEDED]. While manual
annotation based on marker genes remains common, computational methods have
emerged to automate this process [CITATION NEEDED]. These methods typically
rely on reference datasets of annotated cells or curated marker gene lists
[CITATION NEEDED].
Recent advances in large language models and artificial intelligence are
beginning to impact biological research [CITATION NEEDED]. However, their
application to single-cell genomics remains in early stages [CITATION NEEDED].9.5 Phase 1: Analyzing and Expanding Claims
The key to good citations is understanding what each claim actually asserts. We’ll have Claude create a detailed analysis in a separate working file.
> Read draft_introduction.md and create a new file called citation_analysis.md
>
> For each [CITATION NEEDED] marker:
> 1. Extract the exact claim being made
> 2. Break it down into 2-3 specific sub-claims that could each be cited
> 3. Identify what type of source would best support it (seminal paper, review, methods paper, recent application)
> 4. Suggest PubMed search terms for each sub-claim
>
> Format as a structured markdown document with clear sections for each citation point.
Claude creates citation_analysis.md:
# Citation Analysis
## Citation Point 1: "Single-cell RNA sequencing has revolutionized our understanding of cellular heterogeneity"
**Main claim**: scRNA-seq is transformative for understanding heterogeneity
**Sub-claims to support**:
1. scRNA-seq enables measurement of gene expression in individual cells (methodological advance)
2. This revealed previously unknown cellular diversity (biological finding)
3. The field has grown rapidly and changed how we study tissues (impact/review)
**Source types needed**:
- Seminal methods paper (Tang et al. 2009 or similar early work)
- Influential review documenting the revolution
- High-impact study demonstrating heterogeneity discovery
**PubMed searches**:
- `"single-cell RNA-seq"[Title] AND review[pt] AND heterogeneity`
- `"single cell transcriptomics" history development`
- `"single-cell RNA sequencing" cellular heterogeneity discovery`
---
## Citation Point 2: "single-cell approaches reveal the diversity of cell types and states"
**Main claim**: scRNA-seq uncovers cell type and state diversity
**Sub-claims to support**:
1. scRNA-seq can identify distinct cell types within tissues
2. It can also reveal cell states (activation, differentiation, etc.)
3. This granularity was not possible with bulk methods
**Source types needed**:
- Study demonstrating novel cell type discovery
- Paper on cell state dynamics
- Comparison of single-cell vs bulk approaches
**PubMed searches**:
- `"single-cell" "cell type" discovery identification`
- `"single-cell RNA-seq" "cell states" dynamics`
- `single-cell versus bulk RNA-seq resolution`
---
[continues for all 9 citation points...]9.6 Phase 2: Systematic Literature Search
Now we use the PubMed MCP to search for papers. The goal is to find 2-3 highly-cited, relevant papers for each citation point.
> Using the PubMed MCP, search for papers to support Citation Point 1.
>
> For each of the three sub-claims, run a PubMed search.
> For each search, return the top 5 results sorted by relevance, including:
> - Title
> - Authors (first author et al.)
> - Journal
> - Year
> - PMID
> - Approximate citation count if available
>
> Then recommend which 2-3 papers best support the overall claim, explaining why.
Claude searches via the MCP and responds:
### Sub-claim 1: scRNA-seq enables single-cell measurement
**Search**: "single-cell RNA-seq"[Title] AND methods[Title]
Results:
1. "Single-cell RNA-seq profiling of human preimplantation embryos" - Yan et al., 2013, Nature Structural & Molecular Biology (PMID: 23934149) - ~2,800 citations
2. "mRNA-Seq whole-transcriptome analysis of a single cell" - Tang et al., 2009, Nature Methods (PMID: 19349980) - ~3,500 citations
3. ...
### Sub-claim 2: Revealed unknown diversity
**Search**: "single-cell RNA sequencing" cellular heterogeneity discovery
Results:
1. "Single-cell transcriptomics reveals bimodality in expression and splicing" - Shalek et al., 2013, Nature (PMID: 23685454) - ~2,100 citations
2. ...
### Recommendation for Citation Point 1:
I recommend citing:
1. **Tang et al., 2009** - The seminal paper establishing scRNA-seq methodology
2. **Shalek et al., 2013** - Demonstrated unexpected heterogeneity in immune cells
3. **Kolodziejczyk et al., 2015** - Comprehensive review titled "The technology and biology of single-cell RNA sequencing"
These three papers together establish the method, show a key biological discovery, and provide a review documenting the field's impact.
> Continue this process for Citation Points 2-9.
> Add all findings to citation_analysis.md under each citation point.
9.7 Phase 3: Building the Reference List
Once we have candidate papers for all citation points, we compile them:
> Based on the analysis in citation_analysis.md, create a file called references.bib
> containing BibTeX entries for all selected papers (approximately 20-25 references).
>
> For each entry, fetch the complete citation information from PubMed including:
> - All authors
> - Full title
> - Journal name
> - Volume, issue, pages
> - DOI if available
Claude generates a complete references.bib:
@article{tang2009mrna,
title={mRNA-Seq whole-transcriptome analysis of a single cell},
author={Tang, Fuchou and Barbacioru, Catalin and Wang, Yangzhou and others},
journal={Nature Methods},
volume={6},
number={5},
pages={377--382},
year={2009},
doi={10.1038/nmeth.1315},
pmid={19349980}
}
@article{shalek2013single,
title={Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells},
author={Shalek, Alex K and Satija, Rahul and Adiconis, Xian and others},
journal={Nature},
volume={498},
number={7453},
pages={236--240},
year={2013},
doi={10.1038/nature12172},
pmid={23685454}
}
% ... additional entries9.8 Phase 4: Inserting Citations
Now we update the draft with proper citations:
> Update draft_introduction.md to replace each [CITATION NEEDED] with the
> appropriate citations from our analysis.
>
> Use the format (Author et al., Year) for in-text citations.
> Where we have multiple citations for one claim, combine them appropriately.
>
> Save the updated version as introduction_with_citations.md
The result:
# Introduction
Single-cell RNA sequencing has revolutionized our understanding of cellular
heterogeneity (Tang et al., 2009; Shalek et al., 2013; Kolodziejczyk et al.,
2015). Unlike bulk RNA-seq, which averages gene expression across thousands
of cells, single-cell approaches reveal the diversity of cell types and
states within a tissue (Trapnell, 2015; Tanay & Regev, 2017).
The development of droplet-based methods dramatically increased the throughput
of single-cell experiments, enabling the profiling of millions of cells in a
single experiment (Macosko et al., 2015; Klein et al., 2015; Zheng et al.,
2017). This scalability has enabled ambitious projects to create comprehensive
cell atlases of entire organisms (Regev et al., 2017; Tabula Muris Consortium,
2018; HuBMAP Consortium, 2019).
...9.9 Phase 5: Verification
Finally, let’s verify the citations are appropriate:
> Review introduction_with_citations.md and verify:
>
> 1. Each citation actually supports the claim it's attached to
> 2. The papers are from reputable journals
> 3. We have a good mix of foundational and recent papers
> 4. No claim is under-supported (at least 2 citations each)
>
> Flag any issues or gaps.
Claude reviews and might identify:
Review complete. Issues found:
1. Citation Point 8 (LLMs in biology): The papers I found are from 2023.
Given how fast this field moves, should I search for 2024-2025 papers?
2. Citation Point 7 (reference datasets): We only have 2 citations here.
Recommend adding: Aran et al., 2019 (SingleR paper) for a third.
All other citation points have appropriate support.
> Yes, search for more recent LLM papers and add the SingleR citation.
> Update all files accordingly.
9.10 What You’ve Learned
By completing this demo, you’ve practiced:
- Setting up an MCP for extending Claude Code’s capabilities
- Systematic claim analysis: Breaking broad statements into citable sub-claims
- Using PubMed via MCP: Natural language searches of scientific literature
- Citation selection: Choosing highly-cited, relevant papers
- Reference management: Generating BibTeX from PubMed data
9.11 Next Steps
Continue to Extended Paper Workflow for additional paper writing support, or move on to Part 4: Data Analysis Track for more advanced demonstrations.