14 Demo: Reverse-Engineering TRISCO Oligo Design
In this demo, we’ll understand and reproduce the oligo design logic from the TRISCO method (Kanatani et al.). This demonstrates using Claude Code to decode experimental protocols and reverse-engineer designs.
14.1 Background
TRISCO (Triple-stranded RNA-guided Intron-based Splicing Control) is a method for controlling gene expression using designed oligonucleotides. Understanding how the oligos are designed helps us:
- Apply the method to new targets
- Troubleshoot when things don’t work
- Potentially improve the design
14.2 The Challenge
Papers often describe what was done but not why specific sequences were chosen. We’ll reverse-engineer the design rules.
14.3 Project Setup
mkdir ~/Projects/trisco-analysis
cd ~/Projects/trisco-analysis
claude> Set up a project for analyzing oligonucleotide designs. I'll need Biopython for sequence handling and pandas for data organization.
14.4 Phase 1: Gathering Information
First, let’s collect the relevant information from the paper:
> I need to understand the TRISCO method from Kanatani et al.
> Can you help me find:
> 1. The paper details and DOI
> 2. The general principle of how it works
> 3. Where the oligo sequences might be listed (main text, methods, supplements)
Claude will help search or explain based on its knowledge, and guide you to the relevant sections.
> Create a document summarizing the key points about TRISCO oligo design based on what we know
14.5 Phase 2: Extracting Oligo Sequences
The oligo sequences are typically in supplementary tables. Let’s organize them:
> Create a CSV file to store the oligo sequences from the paper.
> Columns should include: oligo_name, sequence, target_gene, target_region, and any other relevant annotations from the paper
Example structure (data/oligos.csv):
oligo_name,sequence,target_gene,target_region,length,notes
TRISCO_1,ACGTACGTACGT...,GeneA,intron_1,25,main targeting
TRISCO_2,TACGTACGTACG...,GeneA,intron_1,25,backup
...
14.6 Phase 3: Analyzing Sequence Features
Now let’s look for patterns in the designed oligos:
> Write a script that analyzes the oligo sequences for:
> 1. Length distribution
> 2. GC content
> 3. Presence of specific motifs
> 4. Any repeated patterns
from Bio.Seq import Seq
from Bio.SeqUtils import gc_fraction
import pandas as pd
def analyze_oligo(sequence):
"""Analyze a single oligonucleotide."""
seq = Seq(sequence)
return {
'length': len(seq),
'gc_content': gc_fraction(seq) * 100,
'has_homopolymer': any(base*4 in sequence for base in 'ACGT'),
'tm_estimate': 4*(sequence.count('G')+sequence.count('C')) +
2*(sequence.count('A')+sequence.count('T'))
}> Run this analysis on all oligos and show me the summary statistics
14.7 Phase 4: Understanding the Target Sites
TRISCO targets specific intronic regions. Let’s understand why:
> For each oligo, extract information about its target site:
> 1. Position relative to splice sites
> 2. Conservation of the target region
> 3. Secondary structure predictions
This might involve: - Fetching gene sequences from NCBI - Aligning oligos to their targets - Calculating distances from splice sites
> Create a visualization showing where each oligo maps relative to the intron structure
14.8 Phase 5: Inferring Design Rules
Based on our analysis, let’s formulate hypotheses about the design rules:
> Based on the patterns we've found, what do you think are the design rules for TRISCO oligos?
> Consider:
> - Sequence constraints (length, GC, motifs to avoid)
> - Positional requirements (where in the intron)
> - Thermodynamic properties (Tm, secondary structure)
Claude might identify rules like: - Oligos are 20-25 nucleotides - GC content between 40-60% - Avoid long homopolymers - Target regions 50-200nt from splice sites - Must bind with sufficient affinity (estimated Tm > 55°C)
> Create a specification document that codifies these design rules
14.9 Phase 6: Building a Design Tool
Now let’s create a tool that can design new oligos following these rules:
> Create a Python function that, given a gene and intron number:
> 1. Fetches the intron sequence
> 2. Identifies candidate target regions
> 3. Designs oligos following our inferred rules
> 4. Scores and ranks the candidates
def design_trisco_oligos(gene_id, intron_number, num_candidates=5):
"""Design TRISCO oligos for a target intron."""
# Fetch intron sequence
intron_seq = fetch_intron_sequence(gene_id, intron_number)
# Define target windows (based on inferred rules)
windows = get_target_windows(intron_seq)
candidates = []
for window_start, window_end in windows:
for length in range(20, 26): # Oligo lengths 20-25
for pos in range(window_start, window_end - length):
oligo = intron_seq[pos:pos+length]
# Apply design rules
if passes_gc_filter(oligo) and \
passes_homopolymer_filter(oligo) and \
passes_tm_filter(oligo):
score = score_oligo(oligo, pos)
candidates.append({
'sequence': oligo,
'position': pos,
'score': score
})
# Return top candidates
return sorted(candidates, key=lambda x: x['score'], reverse=True)[:num_candidates]14.10 Phase 7: Validation
Let’s validate our design rules by checking if they reproduce the original designs:
> Test our design tool on the same genes/introns from the paper.
> Do our top candidates match the published oligos?
def validate_design_rules(original_oligos, our_predictions):
"""Check if our rules reproduce original designs."""
matches = []
for oligo in original_oligos:
candidates = design_trisco_oligos(oligo['gene'], oligo['intron'])
# Check if original is in our top candidates
original_in_top = oligo['sequence'] in [c['sequence'] for c in candidates]
# Or check if we're close
best_match = find_most_similar(oligo['sequence'], candidates)
matches.append({
'original': oligo['sequence'],
'reproduced': original_in_top,
'best_match_similarity': best_match['similarity']
})
return pd.DataFrame(matches)14.11 Phase 8: Extending the Analysis
If our rules don’t fully reproduce the designs, we can dig deeper:
> The validation shows 60% match. What features might we be missing?
> Let's look more carefully at the non-matching cases.
Additional features to consider: - RNA secondary structure of the target region - Presence of regulatory motifs - Off-target binding potential - Modifications (LNA, 2’-OMe) affecting binding
> Add secondary structure prediction using RNAfold and incorporate it into our scoring
14.12 Phase 9: Documentation
> Create comprehensive documentation including:
> 1. Summary of the TRISCO method
> 2. Our inferred design rules with evidence
> 3. How to use our design tool
> 4. Validation results
> 5. Limitations and caveats
14.13 Complete Workflow Example
Here’s how a session might flow:
> Let's start by extracting all oligo sequences from the paper's supplementary data. I've downloaded it as supplementary_table_1.xlsx
> Good, I see 15 oligos. Let's analyze their basic properties first.
> Interesting - they're all 23-25nt and have GC content 45-55%. Let's look at their target positions.
> I notice they all seem to be 80-150 bases from the 5' splice site. Is that significant?
> Let's formalize these observations into design rules and test if they're predictive.
> Our design tool matches 12/15 published oligos. Let's look at the 3 mismatches closely.
> Ah, those three all have unusual secondary structure. Let's add that to our model.
> Now we match 14/15. Good enough to publish our analysis. Let's write it up.
14.14 What You’ve Learned
By completing this demo, you’ve:
- Extracted information from scientific papers systematically
- Identified patterns in experimental design
- Reverse-engineered rules from examples
- Built a predictive tool based on those rules
- Validated your model against ground truth
14.15 Applying This Approach
This reverse-engineering approach works for many experimental methods: - Primer design for specific PCR methods - Guide RNA design for CRISPR variants - Probe design for FISH - Aptamer selection rules
The key is: extract examples, find patterns, codify rules, validate.
14.16 Next Steps
Try another demo or continue to Part 4: Paper Writing Track.