14 Demo: Reverse-Engineering TRISCO Oligo Design

In this demo, we’ll understand and reproduce the oligo design logic from the TRISCO method (Kanatani et al.). This demonstrates using Claude Code to decode experimental protocols and reverse-engineer designs.

14.1 Background

TRISCO (Triple-stranded RNA-guided Intron-based Splicing Control) is a method for controlling gene expression using designed oligonucleotides. Understanding how the oligos are designed helps us:

Apply the method to new targets
Troubleshoot when things don’t work
Potentially improve the design

14.2 The Challenge

Papers often describe what was done but not why specific sequences were chosen. We’ll reverse-engineer the design rules.

14.3 Project Setup

mkdir ~/Projects/trisco-analysis
cd ~/Projects/trisco-analysis
claude

> Set up a project for analyzing oligonucleotide designs. I'll need Biopython for sequence handling and pandas for data organization.

14.4 Phase 1: Gathering Information

First, let’s collect the relevant information from the paper:

> I need to understand the TRISCO method from Kanatani et al.
> Can you help me find:
> 1. The paper details and DOI
> 2. The general principle of how it works
> 3. Where the oligo sequences might be listed (main text, methods, supplements)

Claude will help search or explain based on its knowledge, and guide you to the relevant sections.

> Create a document summarizing the key points about TRISCO oligo design based on what we know

14.5 Phase 2: Extracting Oligo Sequences

The oligo sequences are typically in supplementary tables. Let’s organize them:

> Create a CSV file to store the oligo sequences from the paper.
> Columns should include: oligo_name, sequence, target_gene, target_region, and any other relevant annotations from the paper

Example structure (data/oligos.csv):

oligo_name,sequence,target_gene,target_region,length,notes
TRISCO_1,ACGTACGTACGT...,GeneA,intron_1,25,main targeting
TRISCO_2,TACGTACGTACG...,GeneA,intron_1,25,backup
...

14.6 Phase 3: Analyzing Sequence Features

Now let’s look for patterns in the designed oligos:

> Write a script that analyzes the oligo sequences for:
> 1. Length distribution
> 2. GC content
> 3. Presence of specific motifs
> 4. Any repeated patterns

from Bio.Seq import Seq
from Bio.SeqUtils import gc_fraction
import pandas as pd

def analyze_oligo(sequence):
    """Analyze a single oligonucleotide."""
    seq = Seq(sequence)
    return {
        'length': len(seq),
        'gc_content': gc_fraction(seq) * 100,
        'has_homopolymer': any(base*4 in sequence for base in 'ACGT'),
        'tm_estimate': 4*(sequence.count('G')+sequence.count('C')) +
                       2*(sequence.count('A')+sequence.count('T'))
    }

> Run this analysis on all oligos and show me the summary statistics

14.7 Phase 4: Understanding the Target Sites

TRISCO targets specific intronic regions. Let’s understand why:

> For each oligo, extract information about its target site:
> 1. Position relative to splice sites
> 2. Conservation of the target region
> 3. Secondary structure predictions

This might involve: - Fetching gene sequences from NCBI - Aligning oligos to their targets - Calculating distances from splice sites

> Create a visualization showing where each oligo maps relative to the intron structure

14.8 Phase 5: Inferring Design Rules

Based on our analysis, let’s formulate hypotheses about the design rules:

> Based on the patterns we've found, what do you think are the design rules for TRISCO oligos?
> Consider:
> - Sequence constraints (length, GC, motifs to avoid)
> - Positional requirements (where in the intron)
> - Thermodynamic properties (Tm, secondary structure)

Claude might identify rules like: - Oligos are 20-25 nucleotides - GC content between 40-60% - Avoid long homopolymers - Target regions 50-200nt from splice sites - Must bind with sufficient affinity (estimated Tm > 55°C)

> Create a specification document that codifies these design rules

14.9 Phase 6: Building a Design Tool

Now let’s create a tool that can design new oligos following these rules:

> Create a Python function that, given a gene and intron number:
> 1. Fetches the intron sequence
> 2. Identifies candidate target regions
> 3. Designs oligos following our inferred rules
> 4. Scores and ranks the candidates

def design_trisco_oligos(gene_id, intron_number, num_candidates=5):
    """Design TRISCO oligos for a target intron."""

    # Fetch intron sequence
    intron_seq = fetch_intron_sequence(gene_id, intron_number)

    # Define target windows (based on inferred rules)
    windows = get_target_windows(intron_seq)

    candidates = []
    for window_start, window_end in windows:
        for length in range(20, 26):  # Oligo lengths 20-25
            for pos in range(window_start, window_end - length):
                oligo = intron_seq[pos:pos+length]

                # Apply design rules
                if passes_gc_filter(oligo) and \
                   passes_homopolymer_filter(oligo) and \
                   passes_tm_filter(oligo):
                    score = score_oligo(oligo, pos)
                    candidates.append({
                        'sequence': oligo,
                        'position': pos,
                        'score': score
                    })

    # Return top candidates
    return sorted(candidates, key=lambda x: x['score'], reverse=True)[:num_candidates]

14.10 Phase 7: Validation

Let’s validate our design rules by checking if they reproduce the original designs:

> Test our design tool on the same genes/introns from the paper.
> Do our top candidates match the published oligos?

def validate_design_rules(original_oligos, our_predictions):
    """Check if our rules reproduce original designs."""
    matches = []
    for oligo in original_oligos:
        candidates = design_trisco_oligos(oligo['gene'], oligo['intron'])

        # Check if original is in our top candidates
        original_in_top = oligo['sequence'] in [c['sequence'] for c in candidates]

        # Or check if we're close
        best_match = find_most_similar(oligo['sequence'], candidates)

        matches.append({
            'original': oligo['sequence'],
            'reproduced': original_in_top,
            'best_match_similarity': best_match['similarity']
        })

    return pd.DataFrame(matches)

14.11 Phase 8: Extending the Analysis

If our rules don’t fully reproduce the designs, we can dig deeper:

> The validation shows 60% match. What features might we be missing?
> Let's look more carefully at the non-matching cases.

Additional features to consider: - RNA secondary structure of the target region - Presence of regulatory motifs - Off-target binding potential - Modifications (LNA, 2’-OMe) affecting binding

> Add secondary structure prediction using RNAfold and incorporate it into our scoring

14.12 Phase 9: Documentation

> Create comprehensive documentation including:
> 1. Summary of the TRISCO method
> 2. Our inferred design rules with evidence
> 3. How to use our design tool
> 4. Validation results
> 5. Limitations and caveats

14.13 Complete Workflow Example

Here’s how a session might flow:

> Let's start by extracting all oligo sequences from the paper's supplementary data. I've downloaded it as supplementary_table_1.xlsx

> Good, I see 15 oligos. Let's analyze their basic properties first.

> Interesting - they're all 23-25nt and have GC content 45-55%. Let's look at their target positions.

> I notice they all seem to be 80-150 bases from the 5' splice site. Is that significant?

> Let's formalize these observations into design rules and test if they're predictive.

> Our design tool matches 12/15 published oligos. Let's look at the 3 mismatches closely.

> Ah, those three all have unusual secondary structure. Let's add that to our model.

> Now we match 14/15. Good enough to publish our analysis. Let's write it up.

14.14 What You’ve Learned

By completing this demo, you’ve:

Extracted information from scientific papers systematically
Identified patterns in experimental design
Reverse-engineered rules from examples
Built a predictive tool based on those rules
Validated your model against ground truth

14.15 Applying This Approach

This reverse-engineering approach works for many experimental methods: - Primer design for specific PCR methods - Guide RNA design for CRISPR variants - Probe design for FISH - Aptamer selection rules

The key is: extract examples, find patterns, codify rules, validate.

14.16 Next Steps

Try another demo or continue to Part 4: Paper Writing Track.