16  Automating Repetitive Tasks

Claude Code excels at automating the tedious parts of research—file manipulation, format conversion, batch processing, and more.

16.1 The Automation Mindset

Whenever you find yourself doing something repeatedly, ask:

“Can Claude automate this?”

Usually, yes.

16.2 File Organization

16.2.1 Batch Renaming

> I have 500 image files named IMG_0001.jpg through IMG_0500.jpg
> Rename them to experiment_001.jpg through experiment_500.jpg
import os

for i, filename in enumerate(sorted(os.listdir('images')), 1):
    if filename.startswith('IMG_'):
        new_name = f"experiment_{i:03d}.jpg"
        os.rename(f"images/{filename}", f"images/{new_name}")

16.2.2 Organizing by Date

> Sort all PDFs in my downloads folder into year/month subfolders
> based on their modification date

16.2.3 Cleaning Up

> Find and delete all .DS_Store files in this project
> Remove all empty directories
> Find duplicate files and list them

16.3 Data Format Conversion

16.3.1 Spreadsheet Formats

> Convert all .xlsx files in data/ to CSV format
import pandas as pd
from pathlib import Path

for xlsx in Path('data').glob('*.xlsx'):
    df = pd.read_excel(xlsx)
    df.to_csv(xlsx.with_suffix('.csv'), index=False)

16.3.2 Between Data Formats

> Convert this JSON file to YAML
> Convert this CSV to a SQL INSERT statement
> Parse this XML and extract specific fields to CSV

16.3.3 Image Formats

> Convert all PNG files to JPEG with 85% quality
> Resize all images in figures/ to max 1000px wide, maintaining aspect ratio

16.4 Batch Processing

16.4.1 Processing Multiple Files

> For each .fastq.gz file in raw_data/:
> 1. Run FastQC
> 2. Save reports to qc_reports/
> 3. Generate a summary CSV of quality metrics
import subprocess
from pathlib import Path
import pandas as pd

# Run FastQC on all files
for fastq in Path('raw_data').glob('*.fastq.gz'):
    subprocess.run(['fastqc', str(fastq), '-o', 'qc_reports/'])

# Parse reports and create summary
# ... (parsing logic)

16.4.2 Parallel Processing

> The processing is slow. Run it in parallel using 4 cores.
from concurrent.futures import ProcessPoolExecutor

with ProcessPoolExecutor(max_workers=4) as executor:
    executor.map(process_file, files)

16.5 Web Scraping and APIs

16.5.1 Downloading Data

> Download all supplementary files from this paper's URL
> Fetch protein sequences for these 50 gene IDs from UniProt

16.5.2 API Interactions

> Every day at 9am, check PubMed for new papers matching my search and email me a summary

(This would involve creating a script and scheduling it with cron)

16.6 Report Generation

16.6.1 Automated Reports

> Create a script that:
> 1. Reads all CSV files in results/
> 2. Generates summary statistics
> 3. Creates plots
> 4. Compiles into a PDF report

16.6.2 Email Automation

> Send an email notification when the analysis pipeline completes

16.7 Text Processing

16.7.1 Bulk Find and Replace

> In all .py files, replace 'old_function_name' with 'new_function_name'

16.7.2 Extracting Information

> Extract all email addresses from these documents
> Find all DOIs mentioned in this folder of papers

16.7.3 Log Analysis

> Parse the error log and summarize the most common error types

16.8 Scheduling Tasks

16.8.1 Cron Jobs (Mac/Linux)

> Create a cron job that runs my backup script every night at 2am
# Edit crontab
crontab -e

# Add line:
0 2 * * * /path/to/backup_script.sh

16.8.2 Windows Task Scheduler

> Set up a scheduled task to run this Python script weekly

16.9 Practical Examples

16.9.1 Example 1: Paper Download and Organization

> I have a list of DOIs in papers_to_read.txt
> For each DOI:
> 1. Download the PDF if available via open access
> 2. Save to papers/ with filename: FirstAuthor_Year_ShortTitle.pdf
> 3. Log any that couldn't be downloaded

16.9.2 Example 2: Data Backup

> Create a backup script that:
> 1. Compresses the data/ folder
> 2. Adds today's date to the filename
> 3. Copies to my external drive
> 4. Deletes backups older than 30 days

16.9.3 Example 3: Lab Equipment Log

> Our microscope outputs a log file each session.
> Create a script that:
> 1. Parses all log files
> 2. Extracts usage time per user
> 3. Generates a monthly usage report

16.10 Building Reusable Tools

When you solve a problem, make it reusable:

> Package this file conversion script as a command-line tool I can use anywhere

Claude creates:

#!/usr/bin/env python
"""Convert XLSX files to CSV.

Usage: xlsx2csv input.xlsx [output.csv]
"""

import sys
import pandas as pd

def main():
    input_file = sys.argv[1]
    output_file = sys.argv[2] if len(sys.argv) > 2 else input_file.replace('.xlsx', '.csv')

    df = pd.read_excel(input_file)
    df.to_csv(output_file, index=False)
    print(f"Converted {input_file} -> {output_file}")

if __name__ == "__main__":
    main()

Make it executable:

chmod +x xlsx2csv.py
mv xlsx2csv.py ~/bin/xlsx2csv  # Add ~/bin to PATH

16.11 Tips for Automation

16.11.1 Start Simple

> First, let's just process one file to make sure the logic is right

16.11.2 Add Progress Indicators

> Add a progress bar for the batch processing

16.11.3 Handle Errors Gracefully

> Add try/except so one bad file doesn't stop the whole batch

16.11.4 Log Everything

> Add logging so I can see what was processed and catch any issues

16.12 What You’ve Learned

Automation with Claude Code means: - Less repetitive work - Fewer human errors - More reproducible processes - More time for actual science

16.13 Next Steps

Continue to Building Websites.