darian.lagman
Note · 2026-04-30T00:00:00.000Z

BLAST Bioinformatics

A short note on implementing BLAST's seed-and-extend heuristic from scratch and where the speed actually comes from.


Why BLAST is interesting

BLAST is a practical lesson in algorithm design: exact global alignment is expensive, but biological sequences often contain short high-signal matches. The system is fast because it searches for promising seeds first, then spends alignment work only where the seed suggests a useful hit.

Implementation notes

Index fixed-width words

Split the query into k-mers, score neighboring words, and keep only candidates above a threshold.

Scan the database

Look up exact word hits quickly rather than trying to align every sequence pair.

Extend both directions

Grow each seed while the running score improves; stop when the score drops below a cutoff.

Rank local alignments

Emit high-scoring segment pairs, then merge or filter overlaps before reporting.

The useful engineering habit here is separating the cheap candidate stage from the expensive verification stage. That pattern shows up everywhere: search systems, compiler passes, agent tool routing, and inventory alerts.