Table of Contents Diamond transition or how technicalities can break concepts But let's take a closer look Conclusion References Diamond transition or how technicalities can break concepts We assume the reader has some basic knowledge about pairwise alignment and in particular the WFA algorithm.
In this post we dive into a potential 2x speedup of WFA — one that turns out not to work.
Let's take a look at one of the most important and efficient algorithms for pairwise alignment — WFA (Marco-Sola et al.
small lecture introduces \(h_f(u) = \frac 12 (\pi_f(u) - \pi_r)\). Not found a paper yet. An Improved Bidirectional Heuristic Search Algorithm (Champeaux 1977) introduces a bidirectoinal variant Bidirectional Heuristic Search Again (Champeaux 1983) fixes a bug in the above paper Efficient modified bidirectional A* algorithm for optimal route-finding Didn't read closely yet. A new bidirectional algorithm for shortest paths (Pijls 2008) Actually a new methods.The BiWFA meeting condition

Table of Contents References cross references: BiWFA GitHub issue
It seems that getting the meeting/overlap condition of BiWFA (Marco-Sola et al. (2022), Algorithm 1 and Lemma 2.1) correct is tricky.
Let \(p := \max(x, o+e)\) be the maximal cost of any edge in the edit graph. As in the BiWFA paper, let \(s_f\) and \(s_r\) be the distances of the forward and reverse fronts computed so far.
We prove the following lemma:A* variantshttps://research.curiouscoding.nl/notes/astar-variants/Sun, 12 Jun 2022 12:04:00 +0200https://research.curiouscoding.nl/notes/astar-variants/These are some quick notes listing papers related to A* itself and variants. In particular, here I’m interested in papers that update \(h\) during the A* search, as a background for pruning.
Specifically, our version of pruning increases \(h\) during a single A* search, and in fact the heuristic becomes in-admissible after pruning.
These are the slides Pesho Ivanov and I presented at IGGSY 2022 on Astarix and A*PA.
Pdf: hereBenchmark attention pointshttps://research.curiouscoding.nl/notes/benchmarks/Thu, 28 Apr 2022 23:33:00 +0200https://research.curiouscoding.nl/notes/benchmarks/Pin CPU frequency CPUs, especially laptops, have turboboost, (thermal) throttling, and powersave features. Make sure to pin the CPU core frequency low enough that it can be sustained for long times without throttling. In my case, the `performance` governor can fix the CPU frequency. The base frequency of my CPU is 2.6GHz, but I set it slightly lower since I prefer consistency.
sudo cpupower frequency-set -g performance sudo cpupower frequency-set -u 1.Motivationhttps://research.curiouscoding.nl/notes/motivation/Thu, 28 Apr 2022 23:22:00 +0200https://research.curiouscoding.nl/notes/motivation/It’s not the need for faster software that motivates; it’s the mathematical discovery that needs sharing.[WIP] Linear time pairwise alignment of random stringshttps://research.curiouscoding.nl/notes/linear-time-pa/Sun, 24 Apr 2022 00:00:00 +0200https://research.curiouscoding.nl/notes/linear-time-pa/Table of Contents Pairwise alignment in subquadratic time Random model Comparison Algorithm Counting-seeds heuristic Match pruning TODO Analysis References This post is a work in progress [WIP]/sketch proof to show that pairwise alignment of random strings with random mutations can be done in linear time.
Table of Contents Pairwise alignment in subquadratic time Random model Comparison Algorithm Counting-seeds heuristic Match pruning TODO Analysis References This post is a work in progress [WIP]/sketch proof to show that pairwise alignment of random strings with random mutations can be done in linear time.

Pairwise alignment in subquadratic time Backurs and Indyk (2018) show that computing edit distance can not be done in strongly subquadratic time (i.e. \(O(n^{2-\delta})\) for any \(\delta >0\)) assuming the Strong Exponential Time Hypothesis.
In this post I will explore some variations of the recursion used by WFA/BiWFA for the affine version of the diagonal transition algorithm. In particular, we will go over a gap-close variant, and look into some more symmetric formulations.
(Groot Koerkamp and van der Wegen 2019) (Groot Koerkamp and Živný 2021)
Groot Koerkamp, Ragnar, and Marieke van der Wegen. 2019. “Stable gonality is computable.” Discrete Mathematics & Theoretical Computer Science vol. 21 no. 1, ICGT 2018 (June). https://doi.org/10.23638/DMTCS-21-1-10. Groot Koerkamp, Ragnar, and Stanislav Živný. 2021. “On Rainbow-Free Colourings of Uniform Hypergraphs.” Theoretical Computer Science 885 (September): 69–76. https://doi.org/10.1016/j.tcs.2021.06.022.Research topicshttps://research.curiouscoding.nl/pages/todo/Fri, 15 Apr 2022 00:00:00 +0200https://research.curiouscoding.nl/pages/todo/Table of Contents In progress On hold Pending ideas/blogposts Smaller tasks Future plans Open questions Here I list some ideas for research topics / papers / tasks that need doing:
In progress A* pairwise aligner [GitHub] Exact global pairwise alignment of random strings in expected linear time. Contains proof of correctness, implementation, evals and comparison with WFA and edlib on random data.
Proof of expected linear time alignment I have a proof of concept to show that a simplified version of the algorithm currently implemented by A* pairwise aligner runs in expected linear time on random input with sufficiently low edit distance (\(|\Sigma|^{1/e} \ll n\)), but need to spend some time on details and writing it down.Glossaryhttps://research.curiouscoding.nl/pages/glossary/Thu, 14 Apr 2022 00:00:00 +0200https://research.curiouscoding.nl/pages/glossary/This is a growing list of ambiguous terms and their definitions. More of a place to store random remarks than a complete reference for now.
diagonal transition name introduced by Navarro (2001) approximate approximate algorithm: an algorithms that does not always give the correct answer.
$k$-approximate string matching: variant semi-global alignment where we find all matches of a pattern in a reference with at most \(k\) mistakes.
Table of Contents Variants of pairwise alignment Cost models Alignment types A chronological overview of global pairwise alignment Algorithms in detail Classic DP algorithms Cubic algorithm of Needleman and Wunsch (1970) A quadratic DP Local alignment Affine costs Minimizing vs. maximizing duality Four Russians method TODO \(O(ns)\) methods TODO Exponential search on band TODO LCS: thresholds, $k$-candidates and contours TODO Diagonal transition: furthest reaching and wavefronts TODO Suffixtree for \(O(n+s^2)\) expected runtime Using less memory Computing the score in linear space Divide-and-conquer TODO LCSk[++] algorithms Theoretical lower bound TODO A note on DP (toposort) vs Dijkstra vs A* TODO Tools TODO Notes for other posts Semi-global alignment papers Approximate pairwise aligners Old vs new papers References This post explains the many variants of pairwise alignment, and covers papers defining and exploring the topic.
Say we’re running A* in a graph from \(s\) to \(t\). \(d(s,t)\) is the distance we are looking for.
Papers

AStarix: Fast and Optimal Sequence-to-Graph Alignment Fast and Optimal Sequence-to-Graph Alignment Guided by Seeds AStarix is a method for aligning sequences (reads) to graphs:
AStarix: Fast and Optimal Sequence-to-Graph Alignment Fast and Optimal Sequence-to-Graph Alignment Guided by Seeds AStarix is a method for aligning sequences (reads) to graphs:
Input A reference sequence or graph Alignment costs \((\Delta_{match}, \Delta_{subst}, \Delta_{del}, \Delta_{ins})\) for a match, substitution, insertion and deletion Sequence(s) to align Output An optimal alignment of each input sequence The input is a reference graph (automaton really) \(G_r = (V_r, E_r)\) with edges \(E_r \subseteq V_r\times V_r\times \Sigma\) that indicate the transitions between states.Neighbor joininghttps://research.curiouscoding.nl/notes/neighbor-joining/Fri, 12 Nov 2021 11:57:00 +0100https://research.curiouscoding.nl/notes/neighbor-joining/Neighbor joining (NJ, paper) is a phylogeny reconstruction method. It differs from UPGMA in the way it computes the distances between clusters.
This algorithm first assumes that the phylogeny is a star graph. Then it finds the pair of vertices that when merged and split out gives the minimal total edge length \(S_{ij}\) of the new almost-star graph. (See eq. (4) and figure 2a and 2b in the paper.) \[ S_{i,j} = \frac1{2(n-2)} \sum_{k\not\in \{i,j\}}(d(i, k)+d(j,k)) + \frac 12 d(i,j)+\frac 1{n-2} \sum_{k<l,\, k, l\not\in\{i,j\}}d(k,l).UPGMAhttps://research.curiouscoding.nl/notes/upgma/Thu, 28 Oct 2021 11:56:00 +0200https://research.curiouscoding.nl/notes/upgma/Unweighted pair group method with arithmetic mean (UPGMA) is a phylogeny reconstruction method.
I’ve asked for automated scripts to reproduce test data on 3+ github repositories now, and got a satisfactory answer zero times:
WFA: https://github.com/smarco/WFA/issues/26
Table of Contents Background $k$-mers Sketching MinHash Terminology Introduction Spaced $k$-mer Seeded Distance Improving performance Analysis Pruning false positive candidate matches Phylogeny reconstruction Running the algorithm TODO Assembly \[ \newcommand{\vp}{\varphi} \newcommand{\A}{\mathcal A} \newcommand{\O}{\mathcal O} \newcommand{\N}{\mathbb N} \newcommand{\ed}{\mathrm{ed}} \newcommand{\mh}{\mathrm{mh}} \newcommand{\hash}{\mathrm{hash}} \]

Background Quickly finding similar pieces of DNA within large datasets is at the core of computational biology. This has many applications:
Background Quickly finding similar pieces of DNA within large datasets is at the core of computational biology. This has many applications:
Alignment: Given two pieces of related DNA, align them to find where mutations (i.Open Sciencehttps://research.curiouscoding.nl/posts/open-science/Tue, 19 Oct 2021 00:00:00 +0200https://research.curiouscoding.nl/posts/open-science/Let’s go over some reasons for why I’m writing this blog.
The internet is more accessible than papers The inspiration for this blog is the post on Succinct de Bruijn Graphs by Alex Bowe. I think blog posts are a great way to quickly learn about new ideas and concepts, since they are usually more accessible than papers. A blog post can omit some of the more formal text required in papers and spend more time explaining things on an intuitive level.Hugo and ox-hugohttps://research.curiouscoding.nl/notes/hugo/Thu, 14 Oct 2021 00:00:00 +0200https://research.curiouscoding.nl/notes/hugo/Here’s the customary how I made this site using X post.
This site is built using Hugo and ox-hugo.
The source is written in Org mode, which is converted to markdown by ox-hugo. To get started yourself, check out the initial commit of the source repository and build from there.
Some notes:
Hi there ;) I'm doing a PhD in bioinformatics at the BMI lab at ETH Zurich. Currently I'm working on near-linear algorithms for exact pairwise alignment.

This blog is where I dump my thoughts on my PhD research. For now it includes some short notes/remarks/ideas for research, and a few longer posts that may eventually turn into papers.

Feel free to use this blog as inspiration and build on the ideas you see here, as long as you cite appropriately.
This blog is where I dump my thoughts on my PhD research. For now it includes some short notes/remarks/ideas for research, and a few longer posts that may eventually turn into papers.
