| Step | Subproblem | Condition | Transition Choice |
|---|
Optimal Substructure & Two-Dimensional Sequence Alignment
The Longest Common Subsequence (LCS) problem is a cornerstone paradigm in string matching, bio-informatics, and computational biology. It exhibits a strict optimal substructure because the global optimal solution for a pair of string prefixes depends directly on the optimal solutions of smaller, overlapping sub-problems.
By establishing a two-dimensional dynamic programming matrix, the system maps out the alignment landscape. The state cell dp[i][j] systematically caches the exact length of the longest common subsequence existing between the prefix of string X spanning from index 1 to i, and the prefix of string Y spanning from index 1 to j.
dp[i][j] = 0
Character Match Event (X[i-1] == Y[j-1]):
dp[i][j] = dp[i-1][j-1] + 1
Character Mismatch Event (X[i-1] ≠ Y[j-1]):
dp[i][j] = max( dp[i-1][j], dp[i][j-1] )
When an identical character pair is discovered at the current indices, it is mathematically guaranteed to contribute to the optimal subsequence. The engine increments the historical diagonal count by 1. If a mismatch occurs, the algorithm drops into a branch condition, inheriting the maximum historical valuation found by dropping a character from either the vertical string sequence or the horizontal string sequence.
Combinatorial Matrix Backtracking Strategy
Compiling the quantitative scalar matrix table is only the first phase. Reconstructing the actual human-readable sequence string requires a targeted linear reverse-traversal. Starting at the absolute termination coordinate dp[m][n] (representing the full lengths of both strings), the algorithm steps backwards through the state coordinates by reading historical data dependencies:
- Diagonal Shifts (↖): If the boundary characters match, that character is safely locked into our solution pool. The pointer steps diagonally upward to the left cell: dp[i-1][j-1].
- Directional Tracking Vector (↑ or ←): If the indices do not match, the system looks at the values of the cells immediately above (dp[i-1][j]) and to the left (dp[i][j-1]). The pointer moves toward whichever neighbor holds the higher value, effectively isolating and skipping the non-contributing element.
Asymptotic Complexity Framework
This matrix cache strategy bypasses the exponential, brute-force complexity that tracks with native recursion. The time complexity scales smoothly at O(m · n), where m and n are the respective lengths of the two inputs. The auxiliary space footprint matches this scale at O(m · n) to preserve the structural graph trace. This approach guarantees robust, predictable execution scales when processing large data strings such as file diffs or genomic sequence payloads.