Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms
Abstract
In this paper we address the pure parsimony haplotyping problem: Find a minimum number of haplotypes that explains a given set of genotypes. We prove that the problem is APX-hard and present a 2k− 1-approximation algorithm for the case in which each genotype has at most k ambiguous positions. We further give a new integer-programming formulation that has (for the first time) a polynomial number variables and constraints. Finally, we give approximation algorithms, not based on linear programming, whose running times are almost linear in the input size.

