23ème congrès annuel de la Société Française de Recherche Opérationnelle et d'Aide à la Décision

sciencesconf.org:roadef2022:379218

Answer Set Programming based haplotype phasing of long reads for polyploid species

Clara Delahaye 1, 2, @ , Jacques Nicolas 3, @

1 : Université de Rennes 1

Universite de Rennes 1

2 : Institut de Recherche en Informatique et Systèmes Aléatoires

Universite de Rennes 1, Institut National de Recherche en Informatique et en Automatique, Centre National de la Recherche Scientifique : UMR6074

3 : Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA) - Site web

Universite de Rennes 1, Institut National de Recherche en Informatique et en Automatique

Avenue du général LeclercCampus de Beaulieu 35042 RENNES CEDEX - France

Living organisms have their DNA organized into chromosomes, each complete set of chromosomes being present in two (for diploid organisms such as humans and many animals) or more copies (for polyploid organisms like some plants). We call haplotype each copy within a given set. Haplotypes are highly similar, but show biologically important differences called variants, that can be of high interest as they may be involved into biological processes or genetic diseases.
However most of genome representations available today are monoploid in the sense that they are made of a mix of all haplotypes, thus masking variants and leading to missing or erroneous information. Our aim is to build reference sequences for each haplotype of a genome, taking as input genome subsequences called reads issued from a sequencing machine.

Here we propose a combinatorial method for diploid and polyploid haplotype phasing of long read data. We address the haplotype phasing as an optimization problem and use Answer Set Programming (ASP), with clingo system to solve it.
Rather than providing a unique and likely erroneous answer to this hard problem, the ASP framework allows to reason on the set of possible solutions. Moreover, ASP is a high-level declarative language that offers both efficiency (inspired on SAT-solver techniques) and ex-pressiveness (more than ILP for example): the user can easily express preferences and get aglobal view of confident and ambiguous positions in phased regions.

Type :	:	Article
Thématiques	:	Session "Recherche Opérationnelle en Bio-Informatique"
Mots-Clés	:	Constraint logic programming ; partitioning ; Phasing ; DNA sequencing data

Personnes connectées : 7

Vie privée | Accessibilité