Bioinformatics and Genome Questions
Lander and Waterman derived formulae for the expected completeness of an assembly as a function of coverage c = NL/G, where G = genome length, N = number of reads, L = read length. Then probability that a base is not sequenced = e-c total expected gap length = Ge-c total number of gaps = Ne-c (a) what fraction of a genome could you expect to assemble from eightfold coverage? Present your result as percentage. (b) What total gap length would you expect in an assembly of a 2 Mb target genome size from eightfold coverage? (c) How many gaps would you expect in an assembly of a 2 Mb target genome size from an eightfold coverage of fragments with a read length of 500? (d) You want to sequence to sequence a 4 Mb genome by the shotgun method, by assembling random fragments with read length 500. What coverage would you require, to expect no more than four gaps, assuming no complications arising from repetitive sequences or skewed base composition?