and it turned out that the idea craig venter was really pushing finessed all of this combinatorics. take the whole entire genome, chop it up into lots and lots and lots of little pieces, and take reads off every one of those, and do so many that you have enough information to fit those pieces back together. for any given book, if you will, you have multiple copies of the book, and one you're ripping in half at chapter one and the other you're ripping in half at chapter four. if you do enough of them, it will cohere into one. given this fragment and given this fragment, 200 bases overlap, so we'll put them together and see if we can put the next one in. and if those two overlap and then you have a third one that comes in and it matches, you know you have it right. with a human genome, which is 3 billion a's, c's, g's and t's, you need to be a little smarter. so now you set all sorts of heuristics. so you say, "well, let's take every pair of reads that overlaps by at least, say, 100 letters," and they have the identical 100 letters. the problem is there's always sequencing error. for