Long-read sequencing: Jonas Korlach, CSO of Pacific Biosciences
Note: These podcasts are produced to be heard. If you can, please tune in. Transcripts are generated using speech recognition software and there’s a human editor. But a transcript may contain errors. Please check the corresponding audio before quoting.
Long-read sequencing: A conversation with Jonas Korlach, CSO of Pacific Biosciences.
Vivien Marx [1:30]
When scientists want to know about genes, chances are they use instruments called sequencers. There are quite a few companies that make sequencers. These instruments can give a read-out for example of a stretch of DNA or many stretches of DNA, even entire genomes and many genomes.
The challenge has been that the instruments deliver--short reads—short readouts of sequence. What happens then is that scientists face the challenging computational task of stitching together short reads into contiguous sequence.
Genomes have a lot of gnarly bits for which that assembly isn't possible. That means that you can't find out the sequence at those locations.
To address this problem, a few companies developed instruments that can perform long-read sequencing, one of which is Pacific Biosciences or PacBio for short. There are others such as Oxford Nanopore Technologies, there are newer offerings from other companies such as MGI, Ultima Genomics, Element Biosciences and Illumina one of the more well-known sequencer makers is is also offering long-read technology.
As part of a story for one of the Nature journals, Nature Methods, I spoke with researchers in academia and with scientists at companies about long read sequencing. This is one episode of what will be several on this topic to share more of what I found out as I did that story. Nature Methods calls long-read sequencing the method of the year for 2022.
Trumpet
The companies that makes sequencers that can generate long-reads. In this episode I'd like to introduce you to Dr. Jonas Korlach, the chief scientific officer of PacBio. He's originally from Germany did his bachelor's and master's degree there and then came to the US for his PhD. He lives in California, the PacBio headquarters are in Menlo Park. When we chatted, he showed me a photo of him with former German Chancellor Angela Merkel at the Max Delbrück Center of Molecular Medicine on the outskirts of Berlin.
-
Vivien
I love that photo of you and Angela Merkel that was at the Max Delbrueck?
Jonas Korlach [3:30]
That's the Delbrück center, they got the first system there. And she came by and started the first run. And she has a physics background. So we, there's a there's a funny story. So the instrument after you load it, and we let her load everything. I mean, there, there's actually two funny stories. One was, we had to describe to the Security Service, what happens and we said, when then from medical will open this bag, and the security service guy said “Frau Merkel doesn't open any bags.”
And we said, okay, the bag will already open. And so she loaded the thing, and then you push a button, and the machine takes a few seconds, or maybe 30 seconds to make sure everything is okay, before it starts running. And so we had talked for quite a bit and she was really interested. And we had good chat, and she started it. And then one of the ministers tapped on the shoulder Frau Merkel, we have to go and she said, No, I want to wait for this run to start. So and then, luckily, five seconds later it started and then off they went. So but that was pretty cool.
Vivien
Pretty cool indeed when a politician is intrigued by a scientific instrument. Before you hear more from and with Jonas Korlach, let me just say. In these podcasts, I do try, not terribly successfully to get people's names right. Here is the way his name should be pronounced.
Jonas Korlach
Thank you I appreciate that. My name is pronounced Jonas Korlach
Vivien
I wondered what Jonas Korlach thinks about having a technology he co-developed be named a method of the year.
Jonas Korlach [5:10]
The dream scenario of any method developer is to have scientist use the method that you developed. Right. And so, and we've obviously been been very fortunate and very honored and humbled by the scientific community, using PacBio long read sequencing now and having published over 9000 peer reviewed publications to date. And so this is another form of such recognition.
And so it's incredibly meaningful, and we're very, honored and proud and I think it, you know, speaks to the transformation that's been happening within the biological sciences that the impact that long read sequencing has had to really fundamentally change some of the paradigms that we've had, seeing new biology all the time, and looking at genomes in a completely different way, whether it be seeing for the first time the regions of the genome that people have not been able to look at at all, or maybe more seemingly mundane facts, like the fact that the human genome is now six gigabases in size, and not three gigabases, because for the first time, now, we are able to completely separate the parental haplotypes.
Of course, in every biological cell, there's two copies of your genome, one from the mother and one from the father. And they're not the same, because otherwise your mother and father would have looked the same, which they didn't. And now being able to separate those two alleles and express a human genome as it exists in every one of our cells, as two copies, and expressing it as six gigabases of sequence is just a really dramatic and fundamental paradigm shift in the community.
Vivien
Genomes have some awfully difficult stretches to analyze with a sequencer. For example you can't readily distinguish maternal and paternal contributions to the human genome so easily. But long reads makes that easier. That's why Jonas Korlach talks about the 3 billion bases of the human genome that are now 6 billion bases.
And since he mentioned over 9,000 papers, no worries we are not going to go through them all here. But it's clear that he enjoys seeing work published that involves using PacBio sequencers. And actually he is often part of these projects and is on the papers as a co-author quite frequently.
Jonas Korlach [7:50]
As CSO at the company, you know, part of my job, one of the main portions of my responsibilities is to interact with the scientific community and discuss with the scientists around the world, how they could apply the technology to the questions that they want to answer in their research. And that's incredibly rewarding, both in terms of being at the cutting edge, and these types of discussions and experimental study designs and so forth. And then to see the results of that and to see the enthusiasm of the researchers to finally get the answer.
And often times, after having tried sometimes for many years, to see these regions, or to answer these questions, and so we get, it's very rewarding to see this enthusiasm to say, Fine, I've tried for three years and finally, I've been able to resolve this gene or assemble this genome, see this region that I study, and so forth. And so and, as part of that, you know, job description is to keep pace and and stay up to date on what the community is doing with our technology. And so it's a great pleasure to read the preprints and the latest papers, and, and then I reach out to the corresponding authors congratulating them and asking them, you know, I look for these connections.
Vivien
Jonas Korlach cares about the science and I wondered how involved he gets with projects. It seems he enjoys collaboration with academics.
And when the project is a new approach, there are ways to have PacBio join and help a project in various ways. So if you are a researcher with just such an idea, it can't hurt to reach out.
Jonas Korlach
This is exactly how we've been approaching it and how I've liked to interact because we are on the cutting edge. action. And by its definition, a new method applied to something, you do it for the first time. And so we want to do the best we can to help the researchers and do that collaboratively, because there will be challenges that neither of us anticipated. And so forth and that's how, that's how you develop applications for a fundamental method, whereby you apply it to something for the first time, and then and then it becomes routine.
Vivien
Some aspects in genomics have become routine, both with sequencing and then assembling genomes but there are still plenty of aspects that are not yet routine and PacBio is keen on helping with those first time kinds of experiments. I asked him a bit more about his, how does the company help, I had heard that it offers discounts of various types.
Jonas Korlach [10:05]
Yeah, that is correct. And so you know, and that I think, is fairly standard in the industry, and, you know, just at a very high level, you know, that this is tied to the scale that people want to do things at. Obviously, if they want to do a lot of sequencing, then we try to incentivize that. And then more more directly connected to the previous point, is if somebody wants to do something that's never been done before, and we're excited about it, we're happy to support that, you know, and incentivize that with maybe a discount and try to help make it happen, essentially.
Vivien
Let's say a post doctoral fellow is trying to land a grant for a project but needs data to do that.
Jonas Korlach [10:50]
And it's a little bit of an incentive to let us know, right? And so this is part of the maybe motivation that they reach out to me and reach out to us as a company, and then we can start working together
Vivien
I wondered about the history of Pacific Biosciences, about where the idea came from and how the technology development unfolded. So Jonas Korlach did a bit of time travel. As a graduate student at Cornell University in the lab of Watt Webb, he was fascinated by molecular machines.
Jonas Korlach [11:20]
The story begins when I started graduate school at Cornell University, and that was in the fall of 1997. So 25 years ago. So as I mentioned, this, this fall is very special, because it's the 25th anniversary of all these, all these things that happened there to me. And, and one of the first courses of graduate courses was Bio33, by Jeff Roberts is a very, you know, well known researcher in the field of, I would call fundamental biology. So he worked on phages, and bacteria to in the 60s and 70s, elucidate all these fundamental, you know, dogma questions, and so, the course was entitled, you know, macromolecular, machines and, and we went through and at the, this, the late 1990s, was a very exciting time, because all these high resolution crystal structures became available of DNA polymerase, shown here, and RNA polymerase, and the ribosomes, and, and so forth.
And so, during my undergraduate work, most of the work was like bands on the gel. And now this was replaced with these types of pictures, right. And so I just was fascinated by that. And it was just incredible, because now all of a sudden, we had such a detailed picture of how these incredible macromolecular machines that make DNA and make proteins and make RNA and so forth, are organized.
And so that was, I mean, I was just absolutely fascinated, what struck me is that there was very, very little by comparison, or, in some cases, no kinetic information, no information about how these incredible machines move in time. And so I started thinking about how one might be able to look at this. And so this is basically how this sort of idea came about. And DNA polymerase was particularly fascinating to me, because it's just an incredible enzyme, it goes at 1000 bases per second in your body. It never makes a mistake, it copies your genome exactly one time. I mean, it's just, uh, you know, if you think about it, it's a, it's an incredible sequencing machine, it's the most powerful sequencing machine that's ever been built , you know, by nature through hundreds of millions of years of evolution.
And so, while initially the,the idea was, you know, sort of academic, and how can we look at these machines in time, it very clearly, you know, very quickly, you know, dawned on me that if you could do that if you could watch a DNA polymerase in real time and see it, incorporate the bases, one after another, you would have a sequencing machine. And so, then, you know, about 25 years ago, I jotted down in my notebook, the idea to watch DNA polymerase make DNA thereby sequence.
Vivien
His idea, initially, was to find a way to watch DNA polymerase incorporate DNA bases right as it was copying a DNA strand. One issue was to figure out how to get a signal from the different bases so you collect information and don't just watch, or have the machine watch the sequencing. The other issue was about figuring out how to hold the DNA strand in place to get those measurements, capture the sequence to have a readout. As Jonas Korlach spoke with me he showed me a sketch he did in his notebook. There's a link in this transcript of the podcast where you can see some visuals.
Jonas Korlach [15:10]
And there's two aspects of this which is, how do you differentiate the four bases, right, and so, you know, one obvious thing would be color, and then the other which is jotted down here will be different lifetimes of the fluorophores. Because I thought, maybe the, at the time, that will be easier, you'll have one one color, and then and the lab I was in, had a history of doing fluorescence lifetime measurements, and so forth.
And then the other thing that's visible from from this initial sort of conceptual graphic or sketch is that I realized that well, if the, if the DNA is bound to the surface in whatever observation volume and whatever volume I'm looking at, and then that means that the DNA polymerase will move along the DNA strand as it makes DNA. And we'll we'll walk out of the of the volume essentially.
And so that's why this alternative configuration, which you know, is the one that we're using now, where you actually attach the polymerase, the enzyme itself to the surface, because then it stays stable, the DNA moves relative to the volume. And what I'm saying here, here, the polymerase has to be very processive meaning and that's the term for in enzymology, to say that the enzyme holds on to the substrate, the DNA in this case, for a long time and makes long stretches of DNA. And because there are also DNA polymerases, like Klenow it makes one one incorporation and then the complex falls apart, right and that's obviously useless in this scenario. So we want to make 1000s of and what we have now hundreds of 1000s of bases made by the enzyme before this complex falls apart.
Vivien
This was the initial concept. There were challenges. But he had ideas about what to try. He reached out to a neighboring lab where a graduate student Steve Turner became intrigued by the project.
Jonas Korlach [17:10]
So this is the initial concept. And so there were two main problems. One, there were no available microscopes to watch a single polymerase molecule, all the available microscopes at the time, you know, would look at a volume that is much bigger than a single enzyme molecule. And so if you had, let's say, labeled nucleotides, you would look at hundreds of 1000s at a time, and you would never see the one that's being processed by the polymerase.
My graduate advisor suggested actually, we approached a lab at Cornell that was right next to ours, the lab of Harold Craighead. And Steve Turner was a graduate student in his lab. And of course, Steve is our CTO, I think, you know, Steve very well.
Steve then invented the zero mode waveguides, which are, you know, one of the pillars of the technology.
Vivien
So that's the beginning of the zero mode waveguide that Steve Turner developed. It's a way to build a tiny workspace, a nanostructure in which you can make measurements. In the case of PacBio the measurements are from fluorescent molecules that label the bases at the single molecule level.
Jonas Korlach [18:20]
Steve has been as instrumental and as much of a heart and soul of this endeavor, both at Cornell, and then with the company, he founded the company, and so forth. So I had no intellectual contribution to the inception of the zero mode waveguide. I mean, this is the this is the rig that I built, and then, you know, tested the zero waveguides.
But Steve, had the initial idea, he figured out how to make them and so forth. So and I've, you know, we met, again, 25 years ago, and he was really interested in this in this problem of doing DNA sequencing with single molecule real time. And that's how we started working together. And so he figured out how to make them, I measured the volume that these waveguides can. And so this then led in 2003, to the what we call the first science paper, which we're pleased to get the cover, and demonstrating that with zero mode waveguides, you can make such a small volume. So this was the first implementation of of the original idea.
Vivien
As is so common in science and in tech development there was another issue, and it related to the molecular biology: how to label the nucleotides so you could identify them. And where to attach them physically on the nucleotide.
Jonas Korlach [19:40]
The second problem was a molecular biology problem, all the labeled nucleotides at the time that you could buy had the label attached to the base of the nucleotide. And so that's a problem because, you know, you have to have four things to make the idea work, you have to distinguish the four bases, let's say by color, you have to label all of them. Because you know, you want to detect all of them. And then you have to keep the polymerase happy. And you have to keep the background low. And with this type of implementation, those two are not met. Because if the label is attached to the base, it stays with the, with the with the DNA, it's, it's incorporated in the growing DNA chain. And the polymerase says, Okay, this, this doesn't look like DNA anymore, and I'm quitting and so forth.
And then of course, the other problem is that if the label stays with the DNA, you have all these labels accumulate as the polymerase even if it could incorporate those. So another notebook sketch, other possibility of the detection could be nucleotide analog with tail attached to the triphosphate moiety. So the idea, then that I had was to put the label on the other end of the molecule. So you see previously that it was attached to the base.
Now it's we're attaching it to the terminal phosphate. And so of course, what the polymerase does is when it incorporates a base into the growing DNA chain, is it cleaves this bond right here. This part stays with the DNA and this pyrophosphate floats away and is the waste product. And so now we are and as you know, this is smart sequencing.
Now we are detecting the base while it's being held by the polymerase is the polymerase cleaves this bond, this part stays with the DNA, which is the natural nucleotide, and then the label floats away after incorporation. And so the polymerase has no knowledge that it was dealing with a labeled nucleotide out After the incorporation, so I led the development of this with and showing that you could synthesize DNA with replacing all the nucleotides with 100% dye labeled terminal phosphate link nucleotides.
So then the third thing, and it's probably too much detail, we needed to develop a surface to what I mentioned before,attach the polymerase, keep it happy there and not have it bind to the, to the surfaces that we don't. And so I let the development of that surface chemistry with his PNAS paper, and then you had this in your video already, a lot of engineering and a lot of other innovations had to happen for then what we call the second science paper in January 2009, where we then finally were able to demonstrate, you know, four color sequencing. So, you know, I hope that gave you a little bit of a sense of the early history.
In summary, two things, you know, this is the concept and so, you know, it hadn't fleshed out this picture hadn't fleshed out the specific implementation of the zero-mode waveguides and the fascinating new blue title so forth, but it's certainly without doubt is the picture that sort of got it all started, and it has the hallmark features of SMRT sequencing in it.
Vivien
At PacBio, the relationship between Jonas Korlach and Steve Turner was a scientific collaboration that grew also into a scientific friendship that lasts to this day.
Jonas [23:30]
He's a brilliant guy, and so full of creativity and ideas. And there was really, I think, a perfect combination, because these backgrounds of physics, and he's an expert in nano fabrication. And all that, of course, was necessary and was the underlying reason for why he was able to invent and develop the zero-mode waveguides . And my background is in molecular biology and biochemistry. And so I think, what we found stimulating is to feed off each other complementing this, what is ultimately a biophysics problem with our two very complementary backgrounds.
Vivien
Since we were talking about friendships I thought I would ask Jonas Korlach about his mentors. And about how he sees his role in science these days, since he is involved in many different realms of science.
Jonas Korlach [24:20]
Yeah, a biologist and a method developer, I would say. My two earliest mentors. The first one was my mom's boss, and who was a chemist, and then my undergraduate advisor, both independently told me during my formative years that they felt that every major progress in science is mediated by a new method. And, and that struck me and my undergraduate advisor was actually a method developer. He was a biophysicist. And, and so I, I was really fascinated by his lectures, and then did my thesis with him the Diplom in Germany, and tried to develop a method which didn't work didn't happen was completely different area, but I, I found the field the activity fascinating.
And with regard to what you mentioned, dabbling in, you know, a number of different things, if you want to do and, and I should mention that, I was very grateful for Cornell University, allowing so I was in the biochemistry department at Cornell as a grad student, but I was able to do my PhD work in the applied physics department, and my PhD advisor was Watt Webb, who was, you know, as, you know, a world famous physicist, and so that nurturing of interdisciplinary, I mean, like real interdisciplinary, I spend 90% of my time at Cornell, probably 95% of my time in the Applied Physics Department, as a biochemistry grad student, right. And so that is really interdisciplinary research in practice, rather than lip service.
And so that way, I think I was exposed to all sorts of different things that I would never have seen in a classical sort of system where you stay stay in your lane of work. And then the other aspect to this is that when you are with a startup company, you know, they're like, 12 people and there's like, Okay, we need to fix this, you know, we have 12 warm bodies. And so I said, you know, I'm going to try the best I can to be a surface chemist for a year and a half or something like that. And it's, you know, as long as you know, your limitations and know how to reach out and get help from whoever you can, and then try to do the best you can. I think that's that's really rewarding too.
Vivien
Building on PacBio's SMRT sequencing is an approach called HiFi sequencing, which stands for high fidelity. The instrument makes multiple passes around a circular molecule essentially correcting errors it made on the previous pass. HiFi sequencing made the PacBio instrument much more accurate than it was previously. What is also true, however, is that HiFi sequencing is more expensive than other methods. But the good news is: it's going to get faster and cheaper.
Jonas [27:25]
If you look at protein structures and crystallography, right, so when Max Perutz, you know, and Watson and Crick, were doing their work, it was super cumbersome. And it took several years to get the first structure of a protein. And now it's, you know, it's a, it's a production type activity, and, you know, protein structures come out in minutes, literally, right?
When new methods are being developed, and initially, it's, it's a bit cumbersome, and more expensive, and so forth. And then you know, but whatever you look at NMR and mass spec, or, you know, it there is, over time, as the scientific community realizes, you know, that there's a lot of value there. And you can see things that you've never been able to see before, then through largely engineering, it gets faster and higher throughput.
That way, it becomes more scalable, easier to use. I mean, just as one example for, you know, PacBio long reads sequencing. The genome assemblies, typically about even three or four years ago, took several days. And now with the advent of HiFi sequencing, they take half an hour or an hour and so and these types of improvements are being made all the time we have a partnership with Google, and they've sped up and made the Hi Fi data both more accurate and increase the yield and increasing the computational speed and so forth.
On the computation side, of course, we're piggybacking on the telecom industry, computers are getting faster all the time, GPUs and so forth. Right. So I think it's inevitable that long read sequencing is going to get faster and cheaper and easier to use. We are seeing and increased adoption in the clinic.
Just this morning, I saw a new paper about Nationwide Children's now has implemented PacBio, full length RNA sequencing in the clinic, right? So and we can send you this, it's a nice article in this RNA seq magazine. So fusion and long isoform pipeline for cancer transcriptome base resolution.
And so here's the patient care, they call this PB flip, sample preparation, they do iso-seq they have the full, you know, and so, you know, you get patients who consent and now they can have PacBio isoseq seek done in the context of their cancer to detect these fusions.
Right, as you know, and so this happened this morning, right. And so this is the type of end coming back to what you mentioned, you know, this is the type of thing that gets me out of bed in the morning. I mean, it's just incredible to see these advances. And so we're seeing more and more of those.
And just, I think, you can take a look at the history, historical evolution of how initially Sanger sequencing and then of course, Illumina sequencing has evolved in developing these applications in the research domain, and then moving into the clinic. And I have no doubt that we're going to see exactly the same thing with Pac Bio long-read sequencing.
Vivien
For PhD students and post doctoral fellows and others embarking on a career in science and looking around Jonas Korlach has some advice.
Jonas Korlach [31:20]
I think there are three areas. If you feel like you were a method or application developer type person, which I commend you, because it takes a lot of perseverance to develop methods, 99% of the time, things won't work. So you have to take a lot of energy from the 1% of the time when something does succeed. Then I think, you work on either, you know, method improvements, or new applications for the method, you know, Vijay Ramani at UCSF as a perfect example, or Jason Shendure has done this, of course, for many, many years, where they just create brilliant new methods and applications and use cases of the technology. And we've seen many examples of this for PacBio.
The second type is if if you want to work in industry at the company, either, you know, so we have a very similar sort of research-focused branch where we do largely academic work, trying to, you know, test new things and so forth. But in the context of the more structured and of course, you know, more goal oriented and product-oriented system of the company, and then you know, you have the associated engineering and so forth.
And then the third one, like you said, is just to be a practitioner of the method for the different applications. And if you're in the biodiversity or conservation genomics space, and like you said, go out into the field and want to understand ecology and so forth, apply the method to that. And the great thing with a new method is, you're always going to find something new. I don't know, I haven't met a single researcher, who has applied PacBio iso-Seq, and has told me, I didn't find anything that I didn't know before. They all find new things, because you're looking at things with a new magnifying glass, right? You're looking at it with a different mousetrap and a better mousetrap to see isoforms, for example, right.
And so, or, like you mentioned in the clinic, applying it to, as you saw, by the latest example, patients with rare disease or cancer patients and, and that's, of course, tremendously rewarding, because, too, you mentioned the example of rare disease. Some o f those families have had diagnostic odysseys for, you know, over a decade in some cases. And I think we don't appreciate the emotional journey of those families, because it's not that nothing is happening during those 10 years. It's, oh, we have another method and let's do another test and let's do this.
And there's always a little bit of hope that finally you're going to be able to find the answer to the underlying reason. And even if it's, in many cases, it doesn't mean that we now have a way to treat or to cure even by just having the knowledge and to end the question mark and the the uncertainty of what is happening, and in some case, real world consequences, like risk of, if I want to have another child, what is the risk that the child's gonna have the same disease and so forth.
So the impact that we have seen, and I, you know, I only interact with the researchers, but they're telling me, they're giving me their encounter s with the families of now, finally, being able to see and conclusively close the book on what has caused this particular rare disease. It's, I mean, I'm getting goosebumps right now, just talking about it is just absolutely incredible.
Vivien
Knowing what is likely amiss with a loved one is terrifically powerful. That information is likely not yet a cure or a treatment per se but it can help to move toward a treatment. And it helps to avoid treatments that won't work and also avoid having people go from one physician to another and tests and more tests without getting good answers.
As Jonas Korlach looks ahead, he sees a lot of new uses for long-read sequencing and a need for new approaches that take spatial context into account. The knowledge to be gleaned from single cells is one aspect but readouts from cells in their native spatial habitat so to speak is another important aspect.
Jonas Korlach [35:40]
This is something that is happening now, over the last two years, we've certainly seen really great adoption of single cell, full-length RNA sequencing. And so there's a product that we're going to launch this quarter, to facilitate that, and so forth. And I've had multiple interactions with researchers who now want to take this, it's the logical next level, we're looking at single cells, we can resolve the transcriptomes with unprecedented isoform resolution in single cells, but we've lost their spatial organization and context. And so now, so I know of several groups that are actively applying PacBio to spatial transcriptomics right now.
Vivien
At one point in the not so distant past, long reads from long read sequencing were quite error -prone.
Jonas Korlach [36:40]
I think what fundamentally, what happened in 2019, is that. In 2019, we had two different worlds in sequencing, right? We had one world, which was accurate sequencing with short reads, and then we had long reads, but they had lots of errors in them.
And so the HiFi sequencing takes the best of those two worlds and puts them together and now generates highly accurate long reads. And so, you know, and what, you know, they say the rest is history, what we've seen are those fundamental paradigm changes that I mentioned, some of them, which I mentioned, and so I think the and the power of that is that you're sequencing directly, a single molecule, and you'll do it multiple times to wash out any sort of random errors that you would make to arrive at the most accurate reflection. of the sequence of that molecule, together with its epigenetic markers together with the 5-methyl cytosine that may also be on there.
Vivien
When PacBio introduced HiFi sequencing and scientists began using HiFi reads and other technology, they have been able to close gaps in the human genome sequence. For example there is the Telomere to Telomere consortium that is about sequencing chromosomes end to end, from telomere to telomere. There is also the Human Pangenome Reference Consortium, which is using long-read sequencing, too. Also: the human genome is no longer 3 billion base pairs that have been sort of computationally mashed up combining maternal and paternal genotypes.
Jonas Korlach [38:20]
So the human genome was completed for the first time that you know, you've seen this and the basis for this assembly. And this has been, you know, obviously, was publicized heavily. This was built directly from the HiFi reads. Now, we talked about this already, the human genome is now 6 billion bases in size, no longer 3 billion bases.
I wanted to mainly with this slide from the Human Pan Genome Reference Consortium, and these terrific new preprints from about two or three months ago, attribute sort of the notion because in one of these papers, they have this terrific sentence we no longer consider collapsed. three gigabytes genome assembly is a state of the art, but instead considered two genomes for every diploid genome assemble, that is six giga bases versus three giga bases where parental haplotypes are phased and fully resolved. So I think that is a very powerful sentence.
Vivien
And now one idea is that a pangenome could begin to be the new reference genome. It will involve telomere to telomere sequencing and capture more of the people's genomic diversity.
Jonas Korlach [39:25]
So this paradigm shift that a pan genome reference is the new reference concept. I mean, this is tremendous. And, you know, as much justified media attention that the telomere to telomere consortium has gotten for their paper, this I think is probably as relevant or impactful as fundamental, probably more, because it deals with diploid genomes, and for the first time, it represents the human species with regard to genome in the population context, right. So I mean, I think this is actually more, you know, important or transformative, should I say to the community. And so these are built from PacBio HiFi assemblies of 47 genetically diverse individuals. And as we just talked about, you have two copies.
And so these are built from PacBio HiFi assemblies of 47 genetically diverse individuals. And as we just talked about, you have two copies. So these are 94 haplotypes, that are contributing to this new pan genome reference, adding about, you know, 120 million bases of new sequence, over 1,500 gene duplications relative to the linear.
And then it immediately shows the dramatic benefits that you get from moving from the single linear reference concept to a population encompassing pan genome reference concept. 34% fewer errors in variant calling twice the number of structural variants per haplotype, compared to before, resolving complex regions, actually, the largest absolute increase in accuracy was in these challenging medically relevant genes, better representation of tandem repeats, RNA-seq mapping, ChIP-seq mapping and so forth. So I mean, this is a very powerful paper.
Vivien
PacBio is a member of the Human Pan Genome Reference Consortium. And the paper Jonas Korlach is talking about is one he was directly involved with.
Jonas Korlach [41:20]
This is the one that I was on and directly involved where they do a very detailed comparison of the different sequencing method, sequencing technologies and assemblers to to develop the most accurate, complete and cost effective diploid, and they find that HiFi sequencing gives the best results abroad, quoting approaches that use highly accurate long reads outperform those that did not. And so the assemblies, you know, you know, I won't read all of this, but they have the best performance.
And again, best variant calls in these, and the author's directly attribute this to the high degree of accuracy, give you 50, that's less than an error and 100,000 bases with long reads has only been a recent advance due to the high base calling accuracy of five I read. So so this is the one because it was, you know, technology related that I was directly involved in. And then the other so we I attend all the calls, there are weekly calls by the HPRC. And then we certainly support these efforts. But these are then to apply the these methods to generating the pan genome reference, look at segmental duplications and so forth. So that's the secondary work that we are less directly involved in.
Vivien
What he finds exciting about the Human Pangenome Reference Consortium work is its scale and scope. What he is also happy about is when more accurate long-read sequencing leads to new insights about the genome.
For example the way it enables looking at aspects of the genome such as segmental duplications. These are a kind of structural gnarly bit in genomes.
Jonas Korlach [43:00]
Segmental duplications. It's a form of structural variant. And this paper describes how segmental duplications have not been systematically addressed, because of the difficulties in mapping short read sequence to these virtual so segmental duplications are sequence large sequences duplicated, they're very similar to each other. And so with short reads, you don't know which one they the read came from, and as a quote, as a result became blacklisted from subsequent genomic analysis. So we just did not look at them because we couldn't.
And so in this paper, they use PacBio HiFi data to extend the variant calling into 5% of the human genome 120 million bases of additional segmental duplication sequence, what do they find? almost 2 million SNPs in the gene-rich portion of the genome previously considered largely inaccessible. And they find that variation, genetic variation is actually higher 60% higher in these regions compared to the unique regions that we've been looking at all along. So we haven't been able to see it. And now that we can we find, wow, there's actually more genetic variation there than in the in the regions that we have been looking at. One example where that's relevant, immediately clinically relevant is the immune immunoglobulin regions.
This is an extremely complex region, these are the segmental duplications here, and you have other repeat elements. And it has been typically ignored in genome-wide studies, believe it or not, I mean, it's that's incredible, right. So we know these are antibodies, they matter critically in disease. And we have not looked at the genetics of this at all.
And so this paper about three months ago now, use PacBio HiFi sequencing for the first time to develop a comprehensive catalogue of the genetic variation in this locus. And they find that well, your genetics actually makes a significant contribution to the antibody gene usage.
It's pretty, it's pretty incredible. So in the in the field, because this was a black box, and we couldn't see it, the field largely assumed that it's all the immune response for IgH is large, largely random with this, you know, VDJ, recombination of the different parts of the antibody and so forth. And now that we can see this region for the first time, lo and behold, we find that your genetics actually makes a pretty important contributor to how you're going to respond to certain diseases. And so obviously, very clear implications for any response in cancer immunotherapy, infectious disease, you know, there's something to be said about why some people get very sick with COVID, and others don't, and so forth.
Vivien
To Jonas Korlach, the advances such as those from the the telomere to telomere Consoritum, the T2T Consortium and the Human Pangenome Reference Consortium, the HPRC are enabling a new approach and scale for sequencing and assembly of genomes.
Jonas Korlach [46:20]
I think what we're going to be seeing is that telomere to telomere is going to be the new gold standard that everything's going to be measured, plant and animal genomes, we haven't talked very much about plant and animals, but a lot has been sequenced with.
And I think for the plant and animal field, they had basically largely given up on genome assemblies, because as you know, plant and animal genomes can be very complex, and you just can't do it with short reads at all. And so I just put one example, a recent example by the USDA, they sequenced locusts, six different species of locusts. And, you know, they're three times bigger than the human genome.
And so I liked this quote, you know, it's an 18-wheeler, next to a compact car for fruit fly. And so we've seen that transformation. And we've been a proud partner, and collaborator, and all these biodiversity and conservation genomics projects that you know about.
Vivien
If you heard the beginning this podcast, I had this comment from Jonas Korlach about a photo of him and former German Chancellor Angela Merkel at the Max Delbrück Center of Molecular Medicine. Let me just share it again here, because you might hear it differently now that you have heard about other aspects from him.
-
Vivien
I love the photo of you with Angela Merkel, I think this was at the Max Delbrueck Center
Jonas Korlach [47;40]
That's the Delbrück center, they got the first system there. And she came by and started the first run. And she has a physics background. So we, there's a there's a funny story. So the instrument after you load it, and we let her load everything. I mean, there, there's actually two funny stories. One was, we had to describe to the Security Service, what happens and we said, when then from medical will open this bag, and the security service guy said “Frau Merkel doesn't open any bags.”
And we said, okay, the bag will already open. And so she loaded the thing, and then you push a button, and the machine takes a few seconds, or maybe 30 seconds to make sure everything is okay, before it starts running. And so we had talked for quite a bit and she was really interested. And we had good chat, and she started it. And then one of the ministers tapped on the shoulder Frau Merkel, we have to go and she said, No, I want to wait for this run to start. So and then, luckily, five seconds later it started and then off they went. So but that was pretty cool.
Vivien
Jonas Korlach has seen plenty of developments in the sequencing area and has been part of the development of long-read sequencing. I wondered what he might say to a pioneer like Frederick Sanger if he had had a chance to meet him. Sanger developed the first way to sequence DNA and that become known as the Sanger sequencing. He passed away in 2013.
Jonas Korlach [49:00]
Fred Sanger is obviously one of my, you know, all time heroes. And a nd I had I, you know, obviously I never met him, but I had the pri vilege to meet some people who had, you know, so to work sort of this bridge, and told me, you know, their impressions and their experiences interacting with him. And, you know, I would have obviously loved to meet.
So, I think he would be delighted to see how sequencing has evolved, and how it is now done globally, on this massive scale. And in seeing how biology and medicine is changing, I think we would probably muse about all the things we don't yet know, even though people had previously we've, you know, proclaimed that the human genome has finished and I've been very pleased to see that it got the measured and thoughtful attention and media coverage that it was was appropriate to put the most recent work of now finally, completing the human genome in the right light. Not not as a way to denigrate or to somehow shed the Human Genome Project in a negative light because that would be inappropriate. And the Human Genome Project was absolutely transformational to the whole field. But
I think it was a great example of the progress in science that I would love to talk to Fred Sanger about and that you learn with every new advance, you learn things, but you also learn about all the things you don't yet understand. And those are great opportunities than to to go forth and. I like gardening and I understand he really liked his roses. So I think we'd talk about that a little bit too.
Vivien
That was Conversations with scientists. Today's guest was Dr. Jonas Korlach chief scientific officer of Pacific Biosciences. And the music used for this media project is Winnie The Moog Funky Energetic Intro and Acid Trumpet by Kevin MacLeod, downloaded from
filmmusic.io and licensed from filmmusic.io
Here's a shoutout and thank you to Lizelda Lopez who helped make this podcast happen.
And I just wanted to say because there's confusion about these things sometimes. PacBio didn't pay for this podcast and nobody to be in this podcast. This is independent journalism that I produce in my living room. I'm Vivien Marx, thanks for listening.