How to remove the pollution that occurs during the sequencing process?

[China Pharmaceutical Network Technology News] What pollution problems will occur during the sequencing process? How to detect pollution and how to remove it?



How to deal with the pollution that occurs during the sequencing process?

When Supratim Mukherjee conducted data analysis, he was surprised to find that the same phage sequence was repeated in hundreds of microbial genomes. The bioinformatician from Lawrence Berkeley National Laboratory first began to compare the metabolic pathways of these microbes, but then he discovered an almost ubiquitous sequence. "I thought we found something new," he said. Recall, "In these different microorganisms, this entire phage genome is completely preserved."

But when Mukherjee first analyzed the phage sequence, he knew it was the PhiX sequence, a phage used as a standard in the Illumina sequencing kit. PhiX was originally used as a quality control test to track the error rate that occurred during each sequencing process, but in hundreds of cases, Mukherjee found that none of the researchers removed PhiX sequences from their published genome sequences. .

It is not only Mukherjee who found this situation. A large number of recent reports indicate that the published genome is polluted much more than previously thought. So how do these pollutions occur? What can we do to avoid these situations?

In this regard, TheScientist magazine consulted several researchers who shared some of their Tips to detect and prevent the occurrence of "rogue sequences."

Extensive genetic pollution

After the Mukherjee team realized that PhiX contamination might have appeared in multiple published microbial genomes, the team felt that the frequency of its occurrence was quantified. Through analysis and investigation, Mukherjee et al. found that more than 1000 sequences were contaminated with PhiX sequences in the published 18,000 Integrated Microbial Genomes database. This year, Mukherjee et al. have published this discovery on Standardsin Genomic Sciences. And 10% of these are also in peer-reviewed journals.

PhiX pollution is just the tip of the iceberg – the problem is now exponentially growing, NCBI Director David Lipman said he is also screening data presented to GenBank over the past five years.

“We detected only 2%-3% of the contamination of bacteria and archaea in 2012,” Lipman said. “But it has risen sharply and by 2014 it is close to 10%. So far this year, this ratio Reached 23%".

Scientists at the Sanger Institute have also found that DNA extraction kits, chemical reagents, and bacteria in the laboratory environment can easily cause contamination and affect the results of microbiome analysis.

The researchers found that there should be only one strain in the control sample without contamination, but sometimes there were 270 different bacteria. Low biomass samples from the blood or lungs are particularly susceptible to contamination compared to high biomass samples (fecal samples).

“Now DNA sequencing technology allows people to perform deep sequencing and is widely used in the analysis of rare microbial populations. We have found that such samples are easily contaminated by DNA from other sources, either when collecting samples or during DNA extraction and expansion. In the process of increasing the pollution, the pollution will have a great impact on the research results, which requires the researchers to pay enough attention," said Dr. Alan Walker of the Sanger Institute.

And microbes are not the only area where there is so much pollution. Last year, University of London computer scientist William Langdon found that at least 7% of the 1,000-person genome project was contaminated with mycoplasma genetic material (BioData Mining, 7:3, 2014). So if you feel a headache for the contaminated genome, rest assured that you are not the only one.

Where does the pollution come from?

RobEdwards, a bioinformatician from San Diego State University, said there are many sources of pollution. “The first thing is that lab members may have confused two samples and accidentally tagged documents or samples with wrong labels. These can be enhanced by experimentation. Room management, improving the experimental record keeping system, etc. are easy to solve."

On the other hand, pollution can also come from other foreign genetic material that should not be present in the sample, or from the environment surrounding the cultured bacteria, Edwards said. Even if you think you are sequencing a single culture product, it is not uncommon to see multiple species in a single sequencing cycle.

In addition, if microorganisms from the human gut are being sequenced, human cells will naturally appear in the sample, and even if you only want to sequence the nuclear genes of an organism, intracellular mitochondria and chloroplast genes will appear. It is pollution. Of course, these contaminations are difficult to avoid completely, but some measures can be taken: clean up the sample before sequencing, or eliminate contaminated sequences in the sequencing results.

Edwards' research team focused on metagenomic sequencing from environmental samples, and he said his team often used filtration equipment to separate viruses and bacterial mixtures based on size. If they speculate that there is contamination of human DNA in the sample, then these sequences will be removed first, leaving only the genetic samples of the microbe.

Similarly, if it is necessary to remove the contamination in the system, such as the PhiX control sequence, the target gene sequence amplification primers and sequencing primers, and the cloning vector, a similar method can be employed.

With this in mind, there is another problem that is easy to overlook, that is, the contamination left by the equipment machine during the experiment, and a clear understanding of the source of these pollutions can help researchers select methods to eliminate them after sequencing, Edwards said, if the pollution is repeated Appear, then you may need to change the method or debug the machine.

However another source of contamination is the experiment between the dirty, bleeding, by allowing the gene to run by prior sequencing to appear on the next machine. Edwards said that only being aware of this contamination may exist in your experiment can help you choose the method of sequencing it after deletion. Or, if it appears repeatedly, you can try to change the protocol or troubleshoot your machine.

How to detect?

Undoubtedly, the sooner the pollutants are removed during the experiment, the better. “These pollutions will increase the direct cost of the experiment,” said Dominik Laetsch from the University of Edinburgh. “There is pollution,” the theoretical information you get on every penny. The less, "because it takes time to process and analyze the unwanted sequences. But there's also good news—even if the sequence is full of PhiX, primers, vectors, and genes for unwanted species, you can eliminate them before others see your final published genome.

Laetsch has developed a tool to help sequence cleanup before data analysis. This tool is called Blobtools-light and is the latest version that will put your contigs (assembled into the overlapping portions of the sequenced DNA in the final sequence) with the NCBI database. The known sequences in the alignment are compared, and then the software interprets the alignment visually – sequences from similar biological species will stand out.

“We use this as a preliminary screening tool,” says Laetsch, who is conducting research on pathogenic bacteria.

In addition, there is a similar procedure: ProDeGe (Protocol for fully automated Decontamination of Genomes) (ISME, doi: 10.1038/ismej.2015.100, 2015).

Like Blobtools, ProDeGe uses a public database that detects contamination in a genome and then groups the contigs into a "no pollution" group and a "pollution" group. In terms of price comparison, Blobtools-light can provide visual sequence diagrams, and ProDeGe can help researchers identify and identify what contaminants are.

“This method is relatively simple and doesn't need to be understood too much,” Mukherjee said. “So it is more appropriate for researchers who are not good at such tools.”

There are of course other methods, such as NCBI's Vec Screen, which is a way to quickly identify contaminated carriers in a sequence, and more advanced tools will be published later on the NCBI website.

However, all tools used to detect contaminants must grasp the balance between specificity and sensitivity, that is, accurately identify contaminants without deleting the target sequence. So it's important to understand your overall data. For example, if you are analyzing a new genome, the program will definitely indicate a high level of contaminants because the existing database does not contain your sequence data.

Or, if you know that there will be a highly contaminating bacterial genome, you can list the contaminants, Edwards said. "I recommend running a few more tools and comparing the results."

How to remove pollution

Once the contaminants and sources of contamination are found, data cleanup can begin. There are a variety of tools to choose from, such as the DeconSeq developed by the Edwards research group. Unlike other automated pollution screening programs, Decon Seq requires the user to enter the species properties of the contaminant and then automatically eliminate the species in the genome assembly. sequence.

If you skip this step, it may cause trouble. The Lipman team runs a screening of exogenous contaminants for each sequence presented to the Gen Bank in the NCBI system. He hopes that when a sequence is labeled as a contaminant, scientists can think of it as one of the data. Opportunity, and understand the weaknesses of technology, to avoid this problem in the future.

"If you just say 'well, my presentation has a problem, I am modifying it now', then the problem continues to emerge," Lipman said.

But what if there is pollution in the genome after the paper is published? For example, if you find errors after more experiments, then the point is to modify the errors as early as possible, in case others use the results of these errors for their own research. In some cases, this may mean contacting the magazine to see if it can be errata.

“Everyone needs to be responsible for their sequence data,” Mukherjee said. “If you find a problem, then withdraw it and make changes, then re-release it.”

How to improve genomic pollution

With the advancement of sequencing technology, many sources of pollution may disappear automatically in the future. This is indeed possible," Laetsch said. "As the assembly process becomes easier and the reading length becomes longer, it is easy to find pollution. "But researchers can't use this as an excuse to stop screening for contaminants. "The better the sample you put, the better the sequencing machine will do."

As genomic data becomes larger and larger, it is increasingly difficult to obtain clean sequences. This depends on every scholar doing everything they can to ensure that their genome sequences are not contaminated. "I think the scientific community knows. Contaminants are a big problem, but it takes more effort,” Mukherjee said.

The frequency of pollutants in Gen Bank has soared. Lipman agrees with the consensus of this issue. Why is there more and more pollution? Lipman said to the question, "More and more laboratories can carry out sequencing research. This is a happy thing in itself."

Magnetic Bead Nucleic Acid Extraction Reagent

The magnetic bead method nucleic acid extraction kit is a high-tech product that combines biological science and nanomaterial science. It is a major breakthrough in my country's nucleic acid extraction and purification technology. . Magnetic bead method for nucleic acid extraction has incomparable advantages over traditional DNA extraction methods, which are mainly reflected in: 1. It can realize automatic and large-scale operation. At present, there are 96-well nucleic acid automatic extraction instruments, and the extraction time of one sample can be realized. The processing of 96 samples complies with the high-throughput operation requirements of biology, enabling rapid and timely response to infectious disease outbreaks, which makes traditional methods unmatched; â‘¡ The operation is simple and time-consuming, and the entire extraction process has only four steps , most of them can be completed within 36-40 minutes; â‘¢ Safe and non-toxic, no toxic reagents such as benzene and chloroform in traditional methods are used, and the damage to experimental operators is minimized, which fully conforms to modern environmental protection concepts; â‘£ Magnetic beads and nucleic acids The specific binding of the extracted nucleic acid results in high purity and high concentration of the extracted nucleic acid. According to the same principle as the silica membrane spin column, the superparamagnetic silica nano-magnetic beads are prepared after the surface of the superparamagnetic nanoparticles is modified and modified by nanotechnology. The magnetic beads can specifically recognize and efficiently bind to nucleic acid molecules on a microscopic interface. Using the superparamagnetic properties of silica-coated nanomagnetic microspheres, under the action of Chaotropic salts (guanidine hydrochloride, guanidine isothiocyanate, etc.) and an external magnetic field, samples from blood, animal tissues, food, pathogenic microorganisms, etc. The isolated DNA and RNA can be used in clinical disease diagnosis, blood transfusion safety, forensic identification, environmental microbial testing, food safety testing, molecular biology research and other fields. Magnetic bead nucleic acid extraction can generally be divided into four steps: lysis-binding-washing-elution. Genetic testing will surely become a new symbol of the development of the biological industry. The emergence of high-throughput, automated nucleic acid extraction methods will reduce the labor cost of genetic testing, make large-scale testing a reality, and make it possible for genetic testing to reach ordinary people.

Magnetic Bead Nucleic Acid Extraction Reagent,Nucleic Acid Extraction Reagent,Extraction Kit Magnetics Bead Method,Magnetic Bead Acid Nucleic Extraction Kit

Jilin Sinoscience Technology Co. LTD , https://www.jilinsinoscience.com

Posted on