Escherichia coli is one of the key microbes that can cause disease in humans. It occurs in many different variants, each with a certain specialization. Some E. coli varieties are part of our normal flora and can thus be seen as both harmless. Other variants cause disease and is highly specialized. Some cause urinary tract infections, others provide various forms of intestinal infections and can provide severe sequelae. Food contaminated with such variants represent important health risks in society. New strains of E. coli with extensive resistance to antibiotics is a phenomenon which is increasing in Norway. In this project we are working to find out more about protein expression from such varieties of E. coli to better understand how E. coli is specialized to cause various forms of illness. Today we know the complete genetic material to a wide range of E. coli variants. We have collected collected data with the protein expression profiles of the main varieties where we simultaneously know the exact composition of the genetic material (DNA sequence) and performed "synonymous proteogenomics" analysis of these. We have established a simplified method to examine the proteins from an E. coli strain and can use this method to find more than 2000 different proteins simultaneously and get good quantitative data of the majority of these proteins. Currently it is acknowledged that E. coli express about 2600-2700 different proteins, which means that our method can identify about 80% of the proteins expected to be expressed. The method is efficient over prior methods and makes it possible to examine and compare many different strains of E. coli. We have also implemented data analysis programs that facilitate such a comparison considerably so that it becomes easier to sort out the proteins that distinguish between different E. coli variants and proteins that recur in all E. coli variants. To interpret all the information that is in the genetic material of bacteria is very demanding and requires advanced software to identify each gene. These methods have still some weaknesses. One gene encodes one protein. Therefore, by analyzing proteins it is possible to determine where the genes are. The information we get about all the proteins of E. coli variants, can therefore be used to update the interpretation of their genetic repertoir. These methods can also be used to compare how protein expression will vary with growth conditions. Preferably we would like to know which proteins are expressed when the bacteria cause an infection in our body. To get a step closer, we have established a method to purify bacteria from blood culture, ie directly from the samples obtained from patients with infection in the blood. We have characterized the protein expression profiles purified directly from blood cultures of patients with sepsis due to urinary tract infections and obtained interesting data showing how the protein expression profiles are under conditions that resembles the conditions in the human body.We also have interesting data from the main varieties of E. coli causing tourist diarrhea and are working on a publication to compare the protein expression profiles of these strains. We have also collected data from all of the main variants of E. coli and are working to complete the analysis of these findings. A main observation shows that proteins encoded in plasmids (i.e. bacterial mini-chromosomes that can be transferred between bacteria) are often produced in large quantities by the bacteria. Future development of proteomic methods for microbiology will be to establish better and simpler methods to characterize the protein expression profiles of bacteria. Such information could provide a good overview of sensitivity and resistance to antibiotics, answer whether the bacteria have special pathogenic capabilities, and we will gain detailed knowledge about how problematic bacteria spread in the community and in health institutions.
Escherichia coli is responsible for the most serious examples of infections observed in medical units. The ongoing genomic sequencing effort has facilitated high-throughput, large-scale analysis of 30 sequenced E. coli strains, including laboratory strain s, commensal strains, pathogens with different specializations and a multidrug resistant specimen. Genomics itself has not revealed a full understanding of the bacterial biology, indicating that acquisition of virulence and drug-resistance is often multi- factorial.
We will use high-resolution, highly accurate mass spectrometry to identify, characterize and quantify peptides from proteins present in extracts of E. coli samples. Our recent work shows that, with such approach, we can identify 2200-2500 bact erial proteins with a 1% false discovery rate per sample, in a single experiment, obtaining quantitative information for >95% of the identified proteins in a sample, with impressive dynamic ranges that goes from 4-6 orders of magnitude.
Protein data for l aboratory, commensal, and pathogenic strains of E. coli, will be compared. All samples originating from similar strains will be clustered and analyzed individually, but taking into consideration its cluster. While specific strains might develop independen t mechanisms of virulence or drug-resistance, certain key molecules might be differentially expressed in the strains of a certain cluster. The analysis of individual data from individual strains can be used to generate data-driven hypothesis about the bio logy involved in virulence or acquisition of drug-resistance. The analysis of a cluster of strains might be useful to limit potential targets for vaccine or therapy. This is an important feature, since it has been demonstrated in certain disease models th at different strains induce differential responses during infection in vaccinated/treated hosts.