Let’s say you have just had an amazing dessert prepared by a friend. You’d love to recreate this experience yourself. But instead of giving you a recipe with standard measurements you’re able to recreate in your own kitchen, they give you instructions like pouring a handful of sugar, half a glass of water, and three fork lengths of butter together in a bowl. Get the portions wrong, and the result may not be so sweet. Baking can be boiled down to a science if most recipes and ingredient measurements are standardized. The same concept can be applied in your genomic research labs.
Breakthrough research depends on replicable laboratory procedures and reliable analysis. Testing should follow universally embraced protocols in sample collection for the resulting data to be considered valid. The analysis and interpretation of data, no matter the lab of origin, must follow common metrics.
Standardization helps find variants
To measure results from one individual against those of a control group and those with similar genomic profiles, data analysis must be unconditionally repeatable. Standardized data enables variant detection and identification, batch analysis and genetic assessment. Valuable insight has a positive impact for researchers today and clinicians and patients tomorrow.
It’s hard to avoid normalization when talking standardization. Normalization, also known as min-max scaling, sets data values between 0 and 1. You normalize data if features that need to be compared have different origins. Normalization, however, does not account for outliers. And in genomic research, outliers are the very data points researchers are looking for.
Standardization, also called Z-Score Normalization, scales data to have a mean of zero and a predetermined standard deviation. Standardized data is not bound to a specific scale and ensures that, if various datasets are compared, they’re measured by the same deviation without compromising reporting quality to better identify relevant variants.
Standardization supports universal learning
Consider what’s happening inside all of us at a molecular level. It took decades to compile data for the Human Genome Project. In the past decade we’ve started to see how genomic ingredients create a unique recipe for each person.
To understand cancer, inherited diseases, or even COVID-19, we must understand the recipes, but also where they are mutating. We’ve seen medical research advance rapidly, especially with the efficiency required for COVID-19 response. Labs use different machines for analysis, different sample sources and different algorithms to discover new variants. But if they standardize data, research can produce desperately needed solutions in record time by peers working in labs across the globe.
Standardization is the future of Data-Driven Medicine
In order to advance the research being conducted for disease prevention, detectability, or the spread of a virus, data must be standardized so that scientists working in a European lab could easily share their data with other scientists in Asia or North America, while observing applicable laws. Without standardization of data, language barriers would be the least of our concerns when it comes to understanding the work performed by our international peers. Data standardization can be like a universal language that connects all research with one comprehensive purpose – to eliminate data corruption or “bad data” and preserve the work of thousands to be built upon for the future.
If you’d like to learn more about how SOPHiA GENETICS can help you create a more efficient workflow in your lab, you can contact us today.