
Identification of Significant Genes and Pathways Related to Lung Cancer via Statistical Methods

In the 21st century, cancer research, integrated with biology, genetics, cytology and statistics, continues to be a hot spot. Since last century, many researchers have been working in this field on clinical observations and theoretical deduction. Among many researches, generic aspects of such, a relatively new method for learning causes and preventions for cancer, have begun to show its potential. Over the past decades, large-scale research projects have been launched but faced certain challenges.

Researchers often have to deal with tens of thousands of genes with a relatively small sample size of patient cases—a dilemma referred to as the “Curse of Dimensionality”—and it makes it hard to learn the data well because of relatively sparse data in high dimensional space. To deal with the dilemma, this study used p-values of individual genes for pathway enrichment to find statistically significant pathways. The aim of this study was to find significant genes and biological pathways that were related to lung cancer by statistical method and pathway enrichment analysis.

The dataset was retrieved from Gene Set Enrichment Analysis (GSEA), which collected data in collaboration with National Cancer Institute, National Institutes of Health, National Institute of General Medical Sciences etc. Normalized RNA sequencing data of 868 lung cancer patients was studied statistically with patients’ clinical observations. Two specific clinical data, recurrence free survival (RFS) and whether the patient has a new tumor event after initial treatment (cancer recurrence), were chosen to run regression on. The two major statistical methods, linear regression and logistic regression, were used in this paper. And gene set pathway enrichment analysis was also used.

The results showed that several significant genes, such as WNT2B, VAV2, and significant pathways, such as Metabolism of xenobiotics by cytochrome P450-Homo sapiens (human) and Fatty acid degradation-Homo sapiens (human), were found to be both statistically significant and biological studies supported. Significant genes, includingTESK2, C5orf43, and ZSCAN21, and significant pathways such as Pentose and glucoronate interconversions-Homo sapiens (human), were found to be new cancer-related genes and pathways.

Overall, this study is one of the many steps that human take to understand and finally defeat cancer. As more advanced developments occur in statistics, biology and genetics, a world without cancer approaches steadily.

Article by Yuhang Wu, from Cranbrook Educational Community, Bloomfield Hills, MI, USA.

