Faculty Summaries
Karthik Devarajan, PhD
Karthik Devarajan, PhD
Associate Member & Assistant Professor
Office Phone: 215-728-2794
Fax: 215-728-2553
Office: R383
Statistical Methods in Bioinformatics

Advances in high-throughput technologies in the past decade have given rise to large-scale biological data that is measured in a variety of scales. Gene expression studies enable the simultaneous measurement of the expression profiles of tens of thousands of genes and proteins, often from only a handful of biological samples. Data is typically presented as a two-way numeric table in which the rows represent the genes, columns represent the samples and each entry consists of the expression level of a given gene in a given sample. The samples may represent a phenotype such as tissue type, experimental condition or time points. Traditionally these studies have involved the use of microarray technology to measure mRNA expression, and more recently, the use of SNP arrays to measure allele-specific expression and DNA copy number variation, methylation arrays to quantify DNA methylation and next-generation sequencing technologies such as RNA-Seq, ChIP-Seq etc. for the measurement of digital gene expression. In addition, high-throughput compound and siRNA screening assays are specifically designed to detect interactions with compounds by directly measuring inhibition of siRNA or kinase activity.

These studies have resulted in massive amounts of data requiring analysis and interpretation while offering tremendous potential for growth in our understanding of the pathophysiology of many diseases. The focus of my research is in the development of novel statistical methodology for the analysis of data stemming from such high-throughput studies. It includes methods for dimension reduction and molecular pattern discovery as well as for correlating a qualitative or quantitative outcome variable (tissue type, presence of disease, patient response to treatment, survival time etc.) with large numbers of covariates (genes, SNPs or sequence tags) based on supervised and unsupervised learning. The primary focus of my research activities consist of the following two problems from statistical learning theory - nonnegative matrix factorization (NMF) and continuum regression (CR).

Description of research projects
Selected Publications
  1. Devarajan, K., Wang, G., Ebrahimi, N. (2014). A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing, Machine Learning, 1-27. doi: 10.1007/s10994-014-5470-z. COBRA pre-print series, Article 80 (July 2011).
  2. Devarajan, K., Cheung, V.C.K. (2014). On non-negative matrix factorization algorithms for signal-dependent noise with application to electromyography data, Neural Computation, Jun;26(6):1128-68. Epub 2014 Mar 31. doi: 10.1162/NECO_a_00576.
  3. Devarajan, K., Ebrahimi, N. (2013). On penalized likelihood estimation for a non-proportional hazards regression model. Statistics and Probability Letters. 83, 1703-1710. NIHMS 462966.
  4. Devarajan, K., Ebrahimi, N.(2011). A semi-parametric generalization of the Cox proportional hazards regression model: Inference and Applications, Computational Statistics and Data Analysis, 55(1):667-76, doi:10.1016/j.csda.2010.06.010. PMCID: PMC2976538.
  5. Devarajan, K., Zhou, Y., Chachra, N., Ebrahimi, N. (2010). A supervised approach for predicting patient survival with gene expression data, Proceedings of the IEEE Tenth International Conference in Bioinformatics and Bioengineering, 2010(5521718):26-31. PMCID: PMC2941901.
  6. Devarajan, K. (2008). Non-negative matrix factorization an analytical and interpretive tool in computational biology, 4(7): e1000029. doi:10.1371/journal.pcbi.1000029, PLoS Computational Biology. PMCID: PMC2447881.
  7. Wang, M., Mehta, A., Block, T.M., Marrero, J., Di Bisceglie, A.M., and Devarajan, K. (2013). A comparison of statistical methods for the detection of hepatocellular carcinoma based on serum biomarkers and clinical variables. BMC Medical Genomics 6(Suppl 3):S9 (11 November 2013).
  8. Anastassiadis, T., Deacon, S., Devarajan, K., Ma, H., Peterson, J. Comprehensive assay of kinase catalytic activity reveals features of kinase inhibitor selectivity. Nat. Biotechnol. 29(11):1039-1045, 2011. PMCID: PMC3230241.
  9. Cortellino, S., Xu, J., Sannai, M., Moore, R., Caretti, E., Cigliano, A., Le Coz, M., Devarajan, K., Wessels, A., Soprano, D., Abramowitz, L.K., Bartolomei, M.S., Rambow, F., Bassi, M.R., Bruno, T., Fanciulli, M., Renner, C., Klein-Szanto, A.J., Matsumoto, Y., Kobi, D., Davidson, I., Alberti, C., Larue, L., Bellacosa, A. Thymine DNA Glycosylase Is Essential for Active DNA Demethylation by Linked Deamination-Base Excision Repair (ch. 108), Cell. 2011 Jul 8;146(1):67-79. Epub 2011 Jun 30. PMCID: PMC3230223.
  10. Astsaturov, I., Ratushny, V., Sukhanova, A., Einarson, M.B., Bagnukova, T., Zhou, Y., Devarajan, K., Silverman, J.S., Tikhmyanova, N., Skobeleva, N., Pecherskaya, A., Sharma, C., Nasto, R., Jablonski, S., Serebriiskii, I., Weiner, L., Golemis, E. (2010). Synthetic lethal screen of an EGFR-centered network to improve targeted therapies, Science Signaling, 2010 Sep 21;3(140):ra67. PMCID: PMC2950064.
  11. Altomare, D.A., Vaslet, C.A., Skele, K.L., De Rienzo, A., Devarajan, K., McClatchey, A.I., Kane, A.B. Jhanwar, S.C., Testa, J.R. (2005). A Mouse Model Recapitulating Molecular Features of Human Mesothelioma, Priority Report, Cancer Research, 65 (18): 8090-8095.
All publications