Gene clustering at intensice science: comparison using biological stability index

Authors

  • Miguel Urgilés Escuela Superior Politécnica de Chimborazo, Facultad de Ciencias, Grupo de Investigación Ciencia de Datos/Carrera de Estadística Informática, Riobamba, Ecuador
  • Michael Ulcuango Escuela Superior Politécnica de Chimborazo, Facultad de Ciencias, Grupo de Investigación Ciencia de Datos/Carrera de Estadística Informática, Riobamba, Ecuador
  • Rubén Pazmiño Escuela Superior Politécnica de Chimborazo, Facultad de Ciencias, Grupo de Investigación Ciencia de Datos/Carrera de Estadística Informática, Riobamba, Ecuador.

DOI:

https://doi.org/10.47187/perf.v1i23.257

Keywords:

genes, biological indices, statistics, comparison, intensive Science

Abstract

This research evaluates the performance of the best known clustering algorithms using the biological stability index (BSI). A comparison was made between the clustering algorithms, to determine which is the optimum according to the score obtained in each algorithm, the group of genetics in Intensive Science, which uses extensive databases to cover almost all the results that could probably really. This method is applied to a gene expression database (Microarray). The analysis was performed on the "mouse" database included in the clValid package in the R software, for the study of mouse mesenchymal cells (neural crest and derived mesoderm), graphic methods, such as dendograms, are used for a first approach. For the selection of the optimal algorithm, the biological stability index was calculated for each clustering algorithm, the best being the one closest to the unit. Consequently, the most stable algorithm for this database is "Diana". To reach this result, the number of clusters with the response obtained in each case was visualized graphically; the optimal algorithm was taken as the one that most closely matches the reality of the problem, taking into account its score in the indexes and also with the help of a phylogenetic graph for a final approach.

Downloads

Download data is not yet available.

References

Yan M. Methods of determining the number of clusters in a data set and a new clustering criterion. Virginia Tech; 2005.

Hey T, Tansley S, Tolle K. Jim Gray sobre la e-ciencia: un método científico transformado [Internet]. 148.206.157.233. 2009 [cited 2019 Jul 15]. Available from: http://148.206.157.233/casadelibrosabiertos/libroselectronicos/4toparadigma/4toparadigma.pdf#page=19

Pan H, Zhu J, Han D. Genetic algorithms applied to multi-class clustering for gene expression data. Genomics, proteomics Bioinforma / Beijing Genomics Inst [Internet]. 2003;1(4):279–87. Available from: http://dx.doi.org/10.1016/S1672-0229(03)01033-7

Bhattacherjee V, Mukhopadhyay P, Singh S, Johnson C, Philipose JT, Warner CP, et al. Neural crest and mesoderm lineage-dependent gene expression in orofacial development. Differentiation [Internet]. 2007 Jun 1 [cited 2019 Jul 15];75(5):463–77. Available from: https://www.sciencedirect.com/science/article/pii/S0301468109601390

Moreno V, Solé X, Moreno V. Uso de chips de ADN (microarrays) en medicina: fundamentos técnicos y procedimientos básicos para el análisis estadístico de resultados [Internet]. [cited 2019 Sep 4]. Available from: http://www.sc.ehu.es/ccwbayes/docencia/mmcc/docs/divulgativos/UsoDeChipsDeADN.pdf

DNA Microarray [Internet]. Genetic Science Learning Center. 2018 [cited 2019 Sep 9]. Available from: https://learngendev.azurewebsites.net/content/labs/microarray/#cite

Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, et al. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat Genet [Internet]. 2001 Dec [cited 2019 Sep 9];29(4):365–71. Available from: http://www.nature.com/articles/ng1201-365

Fraley C, Raftery AE. Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc. 2002;97(458):611–31.

Baena DDSR, Santos DJCR, Ruiz DJSA. Análisis de datos de Expresión Genética mediante técnicas de Biclustering. PhD thesis, Dto. Lenguajes y Sistemas Informáticos, Universidad de Sevilla …; 2006.

Zha H, He X, Ding C, Simon H, Gu M. Spectral relaxation for k-rneans clustering. In: Advances in Neural Information Processing Systems [Internet]. 2002. Available from: https://papers.nips.cc/paper/1992-spectral-relaxation-for-k-means-clustering.pdf%0A

Santhanam T, Velmurugan T. Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points. J Comput Sci [Internet]. 2010 [cited 2019 Sep 6];6(3):363–8. Available from: https://s3.amazonaws.com/academia.edu.documents/35351264/jcssp.2010.363.368.pdf?response-content-disposition=inline%3B filename%3DComputational_Complexity_between_K-Means.pdf&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWOWYYGZ2Y53UL3A%2F2019090

Datta S, Datta S. Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics [Internet]. 2003 Mar 1 [cited 2019 Jun 26];19(4):459–66. Available from: https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btg025

Kaufman L, Rousseeuw PJ. Finding groups in data : an introduction to cluster analysis. Wiley-Interscience; 1990. 342 p.

Struyf A, Hubert M, Rousseeuw PJ. Clustering in an object-oriented environment [Internet]. Vol. 1, Journal of Statistical Software. 1996 [cited 2019 Sep 6]. p. 1–30. Available from: https://www.jstatsoft.org/article/view/v001i04/clus.pdf

Maechler M. Cluster analysis extended Rousseeuw et al. R CRAN. 2013;

Brock G, Pihur V, Datta S, Datta S. clValid : An R Package for Cluster Validation. J Stat Softw [Internet]. 2008;25(4). Available from: http://www.jstatsoft.org/v25/i04/

Sekula MN. OptCluster : an R package for determining the optimal clustering algorithm and optimal number of clusters. 2015; Available from: http://ir.library.louisville.edu/etd%5Cnhttp://dx.doi.org/10.18297/etd/2147

Datta S, Datta S. Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC Bioinformatics [Internet]. 2006 Dec 31 [cited 2019 Jun 26];7(1):397. Available from: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-7-397

Yeung KY, Medvedovic M, Bumgarner RE. Clustering gene-expression data with repeated measurements. Genome Biol. 2003;4(5).

Getz G, Levine E, Domany E. Coupled two-way clustering analysis of gene microarray data. Proc Natl Acad Sci U S A. 2000;97(22):12079–84.

Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng GC. Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics. 2006;22(19):2405–12.

Sanusi S. R and Bioconductor Tools for Class Discovery Analysis: Example Analysis with Glioblastoma Multiforme (GBM) Data. 2017;(March).

Yeung KY, Haynor DR, Ruzzo WL. Validating clustering for gene expression data. Bioinformatics. 2001;17(4):309–18.

Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95(25):14863–8.

Nagi S, Bhattacharyya DK. Cluster analysis of cancer data using semantic similarity, sequence similarity and biological measures. Netw Model Anal Heal Informatics Bioinforma. 2014;3(1):1–38.

Datta S, Datta S. Evaluation of clustering algorithms for gene expression data. BMC Bioinformatics. 2006;7(SUPPL.4):1–9.

Downloads

Published

2020-01-31

How to Cite

Urgilés, M., Ulcuango, M., & Pazmiño, R. (2020). Gene clustering at intensice science: comparison using biological stability index. Perfiles, 1(23), 12-19. https://doi.org/10.47187/perf.v1i23.257