OPTIMIZED ALGORITHMS FOR DATA ANALYSIS IN PARALLEL DATABASE SYSTEMS

dc.contributor.advisorOrdonez, Carlos
dc.contributor.committeeMemberGabriel, Edgar
dc.contributor.committeeMemberGnawali, Omprakash
dc.contributor.committeeMemberHan, Zhu
dc.creatorCabrera, Wellington 1969-
dc.date.accessioned2017-06-30T20:51:43Z
dc.date.available2017-06-30T20:51:43Z
dc.date.createdMay 2017
dc.date.issued2017-05
dc.date.submittedMay 2017
dc.date.updated2017-06-30T20:51:44Z
dc.description.abstractLarge data sets are generally stored on disk following an organization as rows, columns or arrays, with row storage being the most common. On the other hand, matrix multiplication is frequently found in machine learning algorithms as an important primitive operation. Since database management systems do not support matrix operations, analytical tasks are commonly performed outside the database system, in external libraries or mathematical tools. In this work, we optimize several analytic algorithms that benefit from a fast in-database matrix multiplication. Specifically, we study how to compute in-database parallel matrix multiplication to solve two major family of big data analytics problems: machine learning models and graph algorithms We focus on three cases: the product of a matrix by its transposed, the powers of a square matrix and iteration of matrix-vector multiplication. Based on this foundation, we introduce important optimizations to the computation of fundamental linear models in machine learning: linear regression, variable selection and principal components analysis. On the other hand, we present parallel graph algorithms that take advantage of matrix powers and parallel vector multiplication to solve several graph problems: transitive closure, all pairs shortest paths, reachability from a single source vertex, single source shortest paths, connected components and PageRank.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.citationPortions of this document appear in: C. Ordonez, W. Cabrera, and A. Gurram. "Comparing columnar, row and array DBMSs to process recursive queries on graphs," Information Systems, 63 (2017): 66-79. https://doi.org/10.1016/j.is.2016.04.006
dc.identifier.urihttp://hdl.handle.net/10657/1854
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. UH Libraries has secured permission to reproduce any and all previously published materials contained in the work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectAlgorithms
dc.subjectGraph summarization
dc.subjectMatrix model
dc.titleOPTIMIZED ALGORITHMS FOR DATA ANALYSIS IN PARALLEL DATABASE SYSTEMS
dc.type.dcmitext
dc.type.genreThesis
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CABRERA-DISSERTATION-2017.pdf
Size:
2.44 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.83 KB
Format:
Plain Text
Description: