# Foundations of Data Science

Computer science as an academic discipline began in the 1960’s. Emphasis was on programming languages, compilers, operating systems, and the mathematical theory that supported these areas. Courses in theoretical computer science covered finite automata, regular expressions, context-free languages, and computability. In the 1970’s, the study of algorithms was added as an important component of theory. The emphasis was on making computers useful. Today, a fundamental change is taking place and the focus is more on applications. There are many reasons for this change. The merging of computing and communications has played an important role. The enhanced ability to observe, collect, and store data in the natural sciences, in commerce, and in other fields calls for a change in our understanding of data and how to handle it in the modern setting. The emergence of the web and social networks as central aspects of daily life presents both opportunities and challenges for theory.

## Foundations of Data Science – Lecture 2

Modern data often consists of feature vectors with a large number of features. High-dimensional geometry and Linear Algebra (Singular Value Decomposition) are two of the crucial areas which form the mathematical foundations of Data Science. This mini-course covers these areas, providing intuition and rigorous proofs. Connections between Geometry and Probability will be brought out. Text Book: Foundations of Data Science.

## Foundations of Data Science – Lecture 1

Modern data often consists of feature vectors with a large number of features. High-dimensional geometry and Linear Algebra (Singular Value Decomposition) are two of the crucial areas which form the mathematical foundations of Data Science. This mini-course covers these areas, providing intuition and rigorous proofs. Connections between Geometry and Probability will be brought out. Text Book: Foundations of Data Science.

## Foundations of Data Science – Lecture 3

Modern data often consists of feature vectors with a large number of features. High-dimensional geometry and Linear Algebra (Singular Value Decomposition) are two of the crucial areas which form the mathematical foundations of Data Science. This mini-course covers these areas, providing intuition and rigorous proofs. Connections between Geometry and Probability will be brought out. Text Book: Foundations of Data Science.

## Foundations of Data Science – Lecture 4

## Foundations of Data Science – Lecture 5 – Length Squared Sampling in Matrices

Modern data often consists of feature vectors with a large number of features. High-dimensional geometry and Linear Algebra (Singular Value Decomposition) are two of the crucial areas which form the mathematical foundations of Data Science. This min-course covers these areas, providing intuition and rigorous proofs. Connections between Geometry and Probability will be brought out. Text Book: Foundations of Data Science.

## Foundations of Data Science – Lecture 6 – Singular Value Decomposition – l

## Foundations of Data Science – Lecture 7 – Singular Value Decomposition – ll

## Foundations of Data Science – Lecture 8 – Low Rank Approximation (LRA) via Length Squared Sampling

## Foundations of Data Science – Lecture 9 – Two Applications of SVD

## Foundations of Data Science

Computer science as an academic discipline began in the 1960s. Emphasis was on programming languages, compilers, operating systems, and the mathematical theory that supported these areas. Courses in theoretical computer science covered finite automata, regular expressions, context-free languages, and computability. In the 1970s, the study of algorithms was added as an important component of theory. The emphasis was on making computers useful. Today, a fundamental change is taking place and the focus is more on applications. There are many reasons for this change. The merging of computing and communications has played an important role. The enhanced ability to observe, collect, and store data in the natural sciences, in commerce, and in other fields calls for a change in our understanding of data and how to handle it in the modern setting. The emergence of the web and social networks as central aspects of daily life presents both opportunities and challenges for theory. The book is available and freely downloadable at https://www.cs.cornell.edu/jeh/book.pdf