topics in algorithms and data science introductionce.sharif.edu/.../root/slides/introduction.pdf ·...
TRANSCRIPT
![Page 1: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/1.jpg)
Topics in Algorithms and Data Science
Introduction
Omid Etesami
![Page 2: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/2.jpg)
Early Computer Science (according to John Hopcroft)
• CS in 1960’s: emphasis on programming languages, compilers, operating systems
• CS theory in 1960’s: finite automata, regular expressions, context free languages, computability
• CS in 1970’s: making computers more useful
for well-defined tasks
• CS theory in 1970’s: important addition of
algorithms
![Page 3: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/3.jpg)
Modern CS
• More focus on applications
• Merging of computing and communication
• More collected data in natural sciences, commerce, …
• Web, social networks
• Requires understanding data
![Page 4: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/4.jpg)
Modern CS theory
Not only discrete mathematics
but also
probability, statistics, numerical methods
![Page 5: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/5.jpg)
Textbook for the course
• Foundations of Data Science (draft of a new book as of May 2015)
by Avrim Blum, John Hopcroft, Ravi Kannan
• We will cover first four chapters.
• Available online: http://www.cs.cornell.edu/jeh/bookMay2015.pdf
![Page 6: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/6.jpg)
Outline of the course
• Random graphs
• High-dimensional geometry
• Singular value decomposition
![Page 7: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/7.jpg)
Random Graphs
• Models for web and social networks
• Simplest model: Erdos-Renyi random graph model
• Understanding global phenomenon such as giant connected component in terms of local choice
• Other models of random graphs: non-uniform
models, growth models with or without preferential
attachment, small-world graphs
![Page 8: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/8.jpg)
Random graphs (continued)
• Random constraint satisfaction problems (like 3-SAT)
• Non-uniform random graphs and their relation to modern coding theory (like fountain codes)
3-SAT solution space (height represents # of unsatisfied constraints)!
![Page 9: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/9.jpg)
High-dimensional geometry
• Represent data with vectors of many components
(e.g. in Search or Machine Learning)
• Intuition for two or three dimensions different from high dimensions!
Sphere in 3 dimensions Stereographic projection of sphere in 4 dimensions!
![Page 10: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/10.jpg)
Singular value decomposition (SVD)
• To deal with high-dimensional data, we need matrix algebra and matrix algorithms
• Singular value decomposition is an important tool
• Applications of SVD:
Principal Component Analysis
Clustering statistical mixtures of Gaussian probability densities
Discrete optimization like Max-CUT
![Page 11: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/11.jpg)
Grading
• Around 7 points for homework and quizzes.
• Around 5 points for midterm
• Around 8 points for final
• Additional points for presentation and project
![Page 12: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/12.jpg)
Homework
• Late homework is NOT accepted. Prepare early.
• You can work on homework together, but you should acknowledge your collaborators and your write-up should be your own. (If you do not acknowledge, you can receive negative points.)
• If you use internet, you should acknowledge your source.
![Page 13: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/13.jpg)
Prerequisites
• Probability including problem solving skills
and basic inequalities
• Linear algebra including
eigenvalues and eigenvectors
• Asymptotic analysis of algorithms
• Basic discrete math, basic calculus
• Most importantly, mathematical maturity like being able to rigorously prove things.
![Page 14: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/14.jpg)
A few teasers (reflecting the background you need for the course)
![Page 15: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/15.jpg)
Sex bias in graduate admissions
• 8442 men applied (44% admitted)
• 4321 women applied (35% admitted)
• In each department
% admitted women/women who applied
>=
% admitted men/men who applied
Can this happen?
![Page 16: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/16.jpg)
Generating a random permutation
for i = 1 to n
j = random between 1 and n
swap(x[i], x[j])
How can you prove the above algorithm does not generate a uniformly random permutation (for all n >= 3)?
![Page 17: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/17.jpg)
Matrix rank
Why is the number of linearly independent rows exactly equal to the number of linearly independent columns?
![Page 18: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,](https://reader036.vdocuments.site/reader036/viewer/2022062921/5f0396507e708231d409c98a/html5/thumbnails/18.jpg)
Volume of the sphere
Can you work out the volume of the sphere in 3 dimensions?