pg program in data science & data management systems
TRANSCRIPT
1
In collaboration with:A program by:
PG PROGRAM IN
FOR RECENT GRADUATES AND EARLY CAREER PROFESSIONALS
DATASCIENCE & DATAMANAGEMENTSYSTEMS
2
I N T R O D U C T I O N
Every business, ranging from agriculture to technology, generates tons of data for every process handled.
In the words of Peter Drucker
The data generated by modern companies is an important asset that can be leveraged to make effective business decisions. Data Science enables businesses to draw meaningful insights from massive amounts of data. As organizations realise the importance of being data-driven in an increasingly digital world, there has been a sharp uptick in the demand for data scientists.
The curriculum and content for PGP-DSDMS is designed keeping practice through execution as the central idea. Throughout the program you will work on 5+ hands-on projects using the tools you were taught during the lectures. Along with the projects, you will apply your learnings through assignments and quizzes throughout the program.
The topics covered in this course can be categorized into Data Science, Data Visualization and Data Engineering, giving you a well rounded exposure to the most important Data Science concepts. The course is divided into 3 modules, each taking a comprehensive approach to the concepts covered, with the projects at the end of each course giving you a holistic problem solving vision for Data Science problems.
Throughout the duration of this program, you will progress through the life-cycle of a Data Science project. Starting with Data Preparation and Data Wrangling, you will learn about how to build relationships between Datasets and automate data transformations. Further, you will learn how a Data pipeline can be created and run on a cloud service like AWS. You will also learn the optimal techniques for Data Visualization and Dashboarding using Tableau.
The final leg of this course gives you the opportunity to choose your specialization based on your area of Interest. Elective A, Big Data Engineering uses PySpark and HADOOP to compute large volumes of data efficiently, while Elective B, Building Data Science Models implements mathematical concepts like regression, classification clustering, etc. to build solutions for business problems.
According to the U.S. Bureau of Labor Statistics, Data Science roles are expected to grow by more than 25% in the coming years. In a recent survey conducted by Analytics Insight, there will be more than 3 million new job openings in Data Science worldwide, and the average value of Data Science salaries is $110,000.
What gets measured, gets managed.
3
P R O G R A M B E N E F I T S
P R O G R A M S T R U C T U R E *The program will be delivered in an online format through recorded lectures and more than 45 hours of online, live-mentored learning sessions. The mentored learning sessions are conducted by industry experts, who will help you gain industry exposure. You will also work on projects to get a simulated experience of the real challenges faced by a data scientist.
6 MONTHS 5-HANDS ON PROJECTS
1 INTEGRATIVE CAPSTONE PROJECT
WEEKLY MENTORED LEARNING SESSIONS
*Refer to program fees for more information.
MOST SOUGHT-AFTER DATA ENGINEERING AND DATA ANALYTICS TOOLS
MENTORED LEARNING SESSIONS WITH EXPERTS
CASE STUDIES FOCUSING ON REAL WORLD BUSINESS SCENARIOS
18 WEEKLY MENTORED LEARNING SESSIONS
36+ ASSIGNMENTS AND QUIZZES
18 PRACTICE EXERCISES
LAB SESSIONS AND PROJECTS FOR HANDS-ON EXPERIENCE
CAPSTONE PROJECT TO CONSOLIDATE YOUR LEARNINGS THROUGHOUT THE PROGRAM
CERTIFICATE FROM THE UNIVERSITY OF TEXAS AT AUSTIN
4
W H O I S T H I S P R O G R A M F O R ?
PG Program in Data Science and Data Management Systems is designed to include practical, hands-on skills across various tools and technologies that are in high demand and are critical for young career professionals to land their first job in the Data Science domain. The program takes you through the entire Data Science value chain, which includes.
Source: Indeed
Industry Growth Hiring Companies
$349 Billion global spending on Data Science in 2025.
Source: IDC Spending Guide
$110K average salaryfor Data Science roles.
Source: Glassdoor
11.5 Million new Jobsfor Data Science professionals.
Source: US Bureau of Labour Statistics
28% annual growthin Data Science jobs by 2026.
Source: US Bureau of Labor Statistics
Industry Trends
Our learners in the DSDMS program are early career professionals, with more than 85% of the cohort having less than 2 years of experience. They come from varied backgrounds like Information Technology, Banking, Pharma, Consulting and Research, with the drive to transform their career to the tune of the rapidly growing Data Science industry.
Data Management
Data Extraction
Exploratory Data Analysis
Data Visualization
Leveraging Cloud Infrastructure
Building Data Science Models
Orchestrating Data Pipelines
5
Y O U R P E R S O N A L C A R E E R
S U C C E S S T E A M
The PGP in Data Science and Data Management Systems is dedicated to ensure the success of all participants, even beyond the lectures and curriculum. With this program, you will get access to GL Excelerate - a career support program, exclusive to our PG Program learners.
• Career Sessions - Personal interactions with industry professionals and access to Career Workshops to gain valuable insights and guidance
• Resume & Linkedin Profile Review - An expert will review your resume and LinkedIn profile, and help build them so that you can achieve your career goals.
• Interview Preparation - Get an insider’s perspective to understand what recruiters look for when hiring for Data Engineering and Data Analytics roles
Apart from this, you will also have access to a Career Workshop, where you will receive guidance on evaluating job opportunities, identifying your strengths and weaknesses and preparing your elevator pitch for prospective employers.
You will also get a chance to appear for a Mock Interview, where you will get an opportunity to understand the expectations of recruiters, and receive personalized feedback on your performance.
With these tools, the PG Program in Data Science & Data Management Systems enables you to take the right steps when it comes to your professional growth and career development.
6
C E R T I F I C A T E
The University of Texas at AustinConferred to attest that
has successfully completed the
June 2020
JOHN SMITH
Post Graduate Program inData Science and Data Management Systems
Gaylen PaulsonAssociate Dean and Executive DirectorTexas Executive Education
Kumar Muthuraman, Ph.D,Faculty Director Data Science and Data Management SystemsTexas Executive Education
All certificate images are for illustrative purposes only. The actual certificate may be subject to change at the discretion of the university.
Hands-on practice sessions using Popular Industry Tools
and more..
7
C O U R S E C U R R I C U L U M
ESSENTIALS OF
COMPUTER SCIENCE
• Hardware
• OS
• Data Structures & Algorithms
• Programming
PYTHON FUNDAMENTALS
• Setup
• Variables
• Data Types
• Operators
• Functions
• Loops
• OOPS
LINUX FUNDAMENTALS
• Basics of OS
• Protocols and Networking
• Basic Linux Commands
VERSION CONTROL
• Introduction to Git
• Features of Git
• Basic Commands
• GitHub
DESCRIPTIVE STATISTICS
• Measures of Central Tendancy
• Measures of Dispersion
You will learn all the essentials of Computer Science, Programming and Statistics to build a strong foundation before you start your learning journey.
MODULE 1
PYTHON & SQL FOR DATA MANAGEMENT (10 Weeks)
COURSE 0: PRE-WORK
DATA PREPARATION
• Data Connectiaon and Data Read
• Data Formatting
• Missing Value Treatment
• Dataframe Operations
EXPLORATORY DATA ANALYSIS
• Graphs and Plots
• Univariate and Bivariate Analysis
• Correlation
WRANGLING
UNSTRUCTURED DATA
• Web Scraping
• Data Cleaning
• Exception Handling
PROJECT 1
8
In this course, you will learn how to connect, extract and aggregate data present in various data sources, clean and perform Exploratory Data Analysis and derive meaningful insights using Python.
In this course, you will learn how to Query data in RDBMS and NoSQL DBs, design schemas and relationships between tables and automate data transformations in a database using Stored Procedures.
INTRODUCTION TO DBMS AND
FUNDAMENTALS OF SQL
• Querying on SQL
• Functions
• Window Functions
DATA MODELING AND
ARCHITECTURE:
• ER Diagrams
• Schema Models
• Stored Procedures
• Views
NOSQL DATABASES
• File Formats and Comparison
• Introduction to MongoDB
• SQL Operations
PROJECT 2
COURSE 1: PYTHON FOR DATA SYSTEMS (6 WEEKS)
COURSE 2: SQL AND DATABASES (4 WEEKS)
9
In this course, you will learn how to orchestrate a data pipeline & navigate the AWS cloud infrastructure to leverage big data services to define and solve a business problem end-to-end from data requirements, to identifying drivers by formulating hypothesis, and finally present the insights in a markdown.
In this course, you will learn how to tell stories using data and create stunning dashboards with relevant visualizations to meet the business needs using Tableau.
INTRODUCTION TO CLOUD
INFRASTRUCTURE
• Cloud9-IDE
• Cluster Compute Services
• Storage and Databases
AIRFLOW FOR DATA PIPELINE
MANAGEMENT - PART 1
• Data Orchestration
• DAG
• Code Architecture
• UI
FOUNDATIONS OF STATISTICS
• Inferential Statistics
• Distributions
• Sampling
• CLT
• A/B Testing
HYPOTHESIS TESTING
• Interpreting p-values
• Errors
• Parametric Tests: t-Test and Chi-Square Test
PROJECT 3
DATA, STORIES AND
DASHBOARDING
• Visual Analytics
• Design Principles
TABLEAU - A BI TOOL
• Architecture
• Data Preparation
• Calculations
• Actions
• Performance Optimization
PROJECT 4
MODULE 2
DATA ANALYTICS & AUTOMATION (9 WEEKS)
COURSE 3: DATA ANALYTICS ON CLOUD (6 WEEKS)
COURSE 4: DATA VISUALIZATION USING TABLEAU (3 WEEKS)
10
Learn how to navigate and build solutions on the cloud by leveraging the Hadoop Ecosystem and use PySpark to compute huge volumes of data efficiently.
INTRO TO HADOOP AND
BIG DATA ECOSYSTEM
• HDFS• YARN• SQOOP• HIVE Fundamentals
DATA PROCESSING
USING SPARK
• Hadoop vs Spark• Spark Architecture• Launch Modes• RDDs
DATAFRAMES WITH
SPARK SQL
• Dataframes • Resource Allocation• Partitioning• Persistence
SPARK JOB OPTIMIZATION
• Memory Management• Dynamic Allocation• Compression• Shuffle
PROJECT 5
Learn how to apply industry relevant Data Science techniques such as Regression, Classification, Clustering, Dimensionality Reduction etc. to solve real world problems.
SUPERVISED LEARNING PT. 1
• SL vs USL
• Regression
• Evaluation Metrics
SUPERVISED LEARNING PT. 2
• Classification
• Linear vs Logistic
• Decision Trees
• Confusion Matrix
UNSUPERVISED LEARNING
• K-means
• K-modes
• K-prototype
• Elbow Curve
• Silhouette
MODEL TUNING
• Bias Variance Trade-off
• Underfitting vs Overfitting
• K-fold Validation
PROJECT 5
ELECTIVE A
BIG DATA ENGINEERING
ELECTIVE B
BUILDING DATA SCIENCE MODELS
MODULE 3
SPECIALIZATION (8 WEEKS)
COURSE 5 (4 WEEKS)
11
CAPSTONE PROJECT (4 WEEKS)
A comprehensive project that encompasses a rigorous employment of all the tools and techniques you have learnt as a part of this program. Through expert assistance, learners would learn how to solve and manage real-world Data Science problems.
This program also introduces you to advanced data science topics, which can be learnt at your own pace. These topics will bolster your understanding of Data Science, and will give you a competitive edge when applying for jobs and appearing for interviews.
In this course, you will learn model deployment techniques and make your model scalable, robust, and reproducible.
1. Model Deployment: Flask, Amazon SageMaker
2. Containerization using Docker: Productionalization
3. Container Orchestration: Kubernetes
COURSE 6: MODEL DEPLOYMENT (SELF-PACED)
In this course, you will learn how to perform a variety of statistical tests and the math behind them.
1. Tests for Normality: Shapiro-Wilk Test, Anderson-Darling Test, D’Agostino’s K2 Test
2. Parametric Tests: ANOVA, ANCOVA, Paired Student’s t-Test
3. Non-Parametric Tests: Mann-Whitney U Test, Wilcoxon Signed-Rank Test, Kruskal-Wallis H Test
4. Tests for Correlation: Pearson’s Correlation Test, Spearman’s Rank Correlation Test
COURSE 7: CORE STATISTICS (SELF-PACED)
12
P R O G R A M F E E S
The Post Graduate Program in Data Science and Data Management Systems has a series of modules designed to help you build your Data Science career.
Each Module builds on the previous one and gives you a deeper understanding of
the technologies prevalent in Data Science. You cannot begin an advanced module
without completing the previous modules.
You have the option to select between 3 learning paths, each designed to kick-start
your career in Data Science. with Unit I - Data Management Fundamentals, this course
gives you an option to start your learning journey at just USD 750. Get in touch with
you Program Advisor to learn more about the Units and how you can benefit from
each of them.
PROGRAM FEE - USD 2500
You need to successfully complete all 3 modules and the Capstone Project at the end of Module 3 to receive a Certificate from McCombs School of Business.
Unit I
Data Management Fundamentals
USD 750
Duration - 8 WeeksAccess to Career Workshops and Webinars
Unit II
Advanced Data Management Systems
USD 1500 USD 1750
Duration - 17 WeeksCareer Workshops + Resume & Linkedin Review by Professionals
Program
Data Science and Data Management Systems
USD 2500 USD 3000
Duration - 26 Weeks
Career Workshops + Resume & Linkedin Review + 1:1 Career Sessions by Industry Experts
OR
OR
13
COURSE PROJECTS
Here are a few sample projects to give you a glimpse into the program:
MOVIELENS DATA EXPLORATION
Industry Entertainment
Summary The GroupLens Research Project is a research group in the Department of Computer Science and Engineering at the University of Minnesota. In this project, you will perform exploratory data analysis to understand the popularity trends of movie genres and derive patterns in movie viewership.
Tools & Concepts
PERSONAL LOAN CAMPAIGN
Industry Banking
Summary You will build a model that helps to identify potential customers of a bank who have a higher probability of purchasing a loan.
Tools & Concepts
CALL DROP ANALYSIS
Industry Telecom
Summary This project involves identification of the major reason for call drops for a telecom company. A large volume of call record data is analysed using big data technologies to identify the reasons and provide recommendations to improve the telecom services to customers.
Tools & Concepts
Supervised Learning, etc.
Career Workshops + Resume & Linkedin Review + 1:1 Career Sessions by Industry Experts
14
TAXI DEMAND PREDICTION
Industry Transportation
Summary Understanding taxi supply and commuter demand, especially the imbalance between the supply and the demand, would directly help to improve the quality of taxi service and eventually increase a city’s traffic system efficiency. As part of this project, you will use Python & Big Data tools to analyze the demand for taxis during specific times of the day and also under specific weather conditions.
Tools & Concepts
15
E - P O R T F O L I O
P R O G R A M F A C U L T Y
SHOWCASE YOUR SKILLS WITH AN E-PORTFOLIO
The E-Portfolio summarizes all the projects you will undertake and tools you will learn during the program, helping you to stand out from other applicants in the highly competitive Data Science industry.
Kumar Muthuraman, H. Timothy (Tim) Harking Centennial Professor, Faculty Director, Center for Research and Analytics, McCombs, University of Texas at Austin, M.S & Ph.D, Stanford University
Dan Mitchell, Assistant Professor, McCombs School of Business Ph.D, The University of Texas at Austin
Ashish Agarwal, Assistant Professor, McCombs School of Business Ph.D, Tepper School of Business, Carnegie Mellon University
View Sample E-Portfolio here
16
M E N T O R S
BECOME INDUSTRY-READY WITH LIVE MENTORSHIP
Along with strong theoretical foundations, hands-on learning goes a long way in preparing you to solve real-world business problems. As you work on real-life projects, you will receive personalised live mentorship every weekend from industry experts in Data Engineering and Data Analytics domains.
Ali Soleymani - Lead Data Scientist at Task Resource Ltd. | LinkedIn
Hossein Kalbasi - Data Engineer at Concured | LinkedIn
Mohammad Amini - Data & Applied Scientist II at Microsoft | LinkedIn
17
A D M I S S I O N P R O C E S S
Admissions are conducted on a rolling basis and the admission process is closed once the requisite number of candidates have been enrolled into the program.
Fill a simple online application form.
Wait for the admission committee & faculty panel to review your application.
If selected, you will receive a letter of admission for the upcoming cohort.
READY TO ADVANCEYOUR CAREER?
SPEAK TO A PROGRAM ADVISOR
+1 512 559 1644
Have questions about the program or how it fits in with your career goals?email: [email protected]
APPLY NOW