Data Science Course
- 184 Hours of Intensive Classroom & Online Sessions
- 2 Capstone Live Projects
- Job Placement Assistance
Datamaze Certified Data Science Program. The Data Science course using Python and R endorses the CRISP-DM Project Management methodology and contains all the preliminary introductions needed. Students will grapple with Plots, Inferential Statistics, and Probability Distributions in this course. The core modules commence with a focus on Hypothesis Testing and the “4” must-know hypothesis tests. Data Mining Unsupervised Learning, Recommendation Engines, and Network Analytics with various Machine Learning algorithms, Text Mining, Natural Language Processing, Naive Bayes, Perceptron, and Multilayer Perceptron are dealt with in detail in the course.
Apply Now
Data Science Certification Programme Overview
This Data Science Course using Python and R endorses the CRISP-DM Project Management methodology and contains a preliminary introduction of the same. Data Science is a 90% statistical analysis and it is only fair that the premier modules should bear an introduction to Statistical Data Business Intelligence and Data Visualization techniques. Students will grapple with Plots, Inferential Statistics, and various Probability Distributions in the module. A brief exposition on Exploratory Data Analysis/ Descriptive Analytics is huddled in between. The core modules commence with a focus on Hypothesis Testing and the “4” must know hypothesis tests. Data Mining with Supervised Learning and the use of Linear Regression and OLS to enable the same find mention in succeeding modules. The prominent use of Multiple Linear Regression to build Prediction Models is elaborated. The theory behind Lasso and Ridge Regressions, Logistic Regression, Multinomial Regression, and Advanced Regression For Count Data is discussed in the subsequent modules.
A separate module is devoted to Data Mining Unsupervised Learning where the techniques of Clustering, Dimension Reduction, and Association Rules are elaborated. The nitty-gritty of Recommendation Engines and Network Analytics are detailed in the following modules. The various Machine Learning algorithms follow next like k-NN Classifier, Decision Tree and Random Forest, Ensemble Techniques, Bagging and Boosting, Adaboost, and Extreme Gradient Boosting. Text Mining, Natural Language Processing, Naive Bayes, Perceptron, and Multilayer Perceptron are the focal points of the succeeding modules.
The fundamentals of Neural Network ANN and Deep Learning Black Box Techniques like CNN, RNN, and SVM find prominent features as well. The concluding modules contain model-driven and data-driven algorithms for Forecasting and Time Series Analysis.
What is Data Science?
Data science is an amalgam of methods derived from statistics, Data Analysis, and Machine Learning that are trained to extract and analyze huge volumes of structured and unstructured data.
Who is a Data Scientist?
A Data Scientist is a researcher who has to prepare huge volumes of big data for analysis, build complex quantitative algorithms to organize and synthesize the information, and present the findings with compelling visualizations to senior management.
A Data Scientist enhances business decision making by introducing greater speed and better direction to the entire process.
A Data Scientist must be a person who loves playing with numbers and figures. A strong analytical mindset coupled with strong industrial knowledge is the skill set most desired in a data scientist. He must possess above the average communication skills and must be adept in communicating the technical concepts to non – technical people. Data Scientists need a strong foundation in Statistics, Mathematics, Linear Algebra, Computer Programming, Data Warehousing, Mining, and modeling to build winning algorithms.
They must be proficient in tools such as Python, R, R Studio, Hadoop, MapReduce, Apache Spark, Apache Pig, Java, NoSQL database, Cloud Computing, Tableau, and SAS.
Data Science Training Learning Outcomes
The Data Science using Python and R commences with an introduction to statistics, probability, python and R programming, and Exploratory Data Analysis. Participants will engage with the concepts of Data Mining Supervised Learning with Linear regression and Predictive Modelling with Multiple Linear Regression techniques. Data Mining Unsupervised using Clustering, Dimension Reduction, and Association Rules is also dealt with in detail. A module is dedicated to scripting Machine Learning algorithms and enabling Deep Learning and Neural Networks with Black Box techniques and SVM. Learn to perform proactive forecasting and Time Series Analysis with algorithms scripted in Python and R. in the best data science training institute in India.
- Work with various data generation sources
- Perform Text Mining to generate Customer Sentiment Analysis
- Analyse structured and unstructured data using different tools and techniques
- Develop an understanding of Descriptive and Predictive Analytics
- Apply Data-driven, Machine Learning approaches for business decisions
- Build models for day-to-day applicability
- Perform Forecasting to take proactive business decisions
- Use Data Concepts to represent data for easy understanding
Data Science Certification Modules
- Introduction to Python Programming
- Installation of Python & Associated Packages
- Graphical User Interface
- Installation of Anaconda Python
- Setting Up Python Environment
- Data Types
- Operators in Python
- Arithmetic operators
- Relational operators
- Logical operators
- Assignment operators
- Bitwise operators
- Membership operators
- Identity operators
- Data structures
- Vectors
- Matrix
- Arrays
- Lists
- Tuple
- Sets
- String Representation
- Arithmetic Operators
- Boolean Values
- Dictionary
- Conditional Statements
- if statement
- if – else statement
- if – elif statement
- Nest if-else
- Multiple if
- Switch
- Loops
- While loop
- For loop
- Range()
- Iterator and generator Introduction
- For – else
- Break
- Functions
- Purpose of a function
- Defining a function
- Calling a function
- Function parameter passing
- Formal arguments
- Actual arguments
- Positional arguments
- Keyword arguments
- Variable arguments
- Variable keyword arguments
- Use-Case *args, **kwargs
- Function call stack
- Locals()
- Globals()
- Stackframe
- Modules
- Python Code Files
- Importing functions from another file
- __name__: Preventing unwanted code execution
- Importing from a folder
- Folders Vs Packages
- __init__.py
- Namespace
- __all__
- Import *
- Recursive imports
- File Handling
- Exception Handling
- Regular expressions
- Oops concepts
- Classes and Objects
- Inheritance and Polymorphism
- Multi-Threading
- What is a Database
- Types of Databases
- DBMS vs RDBMS
- DBMS Architecture
- Normalisation & Denormalization
- Install PostgreSQL
- Install MySQL
- Data Models
- DBMS Language
- ACID Properties in DBMS
- What is SQL
- SQL Data Types
- SQL commands
- SQL Operators
- SQL Keys
- SQL Joins
- GROUP BY, HAVING, ORDER BY
- Subqueries with select, insert, update, delete
- atements?
- Views in SQL
- SQL Set Operations and Types
- SQL functions
- SQL Triggers
- Introduction to NoSQL Concepts
- SQL vs NoSQL
- Database connection SQL to Python
- All About 360DigiTMG & Innodatatics Inc., USA
- Dos and Don’ts as a participant
- Introduction to Big Data Analytics
- Data and its uses – a case study (Grocery store)
- Interactive marketing using data & IoT – A case study
- Course outline, road map, and takeaways from the course
- Stages of Analytics – Descriptive, Predictive, Prescriptive, etc.
- Cross-Industry Standard Process for Data Mining
- Typecasting
- Handling Duplicates
- Outlier Analysis/Treatment
- Zero or Near Zero Variance Features
- Missing Values
- Discretization / Binning / Grouping
- Encoding: Dummy Variable Creation
- Transformation
- Scaling: Standardization / Normalization
In this module, you will learn about dealing with the Data after the Collection. Learn to extract meaningful information about Data by performing Uni-variate analysis which is the preliminary step to churn the data. The task is also called Descriptive Analytics or also known as exploratory data analysis. In this module, you also are introduced to statistical calculations which are used to derive information along with Visualizations to show the information in graphs/plots
- Machine Learning project management methodology
- Data Collection – Surveys and Design of Experiments
- Data Types namely Continuous, Discrete, Categorical, Count, Qualitative, Quantitative and its identification and application
- Further classification of data in terms of Nominal, Ordinal, Interval & Ratio types
- Balanced versus Imbalanced datasets
- Cross Sectional versus Time Series vs Panel / Longitudinal Data
- Batch Processing vs Real Time Processing
- Structured versus Unstructured vs Semi-Structured Data
- Big vs Not-Big Data
- Data Cleaning / Preparation – Outlier Analysis, Missing Values Imputation Techniques, Transformations, Normalization / Standardization, Discretization
- Sampling techniques for handling Balanced vs. Imbalanced Datasets
- What is the Sampling Funnel and its application and its components?
- Population
- Sampling frame
- Simple random sampling
- Sample
- Measures of Central Tendency & Dispersion
- Population
- Mean/Average, Median, Mode
- Variance, Standard Deviation, Range
The raw Data collected from different sources may have different formats, values, shapes, or characteristics. Cleansing, or Data Preparation, Data Munging, Data Wrapping, etc., are the next steps in the Data handling stage. The objective of this stage is to transform the Data into an easily consumable format for the next stages of development.
- Feature Engineering on Numeric / Non-numeric Data
- Feature Extraction
- Feature Selection
- What is Power BI?
- Introduction to Power BI
- Overview of Power BI
- Architecture of PowerBI
- PowerBI and Plans
- Installation and introduction to PowerBI
- Transforming Data using Power BI Desktop
- Importing data
- Changing Database
- Data Types in PowerBI
- Basic Transformations
- Managing Query Groups
- Splitting Columns
- Changing Data Types
- Working with Dates
- Removing and Reordering Columns
- Conditional Columns
- Custom columns
- Connecting to Files in a Folder
- Merge Queries
- Query Dependency View
- Transforming Less Structured Data
- Query Parameters
- Column profiling
- Query Performance Analytics
- M-Language
- Data Optimization
- Derivatives
- Linear Algebra
- Matrix Operations
- Clustering 101
- Distance Metrics
- Hierarchical Clustering
- Non-Hierarchical Clustering
- DBSCAN
- Clustering Evaluation metrics
- Prinicipal Component Analysis (PCA)
- Singular Value Decomposition (SVD)
- Association rules mining 101
- Measurement Metrics
- Support
- Confidence
- Lift
- User Based Collaborative Filtering
- Similarity Metrics
- Item Based Collaborative Filtering
- Search Based Methods
- SVD Method
The study of a network with quantifiable values is known as network analytics. The vertex and edge are the nodes and connection of a network, learn about the statistics used to calculate the value of each node in the network. You will also learn about the google page ranking algorithm as part of this module.
- Entities of a Network
- Properties of the Components of a Network
- Measure the value of a Network
- Community Detection Algorithms
Learn to analyse unstructured textual data to derive meaningful insights. Understand the language quirks to perform data cleansing, extract features using a bag of words and construct the key-value pair matrix called DTM. Learn to understand the sentiment of customers from their feedback to take appropriate actions. Advanced concepts of text mining will also be discussed which help to interpret the context of the raw text data. Topic models using LDA algorithm, emotion mining using lexicons are discussed as part of NLP module.
- Sources of data
- Bag of words
- Pre-processing, corpus Document Term Matrix (DTM) & TDM
- Word Clouds
- Corpus-level word clouds
- Sentiment Analysis
- Positive Word clouds
- Negative word clouds
- Unigram, Bigram, Trigram
- Semantic network
- Extract, user reviews of the product/services from Amazon and tweets from Twitter
- Install Libraries from Shell
- Extraction and text analytics in Python
- LDA / Latent Dirichlet Allocation
- Topic Modelling
- Sentiment Extraction
- Lexicons & Emotion Mining
- Machine Learning primer
- Difference between Regression and Classification
- Evaluation Strategies
- Hyper Parameters
- Metrics
- Overfitting and Underfitting
- Probability – Recap
- Bayes Rule
- Naïve Bayes Classifier
- Text Classification using Naive Bayes
- Checking for Underfitting and Overfitting in Naive Bayes
- Generalization and Regulation Techniques to avoid overfitting in Naive Bayes
- Deciding the K value
- Thumb rule in choosing the K value.
- Building a KNN model by splitting the data
- Checking for Underfitting and Overfitting in KNN
- Generalization and Regulation Techniques to avoid overfitting in KNN
- Probability & Probability Distribution
- Continuous Probability Distribution / Probability Density Function
- Discrete Probability Distribution / Probability Mass Function
- Normal Distribution
- Standard Normal Distribution / Z distribution
- Z scores and the Z table
- QQ Plot / Quantile – Quantile plot
- Sampling Variation
- Central Limit Theorem
- Sample size calculator
- Confidence interval – concept
- Confidence interval with sigma
- T-distribution Table / Student’s-t distribution / T table
- Confidence interval
- Population parameter with Standard deviation known
- Population parameter with Standard deviation not known
Learn to frame business statements by making assumptions. Understand how to perform testing of these assumptions to make decisions for business problems. Learn about different types of Hypothesis testing and its statistics. You will learn the different conditions of the Hypothesis table, namely Null Hypothesis, Alternative hypothesis, Type I error, and Type II error. The prerequisites for conducting a Hypothesis test, and interpretation of the results will be discussed in this module.
- Formulating a Hypothesis
- Choosing Null and Alternative Hypotheses
- Type I or Alpha Error and Type II or Beta Error
- Confidence Level, Significance Level, Power of Test
- Comparative study of sample proportions using Hypothesis testing
- 2 Sample t-test
- ANOVA
- 2 Proportion test
- Chi-Square test
Data Mining supervised learning is all about making predictions for an unknown dependent variable using mathematical equations explaining the relationship with independent variables. Revisit the school math with the equation of a straight line. Learn about the components of Linear Regression with the equation of the regression line. Get introduced to Linear Regression analysis with a use case for the prediction of a continuous dependent variable. Understand about ordinary least squares technique.
- Scatter diagram
- Correlation analysis
- Correlation coefficient
- Ordinary least squares
- Principles of regression
- Simple Linear Regression
- Exponential Regression, Logarithmic Regression, Quadratic or Polynomial Regression
- Confidence Interval versus Prediction Interval
- Heteroscedasticity / Equal Variance
In the continuation of the Regression analysis study, you will learn how to deal with multiple independent variables affecting the dependent variable. Learn about the conditions and assumptions to perform linear regression analysis and the workarounds used to follow the conditions. Understand the steps required to perform the evaluation of the model and to improvise the prediction accuracies. You will be introduced to concepts of variance and bias.
- LINE assumption
- Linearity
- Independence
- Normality
- Equal Variance / Homoscedasticity
- Collinearity (Variance Inflation Factor)
- Multiple Linear Regression
- Model Quality metrics
- Deletion Diagnostics
You have learned about predicting a continuous dependent variable. As part of this module, you will continue to learn Regression techniques applied to predict attribute Data. Learn about the principles of the logistic regression model, understand the sigmoid curve, and the usage of cut-off value to interpret the probable outcome of the logistic regression model. Learn about the confusion matrix and its parameters to evaluate the outcome of the prediction model. Also, learn about maximum likelihood estimation.
- Principles of Logistic regression
- Types of Logistic regression
- Assumption & Steps in Logistic regression
- Analysis of Simple logistic regression results
- Multiple Logistic regression
- Confusion matrix
- False Positive, False Negative
- True Positive, True Negative
- Sensitivity, Recall, Specificity, F1
- Receiver operating characteristics curve (ROC curve)
- Precision Recall (P-R) curve
- Lift charts and Gain charts
Learn about overfitting and underfitting conditions for prediction models developed. We need to strike the right balance between overfitting and underfitting, learn about regularization techniques L1 norm and L2 norm used to reduce these abnormal conditions. The regression techniques of Lasso and Ridge techniques are discussed in this module.
- Understanding Overfitting (Variance) vs. Underfitting (Bias)
- Generalization error and Regularization techniques
- Different Error functions, Loss functions, or Cost functions
- Lasso Regression
- Ridge Regression
Extension to logistic regression We have multinomial and Ordinal Logistic regression techniques used to predict multiple categorical outcomes. Understand the concept of multi-logit equations, baseline, and making classifications using probability outcomes. Learn about handling multiple categories in output variables including nominal as well as ordinal data.
- Logit and Log-Likelihood
- Category Baselining
- Modeling Nominal categorical data
- Handling Ordinal Categorical Data
- Interpreting the results of coefficient values
As part of this module, you learn further different regression techniques used for predicting discrete data. These regression techniques are used to analyze the numeric data known as count data. Based on the discrete probability distributions namely Poisson, negative binomial distribution the regression models try to fit the data to these distributions. Alternatively, when excessive zeros exist in the dependent variable, zero-inflated models are preferred, you will learn the types of zero-inflated models used to fit excessive zeros data.
- Poisson Regression
- Poisson Regression with Offset
- Negative Binomial Regression
- Treatment of data with Excessive Zeros
- Zero-inflated Poisson
- Zero-inflated Negative Binomial
- Hurdle Model
Support Vector Machines / Large-Margin / Max-Margin Classifier
- Hyperplanes
- Best Fit “boundary”
- Linear Support Vector Machine using Maximum Margin
- SVM for Noisy Data
- Non- Linear Space Classification
- Non-Linear Kernel Tricks
- Linear Kernel
- Polynomial
- Sigmoid
- Gaussian RBF
- SVM for Multi-Class Classification
- One vs. All
- One vs. One
- Directed Acyclic Graph (DAG) SVM
Kaplan Meier method and life tables are used to estimate the time before the event occurs. Survival analysis is about analyzing the duration of time before the event. Real-time applications of survival analysis in customer churn, medical sciences, and other sectors are discussed as part of this module. Learn how survival analysis techniques can be used to understand the effect of the features on the event using the Kaplan-Meier survival plot.
- Examples of Survival Analysis
- Time to event
- Censoring
- Survival, Hazard, and Cumulative Hazard Functions
- Introduction to Parametric and non-parametric functions
Decision Tree models are some of the most powerful classifier algorithms based on classification rules. In this tutorial, you will learn about deriving the rules for classifying the dependent variable by constructing the best tree using statistical measures to capture the information from each of the attributes.
- Elements of classification tree – Root node, Child Node, Leaf Node, etc.
- Greedy algorithm
- Measure of Entropy
- Attribute selection using Information gain
- Decision Tree C5.0 and understanding various arguments
- Checking for Underfitting and Overfitting in Decision Tree
- Pruning – Pre and Post Prune techniques
- Generalization and Regulation Techniques to avoid overfitting in Decision Tree
- Random Forest and understanding various arguments
- Checking for Underfitting and Overfitting in Random Forest
- Generalization and Regulation Techniques to avoid overfitting in Random Forest
Learn about improving the reliability and accuracy of decision tree models using ensemble techniques. Bagging and Boosting are the go-to techniques in ensemble techniques. The parallel and sequential approaches taken in Bagging and Boosting methods are discussed in this module. Random forest is yet another ensemble technique constructed using multiple Decision trees and the outcome is drawn from the aggregating the results obtained from these combinations of trees. The Boosting algorithms AdaBoost and Extreme Gradient Boosting are discussed as part of this continuation module. You will also learn about stacking methods. Learn about these algorithms which are providing unprecedented accuracy and helping many aspiring data scientists win first place in various competitions such as Kaggle, CrowdAnalytix, etc.
- Overfitting
- Underfitting
- Voting
- Stacking
- Bagging
- Random Forest
- Boosting
- AdaBoost / Adaptive Boosting Algorithm
- Checking for Underfitting and Overfitting in AdaBoost
- Generalization and Regulation Techniques to avoid overfitting in AdaBoost
- Gradient Boosting Algorithm
- Checking for Underfitting and Overfitting in Gradient Boosting
- Generalization and Regulation Techniques to avoid overfitting in Gradient Boosting
- Extreme Gradient Boosting (XGB) Algorithm
- Checking for Underfitting and Overfitting in XGB
- Generalization and Regulation Techniques to avoid overfitting in XGB
Time series analysis is performed on the data which is collected with respect to time. The response variable is affected by time. Understand the time series components, Level, Trend, Seasonality, Noise, and methods to identify them in a time series data. The different forecasting methods available to handle the estimation of the response variable based on the condition of whether the past is equal to the future or not will be introduced in this module. In this first module of forecasting, you will learn the application of Model-based forecasting techniques.
- Introduction to time series data
- Steps to forecasting
- Components to time series data
- Scatter plot and Time Plot
- Lag Plot
- ACF – Auto-Correlation Function / Correlogram
- Visualization principles
- Naïve forecast methods
- Errors in the forecast and it metrics – ME, MAD, MSE, RMSE, MPE, MAPE
- Model-Based approaches
- Linear Model
- Exponential Model
- Quadratic Model
- Additive Seasonality
- Multiplicative Seasonality
- Model-Based approaches Continued
- AR (Auto-Regressive) model for errors
- Random walk
In this continuation module of forecasting learn about data-driven forecasting techniques. Learn about ARMA and ARIMA models which combine model-based and data-driven techniques. Understand the smoothing techniques and variations of these techniques. Get introduced to the concept of de-trending and de-seasonalize the data to make it stationary. You will learn about seasonal index calculations which are used to re-seasonalize the result obtained by smoothing models.
- ARMA (Auto-Regressive Moving Average), Order p and q
- ARIMA (Auto-Regressive Integrated Moving Average), Order p, d, and q
- A data-driven approach to forecasting
- Smoothing techniques
- Moving Average
- Exponential Smoothing
- Holt’s / Double Exponential Smoothing
- Winters / Holt-Winters
- De-seasoning and de-trending
- Seasonal Indexes
- Neurons of a Biological Brain
- Artificial Neuron
- Perceptron
- Perceptron Algorithm
- Use case to classify a linearly separable data
- Multilayer Perceptron to handle non-linear data
- Integration functions
- Activation functions
- Weights
- Bias
- Learning Rate (eta) – Shrinking Learning Rate, Decay Parameters
- Error functions – Entropy, Binary Cross Entropy, Categorical Cross Entropy, KL Divergence, etc.
- Artificial Neural Networks
- ANN Structure
- Error Surface
- Gradient Descent Algorithm
- Backward Propagation
- Network Topology
- Principles of Gradient Descent (Manual Calculation)
- Learning Rate (eta)
- Batch Gradient Descent
- Stochastic Gradient Descent
- Minibatch Stochastic Gradient Descent
- Optimization Methods: Adagrad, Adadelta, RMSprop, Adam
- Convolution Neural Network (CNN)
- ImageNet Challenge – Winning Architectures
- Parameter Explosion with MLPs
- Convolution Networks
- Recurrent Neural Network
- Language Models
- Traditional Language Model
- Disadvantages of MLP
- Back Propagation Through Time
- Long Short-Term Memory (LSTM)
- Gated Recurrent Network (GRU)
Tools Covered
Why you should take this program?
- The Certified Data Science is in association with Future Skills Prime accredited by NASSCOM, approved by the Government of India
- The curriculum is developed keeping in mind the trending tools and techniques that will make the student stand out in the hiring process.
- The learner will be able to earn a Joint Co-Branded Certificate of Participation by 360DigiTMG and Future Skills Prime
- The course is divided into different modules and each module gives students a thorough insight into all the important techniques that will make the learning process seamless and effective.
- 300 plus hours of online classes with capstone live project and 60 plus hours of assignments.
- The learner is eligible for Government of India (GOI) incentives after succesfully clearing the mandatory Future Skills Prime Assessment. For more details please visit: https://futureskillsPrime.in/govt-of-India-incentives.
- Learners will get access to multiple resources like NASSCOM Career Fair, NASSCOM industry events, Bootcamps, Career guidance sessions, etc.
- Learners will be eligible to apply for jobs and get job placement assistance through the Talent Connect Portal of Future Skills Prime.