Business analytics: the science of data - driven decision making
Kumar, U Dinesh
Business analytics: the science of data - driven decision making - New Delhi Wiley India Pvt. Ltd. 2017 - xxi, 714 p.
Table of Content
Preface
Acknowledgments
1. Introduction to Business Analytics
1.1 Introduction to Business Analytics
1.2 Why Analytics
1.3 Business Analytics: The Science of Data-Driven Decision Making
1.4 Descriptive Analytics
1.5 Predictive Analytics
1.6 Prescriptive Analytics
1.7 Descriptive, Predictive and Prescriptive Analytics Techniques
1.8 Big Data Analytics
1.9 Web and Social Media Analytics
1.10 Machine Learning Algorithms
1.11 Framework for Data-Driven Decision Making
1.12 Analytics Capability Building
1.13 Roadmap for Analytics Capability Building
1.14 Challenges in Data-Driven Decision Making and Future
1.15 Organization of the Book
2. Descriptive Analytics
2.1 Introduction to Descriptive Analytics
2.2 Data Types and Scales
2.3 Types of Data Measurement Scales
2.4 Population and Sample
2.6 Percentile, Decile and Quartile
2.7 Measures of Variation
2.8 Measures of Shape − Skewness and Kurtosis
2.9 Data Visualization
3. Introduction to Probability
3.1 Introduction to Probability Theory
3.2 Probability Theory – Terminology
3.3 Fundamental Concepts in Probability – Axioms of Probability
3.4 Application of Simple Probability Rules – Association Rule Learning
3.5 Bayes’ Theorem
3.6 Random Variables
3.7 Probability Density Function (PDF) and Cumulative Distribution Function (CDF) of a Continuous Random Variable
3.8 Binomial Distribution
3.9 Poisson Distribution
3.10 Geometric Distribution
3.11 Parameters of Continuous Distributions
3.12 Uniform Distribution
3.13 Exponential Distribution
3.15 Chi-Square Distribution
3.16 Student’s t-Distribution
3.17 F-Distribution
4. Sampling and Estimation
4.1 Introduction to Sampling
4.2 Population Parameters and Sample Statistic
4.3 Sampling
4.4 Probabilistic Sampling
4.5 Non-Probability Sampling
4.6 Sampling Distribution
4.7 Central Limit Theorem (CLT)
4.8 Sample Size Estimation for Mean of the Population
4.9 Estimation of Population Parameters
4.10 Method of Moments
4.11 Estimation of Parameters Using Method of Moments
4.12 Estimation of Parameters Using Maximum Likelihood Estimation
5. Confidence Intervals
5.1 Introduction to Confidence Interval
5.2 Confidence Interval for Population Mean
5.3 Confidence Interval for Population Proportion
5.4 Confidence Interval for Population Mean When Standard Deviation is Unknown
5.5 Confidence Interval for Population Variance
6. Hypothesis Testing
6.1 Introduction to Hypothesis Testing
6.2 Setting Up a Hypothesis Test
6.3 One-Tailed and Two-tailed Test
6.4 Type I Error, Type II Error and Power of The Hypothesis Test
6.5 Hypothesis Testing for Population mean with Known Variance: Z-Test
6.6 Hypothesis Testing for Population Proportion: Z-Test for Proportion
6.7 Hypothesis Test for Population mean under Unknown Population Variance: t-Test
6.8 Paired Sample t-Test
6.9 Comparing Two Populations: Two-Sample Z- and t-Test
6.10 Hypothesis Test for Difference in Population Proportion under Large Samples: Two-Sample Z-Test for Proportions
6.11 Effect Size: Cohen’s D
6.12 Hypothesis Test for Equality of Population Variances
6.13 Non-Parametric Tests: Chi-Square Tests
7. Analysis of Variance
7.1 Introduction to Analysis of Variance (ANOVA)
7.2 Multiple t-Tests for Comparing Several Means
7.3 One-way Analysis of Variance (ANOVA)
7.4 Two-Way Analysis of Variance (ANOVA)
8. Correlation Analysis
8.1 Introduction to Correlation
8.2 Pearson Correlation Coefficient
8.3 Spearman Rank Correlation
8.4 Point Bi-Serial Correlation
8.5 The Phi-coefficient
9. Simple Linear Regression
9.1 Introduction to Simple Linear Regression
9.2 History of Regression–Francis Galton’s Regression Model
9.3 Simple Linear Regression Model Building
9.4 Estimation of Parameters Using Ordinary Least Squares
9.5 Interpretation of Simple Linear Regression Coefficients
9.6 Validation of the Simple Linear Regression Model
9.7 Outlier Analysis
9.8 Confidence Interval for Regression Coefficients b0 and b
9.9 Confidence Interval for the Expected Value of Y for a Given X
9.10 Prediction Interval for the Value of Y for a Given X
10. Multiple Linear Regression
10.1 Introduction
10.2 Ordinary Least Squares Estimation for Multiple Linear Regression
10.3 Multiple Linear Regression Model Building
10.4 Part (Semi-Partial) Correlation and Regression Model Building
10.5 Interpretation of MLR Coefficients − Partial Regression Coefficient
10.6 Standardized Regression Co-efficient
10.8 Validation of Multiple Regression Model
10.9 Co-efficient of Multiple Determination (R-Square) and Adjusted R-Square
10.10 Statistical Significance of Individual Variables in MLR – t-Test
10.11 Validation of Overall Regression Model: F-Test
10.12 Validation of Portions of a MLR Model – Partial F-Test
10.13 Residual Analysis in Multiple Linear Regression
10.14 Multi-Collinearity and Variance Inflation Factor
10.15 Auto-correlation
10.16 Distance Measures and Outliers Diagnostics
10.17 Variable Selection in Regression Model Building (Forward, Backward, and Stepwise Regression)
10.18 Avoiding Overfitting: Mallows’s Cp
10.19 Transformations
11. Logistic Regression
11.1 Introduction – Classification Problems
11.2 Introduction to Binary Logistic Regression
11.3 Estimation of Parameters in Logistic Regression
11.4 Interpretation of Logistic Regression Parameters
11.5 Logistic Regression Model Diagnostics
11.6 Classification Table, Sensitivity, and Specificity
11.7 Optimal Cut-Off Probability
11.8 Variable Selection in Logistic Regression
11.9 Application of Logistic Regression in Credit Rating
11.10 Gain Chart and Lift Chart
12. Decision Trees
12.1 Decision Trees: Introduction
12.2 Chi-Square Automatic Interaction Detection (CHAID)
12.3 Classification and Regression Tree
12.4 Cost-Based Splitting Criteria
12.5 Ensemble Method
12.6 Random Forest
13. Forecasting Techniques
13.1 Introduction to Forecasting
13.2 Time-Series Data and Components of Time-Series Data
13.3 Forecasting Techniques and Forecasting Accuracy
13.4 Moving Average Method
13.5 Single Exponential Smoothing (ES)
13.6 Double Exponential Smoothing – Holt’s Method
13.7 Triple Exponential Smoothing (Holt-Winter Model)
13.8 Croston’s Forecasting Method for Intermittent Demand
13.9 Regression Model for Forecasting
13.10 Auto-Regressive (AR), Moving Average (MA) and ARMA Models
13.11 Auto-Regressive (AR) Models
13.12 Moving Average Process MA(q)
13.13 Auto-Regressive Moving Average (ARMA) Process
13.14 Auto-Regressive Integrated Moving Average (ARIMA) Process
13.15 Power of Forecasting Model: Theil’s Coefficient
14. Clustering
14.1 Introduction to Clustering
14.2 Distance and Dissimilarity Measures used in Clustering
14.3 Quality and Optimal Number of Clusters
14.4 Clustering Algorithms
14.5 K-Means Clustering
14.6 Hierarchical Clustering
15. Prescriptive Analytics
15.1 Introduction to Prescriptive Analytics
15.2 Linear Programming
15.3 Linear Programming (LP) Model Building
15.4 Linear Programming Problem (LPP) Terminologies
15.5 Assumptions of Linear Programming
15.6 Sensitivity Analysis in LPP
15.7 Solving a Linear Programming Problem using Graphical Method
15.8 Range of Optimality
15.9 Range of Shadow Price
15.10 Dual Linear Programming
15.11 Primal−Dual Relationships
15.12 Multi-Period (Stage) Models
15.13 Linear Integer Programming (ILP)
15.14 Multi-Criteria Decision-Making (MCDM) Problems
16. Stochastic Models
16.1 Introduction Stochastic Process
16.2 Poisson Process
16.3 Compound Poisson Process
16.4 Markov Chains
16.5 Classification of States in a Markov Chain
16.6 Markov Chains with Absorbing States
16.7 Expected Duration to Reach a State from other States
16.8 Calculation of Retention Probability and Customer Lifetime Value using Markov Chains
16.9 Markov Decision Process (MDP)
16.10 Value Iteration Algorithm
17. Six Sigma
17.1 Introduction to Six Sigma
17.2 What is Six Sigma?
17.3 Origins of Six Sigma
17.4 Three-Sigma versus Six-Sigma Process
17.5 Cost of Poor Quality
17.6 Sigma Score
17.7 Industrial Applications of Six Sigma
17.8 Six Sigma Measures
17.9 Defects Per Million Opportunities (DPMO)
17.10 Yield
17.11 Sigma Score (or Sigma Quality Level)
17.12 DMAIC Methodology
17.13 Six Sigma Project Selection For DMAIC Implementation
17.14 DMAIC Methodology – Case of Armoured Vehicle
17.15 Six Sigma Toolbox
Summary
Multiple Choice Questions
Exercises
Case Study: Era of Quality at the Akshaya Patra Foundation
References
Appendix
Bibliography
Index
The book has 17 chapters and addresses all components of analytics such as descriptive, predictive and prescriptive analytics. The first few chapters are dedicated to foundations of business analytics. Introduction to business analytics and its components such as descriptive, predictive and prescriptive analytics along with several applications are discussed in Chapter 1. In Chapters 2 to 8, we discuss basic statistical concepts such as descriptive statistics, concept of random variables, discrete and continuous random variables, confidence interval, hypothesis testing, analysis of variance and correlation. Chapters 9 to 13 are dedicated to predictive analytics techniques such as multiple linear regression, logistic regression, decision tree learning and forecasting techniques. Clustering is discussed in Chapter 14. Chapter 15 is dedicated to prescriptive analytics in which concepts such as linear programming, integer programming, and goal programming are discussed. Stochastic models and Six Sigma are discussed in Chapters 16 and 17, respectively.
9788126568772
Mathematical statistics
Programming languages (Electronic computers)
Business logistics
Data mining
658.5 / KUM
Business analytics: the science of data - driven decision making - New Delhi Wiley India Pvt. Ltd. 2017 - xxi, 714 p.
Table of Content
Preface
Acknowledgments
1. Introduction to Business Analytics
1.1 Introduction to Business Analytics
1.2 Why Analytics
1.3 Business Analytics: The Science of Data-Driven Decision Making
1.4 Descriptive Analytics
1.5 Predictive Analytics
1.6 Prescriptive Analytics
1.7 Descriptive, Predictive and Prescriptive Analytics Techniques
1.8 Big Data Analytics
1.9 Web and Social Media Analytics
1.10 Machine Learning Algorithms
1.11 Framework for Data-Driven Decision Making
1.12 Analytics Capability Building
1.13 Roadmap for Analytics Capability Building
1.14 Challenges in Data-Driven Decision Making and Future
1.15 Organization of the Book
2. Descriptive Analytics
2.1 Introduction to Descriptive Analytics
2.2 Data Types and Scales
2.3 Types of Data Measurement Scales
2.4 Population and Sample
2.6 Percentile, Decile and Quartile
2.7 Measures of Variation
2.8 Measures of Shape − Skewness and Kurtosis
2.9 Data Visualization
3. Introduction to Probability
3.1 Introduction to Probability Theory
3.2 Probability Theory – Terminology
3.3 Fundamental Concepts in Probability – Axioms of Probability
3.4 Application of Simple Probability Rules – Association Rule Learning
3.5 Bayes’ Theorem
3.6 Random Variables
3.7 Probability Density Function (PDF) and Cumulative Distribution Function (CDF) of a Continuous Random Variable
3.8 Binomial Distribution
3.9 Poisson Distribution
3.10 Geometric Distribution
3.11 Parameters of Continuous Distributions
3.12 Uniform Distribution
3.13 Exponential Distribution
3.15 Chi-Square Distribution
3.16 Student’s t-Distribution
3.17 F-Distribution
4. Sampling and Estimation
4.1 Introduction to Sampling
4.2 Population Parameters and Sample Statistic
4.3 Sampling
4.4 Probabilistic Sampling
4.5 Non-Probability Sampling
4.6 Sampling Distribution
4.7 Central Limit Theorem (CLT)
4.8 Sample Size Estimation for Mean of the Population
4.9 Estimation of Population Parameters
4.10 Method of Moments
4.11 Estimation of Parameters Using Method of Moments
4.12 Estimation of Parameters Using Maximum Likelihood Estimation
5. Confidence Intervals
5.1 Introduction to Confidence Interval
5.2 Confidence Interval for Population Mean
5.3 Confidence Interval for Population Proportion
5.4 Confidence Interval for Population Mean When Standard Deviation is Unknown
5.5 Confidence Interval for Population Variance
6. Hypothesis Testing
6.1 Introduction to Hypothesis Testing
6.2 Setting Up a Hypothesis Test
6.3 One-Tailed and Two-tailed Test
6.4 Type I Error, Type II Error and Power of The Hypothesis Test
6.5 Hypothesis Testing for Population mean with Known Variance: Z-Test
6.6 Hypothesis Testing for Population Proportion: Z-Test for Proportion
6.7 Hypothesis Test for Population mean under Unknown Population Variance: t-Test
6.8 Paired Sample t-Test
6.9 Comparing Two Populations: Two-Sample Z- and t-Test
6.10 Hypothesis Test for Difference in Population Proportion under Large Samples: Two-Sample Z-Test for Proportions
6.11 Effect Size: Cohen’s D
6.12 Hypothesis Test for Equality of Population Variances
6.13 Non-Parametric Tests: Chi-Square Tests
7. Analysis of Variance
7.1 Introduction to Analysis of Variance (ANOVA)
7.2 Multiple t-Tests for Comparing Several Means
7.3 One-way Analysis of Variance (ANOVA)
7.4 Two-Way Analysis of Variance (ANOVA)
8. Correlation Analysis
8.1 Introduction to Correlation
8.2 Pearson Correlation Coefficient
8.3 Spearman Rank Correlation
8.4 Point Bi-Serial Correlation
8.5 The Phi-coefficient
9. Simple Linear Regression
9.1 Introduction to Simple Linear Regression
9.2 History of Regression–Francis Galton’s Regression Model
9.3 Simple Linear Regression Model Building
9.4 Estimation of Parameters Using Ordinary Least Squares
9.5 Interpretation of Simple Linear Regression Coefficients
9.6 Validation of the Simple Linear Regression Model
9.7 Outlier Analysis
9.8 Confidence Interval for Regression Coefficients b0 and b
9.9 Confidence Interval for the Expected Value of Y for a Given X
9.10 Prediction Interval for the Value of Y for a Given X
10. Multiple Linear Regression
10.1 Introduction
10.2 Ordinary Least Squares Estimation for Multiple Linear Regression
10.3 Multiple Linear Regression Model Building
10.4 Part (Semi-Partial) Correlation and Regression Model Building
10.5 Interpretation of MLR Coefficients − Partial Regression Coefficient
10.6 Standardized Regression Co-efficient
10.8 Validation of Multiple Regression Model
10.9 Co-efficient of Multiple Determination (R-Square) and Adjusted R-Square
10.10 Statistical Significance of Individual Variables in MLR – t-Test
10.11 Validation of Overall Regression Model: F-Test
10.12 Validation of Portions of a MLR Model – Partial F-Test
10.13 Residual Analysis in Multiple Linear Regression
10.14 Multi-Collinearity and Variance Inflation Factor
10.15 Auto-correlation
10.16 Distance Measures and Outliers Diagnostics
10.17 Variable Selection in Regression Model Building (Forward, Backward, and Stepwise Regression)
10.18 Avoiding Overfitting: Mallows’s Cp
10.19 Transformations
11. Logistic Regression
11.1 Introduction – Classification Problems
11.2 Introduction to Binary Logistic Regression
11.3 Estimation of Parameters in Logistic Regression
11.4 Interpretation of Logistic Regression Parameters
11.5 Logistic Regression Model Diagnostics
11.6 Classification Table, Sensitivity, and Specificity
11.7 Optimal Cut-Off Probability
11.8 Variable Selection in Logistic Regression
11.9 Application of Logistic Regression in Credit Rating
11.10 Gain Chart and Lift Chart
12. Decision Trees
12.1 Decision Trees: Introduction
12.2 Chi-Square Automatic Interaction Detection (CHAID)
12.3 Classification and Regression Tree
12.4 Cost-Based Splitting Criteria
12.5 Ensemble Method
12.6 Random Forest
13. Forecasting Techniques
13.1 Introduction to Forecasting
13.2 Time-Series Data and Components of Time-Series Data
13.3 Forecasting Techniques and Forecasting Accuracy
13.4 Moving Average Method
13.5 Single Exponential Smoothing (ES)
13.6 Double Exponential Smoothing – Holt’s Method
13.7 Triple Exponential Smoothing (Holt-Winter Model)
13.8 Croston’s Forecasting Method for Intermittent Demand
13.9 Regression Model for Forecasting
13.10 Auto-Regressive (AR), Moving Average (MA) and ARMA Models
13.11 Auto-Regressive (AR) Models
13.12 Moving Average Process MA(q)
13.13 Auto-Regressive Moving Average (ARMA) Process
13.14 Auto-Regressive Integrated Moving Average (ARIMA) Process
13.15 Power of Forecasting Model: Theil’s Coefficient
14. Clustering
14.1 Introduction to Clustering
14.2 Distance and Dissimilarity Measures used in Clustering
14.3 Quality and Optimal Number of Clusters
14.4 Clustering Algorithms
14.5 K-Means Clustering
14.6 Hierarchical Clustering
15. Prescriptive Analytics
15.1 Introduction to Prescriptive Analytics
15.2 Linear Programming
15.3 Linear Programming (LP) Model Building
15.4 Linear Programming Problem (LPP) Terminologies
15.5 Assumptions of Linear Programming
15.6 Sensitivity Analysis in LPP
15.7 Solving a Linear Programming Problem using Graphical Method
15.8 Range of Optimality
15.9 Range of Shadow Price
15.10 Dual Linear Programming
15.11 Primal−Dual Relationships
15.12 Multi-Period (Stage) Models
15.13 Linear Integer Programming (ILP)
15.14 Multi-Criteria Decision-Making (MCDM) Problems
16. Stochastic Models
16.1 Introduction Stochastic Process
16.2 Poisson Process
16.3 Compound Poisson Process
16.4 Markov Chains
16.5 Classification of States in a Markov Chain
16.6 Markov Chains with Absorbing States
16.7 Expected Duration to Reach a State from other States
16.8 Calculation of Retention Probability and Customer Lifetime Value using Markov Chains
16.9 Markov Decision Process (MDP)
16.10 Value Iteration Algorithm
17. Six Sigma
17.1 Introduction to Six Sigma
17.2 What is Six Sigma?
17.3 Origins of Six Sigma
17.4 Three-Sigma versus Six-Sigma Process
17.5 Cost of Poor Quality
17.6 Sigma Score
17.7 Industrial Applications of Six Sigma
17.8 Six Sigma Measures
17.9 Defects Per Million Opportunities (DPMO)
17.10 Yield
17.11 Sigma Score (or Sigma Quality Level)
17.12 DMAIC Methodology
17.13 Six Sigma Project Selection For DMAIC Implementation
17.14 DMAIC Methodology – Case of Armoured Vehicle
17.15 Six Sigma Toolbox
Summary
Multiple Choice Questions
Exercises
Case Study: Era of Quality at the Akshaya Patra Foundation
References
Appendix
Bibliography
Index
The book has 17 chapters and addresses all components of analytics such as descriptive, predictive and prescriptive analytics. The first few chapters are dedicated to foundations of business analytics. Introduction to business analytics and its components such as descriptive, predictive and prescriptive analytics along with several applications are discussed in Chapter 1. In Chapters 2 to 8, we discuss basic statistical concepts such as descriptive statistics, concept of random variables, discrete and continuous random variables, confidence interval, hypothesis testing, analysis of variance and correlation. Chapters 9 to 13 are dedicated to predictive analytics techniques such as multiple linear regression, logistic regression, decision tree learning and forecasting techniques. Clustering is discussed in Chapter 14. Chapter 15 is dedicated to prescriptive analytics in which concepts such as linear programming, integer programming, and goal programming are discussed. Stochastic models and Six Sigma are discussed in Chapters 16 and 17, respectively.
9788126568772
Mathematical statistics
Programming languages (Electronic computers)
Business logistics
Data mining
658.5 / KUM