000 03292nam a22002297a 4500
999 _c4499
_d4499
005 20230117112317.0
008 230117b ||||| |||| 00| 0 eng d
020 _a9780367609504
082 _a001.42
_bHUA
100 _aHuang, Shuai
_910502
245 _aData analytics:
_ba small data approach
260 _bCRC Press
_aBoco Raton
_c2021
300 _axiv, 257 p.
365 _aGBP
_b68.99
504 _aTable of Contents 1. INTRODUCTION Who will benefit from this book Overview of a Data Analytics Pipeline Topics in a Nutshell 2. ABSTRACTION Regression & tree models Overview Regression Models Tree Models Remarks Exercises 3. RECOGNITION Logistic regression & ranking Overview Logistic Regression Model A Ranking Problem by Pairwise Comparison Statistical Process Control using Decision Tree Remarks Exercise 4. RESONANCE Bootstrap & random forests Overview How Bootstrap Works Random Forests Remarks Exercises 5. LEARNING (I) Cross validation & OOB Overview Cross-Validation Out-of-bag error in Random Forest Remarks Exercises 6. DIAGNOSIS Residuals & heterogeneity Overview Diagnosis in Regression Diagnosis in Random Forests Clustering Remarks Exercises 7. LEARNING (II) SVM & ensemble Learning Overview Support Vector Machine Ensemble Learning Remarks Exercises data analytics 8. SCALABILITY LASSO & PCA Overview LASSO Principal Component Analysis Remarks Exercises 9. PRAGMATISM Experience & experimental Overview Kernel Regression Model Conditional Variance Regression Model Remarks Exercises 10. SYNTHESIS Architecture & pipeline Overview Deep Learning inTrees Remarks Exercises CONCLUSION APPENDIX: A BRIEF REVIEW OF BACKGROUND KNOWLEDGE The normal distribution Matrix operations Optimization
520 _aData Analytics: A Small Data Approach is suitable for an introductory data analytics course to help students understand some main statistical learning models. It has many small datasets to guide students to work out pencil solutions of the models and then compare with results obtained from established R packages. Also, as data science practice is a process that should be told as a story, in this book there are many course materials about exploratory data analysis, residual analysis, and flowcharts to develop and validate models and data pipelines. The main models covered in this book include linear regression, logistic regression, tree models and random forests, ensemble learning, sparse learning, principal component analysis, kernel methods including the support vector machine and kernel regression, and deep learning. Each chapter introduces two or three techniques. For each technique, the book highlights the intuition and rationale first, then shows how mathematics is used to articulate the intuition and formulate the learning problem. R is used to implement the techniques on both simulated and real-world dataset.
650 _aR (Computer program language)
_91512
650 _aPython (Computer program language)
_911358
650 _aQuantitative research
_95213
700 _aDeng, Houtao
_911359
942 _2ddc
_cBK