MARC View

000			03292nam a22002297a 4500
999			_c4499 _d4499
005			20230117112317.0
008			230117b \|\|\|\|\| \|\|\|\| 00\| 0 eng d
020			_a9780367609504
082			_a001.42 _bHUA
100			_aHuang, Shuai _910502
245			_aData analytics: _ba small data approach
260			_bCRC Press _aBoco Raton _c2021
300			_axiv, 257 p.
365			_aGBP _b68.99
504			_aTable of Contents 1. INTRODUCTION Who will benefit from this book Overview of a Data Analytics Pipeline Topics in a Nutshell 2. ABSTRACTION Regression & tree models Overview Regression Models Tree Models Remarks Exercises 3. RECOGNITION Logistic regression & ranking Overview Logistic Regression Model A Ranking Problem by Pairwise Comparison Statistical Process Control using Decision Tree Remarks Exercise 4. RESONANCE Bootstrap & random forests Overview How Bootstrap Works Random Forests Remarks Exercises 5. LEARNING (I) Cross validation & OOB Overview Cross-Validation Out-of-bag error in Random Forest Remarks Exercises 6. DIAGNOSIS Residuals & heterogeneity Overview Diagnosis in Regression Diagnosis in Random Forests Clustering Remarks Exercises 7. LEARNING (II) SVM & ensemble Learning Overview Support Vector Machine Ensemble Learning Remarks Exercises data analytics 8. SCALABILITY LASSO & PCA Overview LASSO Principal Component Analysis Remarks Exercises 9. PRAGMATISM Experience & experimental Overview Kernel Regression Model Conditional Variance Regression Model Remarks Exercises 10. SYNTHESIS Architecture & pipeline Overview Deep Learning inTrees Remarks Exercises CONCLUSION APPENDIX: A BRIEF REVIEW OF BACKGROUND KNOWLEDGE The normal distribution Matrix operations Optimization
520			_aData Analytics: A Small Data Approach is suitable for an introductory data analytics course to help students understand some main statistical learning models. It has many small datasets to guide students to work out pencil solutions of the models and then compare with results obtained from established R packages. Also, as data science practice is a process that should be told as a story, in this book there are many course materials about exploratory data analysis, residual analysis, and flowcharts to develop and validate models and data pipelines. The main models covered in this book include linear regression, logistic regression, tree models and random forests, ensemble learning, sparse learning, principal component analysis, kernel methods including the support vector machine and kernel regression, and deep learning. Each chapter introduces two or three techniques. For each technique, the book highlights the intuition and rationale first, then shows how mathematics is used to articulate the intuition and formulate the learning problem. R is used to implement the techniques on both simulated and real-world dataset.
650			_aR (Computer program language) _91512
650			_aPython (Computer program language) _911358
650			_aQuantitative research _95213
700			_aDeng, Houtao _911359
942			_2ddc _cBK