000 09108nam a22002057a 4500
999 _c328
_d328
005 20190904105646.0
008 190904b ||||| |||| 00| 0 eng d
020 _a9788126546145
082 _a006.31
_bFOR
100 _aForeman, John W.
_91049
245 _aData smart: using data science to transform Information into insight
260 _aNew Delhi
_bWiley India Pvt. Ltd.
_c2018
300 _axx, 409 p.
365 _aINR
_b799.00
504 _aTABLE OF CONTENTS Introduction xiii 1 Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask 1 Some Sample Data 2 Moving Quickly with the Control Button 2 Copying Formulas and Data Quickly 4 Formatting Cells 5 Paste Special Values 7 Inserting Charts 8 Locating the Find and Replace Menus 9 Formulas for Locating and Pulling Values 10 Using VLOOKUP to Merge Data 12 Filtering and Sorting 13 Using PivotTables 16 Using Array Formulas 19 Solving Stuff with Solver 20 OpenSolver: I Wish We Didn’t Need This, but We Do 26 Wrapping Up 27 2 Cluster Analysis Part I: Using K-Means to Segment Your Customer Base 29 Girls Dance with Girls, Boys Scratch Their Elbows 30 Getting Real: K-Means Clustering Subscribers in E-mail Marketing 35 Joey Bag O’ Donuts Wholesale Wine Emporium 36 The Initial Dataset 36 Determining What to Measure 38 Start with Four Clusters 41 Euclidean Distance: Measuring Distances as the Crow Flies 41 Distances and Cluster Assignments for Everybody! 44 Solving for the Cluster Centers 46 Making Sense of the Results 49 Getting the Top Deals by Cluster 50 The Silhouette: A Good Way to Let Different K Values Duke It Out 53 How about Five Clusters? 60 Solving for Five Clusters 60 Getting the Top Deals for All Five Clusters 61 Computing the Silhouette for 5-Means Clustering 64 K-Medians Clustering and Asymmetric Distance Measurements 66 Using K-Medians Clustering 66 Getting a More Appropriate Distance Metric 67 Putting It All in Excel 69 The Top Deals for the 5-Medians Clusters 70 Wrapping Up 75 3 Naïve Bayes and the Incredible Lightness of Being an Idiot 77 When You Name a Product Mandrill, You’re Going to Get Some Signal and Some Noise 77 The World’s Fastest Intro to Probability Theory 79 Totaling Conditional Probabilities 80 Joint Probability, the Chain Rule, and Independence 80 What Happens in a Dependent Situation? 81 Bayes Rule 82 Using Bayes Rule to Create an AI Model 83 High-Level Class Probabilities Are Often Assumed to Be Equal 84 A Couple More Odds and Ends 85 Let’s Get This Excel Party Started 87 Removing Extraneous Punctuation 87 Splitting on Spaces 88 Counting Tokens and Calculating Probabilities 92 And We Have a Model! Let’s Use It 94 Wrapping Up 98 4 Optimization Modeling: Because That “Fresh Squeezed” Orange Juice Ain’t Gonna Blend Itself 101 Why Should Data Scientists Know Optimization? 102 Starting with a Simple Trade-Off f 103 Representing the Problem as a Polytope 103 Solving by Sliding the Level Set 105 The Simplex Method: Rooting around the Corners 106 Working in Excel 108 There’s a Monster at the End of This Chapter 117 Fresh from the Grove to Your Glasswith a Pit Stop Through a Blending Model 118 You Use a Blending Model 119 Let’s Start with Some Specs 119 Coming Back to Consistency 121 Putting the Data into Excel 121 Setting Up the Problem in Solver 124 Lowering Your Standards 126 Dead Squirrel Removal: The Minimax Formulation 131 If-Then and the “Big M” Constraint 133 Multiplying Variables: Cranking Up the Volume to 11 137 Modeling Risk 144 Normally Distributed Data 145 Wrapping Up 154 5 Cluster Analysis Part II: Network Graphs and Community Detection 155 What Is a Network Graph? 156 Visualizing a Simple Graph 157 Brief Introduction to Gephi 159 Gephi Installation and File Preparation 160 Laying Out the Graph 162 Node Degree 165 Pretty Printing 166 Touching the Graph Data 168 Building a Graph from the Wholesale Wine Data 170 Creating a Cosine Similarity Matrix 172 Producing an r-Neighborhood Graph 174 How Much Is an Edge Worth? Points and Penalties in Graph Modularity 178 What’s a Point and What’s a Penalty? 179 Setting Up the Score Sheet 183 Let’s Get Clustering! 185 Split Number 1 185 Split 2: Electric Boogaloo 190 And…Split 3: Split with a Vengeance 192 Encoding and Analyzing the Communities 193 There and Back Again: A Gephi Tale 197 Wrapping Up 202 6 The Granddaddy of Supervised Artificial Intelligence—Regression 205 Wait, What? You’re Pregnant? 205 Don’t Kid Yourself 206 Predicting Pregnant Customers at RetailMart Using Linear Regression 207 The Feature Set 207 Assembling the Training Data 209 Creating Dummy Variables 210 Let’s Bake Our Own Linear Regression 213 Linear Regression Statistics: R-Squared, F Tests, t Tests 221 Making Predictions on Some New Data and Measuring Performance 230 Predicting Pregnant Customers at RetailMart Using Logistic Regression 239 First You Need a Link Function 240 Hooking Up the Logistic Function and Reoptimizing 241 Baking an Actual Logistic Regression 244 Model Selection—Comparing the Performance of the Linear and Logistic Regressions 245 For More Information 248 Wrapping Up 249 7 Ensemble Models: A Whole Lot of Bad Pizza 251 Using the Data from Chapter 6 252 Bagging: Randomize, Train, Repeat 254 Decision Stump Is an Unsexy Term for a Stupid Predictor 254 Doesn’t Seem So Stupid to Me! 255 You Need More Power! 257 Let’s Train It 258 Evaluating the Bagged Model 267 Boosting: If You Get It Wrong, Just Boost and Try Again 272 Training the Model—Every Feature Gets a Shot 272 Evaluating the Boosted Model 280 Wrapping Up 283 8 Forecasting: Breathe Easy; You Can’t Win 285 The Sword Trade Is Hopping 286 Getting Acquainted with Time Series Data 286 Starting Slow with Simple Exponential Smoothing 288 Setting Up the Simple Exponential Smoothing Forecast 290 You Might Have a Trend 296 Holt’s Trend-Corrected Exponential Smoothing 299 Setting Up Holt’s Trend-Corrected Smoothing in a Spreadsheet 300 So Are You Done? Looking at Autocorrelations 306 Multiplicative Holt-Winters Exponential Smoothing 313 Setting the Initial Values for Level, Trend, and Seasonality 315 Getting Rolling on the Forecast 319 And Optimize! 324 Please Tell Me We’re Done Now!!! 326 Putting a Prediction Interval around the Forecast 327 Creating a Fan Chart for Effect 331 Wrapping Up 333 9 Outlier Detection: Just Because They’re Odd Doesn’t Mean They’re Unimportant 335 Outliers Are (Bad?) People, Too 335 The Fascinating Case of Hadlum v Hadlum 336 Tukey Fences 337 Applying Tukey Fences in a Spreadsheet 338 The Limitations of This Simple Approach 340 Terrible at Nothing, Bad at Everything 341 Preparing Data for Graphing 342 Creating a Graph 345 Getting the k Nearest Neighbors 347 Graph Outlier Detection Method 1: Just Use the Indegree 348 Graph Outlier Detection Method 2: Getting Nuanced with k-Distance 351 Graph Outlier Detection Method 3: Local Outlier Factors Are Where It’s At 353 Wrapping Up 358 10 Moving from Spreadsheets into R 361 Getting Up and Running with R 362 Some Simple Hand-Jamming 363 Reading Data into R 370 Doing Some Actual Data Science 372 Spherical K-Means on Wine Data in Just a Few Lines 372 Building AI Models on the Pregnancy Data 378 Forecasting in R 385 Looking at Outlier Detection 389 Wrapping Up 394 Conclusion 395 Where Am I? What Just Happened? 395 Before You Go-Go 395 Get to Know the Problem 396 We Need More Translators 397 Beware the Three-Headed Geek-Monster: Tools, Performance, and Mathematical Perfection 397 You Are Not the Most Important Function of Your Organization 400 Get Creative and Keep in Touch! 400 Index 401
520 _aDESCRIPTION The book provides nine tutorials on optimization, machine learning, data mining, and forecasting all within the confines of a spreadsheet. Each tutorial uses a real-world problem and the author guides the reader using query’s the reader might ask as how to craft a solution using the correct data science technique. Hosting these nine spreadsheets for download will be necessary so that the reader can work the problems along with the book. Important topics covered by the book: Linear and integer programming K-nearest neighbors graphs and clustering Logistic regression Demand forecasting with seasonal adjustments Price sensitivity, revenue optimization, and price-sensitive forecasting Naïve Bayes classification Outlier detection using graphs and Local Outlier Factors Multi-criteria decision analysis
650 _aData mining
_9365
650 _aWeb usage mining
_91050
942 _2ddc
_cBK