000 06415nam a22002297a 4500
999 _c4161
_d4161
005 20221130171811.0
008 221130b ||||| |||| 00| 0 eng d
020 _a9781119748007
082 _a005.73
_bFOW
100 _aFowler, Dave
_99360
245 _aInformed company:
_bhow to build modern agile data stacks that drive winning insights
260 _bJohn Wiley & Sons, Inc.
_aNew Jersey
_c2022
300 _axxxv, 220 p.
365 _aUSD
_b26.95
504 _aTABLE OF CONTENTS About This Book xiii Foreword xxi Introduction xxv Stage 1 Source (aka Siloed Data) 1 Chapter 1 Starting with Source Data 3 Common Options for Analyzing Source Data 4 Chapter 2 The Need to Replicate Source Data 11 Replicate Sources 12 Create Read-Only Access 14 Chapter 3 Source Data Best Practices 15 Keep a Complexity Wiki Page 15 Snippet Dictionary 16 Use a BI Product 17 Double Check Results 18 Keep Short Dashboards 19 Design Before Building 20 Stage 2 Data Lake (aka Data Combined) 23 Chapter 4 Why Build a Data Lake? 25 What Is a Data Lake? 26 Reasons to Build a Data Lake Summarized 27 Chapter 5 Choosing an Engine for the Data Lake 33 Modern Columnar Warehouse Engines 35 Modern Warehouse Engine Products 38 Database Engines 41 Recommendation 42 Chapter 6 Extract and Load (EL) Data 45 ETL versus ELT 46 EL/ETL Vendors 48 Extract Options 49 Load Options 51 Multiple Schemas 52 Other Extract and Load Routes 53 Chapter 7 Data Lake Security 55 Access in Central Place 56 Permission Tiers 57 Chapter 8 Data Lake Maintenance 59 Why SQL? 60 Data Sources 61 Performance 64 Upgrade Snippets to Views 68 Stage 3 Data Warehouse (aka the Single Source of Truth) 69 Chapter 9 The Power of Layers and Views 75 Make Readable Views 77 Layer Views on Views 78 Start with a Single View 81 Chapter 10 Staging Schemas 83 Orient to the Schemas 84 Pick a Table and Clean It 85 Other Staging Modeling Considerations 98 Building on Top of Staging Schemas 106 Chapter 11 Model Data with dbt 111 Version Control 111 Modularity and Reusability 112 Package Management 112 Organizing Files 113 Macros 113 Incremental Tables 114 Testing 115 Chapter 12 Deploy Modeling Code 119 Branch Using Version Control Software 119 Commit Message 120 Test Locally 120 Code Review 121 Schedule Runs 122 Chapter 13 Implementing the Data Warehouse 123 Manage Dependencies 124 Combine Tables Within Schemas 126 Combine Tables Across Schemas 128 Keep the Grain Consistent 130 Create Business Metrics 131 Keeping Accurate History 133 Chapter 14 Managing Data Access 135 How to Secure Sensitive Data in the Data Warehouse 137 How to Secure Sensitive Data in a BI Tool 140 Chapter 15 Maintaining the Source of Truth 143 Track New Metrics 144 Deprecate Old Metrics 147 Deprecate Old Schemas 149 Resolve Conflicting Numbers 150 Handling Ongoing Requests and Ongoing Feedback 151 Updating Modeling Code 152 Manage Access 153 Tuning to Optimize 156 Code Review All Modeling 157 Maintenance Checklist 158 Stage 4 Data Marts (aka Data Democratized) 161 Chapter 16 Data Mart Implementation 167 Views on the Data Warehouse 167 Segment Tables 168 Access Update 169 Chapter 17 Data Mart Maintenance 171 Educate Team 172 Identifies Issues 172 Identify New Needs 176 Help Track Success 176 Chapter 18 Modern versus Traditional Data Stacks: What’s Changed? 177 What’s Changed? 177 Chapter 19 Row-versus Column-Oriented Database 181 Row-Oriented Databases 182 Column-Oriented Databases 184 Summary 190 Chapter 20 Style Guide Example 191 Simplify 192 Clean 194 Naming Conventions 195 Share It 197 Chapter 21 Building an SST Example 199 First Attempt—Same Tables with Prefixes 199 Second Attempt—Operational Schema (Source Agnostic) 205 Third Attempt—Application Separate, Other Sources Smashed 207 Less Planning, More Implementing 209 Acknowledgments and Contributions 211 Index 213
520 _aDESCRIPTION Learn how to manage a modern data stack and get the most out of data in your organization! Thanks to the emergence of new technologies and the explosion of data in recent years, we need new practices for managing and getting value out of data. In the modern, data driven competitive landscape the "best guess" approach—reading blog posts here and there and patching together data practices without any real visibility—is no longer going to hack it. The Informed Company provides definitive direction on how best to leverage the modern data stack, including cloud computing, columnar storage, cloud ETL tools, and cloud BI tools. You'll learn how to work with Agile methods and set up processes that's right for your company to use your data as a key weapon for your success . . . You'll discover best practices for every stage, from querying production databases at a small startup all the way to setting up data marts for different business lines of an enterprise. In their work at Chartio, authors Fowler and David have learned that most businesspeople are almost completely self-taught when it comes to data. If they are using resources, those resources are outdated, so they're missing out on the latest cloud technologies and advances in data analytics. This book will firm up your understanding of data and bring you into the present with knowledge around what works and what doesn't. Discover the data stack strategies that are working for today's successful small, medium, and enterprise companies Learn the different Agile stages of data organization, and the right one for your team Learn how to maintain Data Lakes and Data Warehouses for effective, accessible data storage Gain the knowledge you need to architect Data Warehouses and Data Marts Understand your business's level of data sophistication and the steps you can take to get to "level up" your data The Informed Company is the definitive data book for anyone who wants to work faster and more nimbly, armed with actionable decision-making data.
650 _aCloud computing
_95564
650 _aBig data
_9212
650 _aData structures (Computer science)
_910377
700 _aFowler, Dave
_99360
942 _2ddc
_cBK