Document Type
|
:
|
BL
|
Record Number
|
:
|
860380
|
Main Entry
|
:
|
Wiktorski, Tomasz
|
Title & Author
|
:
|
Data-intensive systems : : principles and fundamentals using Hadoop and Spark /\ Tomasz Wiktorski.
|
Publication Statement
|
:
|
Cham, Switzerland :: Springer,, [2019]
|
Series Statement
|
:
|
Advanced information and knowledge processing
|
Page. NO
|
:
|
1 online resource
|
ISBN
|
:
|
3030046036
|
|
:
|
: 3030046044
|
|
:
|
: 9783030046033
|
|
:
|
: 9783030046040
|
|
:
|
3030046028
|
|
:
|
9783030046026
|
Bibliographies/Indexes
|
:
|
Includes bibliographical references.
|
Contents
|
:
|
Intro; Contents; List of Figures; List of Listings; 1 Preface; 1.1 Conventions Used in this Book; 1.2 Listed Code; 1.3 Terminology; 1.4 Examples and Exercises; 2 Introduction; 2.1 Growing Datasets; 2.2 Hardware Trends; 2.3 The V's of Big Data; 2.4 NOSQL; 2.5 Data as the Fourth Paradigm of Science; 2.6 Example Applications; 2.6.1 Data Hub; 2.6.2 Search and Recommendations; 2.6.3 Retail Optimization; 2.6.4 Healthcare; 2.6.5 Internet of Things; 2.7 Main Tools; 2.7.1 Hadoop; 2.7.2 Spark; 2.8 Exercises; References; 3 Hadoop 101 and Reference Scenario; 3.1 Reference Scenario; 3.2 Hadoop Setup
|
|
:
|
3.3 Analyzing Unstructured Data3.4 Analyzing Structured Data; 3.5 Exercises; 4 Functional Abstraction; 4.1 Functional Programming Overview; 4.2 Functional Abstraction for Data Processing; 4.3 Functional Abstraction and Parallelism; 4.4 Lambda Architecture; 4.5 Exercises; Reference; 5 Introduction to MapReduce; 5.1 Reference Code; 5.2 Map Phase; 5.3 Combine Phase; 5.4 Shuffle Phase; 5.5 Reduce Phase; 5.6 Embarrassingly Parallel Problems; 5.7 Running MapReduce Programs; 5.8 Exercises; 6 Hadoop Architecture; 6.1 Architecture Overview; 6.2 Data Handling; 6.2.1 HDFS Architecture; 6.2.2 Read Flow
|
|
:
|
6.2.3 Write Flow6.2.4 HDFS Failovers; 6.3 Job Handling; 6.3.1 Job Flow; 6.3.2 Data Locality; 6.3.3 Job and Task Failures; 6.4 Exercises; 7 MapReduce Algorithms and Patterns; 7.1 Counting, Summing, and Averaging; 7.2 Search Assist; 7.3 Random Sampling; 7.4 Multiline Input; 7.5 Inverted Index; 7.6 Exercises; References; 8 NOSQL Databases; 8.1 NOSQL Overview and Examples; 8.1.1 CAP and PACELC Theorem; 8.2 HBase Overview; 8.3 Data Model; 8.4 Architecture; 8.4.1 Regions; 8.4.2 HFile, HLog, and Memstore; 8.4.3 Region Server Failover; 8.5 MapReduce and HBase; 8.5.1 Loading Data
|
|
:
|
8.5.2 Running Queries8.6 Exercises; References; 9 Spark; 9.1 Motivation; 9.2 Data Model; 9.2.1 Resilient Distributed Datasets and DataFrames; 9.2.2 Other Data Structures; 9.3 Programming Model; 9.3.1 Data Ingestion; 9.3.2 Basic Actions-Count, Take, and Collect; 9.3.3 Basic Transformations-Filter, Map, and reduceByKey; 9.3.4 Other Operations-flatMap and Reduce; 9.4 Architecture; 9.5 SparkSQL; 9.6 Exercises
|
Abstract
|
:
|
Data-intensive systems are a technological building block supporting Big Data and Data Science applications. This book familiarizes readers with core concepts that they should be aware of before continuing with independent work and the more advanced technical reference literature that dominates the current landscape. The material in the book is structured following a problem-based approach. This means that the content in the chapters is focused on developing solutions to simplified, but still realistic problems using data-intensive technologies and approaches. The reader follows one reference scenario through the whole book, that uses an open Apache dataset. The origins of this volume are in lectures from a master?s course in Data-intensive Systems, given at the University of Stavanger. Some chapters were also a base for guest lectures at Purdue University and Lodz University of Technology.
|
Subject
|
:
|
Big data.
|
Subject
|
:
|
Databases.
|
Subject
|
:
|
Big data.
|
Subject
|
:
|
Databases.
|
Subject
|
:
|
Apache Hadoop.
|
|
:
|
Spark (Electronic resource : Apache Software Foundation)
|
|
:
|
Apache Hadoop.
|
|
:
|
Spark (Electronic resource : Apache Software Foundation)
|
Dewey Classification
|
:
|
005.74
|
LC Classification
|
:
|
QA76.9.D32W55 2019
|