Document Type
|
:
|
BL
|
Record Number
|
:
|
601534
|
Doc. No
|
:
|
b430753
|
Main Entry
|
:
|
McCallum, Q. Ethan
|
Title & Author
|
:
|
Bad data handbook /\ Q. Ethan McCallum
|
Edition Statement
|
:
|
First edition
|
Page. NO
|
:
|
xvi, 245 pages :: illustrations ;; 24 cm
|
ISBN
|
:
|
9781449321888 (softcover) :
|
|
:
|
: 1449321887 (softcover)
|
Notes
|
:
|
"Mapping the world of data problems"--Cover
|
Bibliographies/Indexes
|
:
|
Includes bibliographical references and index
|
Contents
|
:
|
Setting the pace : what is bad data? -- Is it just me, or does this data smell funny? -- Data intended for human consumption, not machine consumption -- Bad data lurking in plain text -- (Re)organizing the web's data -- Detecting liars and the confused in contradictory online reviews -- Will the bad data please stand up? -- Blood, sweat, and urine -- When data and reality don't match -- Subtle sources of bias and error -- Don't let the perfect be the enemy of the good : is bad data really bad? -- When databases attack : a guide for when to stick to files -- Crouching table, hidden network -- Myths of cloud computing -- The dark side of data science -- How to feed and care for your machine-learning experts -- Data traceability -- Social media : erasable ink? -- Data quality analysis demystified : knowing when your data is good enough
|
Abstract
|
:
|
This practical handbook takes the reader through several real-world examples to demonstrate the theory and practice behind working with and cleaning up dirty data. As no single tool solves all of the problems well a polyglot approach is taken, with most examples involving R and Python, but sed/awk utilities also appearing
|
Subject
|
:
|
Database management, Handbooks, manuals, etc
|
Subject
|
:
|
Electronic data processing, Handbooks, manuals, etc
|
Subject
|
:
|
Data editing
|
Subject
|
:
|
Databases-- Quality control
|
Dewey Classification
|
:
|
005.72
|
LC Classification
|
:
|
QA76.9.D3M33 2012
|