Search notes:

Data preparation

Data preparation is crucial to turn raw data into value.
Often, data preparation is more time-consuming than working with the prepared data.
Steps

Challenges

Typical challanges include:
One of the major challenges that data preparation addresses is the heterogeneity of data in most but very small organizations: Data is stored in various data formats in different data stores (databases, Excel, SAS, etc.) and needs to be merged into a format that is useful for further processing.
In order to merge the data, the same entities (such as for example a customer) that are stored across different databases must be able to be identified as such. In an ideal world, there would be a primary key. However, because data is stored in different formats for different purposes, it turns out that usually there is not one primary key, rather, the same entities are identitifed differently in different data sources, especially when surrogate keys are used.
Antother challenge I often observe are outliers, that is: data with unusual or (seemingly) unrealistic values. It is often not evident when a value is in fact an outlier or if unrealist values can be explained with more domain knowledge.
Yet another challenge are null values because they can be interpreted in at least two ways: unknown and none.

Privacy violation

data preparation might uncover confidential or private information or to expose data that identifies a specific person.
Therefore, data should be anonymized (aka de-identified) before or during data preparation.

See also

data, Data cleaning
The Tableau product for data preparation seems to be Tableau Prep.
Power Query

Index