Data warehousing - What is data cleaning? How can we do that?

What is data cleaning? How can we do that?

Data cleaning is also known as data scrubbing. Data cleaning is a process which ensures the set of data is correct and accurate. Data accuracy and consistency, data integration is checked during data cleaning. Data cleaning can be applied for a set of records or multiple sets of data which need to be merged.

Data cleaning is performed by reading all records in a set and verifying their accuracy. Typos and spelling errors are rectified. Mislabeled data if available is labeled and filed. Incomplete or missing entries are completed. Unrecoverable records are purged, for not to take space and inefficient operations.

What is data cleaning? How can we do that?

Data cleaning is the process of identifying erroneous data. The data is checked for accuracy, consistency, typos etc.

Methods:-

Parsing - Used to detect syntax errors.
Data Transformation - Confirms that the input data matches in format with expected data.
Duplicate elimination - This process gets rid of duplicate entries.
Statistical Methods- Values of mean, standard deviation, range, or clustering algorithms etc are used to find erroneous data.
Data warehousing - Explain in brief about critical column.
Explain in brief about critical column - A column (usually granular) is called as critical column which changes the values over a period of time......
Data warehousing - What is data cube technology used for?
What is data cube technology used for? - Data cube is a multi-dimensional structure. Data cube is a data abstraction to view aggregated data from a number of perspectives.......
What is Data Scheme? - Data warehousing
Data Scheme is a diagrammatic representation that illustrates data structures and data relationships to each other in the relational database within the data warehouse.....
Post your comment