Credits: 4 (3-0-2)

Prerequisites: COL106, MTL106 or with instructor’s permission.

Description

Sampling and sketching (reservoir sampling, counting samples, graph sampling, count-min sketches, Flajolet sketches, graph sketches, heavy hitters, Johnson-Lindenstrauss lemma and dimensionality reduction techniques); data integration (schema alignment, information extraction, entity linkage, data fusion); data profiling and cleaning (rule-based data cleaning, outlier detection, data transformations, probabilistic data cleaning); issues in building large-scale machine learning models (noise and bias in data, model data management).

Prerequisite Tree

flowchart TD
AIL742-5[AIL742]
AIL742-5 --> MTL106-5[MTL106]
COL106-5 --> COL100-5[COL100]
AIL742-5 --> COL106-5[COL106]

classDef empty height:17px, fill:transparent, stroke:transparent;
classDef trueEmpty height:0px, width:0px;