TheWiz.Net
This dimensionality reduction algorithm tries to discard inputs that are very similar to others. In simple words, if your opinion is same as your boss, one of you is not required. If the value of two input parameters is always the same, it means they represent the same entity. Then we do not need two parameters there. Just one should be enough.
In technical words, if there is a very high correlation between two input variables, we can safely drop one of them.
Python Code
The corr() method can be used to identify the correlation between the fields. Ofcourse, before we start we have to choose only the numeric fields as the corr() method works only with the numeric fields. We can have a high correlation between non-numeric fields. But this method works only on numeric fields.
numeric = train[[‘Numeric_1’, ‘Numeric_2’, ‘Numeric_3’, ‘Numeric_4’]]
correlation = numeric.corr()
numeric_columns = numeric.columns
high_corr = [ ]
for c1 in numeric_columns:
for c2 in numeric_columns:
if c1 != c2 and c2 not in high_corr and correlation[c1][c2] > 0.9:
high_corr.append(c1)
This gives us a list of columns that can be dropped.
[1]:https://solegaonkar.github.io/ConceptHighCorrelationFilter.html



