开源众包
300人浏览/17人投稿
925天前
已托管赏金
any one library for Python: Pandas, NumPy, StandardScaler, LabelEncoder, OneHotEncode, train_test_split,1. Load any DataFrame from Kaggle, Internet or Generate a random DataFrame for classification problem with your own setting (used method make_classification()).2. Print a concise summary of a DataFrame used method df.info() (df is a name of DataFrame).3. Estimation the number of missing values. Create a Series that displays the total count of missing values per column. Create any count plot.4. Remove the columns with more than 23% of gaps or fill any missing data. Revisit the DataFrame to check result ( used method df.dropna() and df.fillna())5. Handle conflicting cases in DataFrame.6. Remove unnecessary or duplicated features (used df.duplicated()). Justify the decision to remove the features7. Convert categorical string features to numeral values (used differences Label Encoding methods).8. Remove anomaly data( used methods: Isolation Forest, Minimum Covariance Determinant, Local Outlier Factor, One-Class SVM or another one - up to you).9. Create Correlations heatmap. Additional step. Correlations feature selection. Drop data with correlation more than 95%.10. Create a Train and a Test Split of DataFrame.11. Perform Data Normalization.12. Additional step. Investigate techniques to Handle Imbalanced Data.Reportadd a file with the program code in *.ipynb format and input data (*.cvs or *. xlsx format).Output Screenshot