使用Python进行Data pre-processing

您的位置：首页 > 任务详情

招标模式使用Python进行Data pre-processing

雇主：mike
发布时间：2023-04-04
分类：程序开发

¥ 200 元

免责声明该外包需求信息来源于站外平台，本站仅提供公开信息部分字段展示与订阅服务，更多请查看免责声明。

任务详情

any one library for Python: Pandas, NumPy, StandardScaler, LabelEncoder, OneHotEncode, train_test_split,1. Load any DataFrame from Kaggle, Internet or Generate a random DataFrame for classification problem with your own setting (used method make_classification()).2. Print a concise summary of a DataFrame used method df.info() (df is a name of DataFrame).3. Estimation the number of missing values. Create a Series that displays the total count of missing values per column. Create any count plot.4. Remove the columns with more than 23% of gaps or fill any missing data. Revisit the DataFrame to check result ( used method df.dropna() and df.fillna())5. Handle conflicting cases in DataFrame.6. Remove unnecessary or duplicated features (used df.duplicated()). Justify the decision to remove the features7. Convert categorical string features to numeral values (used differences Label Encoding methods).8. Remove anomaly data( used methods: Isolation Forest, Minimum Covariance Determinant, Local Outlier Factor, One-Class SVM or another one - up to you).9. Create Correlations heatmap. Additional step. Correlations feature selection. Drop data with correlation more than 95%.10. Create a Train and a Test Split of DataFrame.11. Perform Data Normalization.12. Additional step. Investigate techniques to Handle Imbalanced Data.Reportadd a file with the program code in *.ipynb format and input data (*.cvs or *. xlsx format).Output Screenshot

任务附件 (0)