您的位置:首页 > 任务详情
招标模式 使用Python进行Data pre-processing
  • 雇主:mike
  • 发布时间:2023-04-04
  • 分类:程序开发

¥ 200

免责声明 该外包需求信息来源于站外平台,本站仅提供公开信息部分字段展示与订阅服务,更多请查看免责声明。

任务详情

any one library for Python: Pandas, NumPy, StandardScaler, LabelEncoder, OneHotEncode, train_test_split,1. Load any DataFrame from Kaggle, Internet or Generate a random DataFrame for classification problem with your own setting (used method make_classification()).2. Print a concise summary of a DataFrame used method df.info() (df is a name of DataFrame).3. Estimation the number of missing values. Create a Series that displays the total count of missing values per column. Create any count plot.4. Remove the columns with more than 23% of gaps or fill any missing data. Revisit the DataFrame to check result ( used method df.dropna() and df.fillna())5. Handle conflicting cases in DataFrame.6. Remove unnecessary or duplicated features (used df.duplicated()). Justify the decision to remove the features7. Convert categorical string features to numeral values (used differences Label Encoding methods).8. Remove anomaly data( used methods: Isolation Forest, Minimum Covariance Determinant, Local Outlier Factor, One-Class SVM or another one - up to you).9. Create Correlations heatmap. Additional step. Correlations feature selection. Drop data with correlation more than 95%.10. Create a Train and a Test Split of DataFrame.11. Perform Data Normalization.12. Additional step. Investigate techniques to Handle Imbalanced Data.Reportadd a file with the program code in *.ipynb format and input data (*.cvs or *. xlsx format).Output Screenshot

任务附件 (0)

暂无稿件哦!

预期中标

已中标

0

快去分享,提高任务的曝光率吧

  • 发布需求     2023.04.04
  • 服务商报价    
  • 选择服务商并托管资金    
  • 服务商工作    
  • 验收付款    
  • 评价