帮朋友做一个数据科学相关的计算机作业,是对 YouTube 视频播放数据进行分析,完成了 Part 1-3 ,找一位懂数据分析和机器学习的老哥帮忙做一下 Part 4 ,工作量应该小于 1 工作日,预算 800 元,感兴趣的可以留一下联系方式
- PART 1 – Defining the Problem and Questions (已完成)
- PART 2 – Cleaning the Data (已完成)
- PART 3 – Carrying out an Exploratory Analysis (已完成)
- PART 4 - Developing ML and DL Prediction Model (未完成,需要做的是这部分)
已完成的代码示例: https://colab.research.google.com/drive/1MkSpgV_XZVUcNIT1gI-b7Abd0ET8uT0W?usp=sharing
PART 4 具体要求
When your data is ready for modelling, you can start building your prediction model. As a starting point for your implementation, consider the following steps:
- Convert the Pandas dataframes into NumPy arrays that can be used by scikit_learn.
- Create an array that extracts only the feature data that you want to work with.
- Normalize your data as some ML models require the input data to be normalized.
- Split your data into train and test or use K-Fold cross validation.
- Create a decision tree classifier and fit it to your training data.
- Display the resulting decision tree.
- Measure the accuracy of the resulting decision tree model using your test data.
- Create a random forest classifier, fit it to your data and measure the accuracy.
- Create a random forest classifier, fit it to your data and measure the accuracy.
- Create SVM with linear kernel classifier, fit it to your data and measure the accuracy.
- Create KNN classifier, fit it to your data and measure the accuracy.
- WriteaforlooptorunKNNwithKvaluesrangingfrom1to50andseeifKmakesa substantial difference. Make a note of the best performance you could get out of KNN.
- Use Keras to set up a neural network with 1 binary output neuron (for binary classification only) and see how it performs. You can run a large number of epochs to train the model if necessary.
- Try different neural network topologies by adding additional layers and use Dropout at each step to prevent overfitting.