关键词:
Vegetation
Precipitation
Humidity
Forest & brush fires
Decision trees
Time series
Sensors
Computer science
Forestry
Meteorology
摘要:
Every year, thousands of forest fires occur in Canada; these fires pose risks to people's lives, health, and property and incur huge fire suppression costs. Accurate prediction of fire occurrences at finer temporal and spatial resolutions could aid in deploying fire-fighting resources efficiently. Timely deployment of resources at precise locations could help control fires at an earlier stage, which would reduce damage. We propose an approach to predicting forest fire occurrences at hourly intervals in small rectangular regions across Saskatchewan. To make these predictions, we pre-pared a suitable dataset, trained three machine learning classification models, namely Decision Tree, K-Nearest Neighbor (KNN), and Hidden Markov Model (HMM), and evaluated them using six performance measures. The hourly weather, vegetation, and forest fire ignitions datasets were integrated, cleaned, and transformed into time series classification format, to form the forest reoccurrence (FFO) dataset. Any values missing from the instances were estimated using appropriate imputation techniques. To remove class imbalance from the FFO dataset, we applied a domain-specific under sampling approach. Predicting whether or not a fire will occur at the current hour at a given location is a binary classification problem. A model is used to predict a label, either fire or non-fire. The inputs to the model are, the weather conditions for the previous 24 hours and the current hour, as well as the vegetation type at the location. Predicting fire occurrences is an imbalanced learning problem because the ratio of non-fires to fires is very high. Our experiments studied the performance of machine learning classifiers as this ratio was increased. The experiments used two training regimes. With imbalanced training (IMB), the ratio is identical for the training and testing sets. With balanced training (BAL), the ratio for the training set is fixed at50:50 while the ratio for the testing set is varied. Our r