Initially, we performed validation using ANN and LSTM to choose the best model for our data. Both the ANN and LSTM models are trained and tested with data using one input and one output parameter. The R-square values obtained for temperature, snow cover and NDVI using ANN are 0.681, 0.66, 0.404 (Fig. 3a-c) and 0.89, 0.674, 0.48 using LSTM (Fig. 4a-c) respectively.

From the above results, we can conclude that deep learning provides a lot of advantages for time series forecasting, like automatic learning of temporal dependence and handling of temporal structures like features and seasonality. It can be applied for the multiple numbers of inputs used for a mapping function and thereby supporting multivariate time series forecasting. With its effective way of extracting patterns from the long sequences of input data, it can be perfectly applicable to deal with the massive amounts of data, complex parameters, and multi-step actions. To increase the accuracy of LSTM we tried two input and one output model with different hyperparameters.

### LSTM implementation

In this section, the LSTM was applied for different models to forecast time series data to find which model is best suited (which are having high r-square values) for individual environmental factors.

#### Using LSTM model with one input and one output feature

Here the LSTM model was implemented for forecasting one factor (example temperature) using the same factor as the input feature. Taking the data of each parameter individually from 2001 to 2015 for training the model and the next 2 years of data for testing the model. A dense layer is added at the end of the model to make it more robust. Since only a single value is to be predicted in the output, only one neuron will be set in the dense layer. We have used Mean squared error as a loss function and Adam optimizer is used to optimize the algorithm. The figures mentioned below show the testing results of the temperature, snow cover, and NDVI respectively with one feature. The R-square values of these results are 0.89 for temperature Fig. 4a, 0.674 for snow cover Fig. 4b, and 0.48 for NDVI Fig. 4c. If the R-square value is above 0.75 then it is considered a good model. But as mentioned it is only greater for temperature and not for snow cover and NDVI, so this model with one feature is best suited for temperature but not ideal for the other two parameters.

#### Using the LSTM model with two input and one output feature

LSTM can seamlessly encounter difficulties with multiple input parameters. All we’d like may be a 3D input vector that has to be fed into the input-form of the LSTM. So, if finding a way to change the input parameters to be represented in a 3D vector form, we can use LSTM. The parameters used ie, temperature, snow cover, NDVI that are interrelated to each other. In the above method, the model takes each parameter individually for input as well as output and forecasts but the R-square values obtained are not efficient that we cannot accept the model. To increase the accuracy of the model, we have implemented the LSTM model with multiple combinations of these parameters as input and one output feature and forecasted according to monthly, seasonal, and annual.

#### Using the LSTM model with one input and one output feature

In monthly forecasting, we will take the parameters based on monthly and take two parameters as input and one parameter as output. Here we are taking different combinations of parameters such as Temperature and Snow cover, Temperature and NDVI, Snow cover, and NDVI. To make all input features on a scale of 0 to 1, the input data is normalized using MinMaxScaler. As this is a problem of time series, if we need to forecast snow cover correctly then we need to consider temperature and snow cover from previous months as they are interrelated to each other accordingly for forecasting NDVI we consider these two combinations temperature-NDVI, snow cover-NDVI from previous months. A test dataset is created using 30% of the latest data along with the data created by going back to 60 months in the past for forecasting. This problem is a multivariate regression problem as it involves multiple input climate features. The model is compiled using Adam optimizer and the mean squared error is used as a loss function for this regression problem. To get the accurate values, we need to tune a set of hyperparameters ie, number of neurons, optimizer, epochs, etc. In this monthly case, we had tuned the neurons to find the point where the model performs well. The present investigation used dropout with different values such as 0.25, 0.5 and 0.75 to avoid the overfitting, and it was observed that a value of 0.5 was more suited for dropout layer for all models based on performance, stability and efficiency of the models. Changing the number of neurons may also occur underfitting and overfitting, so we need to choose a suitable number of neurons. The R-square values of these results are for temperature and snow cover Fig. 5a,b it is 0.85 previous 0.51, for temperature and NDVI Fig. 6a,b it is 0.79 previous 0.48 and for snow cover and NDVI Fig. 7a,b it is 0.83 previous 0.48. If the R-square value is above 0.75 then it is considered a good model.

#### Seasonal:

As we are dealing with temperature, snow cover, and vegetation which vary according to the season, we have divided a year into four seasons namely winter (December–February), pre-monsoon (March–May), monsoon (June–August) , and post-monsoon (September–November). We converted monthly data into seasonal by taking the average of the months of that specific season for example to find the value of the temperature of monsoon season we took the average of 3 months in monsoon season. In this seasonal case, we have tuned the different optimizers ie, ADAM, RMSProp, Adamax to get accurate results. In this method, the model best fits for ADAM optimizer as it combines the advantages of both AdaGrad and RMSProp and it achieves high R-square values for temperature and snow cover Fig. 8a it is 0.95, for temperature and NDVI Fig. 8b it is 0.89 and for snow cover and NDVI Fig. 8c, it is 0.92 (Table 1).

#### Annual:

The dataset consists of 17 years of data monthly-wise which we have averaged all months in that year to find annual data. The data is insufficient as it contains only 17 years of annual data. Compared to SVM, Decision tree and other classifiers, the neural networks may not categorize the data very well for less volume of the dataset. Therefore, the predicting model should be large enough to capture the interrelationships of the data along with features of the problem domain (eg, number of categories). Initial layers of the model are used for capturing interrelationships among various parts of the input (ie, patterns and edges). The features that can help classify the desired outputs are captured by the output layer to yield the final decision. If the number of feature parameters and the amount of data required for forecasting is large, the chances for problems to have high complexity increases proportionally. As a result, it causes overfitting of the model. The R-square values obtained for temperature and snow cover Fig. 9a is 0.95, for temperature and vegetation Fig. 9b it is 0.97 and for snow cover and NDVI Fig. 9c, it is 0.98. So, we are not considering the annual case of our model because of overfitting.

#### LSTM forecasting

Here we applied LSTM to forecast the data by taking one input and forecasting one output for the future seven years. The data is divided into 75% for training and 25% for testing the two-layered LSTM model along with a dense output layer, and then we forecast the output step by step for test data and the model feeds the current forecasted value back into the input window by moving it one step forward to make forecasting at the next time step and this is known as moving-forward window technique. The study uses a moving forward window of size 12. The 13th data point has been forecast using the first 12 data inputs. And similarly, the 14th data point is forecasted using a window between 1 and 13 data inputs and the same procedure continues. For moving the window, panda’s shift function is used that shifts the entire column by the specified number. Similarly, the column is shifted up by 1 and then combined with the actual data. After fixing the size of the window, the first 12 columns of the table become input × 1, × 2,…, × 12 features and the 13th column becomes the target y. It is like if we are using LSTM for the NLP task then it looks like a fixed sequence of length 12 of a sentence containing 12 words each and trying to predict the 13th word. By using this method, the environmental factors individually for seven years are forecasted. Figure 10a-c represent forecasted values of temperature, snow cover, and NDVI for the future seven years.