In recent years, the global energy sector has been undergoing a significant transformation, characterized by an increasing shift towards data-driven operations and the widespread adoption of renewable energy such as solar photovoltaics (PV). This transition is largely motivated by the urgent need to address climate change and the realization of the potential that large-scale data collection and analysis hold for enhancing energy efficiency and sustainability. As the energy landscape becomes more complex and interconnected, the role of sophisticated energy forecasting techniques has grown in importance. These techniques are crucial for managing the variability and uncertainty inherent in renewable energy sources, such as wind and solar power, which are subject to fluctuations in weather and environmental conditions. Moreover, the integration of big data analytics into energy systems facilitates more accurate and timely predictions, thereby enabling more effective planning, operation, and maintenance of energy infrastructure. This dissertation introduces a novel, data-driven methodologies to address key challenges in energy forecasting: predicting weather-induced power outages, net load forecasting, and accurately estimating solar PV penetration.
In the first part of the study, a methodology to forecast weather-related power distribution outages one day ahead on an hourly basis is presented. A solution to address the data imbalance issue is proposed, where only a small portion of the data represents the hours impacted by outages, in the form of a weighted logistic regression model. Data imbalance is a key modeling challenge for small and rural electric utilities. The weights for outage and non-outage hours are determined by the reciprocals of their corresponding number of hours. To demonstrate the effectiveness of the proposed model, two case studies using data from a small electric utility company in the United States are presented. One case study analyses the weather-related outages aggregated up to the city level. The other case study is based on the distribution substation level, which has rarely been tackled in the outage prediction literature. Compared with two variants of ordinary logistic regression with equal weights, the proposed model shows superior performance in terms of geometric mean.
The dissertation then explores net load forecasting in the context of increasing behind-the-meter (BTM) solar PV system adoption. This adoption introduces complexities to grid management, especially concerning net load-the difference between demand and PV generation. The intermittent nature of PV generation, influenced by weather and time, adds to net load volatility, posing challenges to grid reliability. This dissertation presents a review of state-of-the-art net load forecasting with a focus on forecasting approaches, techniques, explanatory variables, and the impact of PV penetration on net load forecasting. Additionally, the study conducts a critical analysis of existing literature to identify gaps in the field of net load forecasting and PV integration. To address some of these gaps, a benchmark net load forecasting model is proposed. The proposed model uses publicly available data from ISO New England. Through the case study, it is demonstrated that the proposed net load forecasting model outperforms the current benchmark load forecast model significantly in terms of forecasting accuracy, as measured by Mean Absolute Percentage Error. Moreover, the case study also demonstrates the effectiveness of the proposed model over a range of PV penetration, which is an important consideration as the use of solar energy continues to grow.
Furthermore, the dissertation addresses two critical questions regarding PV integration: (1) How much PV is there in the system?; (2) Which meters have BTM PV? To address the challenge of estimating PV penetration in systems, existing supervised and unsupervised methods are reviewed, which reveal common limitations, especially when PV installation information is limited or completely unavailable. To overcome these challenges, a regression-based approach is developed by leveraging the difference in performance in the benchmark load and net load forecasting models in forecasting net load. The proposed framework is deployed for real-world data from an ISO and a medium-sized in the United States. The results validate the effectiveness of the proposed method in accurately estimating PV penetration levels, even without explicit PV installation data, using only historical load data.
The final part of the study focuses on identifying meters with BTM PV installations. Again by, leveraging the performance disparities between load forecasting models and net load forecasting models, a methodology is devised to differentiate meters with and without PV installations. The effectiveness of the proposed frameworks is confirmed using an empirical case study at a medium-sized US utility with meter-level load data meters. The results illustrate that accurate identification of meters with PV installations was achieved while maintaining a low rate of false identifications. This methodology provides valuable insights for utilities, empowering them to comprehend the adoption and impact of distributed solar energy within their service territories.
Overall, this study contributes significantly to the field of energy system forecasting by developing data-driven models that enhance the understanding and management of weather-induced outages, net load variability, and solar PV integration. These advancements enable utilities to make informed decisions for grid planning, capacity management, and service customization, paving the way for more resilient and efficient energy systems.