Predicting the defect probability of solar cells with the help of Zegami Machine Learning Suite

Solar panels are becoming an increasingly important energy source and represent a great opportunity of sustainable energy for the next generations. However, their efficiency can be affected by the presence of defects which can occur during manufacturing or when used over time. Therefore, it is necessary to monitor the condition of solar modules and replace or repair defective units in order to ensure maximum efficiency.

The goal of this project was to develop a system to automatically detect damage in solar cells, utilising a Deep Learning approach, that predicts defect probability of individual cells. The cells of the dataset used were obtained through the segmentation of photovoltaic modules, as shown in Figure 1

Figure 1: Original images of (a) monocrystalline and (b) polycrystalline photovoltaic modules. Source [1]

Analyse collection

After creating a Zegami collection with these images, we started with some exploratory data analysis with the Zegami Machine Learning Suite to get familiar with the data.

Step1: Identify clusters of similar images

Clustering with different pre-trained feature extraction models is a good way to understand the data and highlight imbalances in image features. By default the Zegami ML suite clusters images using a ResNet50 based model pretrained on Imagenet images. Other model archtectures like VGG16 can also be selected and compared. The ResNet50 and VGG16 are shown in Figures 2 and 3, respectively. We can see that the clusters are mostly related to the solar module type, rather than defect probability.

Figure 2: Image Similarity clustering with ResNet50. On the left, the images are coloured by defect probability. On the right, cells are coloured by solar module type.
Figure 3: Image Similarity clustering with VGG16. On the left, the images are coloured by defect probability. On the right, cells are coloured by solar module type.

Step 2: Understand dataset distribution

Using Zegami, we can easily check the distribution of the target variable, as shown in Figure 4.

Figure 4: Graph visualisation of defect probability distribution in the dataset.

There are many more images showing 0 probability of damage than images with higher probability of damage. The second most numerous class (defect probability = 1) still has half the images compared to the most represented value. The distribution of damage probability for each solar module type, in Figure 5, shows that it is similar for both types.

Figure 5: Scatter plot of solar module type versus defect probability

Step 3: Create balanced subsets of data

Zegami’s tagging tool was used to get representative subsets of the data for training, validation and test. The process, shown in Figure 6, has the following steps:

  1. Sort and colour by the target class column;
  2. Tag the training subset;
  3. Tag the validation subset;
  4. Colour by Tag and select the remaining images for the test subset.
Figure 6: Steps for tagging image to get train, validation and test subsets

Step 4: Build a deep learning model

To start building our deep learning model, first we fetched the data from Zegami to the environment which would perform the training. Then trained and tested a model with different architectures. When satisfied with the results, model predictions were uploaded via the Python SDK and errors were available as columns. Our first model was a simple Convolutional Neural Network (CNN) with three convolutional layers. We can see in Figure 7 that this model did not predict well the intermediate probabilities, probably due to the lack of examples of this group. A solution to this imbalance would be to use data augmentation to increase the examples in the underrepresented class.

Figure 7: Graph View of predicted defect probabilities coloured by ground truth values (blue: 0, cyan: 0.33, orange: 0.67, red: 1).

To get a sense of where the model is looking in the images to predict the defect probability, we used GradCAM to generate heatmap visualisations of each image. We uploaded these images to Zegami’s collection on a new source (see Figure 8) and evaluated them with the corresponding original images. In Figure 9 we can see examples of the explainability heatmaps on both types of cells. In some cases, the model found the broken parts. However, in other images the model is looking at the normal black corners present in all cells from monocrystalline modules as defects. On the other hand, the majority of the polycrystalline cells have darker spots on the surface that can be confounded with damage. These significant differences between the two types of cells were also evidenced by the Image Similarity clusters above.

Figure 8: Explainability heatmaps Image Source

Step 5: Training one model for each cell type

As a next step we decided to train a model for each cell type and apply data augmentation (flipping variations) to images with a defect probability above zero. Additionally, we removed the borders and applied intensity normalisation to monocrystalline cells. The selected model is a similar CNN with one additional convolutional layer. By looking at some heatmap examples from the test set (see Figure 10), we can observe that, for the majority of cells, the damage related regions are highlighted, while other normal details of the image are not. Although the model still fails in some cases (see Figure 11), that can in part be explained by the presence of dubious examples in the dataset, such as the ones in Figure 12, where there seems to be scratches in both images, but defect probability in one is 1 and 0 in the other.

Figure 9: Heatmaps on monocrystalline and polycrystalline cells
Figure 10: Heatmaps on monocrystalline cells from the test set

A similar model was used to train polycrystalline cells. The distribution of predicted defect probabilities versus the ground truth values (Figure 13) is similar to the one verified for the monocrystalline results. In both cell types, the intermediate values, 0.33 and 0.67, are the most difficult to distinguish, as expected. Looking at the explainability maps in Figure 14, we see that the model is being biased by non-damage dark spots of the cells. In fact, there are some examples where the task of predicting a defect probability is hard even for a human (see Figure 15). Nonetheless, when the damage is significantly large, the model is able to find the expected regions.


Table 1 shows the Mean Absolute Error (MAE) for each model version. Although the MAE is lower for the first global model, this can be explained by the fact it used the dataset before data augmentation, so it had less intermediate probability examples, which are the harder to train. By training each cell type separately, the model could have deeper structure and get better heatmaps (Figures 9, 10 and 14) without overfitting. These results also confirm that the task of predicting defect probabilities on monocrystalline cells was more successful, as expected.

ModelMAEMAE (test set)
both types0.2100.209
Table 1: Mean Absolute Error (MAE) for each model
Figure 11: Monocrystalline cells – Graph View of predicted defect probabilities coloured by ground truth values (blue: 0, cyan: 0.33, orange: 0.67, red: 1).
Figure 12: Dubious examples of monocrystalline cells in the dataset.


In this project, with the help of the Zegami Machine Learning Suite, we could better understand the data and quickly iterate on training a deep learning model to predict defect probability of solar cells. Further work could include the confirmation and understanding of the ground truth labels and removal of dubious examples from the dataset. To improve the performance on polycrystalline cells, we could also explore the application of image processing techniques, such as Gaussian filters, to reduce the noise.

Figure 13: Polycrystalline cells – Graph View of predicted defect probabilities coloured by ground truth values (blue: 0, cyan: 0.33, orange: 0.67, red: 1)
Figure 14: Heatmaps on polycrystalline cells from the test set
Figure 15: Dubious examples of polycrystalline cells in the dataset

Zegami would like to acknowledge the creators of this dataset (


  1. Sergiu Deitsch et al. “Segmentation of photovoltaic module cells in uncalibrated electroluminescence images”. In: Machine Vision and Applications 32.4 (May 2021). doi: 10.1007/s00138021-01191-9. url: