pv module performance under real-world test

91
PV MODULE PERFORMANCE UNDER REAL-WORLD TEST CONDITIONS–A DATA ANALYTICS APPROACH by YANG HU Submitted in partial fulfillment of the requirements For the degree of Master of Science Thesis Adviser: Prof. Roger H. French Department of Materials Science and Engineering CASE WESTERN RESERVE UNIVERSITY May, 2014

Upload: truongnguyet

Post on 30-Dec-2016

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

CONDITIONS–A DATA ANALYTICS APPROACH

by

YANG HU

Submitted in partial fulfillment of the requirements

For the degree of Master of Science

Thesis Adviser: Prof. Roger H. French

Department of Materials Science and Engineering

CASE WESTERN RESERVE UNIVERSITY

May, 2014

Page 2: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

PV Module Performance Under Real-world Test Conditions–A Data

Analytics Approach

Case Western Reserve University

Case School of Graduate Studies

We hereby approve the thesis1 of

YANG HU

for the degree of

Master of Science

Prof. Roger French

Committee Chair, Adviser 11/21/2013Prof. Roger French

Prof. David Matthiesen

Committee Member 11/21/2013Prof. David Matthiesen

Prof. Jennifer Carter

Committee Member 11/21/2013Prof. Jennifer Carter

Prof. Jiayang Sun

Committee Member 11/21/2013Prof. Jiayang Sun

Dr. Timothy Peshek

Committee Member 11/21/2013Dr. Timothy Peshek

Dr. Yifan Xu

Committee Member 11/21/2013Dr. Yifan Xu

1We certify that written approval has been obtained for any proprietary material contained therein.

Page 3: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Dedicated to scienceand the pursuit of progress.

Page 4: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Table of Contents

List of Figures vi

Acknowledgements viii

Abstract ix

Chapter 1. Introduction 1

Lifetime and degradation science approach 2

Thesis overview 3

Chapter 2. Background and literature review 5

Previous research on real world PV modules’ performance 5

Standards 10

Data science 13

Clustering analysis 14

Chapter 3. Real-world Data Acquisition 16

SDLE SunFarm design 16

Global SunFarm network and Energy CRADLE 17

Chapter 4. Results: Real-world data analytics 19

Overview 19

Raw data validation 21

Exploratory Data Analysis (EDA) on Integrated Data 29

Clustering of AC Power Data 31

Data Assembly 36

Sub-sampling 38

iv

Page 5: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Clustering of Solar Noon Time Performance Ratio Data 42

Chapter 5. Discussion 49

Data analytics 49

Performance at different relative positions 51

Performance of different brands 55

Power time series data clustering 56

Solar noon time performance ratio clustering 58

Chapter 6. Conclusions 60

Chapter 7. Future research 62

Improved SunFarm data quality and redundancy 62

Predictive model 62

Appendix A. List of 24 manufacturers and nameplate power 64

Appendix B. SunFarm network 65

SDLE SunFarm design & characteristics 65

Energy CRADLE SunFarm informatics 73

Appendix. Complete References 76

v

Page 6: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

List of Figures

2.1 Pie chart of method used to determine Rd 6

2.2 PR subsetting 8

4.1 60 PV modules distribution 20

4.2 Baseline result of 20 brands 22

4.3 Power time series plot 23

4.4 Microinverter’s efficiency 24

4.5 Power curve comparison 29

4.6 Total power production of 20 bands 30

4.7 Normalized power production of 20 brands 32

4.8 Hierarchical cluster 1 33

4.9 Total within cluster sum of square 1 34

4.10 Power time series plot with clustering result 35

4.11 Normalized performance metrics 39

4.12 Noontime PR versus yI 40

4.13 PR in different climate condition 42

4.14 Pairs plot of PR 1 43

4.15 Pairs plot of PR 2 45

4.16 Total within cluster sum of square 2 46

4.17 Hierarchical cluster 2 47

4.18 PR time series plot with clustering result 48

vi

Page 7: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

5.1 Sensor cross check 50

5.2 Averaged performance ratio of 15 min around solar noon time 52

5.3 Comparison of normalized AC power in winter 53

5.4 Comparison of normalized AC power in summer 54

B.1 An overview of SDLE Sunfarm 66

B.2 Sample tray and concentrator 67

B.3 Dual axis tracker 69

B.4 Tracker frame 69

B.5 SunFarms within Ohio 72

B.6 Architecture of NO-SQL Hadoop system 73

B.7 Architecture of Energy CRADLE’s user front end 75

vii

Page 8: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Acknowledgements

I would like to express my deepest gratitude to the patience, diligence, and resource-

fulness of the entire team of researchers in the Solar Durability and Lifetime Extension

(SDLE) center at Case Western Reserve University, Department of Material Science and

Engineering, headed by Prof. Roger H. French. Explicit thanks to Dr. Timothy Peshek

and Mohammad A. Hossain who helped build and maintain the SDLE SunFarm data

acquisition system.

Thanks for the coordinated efforts by researchers at the Center for Statistical Re-

search, Computing and Collaboration (SR2C), Department of Epidemiology & Biostatis-

tics. Prof. Jiayang Sun and Dr. Yifan Xu’s guidance in statistics and data science was

instrumental in this work.

Assistance and technical support from researchers in the Medical Informatics Divi-

sion of EECS, especially Prof. G.Q. Zhang and his group members, Yashwanth Reddy

Gunapati, and Tarun Jian, who were extremely valuable in completing the data collec-

tion and Energy CRADLE part of this work.

I would also like to acknowledge the funding for this work. The SDLE center

was established through funding through the Ohio Third Frontier, Wright Project Pro-

gram Award Tech 12-004. The PV module case study was supported by the Bay Area

Phocovoltaic Consortium Prime Award No. DE-EE0004946, Subaward Agreement No.

60220829-51077-T.

At last I would like to certify that there is no proprietary material in this thesis.

viii

Page 9: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

PV Module Performance Under Real-world Test Conditions–A Data

Analytics Approach

Abstract

by

YANG HU

0.1 Abstract

In pursuit of a higher fidelity understanding of the long-term degradation of long-lived

technologies, such as photovoltaic (PV) systems, the framework of Lifetime and Degra-

dation Science (L&DS) goes beyond initial qualification tests and investigates the under-

pinning mechanisms of degradation. L&DS concerns itself with the complex and mul-

tivariate signatures of the degradation process and uncovering the fundamental phys-

ical mechanisms contributing to that degradation. In the case of PV modules, this ef-

fort requires extensive continuous monitoring of PV modules’ power production and

climatic conditions. The responses of PV module to the stressors of the real world is

cross-correlated to the simulated and accelerated stressors placed on devices in a labo-

ratory setting.

A unique, highly instrumented, outdoor test facility for PV materials, components,

and systems, the Solar Durability and Lifetime Extension (SDLE) center’s SunFarm, was

built for the purpose of better understanding the power degradation mechanisms of

PV modules and materials. The SDLE SunFarm provides an apparatus for the collec-

tion of real-world time series data consisting of output power, weather and insolation

ix

Page 10: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

metrology. The SunFarm is comprised of 122 individual PV power plants, including 120

module-level plants and 2×8 modules, string-level plants. Output power is monitored

through appropriate grid-tied inverters.

The metrology package developed at CWRU for the collection of time series data pro-

vides a model to be implemented at external sites around the globe. In order to expand

the ability of monitoring PV systems’ performance under different climatic conditions,

a global SunFarm Network was implemented among nine outdoor test facilities around

the world in collaboration with academic institutions and industrial partners including

commercial power plants.

This thesis provides the initial data analytics on the first six months of data from 60

PV modules on the SDLE SunFarm, and serves as a model for the analytics of full dataset

from the global SunFarm Network. The data was first validated by characterization of

the measurement apparatus, redundancy of measurement, and time-slewing accord-

ing to minimization of the time cross-correlation function using a free and open-source

statistical software language and packages known as “R”. Using R (v3.0.1)1 for clustering

data analysis base upon unfiltered AC power time series showed that the data fell into

six clusters, which represented the six different electrical sites of SDLE SunFarm.

The data were intelligently assembled and subsampled around solar noon time. PV

performance ratio (PR), which is a measure of PV modules’ output at given incident

power from sunlight, was used as a indicator of modules’ working effectiveness. Cor-

relations among the filtered sub-set of solar noon time PR data were discerned with hi-

erarchical clustering analysis. K-means clustering was used to confirm the optimum

x

Page 11: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

number of clusters for the analytics. The clustering results differentiate modules on dif-

ferent physical sites, pointed out malfunctions of the PV mounting system, and incapac-

ity of certain module brands. These results are useful for correlating different modules’

response to stressors and those stressors’ effects on overall performance.

xi

Page 12: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

1

1 Introduction

Solar energy is becoming a more mature and mainstream source of electricity; the pho-

tovoltaic (PV) industry has experienced remarkable growth over the past decade. World-

wide, PV has already exceeded the 100 GW installed capacity mark in 20122. Germany

lead the installation in 2012 with 7.6 GW, followed by China with between 3.5 and 4.5

GW3,4. In the US during 2012, there were 3.2 GW installed, fourth in the world2. A solar

project will be installed, on average, every four minutes in the US5. By the end of 2013,

over 100,000 individual solar systems will be installed, exceeding 4.4 GW in capacity. In

the academic world, although much PV research still focuses on gaining higher efficien-

cies and inserting new technologies, interest in lifetime and degradation has risen. At

the 2010 Department of Energy Science for Energy Technology workshop6, the topic of

PV lifetime and degradation science (L&DS) was made a research priority and its im-

portance was reconfirmed in the Mesoscale Science Report7. A quantification of power

decline over time, also known as degradation rate (Rd ), is equally important as initial

performance. Especially for investigators and PV power plant owners, degradation rates

essentially determine the lifetime of a PV system. A well-known disaster in the PV in-

dustry was Carrizo Plains, which was once the largest PV power plant in the world8. The

Page 13: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Introduction 2

installation failed after four years of operation because it exhibited a power degrada-

tion rate of 10% per year. Commercial PV panels claim a degradation rate lower than

1% per year, and usually come with a 25 year manufacturer warranty9. However, recent

research, sampling from on over 2000 degradation rates reported around the world, sug-

gest that some PV systems exhibit a power degradation rate (Rd ) higher than 1%. Addi-

tionally, the study observes that Rd is highly dependent on the operating environment10

.

1.1 Lifetime and degradation science approach

In order to predict the performance and lifetime of PV modules, a better understand-

ing of degradation mechanisms and the influence of climate condition is necessary. A

performance and lifetime prediction tool (PLP) based on a reliability physics and prog-

nostics approach was proposed, which requires indoor accelerated studies of PV mate-

rials, components and system and a real-world degradation and time series analysis of

PV modules11,12.

Real-world testing plays a critical role in researching degradation mechanisms, firstly

because it is the typical operating environment for PV systems13. A real-world environ-

ment is a unique combination of different stressors that no indoor testing chamber is

able to duplicate. Stressors in the real-world include, but are not limited to solar irradi-

ance, rain, snow, salt fog, and soiling. In order to isolate the influence of a single stres-

sor or several stressors requires precision and redundant climate condition monitoring.

Secondly, outdoor testing is the only way to correlate indoor accelerated testing to real-

world performance. By developing metrics, metrology, and tools to quantify, compare,

and cross-correlate the response of PV modules and components to a variety of stressors

Page 14: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Introduction 3

for both accelerated and outdoor testing, it is possible able to link observed responses

to particular stressors and determine quantitative rates of degradation.

1.2 Thesis overview

1.2.1 Background and literature review

A literature review of previous research on PV modules, PV power plant performance

under real-world operation conditions and different data filtering methods applied is

provided in this thesis. Two IEC standards which were used for data monitoring and

data cleaning in this study are also reviewed. Finally some background information on

data science is provided.

1.2.2 Real World Data Acquisition

SDLE SunFarm’s design and the data acquisition methods applied to the case study of

60 PV modules on SDLE SunFarm are explained.

1.2.3 Results:Real-world data analytics

Descriptive data analysis and data clustering results are presented in this section.

1.2.4 Discussion

Discussion of the data analytic procedures for outdoor test data, comparison of mod-

ules’ performance under different climate conditions, at different relative position to

sunlight, comparison of initial indoor performance and outdoor performance of 20 brands

will be presented in this section.

1.2.5 Conclusions

Conclusions draw from data analysis are presented in this section.

Page 15: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Introduction 4

1.2.6 Future research

An improved study protocol and a predictive model are planed for future research.

1.2.7 Appendix

A list of 24 PV models being studied in this thesis, SunFarm design and characteristics,

and full references are presented in the appendixes.

Page 16: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

5

2 Background and literature review

2.1 Previous research on real world PV modules’ performance

2.1.1 PV module degradation

PV modules’ power output is known to decline over time, and a quantification of this

phenomenon is measuring the degradation rate (Rd ) of a PV system. It is equally impor-

tant for investigators and power plant owners to know the initial efficiency of PV mod-

ules as well as their degradation rates. Jordan and Kurtz reviewed over 2000 degradation

rate reports in 201110. All the degradation rates that had been reported were determined

using one of the four methods introduce below.

Current-voltage (I-V) curves, which are typically taken at discrete time intervals in-

door with a solar simulator or outdoors with a portable I-V curve tracer, are used for

determining Rd14. In order to take an indoor I-V curve, the PV module needs to be taken

off the array, which is not convenient for PV system owners. Outdoor I-V curve tracing

requires a very clear sky. Fig. 2.1 shows the the methodologies used to determine Rd . The

use of indoor I-V curve tracing increased after the year 2000 due to the widespread use

of flash indoor solar simulators. Neither of these methods provide continuous measure-

ments, in fact it would take a large effort to acquire I-V curve measurements on every PV

module on a real PV power plant15. As a result, a large portion of the Rd measurements

Page 17: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Background and literature review 6

(40 out of 58, around 70%) were determined using only two, or even one, data point,

which leads to low accuracy and high uncertainty16. Using continuous power data for

Figure 2.1. Pie chart of the number of references deploying the indicatedmethods to determine degradation rates prior to and following the year200010.

Rd determination can improve the accuracy10. Photovoltaic for Utility Scale Application

(PVUSA)17 and performance ratio (PR)18 Rd measurement methods are in the continu-

ous data category. PVUSA is an AC rating method developed by engineers working on

Page 18: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Background and literature review 7

the PVUSA project. The PVUSA method provides an empirical relationship of the mod-

ule’s AC output as a function of solar irradiance, ambient temperature and wind speed.

PR gives a measure of the ratio of modules convert efficiency in the field to a manu-

facturer provided qualification test efficiency under standard test conditions (STC), of

25◦C, 1kW /m2, and AM 1.5 irradiance.

The degradation rate was determined by taking the trend of continuous data using

time series analysis19,20. Both methods display strong seasonality that can affect re-

ported rates and increase uncertainties. In practice, the process of preferentially choos-

ing data subsets, referred to as data filtering, such as data for sunny-only days, can re-

duce the noisiness of data21. However, data filtering usually eliminates or disregards the

impact of different climate conditions on modules’ Rd .

2.1.2 Performance ratio filtering

Performance ratio (PR) reflects the PV system conversion efficiency in the field com-

pared to that under qualification STC. Previous research reported that typical PR of PV

systems is about 70%-80%. A survey conducted by Nils Reich from Frauhofer Institute

for Solar Energy Systems suggests that the PR for newly built PV systems in Germany in-

creased to 90%22. However these reported PR are all filtered and averaged with a certain

methodology. Reich’s study only considered POA irradiance between 800-1000 W /m2

and temperatures of either the 35-40◦C or the 40-45◦C temperature bin. Following the

first round of filtering there is still remaining ”outliers”, which they discarded all the data

points with a deviation of more than ±5% from the median of the annual PR. Fig 2.2

shows how annual PR was determined from already filtered data set. There are obvious

outliers exceeded 110% at the beginning of the study, and additional outliers lower than

40% during the study. This range was selected because ”there is no physical reason apart

Page 19: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Background and literature review 8

from malfunctions or measurement uncertainty, why PR at selected irradiance and tem-

perature conditions should differ that much”23.

Figure 2.2. PR subsetting of an entire plant. Keeping ±5% data from themedian of annual PR22.

Another study conducted by Jordan et al . from NREL used three steps filtering13.

POA irradiance is fixed between 800 W /m2 to 1200W /m2. Another two filters were

applied, denoted as stability and outliers. The stability filter ”eliminates data points

when POA changes more than 20 W /m2/mi n and the module temperature more than 1

◦C/min”. Outlier filter ”uses DC/POA to eliminate snow days, partial shading conditions.

Furthermore, the data for sunny days were selected by filtering for clearness index >0.5”.

Clearness index of the sky is the ratio of measured global irradiance over the extrater-

restrial beam irradiance on a similarly tilted surface24. After filtering, PR shows good

precision, which is good for degradation determination. However by applying a filter it

Page 20: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Background and literature review 9

only keeps data from constant bright sunny days and eliminated the other weather con-

ditions. Filtered data was also averaged to eliminate seasonality, yet weather and season

have important effects.

Recently, Hasselbrink et al . from SunPower Corporation developed a unique ap-

proach of using “3 million module-years of live site data”25. Instead of determine yearly

degradation data with monthly averaged PR and moving average method, which ignore

seasonality by smoothing out the variation, performance index of the same day of the

year was used to determine the degradation rate at each day of the year. And yearly

degradation rate was determined from the distribution of the 365 Rd . This method in-

cluded all climate conditions; however, isolating the influence of each climate stressors

is not the focus of their study.

2.1.3 Influence of weather stressors

A PV systems’ operating environment is a combination of multiple weather stressors

including temperature, humidity, radiation, soiling, etc. Interest has risen for the in-

vestigating of the influence of one or multiple stresses. Faiman, Ye et .al conducted an

experiment on three different types of modules: Mono c-Si, micromorph Si and a-Si

with single junction. Their performance under two distinct monsoon seasons through-

out the year was modeled26. The results show module efficiency is highly correlated

to temperature. However, as a result of Singapore’s low altitude, module’s efficiency at

noon time is not strongly correlated to spectral effects, which arises from changes in air

mass. Another study focused on the soiling losses of solar systems, was conducted by a

group of researchers at the University of California San Diego. They qualitatively mod-

eled the losses caused by dust accumulating on module surfaces between two days of

rain27. The research explicitly compared average soiling losses of modules mounted at

Page 21: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Background and literature review 10

tilt angles from 0-5, 6-19, and greater than 20 degrees. Soiling loss of sites have tilt angle

shallower than 5◦ showed losses five times that of the rest of the sites.

Seasonal variation, which has usually been neglected in the process of determin-

ing Rd , contains information about influence of climate stresses on PV modules per-

formance and reliability. The research reported here aims to extract more information

by doing exploratory data analysis and clustering analysis on the entire AC power time

series data before sub-sampling or “filtering”.

2.2 Standards

2.2.1 Photovoltaic system performance monitoring

IEC 61724 describes general guidelines for the monitoring and analysis of the electrical

performance of photovoltaic systems28.

Meteorology. For climate conditions monitoring, total irradiance in the plane of array

(GI ) shall be measured in the same plane as the PV array by calibrated reference devices

or pyranometers. Ambient air temperature (Tam) shall be measured at a location that

can represent array conditions using temperature sensors that are shielded from direct

solar radiation. Wind speed (SW ) shall be measured at a height that can represent array

conditions.

Electrical parameters. PV system electrical parameters including output voltage (VA),

output current (IA), and output power (PA) represent the DC electrical characteristics.

Utility grid electrical parameters including utility voltage (VU ), current to utility grid

(ITU ), current from the grid (IFU ), and power to the utility grid (PTU ). The standard also

point out that “AC voltage and current may not need to be monitored in every situation.

DC power can either be calculated in real time as the product of sampled voltage and

Page 22: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Background and literature review 11

current quantities or measured directly using a power sensor. If DC power is calculated,

the voltage and current quantities shall be sampled not averaged.” This explains why

the microinverter used in this study provides instantaneous DC voltage and DC current

and averaged AC power.

System performance indices. System performance indices are part of derived parame-

ters that relate to system energy balance and performance calculated from the recorded

monitoring data. Performance indices normalize system performance, which makes PV

systems of different configurations and at different locations comparable. These indices

include yield, losses and efficiencies. Yields are energy quantities normalized to rated

array power. System efficiencies are normalized to array area. Losses are the differences

between yields.

Daily mean yields. a) The array yield YA is the daily array energy output per kW of in-

stalled PV array:

YA = E A,d /P0 = τr × (Σd ay P A)/P0 (2.1)

This yield represents “the number of hours per day that the array would need to operate

at its rated output power, P0, to contribute the same daily array energy to the system as

was monitored”.

b) The final PV system yield Y f is the portion of daily net energy output of the entire

PV plant which was supplied by the array per kW of installed PV array:

Y f = YA ×ηLO AD (2.2)

This yield represents the number of hours per day that the array would need to operate

at its rated power output to equal monitored net daily yield. ηLO AD is load efficiency.

Page 23: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Background and literature review 12

c) The reference yield Yr can be calculated by dividing the total daily in-plane irradi-

ation by the module’s reference in-plane irradiance GI ,r e f .

Yr = τr × (Σd ayG I )/G I ,r e f (2.3)

This yield represents the number of hours in a day the sun needs to be at reference irra-

diance levels in order to contribute the same incident energy as measured on the field.

Normalized losses. By subtracting yields, normalized losses are calculated.

a) The "array capture" losses Lc represent the losses due to array operation:

Lc = Yr −YA (2.4)

b) The balance of system (BOS) losses LBOS represent the losses in the BOS compo-

nents:

LBOS = YA × (1−ηBOS) (2.5)

c) The PR indicates the overall effect of losses on the array’s rated output due to ar-

ray temperature, incomplete utilization of the irradiation, and system component inef-

ficiencies or failure:

PR = Y f /Yr (2.6)

2.2.2 Procedures for temperature and irradiance corrections to measurecurrent-voltage characteristics

In IEC. 6089118, three correction procedures have been introduced. For time’s sake, only

the first procedure will be introduced, which was used for the baseline data correction

in this work. The second procedure is especially good for large irradiance corrections

(>20%). The third procedure needs to be utilized when the temperature coefficient of

PV devices is unknown.

Page 24: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Background and literature review 13

Correction procedure 1. The measured current-voltage characteristic shall be corrected

to standard test conditions ,which is given at 25 and 1000W /m2, by applying the follow-

ing equations:

I2 = I1 + ISC · (G2/G1 −1)+α · (T2 −T1) (2.7)

V2 =V1 −RS · (I2 − I1)−κ · I2 · (T2 −T1)+β · (T2 −T1) (2.8)

where I1,V1 are coordinates of points on the measured characteristics; I2,V2 are coordi-

nates of the corresponding points on the corrected characteristics; G1 is the irradiance

measured with the reference device; G2 is the irradiance at the standard or other de-

sired irradiance; T1 is the measured temperature of the test specimen; T2 is the standard

or other desired temperature; ISC is the measured short-circuit current of the test speci-

men at G1andT1;αandβ are the current and voltage temperature coefficients of the test

specimen in the standard or target irradiance for correction and within the temperature

range of interest; Rs is the internal series resistance of the test specimen; κ is a curve

correction factor.

2.3 Data science

2.3.1 Data validation

Data validation is the process of ensuring that data analysis is based on a clean, correct

and useful data set29. Data validation including data type checks, for example, whether

the data is power production of PV module or irradiance intensity on the PV module’s

plane; file existence check, check for which days data files are available for analysis;

cross-system consistency check, which compare data point to the same variable col-

lected in different systems to ensure it is consistent. In practice data validation rules

Page 25: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Background and literature review 14

can be implemented through the automated facilities of a data dictionary30, or by the

inclusion of explicit application program validation logic31.

2.3.2 Exploratory data analysis

In outdoor testing of PV systems, test conditions are not controllable, the best we can do

is to collect as much data as possible so as quantitatively evaluate climate stressors and

the PV systems’ response. Exploratory data analysis (EDA)32encompasses and surpasses

initial data analysis (IDA)33 while IDA narrowly focus on hypothesis testing and check-

ing assumptions, EDA encourages statisticians to explore the data, possibly formulating

hypothesis that can guide further experiments and data collection. EDA usually sum-

marizes main characteristics of data by visual methods, including box plots, histograms,

multi-vari charts which graphically displays patterns of variation.

2.4 Clustering analysis

Data describe the characteristics of different PV systems. In order to understand all

kinds of response and phenomena, one of the most important steps of data analysis ac-

tivities is to classify or group data into a set of categories or clusters. Data objects that are

classified in the same group or cluster should reflect similar properties based on some

criteria. Classification processes can be supervised or unsupervised. Supervised classi-

fication is mapping data objects into predefined classes. Unsupervised classification is

know as cluster data analysis34. As described in literature, “A direct reason for unsuper-

vised clustering comes from the requirement of exploring the unknown natures of the

data that are integrated with little or no prior information”35. Clustering algorithms will

be discussed in this paper including hierarchical clustering and k-means clustering.

Page 26: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Background and literature review 15

Hierarchical clustering is a connectivity based clustering algorithm. It is based on

the core idea of “objects being more related to nearby objects than to objects farther

away”36. In order to determine the similarity of two objects, the distance of two objects

need to be defined. Distance metrics including Euclidean distance, squared Euclidean

distance, Manhattan distance, Dynamic time warping etc. Euclidean distance computes

the root of square differences coordinates of a pair of objects:

DX Y =√∑

k(xi k −x j k )2 (2.9)

The standard Euclidean distance can be squared in order to place progressively greater

weight on objects that are farther apart:

D2X Y =∑

k(xi k −x j k )2 (2.10)

Manhattan distance or city block distance represents distance between points in a

city road grid. It computes the absolute differences between coordinates of a pair of

objects:

DX Y =∑k| xi k −x j k | (2.11)

Linkage criterion specifies if two set of objects can joined into one by measure dif-

ferent objects pairs in two sets.

K-means clustering is also known as centroid-based clustering37, which partitions

objects in a way that objects assigned to the same cluster are nearest to each other. K-

means clustering uses Euclidean distance metrics. The quantity that can evaluate the

quality of k-means clustering result is within-cluster sum of squares (WCSS), which is a

sum of the distance among the objects in the same cluster. The goal is to assign each

objects to a cluster such that the total WCSS is minimized.

Page 27: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

16

3 Real-world Data Acquisition

3.1 SDLE SunFarm design

The SDLE Sunfarm located on the west campus of CWRU is about one acre in size. 14

high precision, Feina SF20 dual-axis tracker and 2 sites of adjustable tilted racking com-

prise the 16 electrical sites of SDLE SunFarm. 122 individual PV power plants include

120 PV modules with microinverters and two sets of 8 PV modules connected in series

with string inverters. Output power is monitored through inverters and fed back to the

grid through a reversing relay. 120 modules work with microinverters were evenly sepa-

rated into two groups, each group has 60 samples (3 modules samples from 20 brands).

Two groups of modules use two different microinverter models for comparison. The

first 60 microinverters installed were Enphase model M215. Electrical data was reported

by Enphase’s embedded Enlighten data acquisition system. The metrology platform

(shown in Table 3.1) includes insolation, and weather monitoring. Minute-by-minute

global horizontal irradiance (GHI) data was monitored by a Kipp & Zonen CMP6 pyra-

nometer, positioned near the fixed racking. Another Kipp & Zonen CMP11 pyranome-

ter was also set on the horizontal plane and connected to a Daystar multi-tracer. Two

Vaisala WXD520 weather stations were placed on the SunFarm to record wind speed,

wind direction, rainfall, rain intensity, rain duration, and humidity. An anemometer was

Page 28: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Real-world Data Acquisition 17

connected to the Master Control Unit of the trackers to monitor the wind load on the

trackers. T-type thermocouples were used for backsheet temperature monitoring.

Instrument Attributes

Enphase micro-inverterAC powerDC currentDC voltage

Kipp& Zonen pyranometer Solar irradiance

Vaisala WXD 520 weather station

TemperatureWind speedWind directionRainfallHailRelative humidity

T-type thermocouple Backsheet temperatureTable 3.1. Parameters monitored using SDLE SunFarm metrology platform

The data acquisition system consists of 17 networked Campbell Scientific CR-1000

dataloggers, with each datalogger connected to an AM 16-32 multiplexer, extending the

capacity of datalogger to 32 differential measurement channels. The Campbell data-

loggers monitor thermocouple and sensor outputs. Enphase micro-inverters use the

Enphase Envoy Communications Gateway to connect each individual micro-inverter to

Enlighten monitoring and management software. Similarly, Solectria string inverters

use the Solrenview system to collect data. Minute by minute data can be downloaded

from Solrenview web servers.

3.2 Global SunFarm network and Energy CRADLE

Cleveland’s climate, a humid continental, is not typical for PV degradation research. In

order to study PV modules’ performance under different climatic conditions, a global

SunFarm network was established among nine PV outdoor test beds across the world.

Page 29: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Real-world Data Acquisition 18

The purpose of the Energy Common Research Analytics and Data Lifecycle Environ-

ment (Energy CRADLE) is to create, for engineering, and in particular lifetime science,

the tools and protocols necessary to transform Big Data to information, which informs

scientific knowledge to guide further analysis30,38–40. Energy CRADLE is tightly focused

on serving the needs of handling and sharing data across the SunFarm network. Appen-

dix B provides further details.

Page 30: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

19

4 Results: Real-world data analyt-ics

4.1 Overview

In this section, the results will focus on real-world performance of 60 crystalline silicon

PV modules from 20 different manufacturers exposed from November 25, 2012 to May

31, 2013 on the SDLE SunFarm. The purpose of this case study is interpreting the infor-

mation in the data that has been collected during the first 6 months of SDLE SunFarm’s

operation, developing a data cleaning, and data munging procedure. This analytic pro-

cedure will be integrated within Energy CRADLE and will guide the way for data pro-

cessing on the cloud. This case study can also inform experimental design and evoke

further research interests.

Fig. 4.1 is a blueprint of SDLE SunFarm’s 16 electrical sites. The 60 modules stud-

ied are distributed on the sites marked with red boxes, specifically fixed rack Site 1 and

tracker Sites 4, 6, 8, 12, and 14. All three modules from the same manufacturer are placed

on the same site. On fixed rack Site 1, 18 modules from 6 brands are aligned horizontally,

and modules of same brand placed adjacent to each other. On trackers, which carry ei-

ther 6, 9 or 12 modules each, modules of the same brand are evenly distributed on the

same tracker frame.

Page 31: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 20

In this study, manufacturer information of the modules are withheld, each brand

will be referred to as capital letter A through T. Modules’ location is represented using

lower case f or t, which are short for “fixed rack” or “tracker”, respectively, followed by

site number. Each module has a sample number start with “sa”. For example, in Fig. 4.3

power data was record from “A.f1.sa18259.00” which is a module of brand A mounted on

fixed rack site 1 and its sample number is “sa18259.00”.

Figure 4.1. 60 PV modules studied in this section are distributed on 6 dif-ferent electrical sites shown in red boxes in the plot. Site 1, which is thelong site along the bottom, is a fixed tilt rack site. 18 modules are exposedon Site 1. The rest of the modules are exposed on even number of trackersites. There are 12 modules on Site 4, 6 modules on Site 6 and 8, 9 moduleson Site 12 and 14.

4.1.1 Analytical methods

R1, which is a free and open source programming language and software environment

for statistical computing and graphics was chosen as the data analysis tool. The data

Page 32: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 21

analytical methods applied to this case study consist of raw data validation, exploratory

data analysis, data assembly, data subsampling, and clustering data analysis.

4.2 Raw data validation

4.2.1 Module baseline

All the modules studied on SDLE SunFarm were brand new modules purchased on the

open market. Before being exposed to the sunlight, I-V characterization of each mod-

ule were recorded using a SPIRE SPI- 4800 solar simulator and I-V curve tracer, located

at the Wright Center for Photovoltaic Innovation and Commercialization (PVIC), at the

University of Toledo. In order to reduce the impact of instrument uncertainty, sixteen

I-V curve measurements were acquired for each module. Additionally, the backsheet

temperature and the irradiance intensities were recorded. Each measurement was cor-

rected to standard test condition (STC), specified at 25 ◦C and 1 kW /m2 according to

IEC 6089118. Maximum power output (Pmax) was taken from 16 corrected I-V curve

measurements to represent the initial performance of each module under STC. For 60

modules, the standard deviations of 16 Pmax measurements fall between 0.04%-0.9%,

which supports the reliability of baseline results.

In order to evaluate the initial performance of each brand, the mean of Pmax were

taken for each brand from three modules and normalized by dividing nominal power

output of the module. Fig. 4.2 shows the normalized performance of each brand, and

the deviation among three module samples is shown as error bars. Most brands’ (except

H and Q) initial performance fall in the gap between 0.95-1.05, which means their initial

performance reached the common market expectation of ± 5% of their nominal power.

Page 33: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 22

Figure 4.2. Cross-sectional comparison of crystalline silicon PV modulesfrom 20 different manufacturers. Y axis is normalized power. X axisshows the brands and location. Brand names were replaced with lettersA through T. Letter f and t represent fixed tilt rack and tracker. The max-imum power output (Pmax) of three modules of each brand were mea-sured. The bars show the averaged normalized power of each brand. Thestandard deviation was plotted as error bars.

4.2.2 Power data

As introduced in previous chapter, electricity generated from all 60 modules are reported

by the microinverters data acquisition system, Enlighten. Enlighten data reports DC

Page 34: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 23

current, DC voltage, a microinverter’s internal temperature, and AC power. Data collec-

tion interval is 5 minutes. Prior to Energy CRADLE, power data was collected from En-

lighten manually. Fig. 4.3 shows an example module’s AC power over six months. From

this figure, we can clearly see daily variation of power data. During the 180 days, there

are several gaps in data partly due to three trips of the interconnection relay. Over the

180 days observation time, 99 days have power data reports.

Figure 4.3. Power production versus time of one PV module.

4.2.3 Microinverter’s efficiency

A microinverter’s efficiency is calculated from AC and DC power data. Microinverter’s

conversion efficiency is given by the ratio of AC power to DC power. DC power is cal-

culated using the product of DC current, DC voltage. AC power is provided in the data.

Both DC current and DC voltage have two significant digits, while AC power is given as

an integer. Fig. 4.4 shows the efficiency of 60 microinverters. The majority of the effi-

ciencies are between 95% and 99%, which is consistent with the efficiency provided by

the manufacturer. However, there are 12 points that exceed 1.0, which is contrary to the

laws of thermodynamics. By looking at the raw data, it appears that when the PV mod-

ule’s DC output is low (around 1 W), the module tends to "round up" the product of DC

Page 35: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 24

current and voltage to an integer. This rounding behavior explains why abnormal effi-

ciency appears mostly on trackers 12 and 14; these two trackers did not track properly

during the majority of the 180 days data was collected. The modules on these trackers

were exposed to low irradiance level longer than the other modules.

Figure 4.4. The efficiency of 60 microinvertors from 99 days of datacollection after exposure on the SDLE SunFarm on fixed rack 1 (blue),tracker4 (red), tracker6 (yellow), tracker8 (green), tracker12 (purple), andtracker14 (light blue).

Page 36: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 25

4.2.4 Microinverter burst-mode

Further investigation of the“round up” effect shows that it is due to the fact that Enphase

microinverter works at “burst-mode” at low DC input. When a PV module is working

under low irradiance, DC output of the module is low, and therefore the DC to AC con-

version efficiency will drop41. Microinverters can scan the DC voltage at each AC cycle

(1/60 second). When a microinverter detects the DC input is lower than 30%, it will

charge a capacitor instead of converting DC to AC power. At the next cycle, microin-

verter scans the PV module for its output again and adds that to the amount of charge

already stored in that capacitor bank from the previous cycle. If the combined power

is high enough for a DC-to-AC conversion, the capacitor will release the charge. As a

result, when the microinverter is “bursting” the stored-up charge, the AC output of the

microinverter will be higher than what the DC input would dictate. This explains why

when a microinverter always rounds up its AC power which, as a result, make its effi-

ciency higher when it’s working at low irradiance level. However, it is also known that

AC power reported by Enlighten is an averaged value instead of an instantaneous mea-

surement. The affect of “Burst-Mode” will be shown in the data subsampling part of this

chapter.

4.2.5 Weather data

Insolation data. This study uses global horizontal irradiance (GHI) monitored by a Kipp

& Zonen CMP6 pyranometer placed at a horizontal plane as reference. The sampling

rate for irradiance data was determined by the datalogger’s scan period, which is 1 minute

for all the data loggers on SDLE SunFarm. Incident irradiance on a PV module’s plane

Page 37: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 26

is different from horizontal plane, so in order to convert GHI to plane of array (POA)

irradiance, the assumption was made that all incident sunlight is direct sunlight.

Global horizontal irradiance (GHI) to plane of array (POA) irradiance conversion. In

reality, global horizontal irradiance (GHI) consists of direct irradiance and diffuse irra-

diance. Direct irradiance is proportional to direct normal irradiance (DNI) with a sine

function, while diffuse irradiance varies on different planes.

G H I = DN I × si nα+ Idi f

PO Atr acker = DN I + Idi f

PO A f i xed = DN I × si n(α+β)+ Idi f

(4.1)

where Idi f is the defused irradiance, α is the elevation angle of the sun, and β is the tilt

angle of the fixed rack, in this case β equals 22.3◦. The elevation angle is given as :

α= 90◦−θ+δ (4.2)

where θ is the latitude; and δ is the declination angle given as:

δ= 23.45◦si n[360◦× (284+d)/365] (4.3)

where d is the day of the year. Since the DNI or Idi f data was not available for the first 6

months, here the assumption is that all the incident light is direct sunlight, which sim-

plifies the formulas as:

G H I = DN I × si nα

PO Atr acker = DN I

PO A f i xed = DN I × si n(α+β)

(4.4)

Page 38: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 27

The estimated POA is supposed to be higher than the actual incident irradiance on a

modules’ surface as a result of treating diffused light as direct light, amplified by con-

verting GHI to POA. In the future, this systematic error can be removed by having an

irradiance sensor set on the plane of array and a direct irradiance sensor for DNI. In this

case study, as modules performances will be cross compared, the systematic error can

be ignored.

Additional climate data. Additional climate data including special climate events and

the cloudiness levels were collected from online open source historical data, such as

Weather Underground (http://www.wunderground.com).

4.2.6 Data alignment

Data alignment is another important validation process for time series data. There are

multiple data sources on the SunFarm, and different devices synchronize time from dif-

ferent time sources. For example, the weather data used in this study was collected by

dataloggers on SDLE SunFarm. Time on these dataloggers was synchronized through

a controller software on a desktop computer in the SDLE lab. The power data was re-

ported by Enphase user interface, which synchronizes time with their server. PV mod-

ules can generate power almost instantaneously when sunlight hit on the front surface;

therefore, time series data of power and irradiance should be highly correlated.

Weather and power data were aligned using the sample cross correlation function

(ccf) in R. Ccf in R is defined as the set of sample correlations between time series X at

time t + h (h = 0, ±1, ±2) and time series Y at time t, where X is potentially a predictor

of Y. If two time series were perfectly aligned, then correlation is the highest when h =

0 and the correlation value drops as the absolute value of h increases. However, if the

maximum correlation value appears when h is positive, then X lags Y. If correlation is

Page 39: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 28

maximum when h is negative, then X leads Y. It was determined using CCF that, weather

data was leading power data by 3 minutes before March 10th , 2013. After daylight saving

time began on March 10th , 2013, the time shift became 63 minutes. The time for the

power data was found to be more trust worthy in comparison to the standard Greenwich

Mean Time (GMT). The weather data were separated into two parts, before and after

March 10, then the time of two parts were slewed accordingly.

4.2.7 Malfunction of trackers

According to the maintenance record, tracker 4 did not experience any mechanical prob-

lems. All the other trackers experienced some amount of malfunctions during the ob-

servation time. In order to determine the days when trackers malfunctioned, the power

data from an example module on each tracker were plotted versus time and compared

to the example power data from the functioning tracker 4. An example of the power data

curve from a stopped tracker (tracker 8) and power data curve from a normal operating

tracker (tracker 4) is shown in Fig. 4.5.

The curve of the power on tracker 8 is not symmetric with the majority of the power

generated in the afternoons. Thus, it was stopped facing west. By comparing the curves

in this manner, the malfunctioning dates of each tracker were determined. Tracker 6

was stopped for 5 days in May. Tracker 8 stopped 10 days in April and May. Tracker 12

was off tracking until its gear counter got replaced. Tracker 14 was not tracking most of

the time because of a gear stopper issue.

Page 40: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 29

Figure 4.5. A comparison of AC power generated by module on tracker 8(top) and tracker 4 (bottom) between May 12th to May 18th 2013, whentracker 8 stopped functioning and facing west.

4.3 Exploratory Data Analysis (EDA) on Integrated Data

4.3.1 Total power production

A normal way of evaluating a PV module’s performance is by comparing the total power

production. A module’s power production, in this case, is not only affected by it’s nom-

inal power rating, but also affected by modules mounting system. The averaged total

power production of each brand is shown in Fig. 4.6. The highest power production in

99 days is 60.36 kWh from brand G on tracker 4 and the lowest power production, 28.79

kWh, is brand T on tracker 14. The four brands on tracker 4 (red), on average, produced

about 40% more power than the six brands on fixed rack (blue). Modules on the other

trackers produced less power than tracker 4 by varying degrees. Generally, modules on

an operational tracker should produce more power than those on fixed rack when the

Page 41: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 30

tracker is operating correctly. However, except the tracker on sites 4 and 6, the modules

on trackers produced less power on average than the modules on the fixed rack site 1.

In order to compare the performance of different brands on the same site, total power

needs to be normalized.

Figure 4.6. The bar graph shows the averaged total power production ofeach brand from fixed rack 1 (blue), tracker 4 (red), tracker 6 (yellow),tracker 8 (green), tracker 12 (purple), and tracker 16 (light blue). The stan-dard deviation are plotted as error bars.

4.3.2 Normalized power yield

Normalized power yield is defined as the ratio of the total power production to the prod-

uct of nominal power and exposure days (i.e, 99 days)28. Normalized power yield is

Page 42: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 31

equal to the time that the PV plant is operating at nominal power output in a day. Nor-

malized power is an important factor in choosing PV modules. In the PV industry, a

module’s price is presented in the unit of dollars per watt, so modules have higher nor-

malized power yield are more cost effective. Fig. 4.7 shows the normalized power pro-

duction of each brand. As the trackers experienced different problems, it is not valid to

compare the performance between two brands on different sites. However, the rank-

ing of brands within one site demonstrates the relative performance of different brands.

For example, on fixed rack which is shown in blue, brand B’s total power production is

the lowest but it’s normalized power production is the highest among 6 brands. This

indicates that under the same environment, including temperature and irradiance con-

ditions, brand B performs better than the other brands on site 1.

4.4 Clustering of AC Power Data

Section 4.3 showed that integrated performance, total power production and normal-

ized power production vary among the 20 different brands; however, brands on the

same site perform similarly. In order to determine if the modules of the same brand

always perform similarly, it is necessary to check the similarity of 60 modules’ AC power

time series data. As mentioned in Chapter 2, a statistical way of checking the similarity

of multiple observations is clustering analysis. A hierarchical clustering analysis (HCA)

was conducted on all of the time series AC power data from the 99 days of observed

data. There are 9698 observations for each module. A dendrogram that uses Euclidean

distance metric and average linkage criterion is shown in Fig. 4.8. The distance metric

and linkage criteria will be discussed in 5. Red boxes in the plot show the result of di-

viding modules into six groups. The grouping result reflected exactly 6 physical sites on

Page 43: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 32

Figure 4.7. The bar graph shows the averaged normalized power produc-tion of each brand from fixed rack 1 (blue), tracker 4 (red), tracker 6 (yel-low), tracker 8 (green), tracker 12 (purple), and tracker 16 (light blue). Thestandard deviation are plotted as error bars.

SDLE SunFarm. Although there are some exceptions, most of the modules of the same

brand are close to each other in distance. In Fig 4.8 from left to right, the 6 groups con-

sist of modules from tracker 14, tracker 12, tracker 8, fixed rack 1, tracker 6, and tracker

4 respectively.

However, six is an arbitrary number chosen from experience with the data. In order

to confirm the result of HCA is valid, the k-means algorithm was used. K-means cluster-

ing partition observations into k clusters which minimize the “total within-cluster sum

of square" (WCSS). In this case, each sample (PV module) has a set of 9698 observations,

Page 44: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 33

Figure 4.8. Hierarchical cluster analysis of 60 modules based on all ACpower time series data. The clusters were generated using “hclust" in“stats" package in R(v3.0.1). Distance matrix is computed using a Eu-clidean method. Distance between sets of observations is defined withthe average linkage method. When the dendrogram tree is divided into6 groups, each group includes exactly the modules physically located onthe same electrical site.

where each sample is treated as a 9698-dimensional vector. In order to determine the

k value that gives the most reasonable result, a commonly used method is the elbow

method42. The elbow method is applied to a plot of WCSS as a function of the cluster

numbers, k. The best cluster result occurs when adding an additional cluster does not

statistically improve the model of the data. This point should be chosen as the cluster

number, hence the "elbow criterion". A survey of the WCSS as a function of k is plotted

Page 45: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 34

in Fig. 4.9. The elbow point is equal to 6, which is marked in a red circle. The k-means

clustering result a k equals to 6 is consistent with the result of HCA. In order to visually

Figure 4.9. Total within cluster sum of square (WCSS). Elbow points oc-curs when k is equal to 6.

conform that AC power time series fall into each group similar to each other, Fig. 4.10

plotted AC power output of 60 PV modules over 99 days according to both k-means and

hierarchical clustering results. 60 AC power time series were separated into 6 groups,

Page 46: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 35

modules from the same brands are shown in the same color. Group 1 through 6 corre-

spond to tracker 14, tracker 12, tracker 8, fixed rack 1, tracker 6, and tracker 4, respec-

tively. Fig. 4.10 confirms that the shape and magnitude of the AC power time series in

each cluster are similar.

Figure 4.10. AC power of 60 modules grouped by hierarchical result.Color of the curve differentiate module brands.

Page 47: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 36

4.5 Data Assembly

4.5.1 Performance metrics

Up to this point analysis was based on 60 modules’ AC power data. However, in order

to correlate modules power output to climate conditions, climate data and power data

were assembled in the following way. To compare the performance of 60 modules of

different nominal power and with different mounting system, a normalized analysis and

presentation was introduced based on IEC 6172428 and H. Haeberlin et. al. work43.

Normalized energy yields and losses. Definition of six performance indices introduced

in IEC 61724 were discussed in Chapter 2. Since the 60 modules being studied are all

working with individual microinverters instead of a PV array and AC power generated

was directly fed back to the grid. Each module is one PV plant. Data was collected

on a minute basis instead of on daily basis. Specifically power data was collected ev-

ery 5 minutes and weather data was collected every minute; therefore, it is necessary

to modify the performance metrics. These new performance indices are normalized in-

stantaneous quantities. Irradiance yield, YI , is POA irradiance normalized to reference

irradiation 1 kW/m2 (Equation 4.5).

YI = PO A/G0,G0 = 1kW /m2 (4.5)

DC yield, YDC , is the DC power normalized to a module’s nominal power (Equation 4.6).

DC power was calculated by multiplying DC current to DC voltage.

YDC = PDC /P0 (4.6)

AC yield, YAC , is the AC power normalized to module’s nominal power (Equation 4.7).

YAC = P AC /P0 (4.7)

Page 48: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 37

Capture losses, Lc , is the part of incident sun power not captured by the solar cell (Equa-

tion 4.8).

Lc = YI −YDC (4.8)

System losses, Ls , is the DC-AC inverter conversion losses (Equation 4.9).

Ls = YDC −YAC (4.9)

Performance ratio (PR) is the ratio of the useful energy fed back into the grid to the en-

ergy which would be generated an ideal PV module with cell temperature of 25◦C and

the same irradiance.

PR = YAC /YI (4.10)

4.5.2 Solar time

Local noon time is usually not when the sun is the highest in the sky due to the Earth’s

orbit and human adjustments such as time zones and daylight saving time. Noon local

solar time (LST) is defined as the time when the sun is highest in the sky for a particular

location and not necessarily at the local noon time44. In order to better understand the

modules’ performance corresponding to solar motion, timestamps of the data need to

be converted from local time (LT) to LST. The local standard time meridian (LSTM) is

a reference meridian used for a particular time zone and is similar to the Prime Merid-

ian (longitude = 0◦), which is used for greenwich mean time (GMT)45. The formula for

calculating LSTM is given by Equation 4.11:

LST M = 15◦×∆TGMT (4.11)

where ∆TGMT is the difference of the local time from GMT in hours. ∆TGMT equals −4

for eastern daylight time (EDT), equals −5 for eastern standard time (EST). Equation of

Page 49: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 38

time (EoT) corrects the eccentricity of the Earth’s orbit and Earth’s axial tilt (Equation

4.12).

EoT = 9.87si n(2B)−7.53cos(B)−1.5si n(B) (4.12)

where

B = 360◦(d −81)/365

in degree and d is the number of days in the year. The net time correction factor (TC)

accounts for the variation of LST in a given time zone (Equation 4.13, 4.14).

TC = 4(Long i tude −LST M)+EoT (4.13)

LST = LT +TC /60 (4.14)

Six performance metric variables of one single module over one day in LST are shown

in Fig. 4.11. YI ,YDC ,YAC ,Lc ,Ls , and PR curves are plotted in black, green, blue, yellow,

brown, and red, respectively. On a clear sunny day both irradiance and PR show a dome

shaped curve. The PR curve has a comparably flat top, which suggest PR and POA are

highly correlated and PR is less sensitive to POA irradiance at high level (over 750 W/m2).

4.6 Sub-sampling

4.6.1 Solar noon time performance ratio

From the EDA plot of performance metrics (Fig. 4.11), it is clear that PR is correlated to

POA irradiance. PR can reach up to 0.85 on the 22.3◦ fixed rack and 0.90 on the tracker at

solar noon time when the POA irradiance is high. In order to reduce the volume of data

and reduce temporal fluctuations, the PR is subset into ±15 mins around solar noon

time. The sampling rate of the PR is 5 mins, so there are about 7 data points within this

Page 50: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 39

Figure 4.11. Normalized performance of one single module(D.fi.sa18286.00) on fixed rack on December 12, 2012. These vari-ables are PR (red), YI (black), YDC (green), YAC (blue), Lc (yellow), Ls

(brown).

30 min window. During the 99 days, there are roughly 700 observations for each module,

which is still statistically sufficient for further analysis.

4.6.2 Snowy days

EDA on the solar noon time PR subset was performed by plotting PR versus YI for each

module. An example of PR vs YI plot is shown in Fig. 4.12. Three abnormal data points

groups are marked in yellow circle in the plot.

Page 51: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 40

Figure 4.12. Solar noon time PR of a module (C.f1.sa18328.00) on thefixed rack versus YI . The vertical blue line marks POA irradiance at 1200kW /m2, the red horizontal line marks the PR at 1.0. Group 1 are pointshave a PR greater than one. Group 2 are the points at irradiance higherthan 1200 kW /m2. Group 3 are the points showed zero PR.

Group 1. In theory, the PR can never exceed one. From literature and standards22,28,43,

PR is normally reported to be 0.8-0.85 on average. The abnormal points in group 1 have

a PR calculated greater than one. By looking at the raw data including AC power, DC

power, and POA irradiance at each of data points, two potential causes were found.

First, these data points appeared when the irradiance changed quickly. As discussed

previously, power data was reported by Enphase Envoy system and the method of their

data acquisition is unknown. It could be that Enphase does not report instantaneous

power but an averaged value. By using averaged power data and instantaneous POA

measurement for PR calculation, a systematic error is introduced. Another cause of high

performance could be the microinverter working in burst-mode as discussed earlier.

Page 52: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 41

Group 2. Solar radiation outside the Earth’s atmosphere is 1.36 kW /m2 and the global

irradiance on a tracker plane at noon time in Denver, Colorado is less than 1.2 kW /m2.

The abnormal data points in Group 2 showed a POA irradiance on the fixed rack, 22.3◦

tilt plane is higher than 1.2 kW /m2. This is a systematic error introduced by converting

GHI to POA. Without direct global POA irradiance monitoring or direct sunlight moni-

toring, it is not possible to correct the error. However, since the same irrradiance con-

version method is used for all modules, it will not affect the cross-sectional comparison

of the modules performance.

Group 3. The PR of the modules was small or equal to zero even when irradiance was

not very low, which indicates that the module may be covered. Moreover, it only ap-

peared in certain days in December and January, and only appeared on some modules

mainly on the fixed rack. Given the climate, this suggests that it was caused by snow

coverage.

Snowy days. In order to document the relationship between low performance of mod-

ules and snowy weather, historical climate condition data from a third party web site

was collected. PR time series were plotted for each module, data points on snowy days

were highlighted with red and blue colors46. Fig. 4.13 shows the PR of six modules of

two different brands (three modules from each brand). Low PR appears only during or

after snow or fog-snow days, proving that the abnormal points group 3 points were most

likely snow coverage. All snow-covered date was determined by plotting out all 60 mod-

ules PR versus time, and the snowy days data were assembled as a subgroup.

Page 53: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 42

Figure 4.13. Solar noon time PR of six modules, data points when therewas snow or fog-snow were highlighted in red and blue. Three moduleson the top row are from brand A placed on fixed rack. Three modules onthe bottom row are from brand G placed on tracker 4. All three A brandmodules showed low/zero performance during or after some snowy days,while the other three modules on tracker do not.

4.7 Clustering of Solar Noon Time Performance Ratio Data

As discussed in Section 4.6, PV modules performance ratio data was subsetted to 15

minutes around solar noon time. In order to reduce data volume, the average of each

days PR data was taken. After subtracting the snowy days, there are 75 days of PR data

Page 54: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 43

left, thus 75 data points represent the solar noon time performance of each module.

Since the relationships among the 60 modules noon time performance is not intuitive,

an EDA can lead to a better understanding of the data. A pairs plot is commonly the first

step of EDA.

Figure 4.14. A pairs plot of solar noon time PR of three modules of thesame brand. For each row, all the Y axis are the PR of a module. For eachcolumn, all the X axis of the plot are PR of a module. Module’s samplenumber is shown in the diagonal boxes. The correlation coefficient ofeach X, Y axis is calculated, and represented by varying shades of green.The darker the green relate to the higher correlation coefficient.

Page 55: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 44

The pairs plot takes the value of the PR of one module as the X coordinate and the

PR of another module as the Y coordinate. If the two modules under comparison per-

formed the same at each observation time, then we expect to see all data points in a di-

agonal line. Fig 4.14 is a plot of solar noon time PR of three modules of the same brand.

In order to better visualize the correlation of the X and Y coordinates, the correlation

coefficient of the two modules are represented by varying shades of green related to the

strength of the correlation coefficient. The darker the green color relates to a higher cor-

relation coefficient between the two PR series. In Fig. 4.14, performance of the module,

G.t4.sa18211.00, is over 99% correlated to G.t4.sa18210.00. Only the first ten modules

pairs plot is shown in Fig. 4.15, as space is limited; however a pairs plot of all 60 mod-

ules was studied. From the pairs plot of all 60 modules, a green and gray pattern helps

visualize that modules in different groups. Qualitatively grouping the modules requires

a Pearson distance matrix which use correlation coefficient to define the distance be-

tween different observations47. Also, since there is no domain knowledge suggesting

the number of clusters, a k-means clustering analysis was used to determine the num-

ber of clusters. Fig 4.16 shows the WCSS as a function of clusters number, k, and there is

a clear “elbow point” when k equals 5. An HCA dendrogram of solar noon time perfor-

mance ratio using the Pearson distance matrix and average linkage criteria is shown in

Fig. 4.17. Modules are divided into 5 groups using the cutr ee function. The first group

on the left consists of all the modules on the fixed rack. The second group are all mod-

ules on tracker 4, 6, and 8 except for three modules of brand M. The third group are all

modules on tracker 14 and the forth group are all modules on tracker 12. The last group

on the right contains only three modules of brand M. Time series of each modules are

plotted out according to the HCA result (Fig. 4.18). There are several gaps since data was

Page 56: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 45

Figure 4.15. Pairs plot of solar noon time PR of ten modules. For each row,all the Y axis are the PR of one module. For each column, all the X axisof the plot are PR of one module. Module’s sample number is shown inthe diagonal boxes. Correlation coefficient of each X, Y axis is calculated,and represented by color. The darker the green background represent thehigher correlation coefficient. First three modules showed strong cor-relations (over 99%) among themselves. They also showed fairly strongcorrelations to next three modules on a different location (about 90%).However first three module showed low correlation to last four modules,correlation coefficient is lower than 30%.

not continuous due to snow and noncontiguous AC power data. The variability of the

curve in the same group are mainly caused by the noncontiguous nature of the data.

The largest dispersion of data curves appears in the last group. Before mid February,

data curves of three modules are highly varied.

Page 57: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 46

Figure 4.16. Total within-cluster sum of square (WCSS). The elbow pointoccurs when k equals 5.

Page 58: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 47

Figure 4.17. The hierarchical clustering of 60 modules based on solarnoon time PR time series data. The distance matrix is computed usingthe Pearson method. Distance between the sets of observations is de-fined with the average linkage method. From left to right, the first groupincludes all modules on fixed rack; the second includes all modules fromtracker 4, 6, and brand N on tracker 8; the third included all modules ontracker 14; the fourth included all modules on tracker 12; the last groupare three modules from brand M (on tracker 8).

Page 59: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Results: Real-world data analytics 48

Figure 4.18. Solar noon time PR of 60 modules grouped by hierarchicalclustering result. Color of the curve differentiates samples.

Page 60: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

49

5 Discussion

5.1 Data analytics

This section will focus on the problems found in the process of data cleaning, munging,

and exploratory data analysis (EDA).

5.1.1 Irradiance data crosscheck

In the data subsampling part (Section 4.6), due to snow coverage, some modules, es-

pecially those on the fixed rack showed low performance during and after snowy days.

Snow has the potential to cover irradiance sensors on SunFarm. Since all irradiance data

used in this study were measured by a pyranometer mounted on top of a electrical cab-

inet near the fixed rack, it is necessary to evaluate the irradiance data quality.

There were two pyranometers working on SDLE SunFarm during the observation

time, the GHI data used in this work is collected by a CMP11 pyranometer mounted on

an electrical cabinet. The other one was mounted horizontally and connected with a

Daystar multi-tracer. The Daystar can trace real-time I-V curves of up to 32 modules.

Unlike dataloggers, the Daystar doesn’t collect irradiance measurements every minute,

it recordes irradiance data only when an I-V curve was being taken. In the first several

months, the Daystar took an I-V curve in 30 minute time intervals. After proper data

Page 61: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Discussion 50

Dec 2600:00

Dec 2604:00

Dec 2608:00

Dec 2612:00

Dec 2616:00

Dec 2620:00

Dec 2623:59

020

4060

2012−12−26

fog−snow

●● ●● ●● ●●●●

●●

●●

●●

●●

●● ●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●● ●● ●●

Dec 2700:00

Dec 2704:00

Dec 2708:00

Dec 2712:00

Dec 2716:00

Dec 2720:00

Dec 2723:59

010

2030

4050

60

2012−12−27

fog−snow

●● ●● ●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

Dec 2800:00

Dec 2804:00

Dec 2808:00

Dec 2812:00

Dec 2816:00

Dec 2820:00

Dec 2823:59

010

020

030

040

0

2012−12−28

fog−snow

●● ●● ●● ●●●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●● ●● ●●

Dec 29 Dec 29 Dec 29 Dec 29 Dec 29 Dec 29 Dec 29

020

4060

2012−12−29

●● ●● ●● ●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●● ●● ●●

Dec 30 Dec 30 Dec 30 Dec 30 Dec 30 Dec 30 Dec 30

050

100

150

2012−12−30

●● ●● ●● ●● ●●●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●●● ●●

Dec 31 Dec 31 Dec 31 Dec 31 Dec 31 Dec 31

050

100

150

200

250

2012−12−31

●● ●● ●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●● ●● ●●

Figure 5.1. Cross-comparison of global horizontal irradiance (GHI) fromtwo irradiance sensors.

cleaning and data alignment, each day’s irradiance data collected by the two pyranome-

ters was compared in the same figure. In Fig. 5.1, red points represent irradiance data

collected by the Daystar, the black curve shows irradiance collected by the datalogger.

Because the two instruments have different sensitivity and their sampling time and rates

are not the same, an efficient way of evaluating the data is visual comparison. In most of

the plots, red dots stack on black curve. However in the second plot on the first row, De-

cember 27th, 2012, the red points are far above the curve which indicates that irradiance

measurement on the Daystar pyranometer is much higher than the one near fixed rack.

It is potentially caused by snow coverage. The weather condition data confirms that this

Page 62: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Discussion 51

data was collected on a snowy day. A survey of the irradiance cross check has been done

by plotting two sensors’ data together. On the following dates, fixed rack pyranometer’s

reading is lower than Daystar pyranometer. "2012-12-27, 2012-12-28, 2012-12-30, 2013-

01-04, 2013-01-25". These days’ data were eliminated

5.1.2 Performance ratio filtering

Several methods of data filtering were tried; however, with limited data sources (direct

POA irradiance, Direct Normal Irradiance (DNI), and the module’s temperature were not

available for the case study), the ±15 min around solar noon time PR data was consid-

ered the best filtering method when malfunctions like snow covered module surface and

snow covered irradiance sensor data were eliminated from the final dataset. Average PR

of 30 min around solar noon time is plotted in Fig. 5.2. Modules on the fixed rack have

the highest PR, around 0.75. Modules on trackers showed lower PR due to the fact that

POA on tracker was over estimated by conversion, and during last two months of opera-

tion some microinverters on trackers saturated at noon time.

5.2 Performance at different relative positions

There are only two different positions of the PV modules relative to the sun (fixed rack

tilted at 22.3◦ and tracker mounted); however, due to the tracker’s mechanical failures

during parts of the exposures, there are actually six different positions of PV modules

relative to the sun. This is because the relative positions of modules on different trackers

are not identical in practice.

Page 63: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Discussion 52

Figure 5.2. Average performance ratio of 60 PV modules 15 min aroundsolar noon time. Y axis are the sample IDs of each module. From left toright are modules on fixed rack 1 (blue), tracker 4 (red), tracker 6 (yellow),tracker 8 (green), tracker 12 (purple), and tracker 14 (light blue) respec-tively.

5.2.1 Performance in sunny days

Modules mounted on the dual axis trackers should always track the sun, which increases

the amount of incident light to a module. Furthermore, as all the incident light is nor-

mal to the modules surface, a larger portion of the sunlight can be absorbed. Intuitively,

modules mounted on trackers should produce more electricity than the ones mounted

on fixed rack. Fig. 5.3 shows normalized AC power production of one PV module on

Page 64: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Discussion 53

Figure 5.3. Comparison of normalized AC power of G sa18211.00 (red) ontracker and D sa18286.00 (blue) on tilted rack on December 12th , 2012.

fixed rack (blue) and one PV module on a normally operating tracker (red) on December

12, 2012. Fixed rack is tilted at 22.3◦ which tends to optimize the irradiance gain dur-

ing summer months of the year. Thus in December, even on a bright sunny day, noon

time peak power didn’t reach 70% of the nominal power. The total power production on

this particular day of a specific module on a tracker (G.t4 sa18211.00) is 1.64 times of the

power production of D.f1 sa18286.00. However, in the summer time, as the sun’s eleva-

tion angle is higher at noon time, performance of the modules on fixed rack increases.

Fig. 5.4 shows the AC power production of the same modules on May 13th , 2013. The

modules on the trackers gain more power in the morning and afternoon, but almost the

same as the module on fixed rack around noon time. As discussed above, one factor is

Page 65: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Discussion 54

Figure 5.4. Comparison of normalized AC power of G sa18211.00 (red) ontracker and D sa18286.00 (blue) on tilted rack on May 13th , 2013

the sun’s elevation angle in May is closer to the fixed rack’s tilt angle at noon time. An-

other possible effect is microinverter’s saturation. From the data provided by Enphase,

M215’s maximum output power is 225 W48. Around noon time, the AC power of brand

G module’s output increased to 225 W and the microinverter saturated, which forms the

”flat head” of the curve.

5.2.2 Performance in snowy days

Snow-coverage occurred on all of the modules mounted on fixed rack. When a PV mod-

ule is fully covered by snow, there is no light incident to solar cell and the module’s power

Page 66: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Discussion 55

production is zero. If part of the solar cell is exposed to sunlight, it will generate electric-

ity. However, because of the bypass diodes, if only one cell is working then current can-

not flow. This small amount of electricity will be dissipated as heat, raising the temper-

ature of the PV module, and melting the snow on the surface. As a result, snow coverage

should not last long, but it will increase the thermal stresses on the solar cell and string,

which may lead to reliability issues in the future. In contrast, most of the modules on

the tracker do not have the snow coverage problem. On a normal working tracker, the

tracker frame is almost vertical during dawn and dusk, thus it mechanically avoids snow

accumulation on the module surface.

5.3 Performance of different brands

PV modules are chosen for PV power plants based on their nominal power from a mod-

ule’s data sheet. Fig. 4.2 shows that the brands H and Q didn’t reach 95% of their nominal

power, and therefore that a power plant could loose 5% of the designed power output by

using these two models of PV module. For a nominal 10 MW utility scale power plant,

that is over 500 kW power “lost" due to the over-estimation of the PV module’s power

output. In comparison, power plants using module type A, J, or S may produce more

power than specified, which is not necessarily good since it may over load the power

grid or subsequent electricity storage instruments. Furthermore, a PV module’s price is

also based on the nominal power. For example, while brand A and Q share the same

nominal power, the initial selling price of brand A in2011 was 0.82 $/W, while brand Q

was 0.80 $/W, suggesting Brand Q is cheaper. However, given the baseline results of the

two brands, brand A’s initial performance is 101% of its nominal power, but brand Q’s

initial performance is only 92.5%. The final price of brands A and Q are 0.812 $/W and

Page 67: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Discussion 56

0.865 $/W, respectively. Thus, brand A is actually a more cost effective choice. Deviation

of power output of the brand may also be an important factor for the initial performance

of a PV power plant. Since most power plants are implemented with string inverters,

power output of a string is determined by the lowest module in the string. The lower the

power output deviation of modules in the same string, the less the electricity dissipated

as heat in a string. Internal thermal stress is also suspected as a factor that causes power

degradation.

5.4 Power time series data clustering

In the hierarchical clustering of power time series data two distance methods were ap-

plied, Euclidean distance and dynamic time warping (DTW)49. Euclidean distance is the

most commonly used method, which is the square root of sum of squares of attribute

differences. DTW is an algorithm often used in time series analysis for measuring the

similarity of two temporal sequences, which may vary in time or speed. However, in our

application DTW was not used because all time series data were rigidly aligned. The

speed of variation of AC power was determined by incident sun light and a module’s

internal characteristics, so they should be at the same pace. Therefore, dynamic time

warping is not applicable and Euclidean distance is more appropriate.

There are only two different positions of the PV modules relative to the sun (fixed

rack and tracker mounted); however, due to the mechanical failures of the trackers dur-

ing parts of the exposure time, there are actually six different locations of PV modules rel-

ative to the sun. This is because the relative position of modules on different trackers are

not identical in practice. In order to find a proper linkage criteria, hierarchical cluster-

ing results were compared; using three linkage criteria; complete, single and average. In

Page 68: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Discussion 57

complete-linkage clustering, or farthest neighbor clustering50, the distance of two clus-

ters is equal the distance between those two elements that are farthest from each other.

In single-linkage clustering, or nearest neighbor clustering; the distance between two

clusters equals to the distance between the nearest elements pair. Average clustering, or

UPGMA (unweighted pair group method with arithmetic mean) defines the distances

between two clusters as the average of all distances between pairs of elements. Dendro-

grams using all three different linkage criteria with the group equal to six, the modules

on the same site are always grouped in one cluster. Cutting these 6 large groups into

20 smaller clusters, the modules from the same brand are more likely to form a cluster

than the dendrograms using the other two criteria. Although k-means clustering itself

is an independent clustering method, in this application, it was used to determine the

number of groups and confirm the result of the hierarchical clustering. For power data

clustering, k-means clustering consistent to hierarchical clustering result when k equals

six, which is also the elbow point. This result confirmed that modules’ location (which

site they were mounted on the SDLE SunFarm) has the strongest influence on modules

power production over time. The analytical method of power time series data cluster-

ing gives a way of distinguishing PV systems mounted in different configurations. In

this case study, as observed on the field, it was already known that six sites worked dif-

ferently. For data shared through Energy CRADLE, which does not necessarily carry a

maintenance log with them, this power time series clustering will be a good tool to start

to classify the power data.

Page 69: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Discussion 58

5.5 Solar noon time performance ratio clustering

Solar noon time PR clustering is based on a subset of PR time series data. Snow covered

PV modules and snow covered irradiance sensor data were removed from the data. For

the other 85 days, average of solar noon time PR was taken. In hierarchical clustering,

the Pearson correlation coefficient was used as distance metric, it is a measure of how

similar two time series’ shapes are, in other words, how similarly two modules’ perfor-

mances vary with time. As with power time series clustering, the average linkage criteria

was used. The clustering results can be interpreted as a group of fixed rack, a group of

functioning trackers, two groups of malfunctioning trackers, and a group of malfunc-

tion modules. Fig 4.17 shows this clearly, the first cluster on the left are all the modules

on fixed rack. Even at solar noon time, modules performance on fixed rack is different

from the ones on trackers because they are tilted at 22.3◦, which is much shallower than

the trackers at solar noon time in the winter. The second cluster on the left consists all

modules from trackers 4 and 6, and one brand from tracker 8. These three trackers were

tracking correctly most of the time, however 6 and 8 stopped for a short time. Taking the

means of PR at solar noon time minimized the influence of outliers, and made the dif-

ference between fixed rack and tracker more distinguishable. It is noteworthy that the

“fixed rack group" is very close to the“normal tracker group", which indicates their per-

formances are very similar. Fig.4.18 proved that the shape of curves in group 1 and group

2 are similar, though the amplitudes are different because of the difference in angles.

Tracker 12 was off tracking most of the time, and tracker 14 was not in motion, so

they each formed a group. The last group on the right isolated brand M from other

modules on the same tracker. Brand M had been replaced during the experiment be-

cause previous models are not compatible with Enphase M215 microinverter. During

Page 70: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Discussion 59

the first two months of operation errors were reported by Enlighten, so these modules

were replaced on Feb. 15th. Previous clustering result didn’t reflect this change, yet us-

ing noon time PR data and the Pearson coefficient distance method distinguished these

changes and the distance between this cluster to the other modules on the same tracker

(brand N modules) is quite far, which indicates performance of brand M is unlikely to be

related to brand N. Solar noon time mean PR clustering result using Pearson correlation

coefficient distance metric neglected diminutive differences among functioning track-

ers51, differentiated fixed rack and trackers, identified malfunctioning tracker 12 ,14 and

malfunctioning module brand M.

Page 71: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

60

6 Conclusions

Previous work on time series analysis (TSA) of a photovoltaic (PV) system’s real-world

performance focused on determining precise and accurate Rd by means of using highly

filtered data and neglecting seasonality. My research provides a higher fidelity data an-

alytics approach to TSA. With a case study of the first six mouths of real-world perfor-

mance of 60 PV modules and climate conditions, a data analytic procedure was devel-

oped, which includes the following parts.

Raw data was first validated by characterization of the measurement apparatus. The

impact of microinverters working in burst mode, and microinverter’s efficiency on AC

power data were evaluated. Irradiance data was converted from global horizontal plane

(GHI) to plane of array (POA), and the systematic error being introduced was discussed.

Secondly, using the redundancy of measurements, snow covered irradiance sensor data

was first detected by visually cross checking the daily irradiance profile; and then elimi-

nated from the data set. Thirdly, data alignment was accomplished using a cross-correlation

function in R to minimize the time lag of two time series.

Exploratory data analysis (EDA) on integrated data indicate that the total energy har-

vest of each PV system varies severely. A clustering analysis on power time series data

Page 72: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Conclusions 61

found that PV systems located on the same mounting system performed similarly, and

the behavior of all the tracking systems was not identical.

Data was assembled based on IEC 61724, and performance ratio (PR) was used to

evaluate the PV system’s performance. PR of the 60 modules were subsampled to solar

noon time, and snow covered days were eliminated. The solar noon time performance

ratio clustering neglected diminutive differences of different sites, strongly differenti-

ated fixed rack and trackers, identified malfunctioning tracker 12, 14 and malfunction

module brand M.

This work leads to improvements to the SunFarm metrology platform and suggests it

is necessary to have redundant measurements. The Clustering results provide guidance

for future data modeling.

Page 73: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

62

7 Future research

7.1 Improved SunFarm data quality and redundancy

Since the conversion of GHI to POA introduced systematic error, sufficient irradiance

measurements are critical to improve data quality. Another pyranometer measuring

global irradaince was mounted on tracker 7 in June, 2013. A pyrheliometer, which mea-

sures direct irradiance was also mounted on tracker 7 at the same time. Thus, a direct

measurement of POA and NDI( normal direct irradiance) are now available. By taking

the ratio of NDI and POA, it is possible to determine clearness (or cloudiness) of the sky.

Redundant irradiance measurement will also be made to cross-check the sensor’s ac-

curacy. On the Energy CRADLE’s user interface, a sensor cross-check page will enable

real-time on-site monitoring.

7.2 Predictive model

The Global SunFarm Network has already come online, there will be sufficient amount

of PV performance data coming from different climatic conditions available for the next

step. In the next phase of study, it would be interesting to build a mixed effect model52

of PV module’s power output as a function of multiple climate stresses. A mixed effect

Page 74: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Future research 63

model is a statistical model that represents the observed quantities in terms of explana-

tory variables that are treated as if the quantities were non-random. Furthermore, by

correlating to indoor test results, we hope be able to predict a PV module’s degradation

with climatic stresses. This kind of data can direct the improvement of PV modules qual-

ification testing, which will eventually lead to improved lifetime for PV modules.

Page 75: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Appendix 64

Appendix A

List of 24 manufacturers and nameplate power

Manufacturer NameplatePower1 AUO 2402 Astronergy 2353 Bosch 2254 CSI 2205 Conergy 2206 ET Solar 2357 EcoSolargy 2308 Helios 2409 Hyundai 230

10 Kyocera 24011 LG 22012 MX Solar 23013 Mage 23014 Perlight 24015 REC 23016 Sanyo 22017 Schott 23018 Schuco 24019 Sharp 23520 Siliken 22021 Solar World 23022 Trina 23023 UpSolar 24024 Yingli 230

Page 76: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Appendix 65

Appendix B

SunFarm network

1 SDLE SunFarm design & characteristics

1.1 Overview

The SDLE SunFarm was established for outdoor testing of long lived materials, compo-

nents, and systems53,54. It is a highly instrumented outdoor test facility which is not

commonly found in academic research. It is located on CWRU west campus, and is

about one acre in size. There are 16 electrical sites on the SunFarm including 14 high

precision dual axis trackers and two sites of fixed tilt racking, shown in Fig B.1. A total

of 148 full-sized crystal silicon modules bought on the open market from 24 different

manufacturers in sets of six or eight were exposed on both trackers and fixed racking.

On the trackers 8000 PV material samples will be exposed under 1X, 2X, 4X, and 5X suns

illumination with front surface mirror concentrators.

1.2 Samples

Samples being exposed on the SDLE SunFarm are divided into two major groups: PV

modules and PV material sample coupons.

Full-sized crystalline PV modules. In order to better understand power degradation

mechanisms and determine power degradation rates (Rd ) 148 full-sized crystalline sil-

icon modules from 24 different manufacturers around the world are being exposed on

SunFarm to investigate their performance under real-world working conditions. The

majority of the population are polycrystalline silicon modules, only two brands are mono-

crystalline silicon modules. 24 manufacturers and their nameplate power are listed in

appendix A. The 60 crystalline PV module samples studied in this thesis are part of this

Page 77: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Appendix 66

Figure B.1. Top half shows the blue print of SDLE Sunfarm and the distri-bution of 16 electrical sites. Bottom half shows an operating tracker andthe electrical cabinets behind trackers. On the tracker frame showing inthe figure, top half features six PV modules mounted horizontally, and thebottom half features 48 sample trays mounted in 12 by 4 rows.

Page 78: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Appendix 67

Figure B.2. A mechanical drawing of a sample tray, on the left.A 3D draw-ing of 5X front surface mirror concentrator on the right.

total population. These 60 modules are from 20 different manufacturers with 3 samples

from each brand.

PV material samples. In order to better understand PV modules degradation mecha-

nisms, we need to know how each component of PV modules degrade over time. PV

material and component samples are made and exposed on the SDLE SunFarm Back-

sheet samples, front sheet samples and transparent conductor oxide (TCO) samples are

cut into 1 × 1.5 inch coupon and held with sample trays(Fig. B.2).With the use of front

surface concentrator (Fig. B.2), PV material samples can be exposed at 1X, 2X, 4X, 5X

sunlight intensities as well as real-world climate conditions. Material samples will be

taken off periodically for optical characters measurements.

1.3 PV mounting system

Two different PV mounting system are being used on the SunFarm. Fourteen dual-axis

trackers which are commonly used for high-concentration photovoltaics (HCPV) can

Page 79: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Appendix 68

keep the module plane normal to sunlight. Two sites are fixed tilt rack, which is com-

monly used for roof PV installation or utility power plant. Fixed racks are facing south

and tilted at 22.3◦

Fixed rack. Power produced from a PV array is proportional to the direct sunlight it

receives. Typically, fixed PV arrays are tilted to an angle equal to the latitude of the arrays’

location, which is the average elevation angle of the sun through out a year. Here in

Cleveland since we usually have more cloudy days in winter, the fixed rack on SDLE

SunFarm was installed at a shallower angle; latitude of the SDLE SunFarm is 41.5◦, the

tilt angle of the fixed racking is 22.3 ◦. The 30 meters of fixed tilted rack are divided into

two identical electrical sites. Eighteen PV modules are exposed on each site.

Trackers. Dual-axis solar trackers orient PV modules normal to direct sunlight at all

times. They are often seen in concentrated photovoltaic (CPV) applications especially

HCPV systems, which enable the optical components in concentration system. In flat-

panel PV applications, trackers can maximize the performance of PV modules by min-

imizing the indecent angle of sunlight to the module plane. The 14 dual-axis trackers

on the SDLE SunFarm were manufactured by Feina Tracker, Spain. Each tracker con-

sist of three parts, foundation pole, tracker head, and tracker panel, shown in Fig. B.3.

Tracker head is 10 feet off the ground, driven by two DC motors in both horizontal and

azimuth directions. The motion of the motors is controlled by a Tracker Control Unit

(TCU) inside the 4 to 5 inch electrical box behind each tracker.

Tracker panels are 16’ 4” (5m) width× 13’ 1” (4m) length, shown in Fig B.4, which

can hold up to 12 PV modules in landscape mode. In order to enhance the capability

of testing various modules, components, and materials ten flexible unistrut were placed

on the tracker panel to fasten modules and sample trays to the tracker.

Page 80: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Appendix 69

Figure B.3. The photo was taken when SunFarm was under construction.Relative position of fixed rack and tracker was shown in the image, fixedrack is in the front of SunFarm in order to avoid shading issue. Relativeposition of tracker head, tracker foundation, and two electrical cabinetsbehind each tracker are shown on the right hand side.

Figure B.4. A mechanical drawing of tracker frame is shown on the lefthand side. Distance between horizontal unistrut is 0.5m in order tomount sample trays. On the right, there is a drawing of a tracker framefully loaded with 12 full size PV modules.

1.4 SunFarm electrical design

Two sites of fixed tilted rack, plus 14 dual-axis trackers formed the 16 electrical sites of

the SDLE SunFarm. Two electrical cabinets behind each site separate the power device

Page 81: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Appendix 70

and datalogger system (Fig B.3). 110V AC is connected to the power cabinets to power

the tracking system as well as Ethernet switches. In the data cabinet, a data logging

system consist of Campbell CR1000 datalogger, multiplexer, and battery monitor sensor

readings. The SDLE SunFarm have 122 individual power plants, including 120 individual

PV modules connected with microinverter and 2 strings of 8 PV modules connected with

string inverter. These 122 power plants, which can generate about 32 kW of electricity at

peak, all tied to grid through a reversing relay.

1.5 Metrology platform

Power Data. A metrology platform is built for SDLE SunFarm data monitoring, include

power, insolation, and weather monitoring. For power monitoring, either inverters or

I-V curve tracers were used. Two trackers with eight full-sized modules on each used

Solectria PVI1800 string inverters. Another 10 trackers as well as two tilt rack sites used

two brands of micro-inverters, Enphase and Power-One. On the other two trackers a

Daystar multi-tracer is used to take I-V curves of full-sized modules and mini-modules

with one minute time intervals. A portable I-V curve tracer was used on clear days to

take I-V curves on demand.

Insolation data. Redundant insolation sensors were placed around the SDLE SunFarm

in order to get accurate irradiance data and align the trackers. Four Kipp & Zonen pyra-

nometers of three different models (CMP6, CMP11, CMP21) were placed on the hori-

zontal, tilt rack, and tracker planes. A Kipp & Zonen pyrheliometer (CHP1) was used to

measure direct illumination. Multiple split-cell reference cells, Li-cor Li-200 pyranome-

ters, and Apogee SP-212 full spectra radiance sensors were placed in the tracker plane

to help align the tracker frames’ orientations. Another four Apogee SP-212 full spectra

Page 82: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Appendix 71

irradiance sensors and apogee SU-100 UV sensors were mounted on the sample trays to

measure the concentrated solar irradiance.

Climate data. Two Vaisala WXD520 weather stations were placed on the SunFarm to

record wind speed, wind direction, rainfall, rain intensity, rain duration, and humidity.

An anemometer was connected to the Master Control Unit of trackers to monitor the

wind load on the trackers. A snow cup was used to measure the precipitation. T-type

thermocouples were used for backsheet temperature monitoring.

Data acquisition system. The data acquisition system consists of 17 networked Camp-

bell Scientific CR-1000 dataloggers, with each datalogger connected to an AM 16-32

multiplexer, extending the capacity of datalogger to 32 differential measurement chan-

nels. The Campbell dataloggers monitor thermocouple and sensor outputs. Enphase

micro-inverters use envoy unit to collect data from each individual micro-inverter. Sim-

ilarly, Solectria string inverters use Solenview system to collect data. Minute by minute

data can be downloaded from their web servers.

1.6 SunFarm Network

Cleveland’s climate, a humid continental, is not typical for PV degradation research. In

order to study PV modules’ performance under different climatic conditions, a global

SunFarm network was established among nine PV outdoor test beds across the world.

These test beds include four Ohio SunFarms: SDLE SunFarm, Cleveland, Ohio; Lakeview

1MW power plant, Cleveland, Ohio; Replex SunFarm, Mt. Vernon, Ohio; and AEP Dulan

test center, Columbus, Ohio (Fig. B.5).

Within the United States, we cooperate with two Q-Lab SunFarms in Arizona and

Florida, which are in mid-latitude desert climate and humid subtropical climate area,

Page 83: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Appendix 72

Figure B.5. Upper left corner is SDLE SunFarm with tilt fixed rack in thefront and duel axis trackers on the back. Bottom right is Replex SunFarmat Mount Vernon Replex Plastics, with fixed rack, single axis tracker anddual axis trackers. Bottom left is AEP SunFarm at Dolan Technology Cen-ter. Mirror Augmented PV (MAPV)system are on the bottom half of thetilt racks and flat back surface mirrors were mounted tilted towards themodules. The top half of the tilt rack has a non-augmented PV system.

respectively. On an even larger scale, we established three SunFarms abroad with in-

ternational collaborators: Underwriter Lab Sunfarms in Taitung and Lujhu, Taiwan, and

SunFarm at the Indian Institute of Technology Gandhinagar, Ahmedabad (IITGN). These

nine SunFarms span a large the range of environmental conditions across the globe.

Similar data collection methods were applied to each SunFarm. In order to better ma-

nipulate the Big Data that streams back daily, and manage the sensors that go on each

site, a data acquisition system Energy CRADLE was established.

Page 84: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Appendix 73

2 Energy CRADLE SunFarm informatics

The purpose of the Energy Common Research Analytics and Data Lifecycle Environment

(Energy CRADLE) is to create for engineering, and in particular lifetime science, the tools

and protocols necessary to transform Big Data to information, which informs scientific

knowledge to guide further analysis30 38 39 40. Energy CRADLE is tightly focused on serv-

ing the needs of handling and sharing data among the SunFarm network researchers.

Raw data collected from the SunFarms will go through data pre-processing and seman-

tic annotation and stored in a NO-SQL Hadoop system. With domain knowledge Energy

CRADLE can manage the organization and orchestration of the data, making the inquiry

of the data more efficient. The Energy CRADLE data integration environment has two

features, shown in Fig. B.6. First, it can push all the raw data collected from SunFarms

on to a Hadoop Distributed File System (HDFS) and further map to HBase which is a

distributed database. Secondly, through Thrift and REST servers, user can use a visual

front end to interact with data stored in HBase.

Figure B.6. Architecture of NO-SQL Hadoop system.

Page 85: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Appendix 74

The front end of Energy CRADLE (Fig. B.7)consists of four different web pages. Ac-

cording to their functions, the four pages were named: Data Inquiry, Equipment Reg-

istration, Maintenance, and Metrology Check. On the Data Inquiry page, data can be

queried by location, system ID, local time, or local solar time. Invalid data points or NAs

are filtered out. Equipment Registration page was built for the purpose of sensors and

sample management. Location, serial number, and calibration coefficients can be reg-

istered remotely from any one of the SunFarms. Because the top of the tracker is more

than 30 feet above the ground when it is operating, maintenance(e.g, changing samples)

has to done when the track is at birdbath mode. Whenever maintenance is needed, the

operator can go to the Maintenance page to specify the location and duration. Data col-

lected during maintenance will be flagged in the database warning that the tracker is not

at normal operation mode. The redundant insolation and weather sensors cross check

to assure that sensors are working correctly. The Metrology Check page can plot the

same variables collected by multiple sensors comparatively, making the cross checking

easy.

Page 86: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Appendix 75

Figure B.7. Architecture of Energy CRADLE’s user front end.

Page 87: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Bibliography 76

Complete References

[1] Kurt Hornik. The R FAQ, 2013.

[2] Mano’el Rekinger Ioannis-Thomas Theologitis Myrto Papoutsi Ga’etan Masson,Marie Latour. Global market outlook fo photovoltaics. European Photovoltaic In-dustry Association annual report, 2013.

[3] SANDRA ENKHARDT. Germany sets new pv installation record in 2012, January2013.

[4] Becky Beetz. China 2012: 5gw of pv installations predicted, October 2012.

[5] Stephen Lacey. A solar system is installed in the us every 4 minutes, August 2013.

[6] J. Hemminger, G. Crabtree, and A. Malozemoff. Science for energy technology:Strengthening the link between basic research and industry. A report from the BasicEnergy Sciences Advisory Committee, US Department of Energy, 2010.

[7] John Hemminger. From quanta to the continuum: Opportunities for Mesoscale Sci-ence. Technical report, A Report from the Basic Energy Sciences Advisory Commit-tee, 2012.

[8] Andrew L Rosenthal and Cary G Lane. Field test results for the 6 mw carrizo solarphotovoltaic power plant. Solar cells, 30(1):563–571, 1991.

[9] John H Wohlgemuth, Daniel W Cunningham, Paul Monus, Jay Miller, and AndyNguyen. Long term reliability of photovoltaic modules. In Photovoltaic Energy Con-version, Conference Record of the 2006 IEEE 4th World Conference on, volume 2,pages 2050–2053. IEEE, 2006.

[10] Dirk C Jordan and Sarah R Kurtz. Photovoltaic degradation rate-an analytical re-view. Progress in Photovoltaics: Research and Applications, 21(1):12–29, 2013.

[11] M.P. Murray, L.S. Bruckman, and R.H. French. Durability of acrylic: Stress and re-sponse characterization of materials for photovoltaics. In Energytech, 2012 IEEE,pages 1 –6, May 2012.

[12] Myles P. Murray, Laura S. Bruckman, and Roger H. French. Photodegradation in astress and response framework: Poly(methyl methacrylate) for solar mirrors andlens. Journal of Photonics for Energy, 2(1):022004–022004, 2012.

Page 88: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Bibliography 77

[13] D. C. Jordan and S. R. Kurtz. The dark horse of evaluating long-term fieldperformance-data filtering. In PV Module Reliability Workshop, February 26–27,2013, Golden, Colorado, 2012.

[14] Ryan M Smith and National Renewable Energy Laboratory (U.S.). Outdoor PV Mod-ule Degradation of Current-Voltage Parameters Preprint. Number 5200-53713 inNREL/CP. National Renewable Energy Laboratory, Golden, CO, 2012.

[15] Radhika Lad, John Wohlgemuth, and G TamizhMani. Outdoor energy ratings andspectral effects of photovoltaic modules. In Photovoltaic Specialists Conference(PVSC), 2010 35th IEEE, pages 002827–002832. IEEE, 2010.

[16] D Jordan and S Kurtz. Photovoltaic degradation risk. In World Renewable EnergyForum, Colorado, 2012.

[17] A. Kimber, T. Dierauf, L. Mitchell, C. Whitaker, T. Townsend, J. NewMiller, D. King,J. Granata, K. Emery, and C. Osterwald. Improved test method to verify the powerrating of a photovoltaic (PV) project. In Photovoltaic Specialists Conference (PVSC),2009 34th IEEE, pages 000316–000321, 2009.

[18] Iec 60891 ed2.0 - photovoltaic devices - procedures for temperature and irradiancecorrections to measured i-v characteristics | iec webstore | publication abstract, pre-view, scope.

[19] Sanford Weisberg. Applied linear regression, volume 528. John Wiley & Sons, 2005.

[20] John Fox and Sanford Weisberg. An R companion to applied regression. Sage, 2011.

[21] D. C. Jordan and S. R. Kurtz. Data filtering impact on PV degradation rates and un-certainty (poster). In PV Module Reliability Workshop, 28 February - 2 March 2012,Golden, Colorado, 2012.

[22] Nils H Reich, Alexander Goebel, Daniela Dirnberger, and Klaus Kiefer. System per-formance analysis and estimation of degradation rates based on 500 years of mon-itoring data. In Photovoltaic Specialists Conference (PVSC), 2012 38th IEEE, pages001551–001555. IEEE, 2012.

[23] Klaus Kiefer and Daniela Dirnberger. A degradation analysis of pv power plants. In25th EUPVSEC, 2012,Valencia,Spain, pages 005032–005037. EUPVSEC, 2010.

[24] Matthew J. Reno and Joshua Stein. Using cloud classification to model solar vari-ability. Technical report, Sandia National Laboratories, 2013.

Page 89: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Bibliography 78

[25] Mike Anderson Zoe Defreitas Mark Mikofski Yu-Chen Shen Zach Campeau, Char-lie Hasselbrink. Validation of the pvlife model using 3 million module-years of livesite data. In 39th IEEE Photovoltaic Specialists Conference, 2013.

[26] J. Ye, T. Reindl, and J. Luther. Seasonal variation of PV module performance in trop-ical regions. In Conference Record of the IEEE Photovoltaic Specialists Conference,pages 2406–2410, 2012.

[27] F.A. Mejia and J. Kleissl. Soiling losses for solar photovoltaic systems in california.Solar Energy, 95:357–363, 2013.

[28] IEC 61724 ed1.0 - photovoltaic system performance monitoring - guidelines formeasurement, data exchange and analysis | IEC webstore | publication abstract,preview, scope.

[29] Werner Horn, Silvia Miksch, Gerhilde Egghart, Christian Popow, and Franz Paky.Effective data validation of high-frequency data: time-point-, time-interval-, andtrend-based methods. Computers in biology and medicine, 27(5):389–409, 1997.

[30] G.Q. Zhang, T. Siegler, P. Saxman, N. Sandberg, R. Mueller, N. Johnson, D. Hunscher,and S. Arabandi. Visage: A query interface for clinical research. In Proceedings ofthe 2010 AMIA Clinical Research Informatics Summit; San Francisco. March 12–13;2010, pages 76–80, March 2010.

[31] German Puebla, Francisco Bueno, and Manuel Hermenegildo. A generic prepro-cessor for program validation and debugging. In Analysis and Visualization Toolsfor Constraint Programming, pages 63–107. Springer, 2000.

[32] John W Tukey. Exploratory data analysis. Reading, Ma, 231, 1977.

[33] Matthew B Miles and A Michael Huberman. Qualitative data analysis: An expandedsourcebook. Sage, 1994.

[34] Michael R Anderberg. Cluster analysis for applications. Technical report, DTICDocument, 1973.

[35] Rui Xu and Don Wunsch. Clustering, volume 10. Wiley. com, 2008.

[36] wikipedia. hierarchical clustering, June 2013.

[37] James MacQueen et al. Some methods for classification and analysis of multivari-ate observations. In Proceedings of the fifth Berkeley symposium on mathematicalstatistics and probability, volume 1, page 14. California, USA, 1967.

Page 90: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Bibliography 79

[38] G.Q. Zhang, S. Arabandi, and S. Redline. Physio-MIMI lessons learned. Technicalreport, National Center for Research Resources (NCRR), 2011.

[39] Physio-MIMI homepage, 2012.

[40] R. Mueller, S. Sahoo, X. Dong, S. Redline, S. Arabandi, L. Luo, and G.Q. Zhang. Map-ping multi-institution data sources to domain ontology for data federation: ThePhysioMIMI approach. In AMIA Clinical Research Informatics Summit, March 2011.

[41] Enphase. âAIJburst-modeâAI makes enphase micro-inverter systems the smarterchoice. Enphase Whitepaper Series, 2010.

[42] Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schrödl. Constrained k-meansclustering with background knowledge. In ICML, volume 1, pages 577–584, 2001.

[43] Heinrich Häberlin. Normalized representation of energy and power of pv systems.Photovoltaics: System Design and Practice, pages 487–506.

[44] Richard Perez, Pierre Ineichen, Robert Seals, Joseph Michalsky, and Ronald Stewart.Modeling daylight availability and irradiance components from direct and globalirradiance. Solar energy, 44(5):271–289, 1990.

[45] PVEducation.org. Solar time, June 2009.

[46] PVEducation.org. weather data, June 2013.

[47] Kevin R. Coombes. oompaBase: Class unions and matrix operations for OOMPA,2013. R package version 3.0.1.

[48] Preston Steele Tefford Reed David Briggs, Dave Williams. Bigger is better: Sizingsolar modules for microinverters. Enphase Whitepaper Series, 2010.

[49] Meinard Müller. Dynamic time warping. Information Retrieval for Music and Mo-tion, pages 69–84, 2007.

[50] Charles J Krebs et al. Ecological methodology, volume 620. Benjamin/CummingsMenlo Park, California, 1999.

[51] Karl Pearson. Note on regression and inheritance in the case of two parents. Pro-ceedings of the Royal Society of London, 58(347-352):240–242, 1895.

[52] Alex J Sutton, Keith R Abrams, David R Jones, David R Jones, Trevor A Sheldon, andFujian Song. Methods for meta-analysis in medical research. J. Wiley, 2000.

Page 91: PV MODULE PERFORMANCE UNDER REAL-WORLD TEST

Bibliography 80

[53] Yang Hu, Mohammad A Hosain, Tarun Jain, Yashwanth R Gunapati, Lauren Elkin,GQ Zhang, and Roger H French. Global sunfarm data acquisition network, energycradle, and time series analysis. In Energytech, 2013 IEEE, pages 1–5. IEEE, 2013.

[54] Yang Hu, Dave Hollingshead, Mohammad A Hossain, Mark Schuetz, and RogerFrench. Comparison of multi-crystalline silicon pv modules’ performance underaugmented solar irradiation. In MRS Proceedings, volume 1493, pages mrsf12–1493.Cambridge Univ Press, 2013.