multirotors from takeoff to real-time full identiﬁcation using … · 1 day ago · kingston...

1

Multirotors from Takeoff to Real-Time FullIdentification Using the Modified Relay Feedback

Test and Deep Neural NetworksAbdulla Ayyad , Member, IEEE, Pedro Silva , Mohamad Chehadeh , Member, IEEE, Mohamad Wahbah ,

Oussama Abdul Hay , Igor Boiko , Senior Member, IEEE, and Yahya Zweiri , Member, IEEE

Abstract—Low cost real-time identification of multirotor un-manned aerial vehicle (UAV) dynamics is an active area ofresearch supported by the surge in demand and emergingapplication domains. Such real-time identification capabilitiesshorten development time and cost, making UAVs’ technologymore accessible, and enable a variety of advanced applications.In this paper, we present a novel comprehensive approach, calledDNN-MRFT, for real-time identification and tuning of multirotorUAVs using the Modified Relay Feedback Test (MRFT) andDeep Neural Networks (DNN). The first contribution is thedevelopment of a generalized framework for the application ofDNN-MRFT to higher-order systems. The second contributionis a method for the exact estimation of identified process gainwhich mitigates the inaccuracies introduced due to the use ofthe describing function method in approximating the response ofLure’s systems. The third contribution is a generalized controllerbased on DNN-MRFT that takes-off a UAV with unknown dy-namics and identifies the inner loops dynamics in-flight. Using thedeveloped generalized framework, DNN-MRFT is sequentiallyapplied to the outer translational loops of the UAV utilizingin-flight results obtained for the inner attitude loops. DNN-MRFT takes on average 15 seconds to get the full knowledge ofmultirotor UAV dynamics and was tested on multiple designs andsizes. The identification accuracy of DNN-MRFT is demonstratedby the ability of a UAV to pass through a vertical window withoutany further tuning, calibration, or feedforward terms. Suchdemonstrated accuracy, speed, and robustness of identificationpushes the limits of state-of-the-art in real-time identification ofUAVs.

Index Terms—System Identification, Unmanned Aerial Ve-hicles, Multirotors, Learning Systems, Sliding Mode Control,Process Control.

I. INTRODUCTION

ADAPTING controller parameters online to account forunknown or changing process parameters has always

This work was supported by Khalifa University Grants RC1-2018-KUCARS and CIRA-2020-082. (Corresponding autho: Abdulla Ayyad)

A. Ayyad, P. Silva, M. Chehadeh, M.Wahbah, O. Hay, I. Boiko are withthe Center for Autonomous Robotic Systems, Khalifa University, Abu Dhabi,United Arab Emirates (email: [email protected]; [email protected];[email protected]; [email protected]; [email protected]; [email protected]).

Yahya Zweiri is with the Faculty of Science, Engineering and Computing,Kingston University London, London, SW15 3DW, UK. He is also with theCenter for Autonomous Robotic Systems, Khalifa University, Abu Dhabi,United Arab Emirates (e-mail: [email protected]).

* Abdulla Ayyad, Pedro Silva, and Mohamad Chehadeh contributed equallyto this work.

been of interest in the controls community. Maintaining per-formance and safety are some of the challenges that areoften tackled in adaptive control research. A unified definitionfor adaptive control has always been a topic of discussionin the controls community, but we found the one from [1]suitable and covers most relevant research. In [1], an adaptivecontroller is "a controller with adjustable parameters anda mechanism for adjusting the parameters". In this sense,adaptive controllers are of many different types and can extendto very complex formulations. In this paper we build on anovel technique [2] that uses deep neural network (DNN) andthe Modified Relay Feedback Test (MRFT) [3] to identifyunknown process parameters. Specifically, this paper extendsthe approach suggested in [2] to identify side motion dynam-ics of a symmetric multirotor vertical take-off and landing(VTOL) unmanned aerial vehicle (UAV) (in this documentreferred to as multirotor UAVs) which is under-actuated andhas modeled process dynamics of relative degree five inaddition to time delay. We show that our two-stage adaptivescheme can identify process parameters in real-time with highaccuracy (first stage) and then suggest optimal controller gainsbased on the identified system parameters (second stage). Wedemonstrate that using our approach, a multirotor UAV cantake-off without any pre-tuned controller gains and find theoptimal controller parameters in-flight. To the best of ourknowledge, this is the first adaptive controller that is capableof performing a takeoff and reach optimal controllers withoutinitial stabilizing controller gains.

Such demonstrated capability can be a game changer in theUAV industry as it shortens development time and cost, andexpands the accessibility of UAV technology. For example, itbenefits both the hobbyists community and enterprises thatrequire custom UAV solutions by enabling safe and high-performance operation of custom built models in the shortestpossible time. Additionally, the presented take-off and self-tuning approaches can be used in more advanced applica-tions that requires real-time control gains adaptation whileguaranteeing stability limits. A video demonstration of thepresented approach applied to multiple multirotor UAVs thatshows robustness in identification phase and high performancein the control phase can be found in [4].

arX

iv:2

010.

0264

5v1

[ee

ss.S

Y]

6 O

ct 2

020

https://orcid.org/0000-0002-3006-2320

https://orcid.org/0000-0002-2598-7695

https://orcid.org/0000-0002-9430-3349

https://orcid.org/0000-0003-1647-8546

https://orcid.org/0000-0001-8299-6021

https://orcid.org/0000-0003-4978-614X

https://orcid.org/0000-0003-4331-7254

2

A. Relation to Existing Adaptive Control Approaches

Adaptive control approaches are broad in nature and study-ing the relation of this work to all of them is not feasible.Rather, we focus our study on adaptive approaches appliedto UAVs experimentally. Yet, for completeness, we chose toconsider in our literature system identification methods thatoutputs an identified model in a form suitable for controllerdesign as long as this identification was demonstrated exper-imentally. We further limit our scope of review by excludingadaptive methods that deal with very specific cases; e.g.adapting to thrust coefficient change due to ground effect,change of lift force due to a propulsion fault, weight imbalanceacross a single axis, etc. One of the earliest approacheswhich demonstrated great success is iterative learning control(ILC) [5]. ILC tunes a feedforward law that compensates forrepeatable model uncertainties. ILC requires a high numberof experimental iterations and hence cannot adapt in real-time. Additionally, the feedforward compensation techniquemight suffer from severely degraded performance under un-seen external disturbances or changes of model parameters.The approach presented in [6] utilizes Gaussian processes withBayesian optimization to learn feedback control parametersfor the transnational control channels. This approach requiresa lot of iterations to converge and hence cannot be appliedto real-time applications. Deep model-based reinforcementlearning (RL) was used to adapt a RL based control policyfrom experimentation [7]. This method requires excessiveexperimental data, is computationally expensive, and does notprovide stability guarantees. State space sampling explorationtechniques through deep learning [8], and apprenticeship learn-ing [9] were used to fine-tune and improve the performance ofouter loop controllers. These techniques require an abundanceof experimental data and offline computation.

Other tuning approaches based on relay methods have beenapplied in practice. Recent work in [10] shows near-optimalattitude loops controller tuning based on MRFT. Though thistuning method can run in real-time, its tuning performancedegrades in the presence of biases in the system [2]. Anotherrelay based tuning method that uses relay feedback test (RFT)in a cascaded arrangement is presented in [11]. The tuningassumes first order plus time delay (FOPTD) model and wasonly performed on a testbed. Heuristic and model-free ap-proaches were widely investigated in literature [12], [13], [14],[15] but their tuning time is generally large (a few minutes atleast) and there is no guarantee of optimality of the achievedcontroller gains. Few other methods of UAV tuning are basedon experimental system identification. In [16], frequency-domain identification using an adaptive genetic algorithmwas performed on an unmanned helicopter. The identificationmethod requires a fair amount of flight data, which in turnrequires a pre-processing stage that includes human expertise.Similar drawbacks are present in the approach used by [17]where UAV models based on fuzzified eigensystem realizationalgorithm were identified.

A common limitation of all reviewed approaches is that theyrequire a stabilized system to begin with. This is usually donethrough an extensive trial and error process or initial rough tun-

ing based on pre-measured physical parameters. This leads to aprolonged development time and increased cost especially forlarger UAVs. Also, most of the presented approaches can beexclusively used either to adapt attitude and attitude rate loopsgains (inner loops), or outer loops gains. The literature lacksa unified robust approach for tuning of the inner and outerloops of multirotor UAVs. Another limitation specific to data-driven approaches like ILC, state-space sampling approaches,and other identification methods widely adopted in literature[18], [19], [20] is that tuning performance is dependent onhow data is generated. Data generation for these adaptationtechniques has its own complexities and requires an experthuman to perform.

Our proposed approach uses MRFT, which can be con-sidered as an extension of the widely used RFT, to excitea certain system response. This system response is fed to aDNN that is able to infer system parameters. Therefore, werefer to the approach presented in this paper by DNN-MRFT.DNN-MRFT is the most appealing compared to the otherrelevant adaptive approaches described in literature due toits stability guarantees, its minimal data requirements, and itscomputational efficiency which enables it application in real-time. DNN-MRFT provides additional benefit in that it resultsin accurate identification of model parameters, permitting thedesign of controllers other than PID. Thus DNN-MRFT can bealso considered as a system identification method. The DNNis only trained on simulation data which greatly simplifies theidentification algorithm design process.

B. Contributions

DNN-MRFT provides a unified approach for the identifi-cation of a linear system parameters. It was first introducedand applied to a second order with integrator plus time delay(SOIPTD) system depicting multirotor UAV attitude dynamics[2]. The first contribution of the present paper is in theextension of the DNN-MRFT approach to multirotor UAV sidemotion dynamics. These dynamics have a relative degree offive with time delay, which requires a different treatment thanSOIPTD model presented in our previous work [2]. The exten-sion of DNN-MRFT to higher order dynamics in a hierarchicalfashion is presented in details, where inner loop dynamicsare considered in the identification of outer loop dynamics.This hierarchical approach to identification through DNN-MRFT can be repeated to generalize even for higher ordersystems. The second contribution is an exact identification ofopen-loop system gain utilizing simulated knowledge of theconsidered systems. This mitigates amplitude inaccuracy dueto the Describing Function (DF) method’s low pass filteringassumption. In literature, the Locus of the Perturbed RelaySystem (LPRS) [21] was suggested as an exact descriptionof discontinuous systems. The proposed amplitude scalingtechnique in this paper is simpler, and can be directly used incontroller tuning. The third contribution in this paper is thatwe have developed an algorithm based on the DNN-MRFThierarchical identification approach that allows a multirotorUAV to takeoff without any initial controller parameters andperform identification and tuning safely for all control loops.

3

This is tested experimentally on multiple sizes and designsof multirotor UAVs. The optimality of controller tuning isdemonstrated by achieving trajectory tracking performanceon a par with the state-of-the-art. Using the DNN-MRFTidentification results, the multirotor UAV can pass through avertical window without the need of any modification to thecontroller structure or gains. To the best of our knowledge,this is the only adaptive scheme that can take-off a multirotorUAV with zero initial gains and achieve a feedback controllerthat can perform such aggressive maneuvers.

C. Paper Outline

This paper is organized as follows: aspects related to dy-namics modeling and relevant assumptions are discussed inSection II. The design of MRFT parameters through findingthe distinguishing phase for inner and outer feedback loopswith a generalized identification approach is demonstrated inSection III. The process of discretizing the model parameterspace into a finite set of representative processes is described inSection IV. The DNN model development and the generationof training data through simulation is discussed in Section V.A modified method for finding exact system gain that mitigatesthe DF approximation is presented in Section VI. The designof an empirical controller that can perform safe take-off,identification, and tuning of optimal controllers is shown inSection VII. Finally, extensive simulation and experimentalresults which demonstrates state-of-the-art performance andadaptation robustness are presented in Section VIII.

II. MODELLING OF DYNAMICS

In this work, we define the inertial frame FI to be earth-fixed right-handed reference frame with zI(+) pointing up-wards. The right-handed body reference frame FB is attachedto the multirotor UAV center of mass, with zB(+) perpen-dicular to the body upper surface, and is always aligned withits attitude and heading angles. For convenience, we defineanother body attached reference frame FH that is always yaw-aligned with FI . A rotation matrix used to transform betweenreference frames is denoted by T

SR, where T is the targetreference frame and S is the source one.

A. Modeling of Attitude and Altitude Dynamics

The approach of DNN-MRFT was previously applied to aSOIPTD system and demonstrated in accurate identificationresults [2]. Attitude and altitude share the same model struc-ture but each have different model parameters. Attitude andaltitude loops are modeled as [10]:

Ginner(s) =Keqe

−τs

s(Tprops+ 1)(T1s+ 1)(1)

A more detailed representation of these dynamics with PDfeedback control can be seen in Fig. 1. The linear dynamicsin Eq. (1) relate motor commands sent by the flight controllerto the observed roll, pitch, or altitude. Note that the time delayin the numerator represents the overall time delay in the systemwhich consists of electronic speed controller (ESC), processor,

communication and sensor delays. The nonlinearity of thesystem is mainly exhibited by the change in the value of theparameter T1 as a function of rotational velocity (for attitudedynamics), or translational velocity (for altitude dynamics)representing nonlinear drag dynamics. The assumption thatsuch a drag effect, caused by air inflow, blade flapping, andbody drag, can be considered constant works well in practiceand was analyzed in detail by [22], [23]. Propulsion systems,consisting of electronic speed controllers ESCs and motors, areassumed to provide linear response of thrust function of ESCcommand; and hence, Keq can be considered constant. Frombench propulsion system tests similar to the ones performedin [24], it can be concluded that Tprop is constant acrossthe whole operating range except when the rotational speedof the motor is very low. In practice, we avoid operatingin this non-linear range by enforcing appropriate minimummotor command. Additionally, network communication andprocessing delays are almost constant (i.e. have small variancein delay value), permitting us to consider the time delay τ asa constant. The considered attitude and altitude dynamics aresubject to measurement noise ℵ and forced bias u0 due toexternal disturbances such as gravity, sensor bias, unmatchedpropulsion thrust, or model asymmetry.

The coupled dynamics of rotational motion due to thegyroscopic effect is assumed to be negligible. This is becausein the operational limits we are interested in, the torquesgenerated due to gyroscopic effects are considerably smallerthan the torques contributed by other dynamics of the system[10]. Thus the assumption of single input single output (SISO)system dynamics for every rotational control loop is valid.Coupling of rotational dynamics can also occur due to otherreasons like sensor misplacement, asymmetric center of mass,etc. Care was taken to minimize such effects when preparingthe experimental setup.

B. Modeling of Side Motion Dynamics

Multirotor UAVs are underactuated due to the fact thatmovements in the xB(+) and yB(+) due to actuator actionare not possible. But side movements in the inertial frame FIare possible and can be approximated to be linear for smallattitude angles. The linearized side motion dynamics are givenby:

Gouter(s) =Keqe

−τs

s2(Tprops+ 1)(T1s+ 1)(T2s+ 1)(2)

Here we assume a linear drag term T2 that describes airresistance acting on the body of the multirotor UAV frame dueto translational motion. The assumption of linearity works wellin practice for small angles due to the fact that translationaldrag terms remain similar, and altitude loss due to thrustvector change is negligible (i.e. assume a nominal value ofthrust provided by the motors). Overall system dynamics withfeedback control design for small angles are shown in Fig. 2.Note that the attitude (inner loop) model parameters directlyaffect the performance of the outer loop dynamics.

In cases where aggressive maneuvers require large rotationangles, we present a different treatment for the generation of

4

Fig. 1: The generic model used for attitude and altitude dynamics under PD feedback control. T1 represents generic drag term.

Fig. 2: The generic model used for side motion dynamics (xI and yI ) under PD feedback control. T2 represents drag termdue to translational velocity component.

reference angles and thrust. This is achieved by introducingan intermediate feedback linearization step. We keep theouter-loop controller structure and gains the same except thatwe scale the summed controller output by a factor γ. Thisscaling factor represents the ratio between the amplitude h inradians of the outer-loop MRFT (presented in Eq. (8)) and thehorizontal force generated at this angle; and can be formulatedas:

γ =arccos(h)Υavg

h(3)

Where Υavg is the average thrust generated by all motors ofa UAV during MRFT. After applying the scaling factor, weget ~Fdes instead of φdes, θdes, and Uz . On the other hand, thegenerated inertial forces at current system state are given by:

~F =

FxFyFz

= HBR

00Υ

−m0

0g

(4)

where Υ is the summed common-mode generated thrustproduced by all propellers in N . Note that Υ is unobservableand so we need to estimate it. We use Uz to estimate Υby simplifying the model in this case to neglect actuator

dynamics. It follows that we need to estimate φdes, θdes, andΥdes from ~Fdes found earlier. We solve by using:

Υ = ~Fdes · zB(+) (5)

and: φerrθerr0

= E(fIRAI R

T ) (6)

where fIR is the rotation matrix corresponding to ~Fdes pointingdirection, and E is a function that gets Euler angles fromrotation matrix while handling singularities appropriately. Notethat in Eq. (6) yaw error is constrained to zero for simplicityof the solution. This feedback linearization solution is approx-imate due to actuator dynamics simplifications and the use of aconstant value for γ. We found the performance of this partialsolution satisfactory in simulation and experimentation.

C. Yaw Dynamics

Rotation around zB axis results in change of the yaw angle.Yaw has second order dynamics and is given by:

Gyaw(s) =Keqe

−τs

s(Tprops+ 1)(7)

5

Because yaw controller is easy to tune due to the small delayvalue and the presence of full state measurements, we assumethat a controller with satisfactory performance exists prior tothe flight.

D. Bounds of Considered Model Parameters

The identification method presented in this paper requiresthe considered model parameters to be bounded. This wouldlimit the amount of data and labels to be handled by theDNN classifier presented later. In this work, we considercommonly used multirotor UAV designs ranging from smallracing quadrotors to larger multirotors with take-off weight ofup to approximately 50Kgs. The selection of the parametersdomain was based both on experimental findings of previouswork in the literature [25], [10], [23], [24], in addition tomodeling equations like those discussed in [10], [23], [22],[26]. It is worth noting that the identification performance isnot sensitive to the selection of the parameters’ range, rather,the selection of parameter bounds can be safely expandedto include UAV designs beyond the specified ranges. Suchexpansion would be at the cost of increased simulation andDNN training times. The selected bounds of model parametersfor the considered control loops can be found in Table I.

III. MRFT AND IDENTIFICATION APPROACH

A. The Modified Relay Feedback Test

DNN-MRFT relies on exciting certain system responseusing MRFT as a controller. MRFT is an algorithm that canexcite self-sustained oscillations at a specific phase ψ, and isrealized by the following equation [3]:

uM (t) =h : e(t) ≥ b1 ∨ (e(t) > −b2 ∧ uM (t−) = h)

−h : e(t) ≤ −b2 ∨ (e(t) < b1 ∧ uM (t−) = −h)

(8)

where b1 = −βemin and b2 = βemax. emax > 0 and emin <0 are respectively the last maximum and minimum values ofthe error signal after crossing the zero level; and uM (t−) =limε→0+uM (t− ε) is the previous control signal. Prior to thestart of MRFT, the maximum and minimum error values areset as: emax = emin = 0. β is a constant parameter thatdictates the phase of the excited oscillations as:

ϕ = arcsin (β) (9)

Using the DF method, it could be shown that the MRFTachieves oscillations at a specified phase angle by satisfyingthe Harmonic Balance (HB) equation [27]:

Nd(a0)G(jΩ0) = −1 (10)

The DF of MRFT is presented in [3] as:

Nd(a0) =4h

πa0(√

1− β2 − jβ) (11)

The DF method provides an approximate solution that isvalid only if G(s) has sufficient low pass filtering properties.It is worth mentioning that the MRFT control signal uM (t)has a phase lead relative to the error signal e(t) in the case ofβ < 0, and lags in the case of β > 0. The MRFT DF intersectsthe Nyquist plot in the second quadrant for β < 0; while thisintersection occurs in the third quadrant when β > 0. TheRelay Feedback Test (RFT) [28] could be thought of as aspecial case of the MRFT algorithm where β = 0.

B. The Distinguishing Phase

The idea of distinguishing phase is based on the suppositionthat the optimal phase angle at which the test oscillationsare generated and which is obtained through the design ofoptimal tuning rules [29], [3], [10] would reveal the mostdistinguishing characteristics of the considered processes do-main. In a previous work [2], we showed that for an LTIsystem G(s) with known model structure and unknown setof bounded model parameters D there exists a distinguishingphase ϕd at which the characteristics of the self-excitedoscillations induced by the MRFT can be used to identifythe corresponding processes in D. The distinguishing phaseϕd can be determined by the process of designing optimalnon-parametric tuning rules as outlined in [3], [29]. Note thatMRFT parameter β is related to the distinguishing phase byEq. (9). Algorithm 1 summarizes the steps taken to find thevalue of ϕd.

Algorithm 1: Finding distinguishing phase throughoptimal non-parametric tuning rules design

INPUT: (G(s),D) - Model Structure, ParametersDomainOUTPUT: ϕd - Distinguishing Phase

1: Discretize the desired parameters subspace D to obtainD;

2: Select phase margin or gain margin tuning specifications;3: Find the set of locally optimal tuning rules ∆ for every

process in D;4: Apply every optimal tuning rule in ∆ to all other

processes in D and get the set Σ corresponding to thevalue of the worst performance deterioration of everyprocess in D due to the application of the non-optimaltuning rule ;

5: Select the tuning rule from ∆ that corresponds to theleast worst deterioration value from Σ as the globallyoptimum tuning rule ∆∗;

6: Compute ϕd from β ∈ ∆∗;

C. Identification Approach

The MRFT can excite test oscillations only in the rangeof [-270 deg, -90 deg] of the plant phase response. This isdue to the fact that MRFT has a DF for which the negativereciprocal exists only in the second and the third quadrants ofthe complex plane. As a result, for high relative degree systems

6

TABLE I: The model parameters’ ranges for all the feedback loops considered in this paper

Feedback Loop Parameters Domain

Attitude Datt := (Tattprop, T

att1 , τatt) : 0.015 ≤ Tatt

prop ≤ 0.3, 0.2 ≤ Tatt1 ≤ 2, 0.0005 ≤ τatt ≤ 0.1

Altitude Dalt := (Taltprop, T

alt1 , τalt) : 0.015 ≤ Talt

prop ≤ 0.3, 0.2 ≤ Talt1 ≤ 2, 0.0005 ≤ τalt ≤ 0.1

Side Dside := (T2, τside) : 0.015 ≤ T2 ≤ 0.3, 0.0005 ≤ τside ≤ 0.1

the generated test oscillations have values of a0 and Ω0 fromEq. (10) that may not be practically useful. This is a reflectionof the expedience of using a cascade controller arrangement,which would in turn require to organize the MRFT testsin each loop separately to tune each controller. It wouldalso eliminate the indicated problem of the test oscillationspossibly being of low frequency and high amplitude. For thisreason, the considered high order LTI system must be splitinto a composition set GHO := G1, G2, ..., GM where G1

represents the sub-system with smallest relative degree withrespect to the control command. The iterative design methodrequired to generate the feedback structure and hence requiredset of distinguishing phases for higher order LTI processes canbe found in Algorithm 2. Note that Algorithm 2 assumes thatinner feedback loops are observable and controllable.

Algorithm 2: Generating cascaded feedback structureINPUT: (GHO(s), D) - Model Structure, ParametersDomainOUTPUT: GFB - Resulting Feedback Structure

1: Split GHO to G1, G2, ..., GM based on M observableoutputs with unity open loop gain

2: Gres ← 13: for i=1,...,M-1 do4: Gres ← GresGi5: Vi ← All processes at the vertices of D parameters in

Gres6: (a0,Ω0)←MRFT (Gi+1Vi, ϕd) from Eq. (8)7: if ReNd(a0)Gi+1(jΩ0)Vi(jΩ0) ≥ 0 or Impractical

a0 or Ω0 values then8: C∗ ← Find optimal controller of Gres9: Gres ← Feedback(C∗,Gres)

10: end if11: end for12: Gres ← GresGM13: C∗ ← Find optimal controller of Gres14: GFB ← Feedback(C∗,Gres)

For the particular case presented in this paper, we obtaintwo cascaded feedback loops by applying Algorithm 2 to theside motion model in Eq. (2). From Algorithm 2 line 1 weget:

GHO(s)→ G1, G2, G3, G4 =

e−τimus

(Tbodys+ 1)(Tprops+ 1),

1

s,

e−τposs

(Tsides+ 1),

1

s

(12)

Where G1 represents attitude rate dynamics, G1G2 represents

attitude dynamics given in Eq. (1), G1G2G3 represents sidemotion velocity dynamics, and Gtot = G1G2G3G4 representsside motion dynamics given in Eq. (2). The condition in Al-gorithm 2 at line 7 is met only when i = 2 for multirotor UAVside motion case which will result in two cascaded feedbackloops shown in Fig. 2. Also, this means that we will end upwith a set of distinguishing phases; one distinguishing phaseto reveal the inner loop attitude dynamics and a distinguishingphase for every process in Datt to reveal outer loop positiondynamics. Note that the distinguishing phase of the particularouter loop system depends on the inner closed-loop dynamics.Thus prior to outer loop identification, the parameters ofthe inner loop dynamics, and the optimal controller for theinner loop have to be identified first. The overall identificationscheme used in this paper is shown in Fig. 3. It is importantto note that this approach is generic and can be applied tohigher order LTI systems as long as the distinguishing phasecorresponds to second or third quadrants in the complex plane.

IV. GENERATING REPRESENTATIVE PROCESSES

This section describes the steps undertaken to discretizethe model parameter subspace D into a discretized set ofrepresentative processes D := G1, G2, ..., GN that capturethe main dynamics of the full range of parameters shown inTable I. The discretization of D enables tackling parametricidentification as a classification problem, where a classifiermaps a process under test to the most appropriate process inD. The recognition of representative processes serves severalobjectives in the DNN-MRFT approach. First, it alleviates theneed of an online real-time controller optimizer as optimalcontrollers are designed offline for all processes in D. Second,knowledge of the dynamics of D is exploited to identify exactprocess gains as later explained in section VI. The third objec-tive of discretizing D is providing a measure of discrepancybetween all representative processes that correspond to thecontroller auto-tuning objective. This measure of discrepancyis utilized for the training of DNN-classifiers as discussed indetail in section V. These advantages outweigh the marginalloss of accuracy resulting from the discretization process,which is shown to be negligible by the results obtained inTable V.

A proper criteria must be defined for the discretization ofD that establishes sufficient guarantees on the performance ofsystem identification and controller auto-tuning without sacri-ficing the distinguishability of the discretized processes. Forinstance, an equispaced discretization with a small partitioningdistance would generate an overly-discretized D with an im-balanced representation of the frequency response characteris-

7

Fig. 3: The identification scheme used for multirotor UAV side motion dynamics. Only inner part of the identification isapplicable to altitude dynamics identification. Note that there is only one switch S in this identification scheme. The identificationstarts with S at 1 (and at 1∗ for altitude). Once enough data are pre-processed from steady-state MRFT response of innerloops, system identification is performed by the inner loops DNN. Once the inner loop systems are identified, an appropriatecontroller for each control loop is designed and S switches to 2. Note that the outer DNN structure and weights, and theMRFT β parameter are all selected based on the identified inner loop model parameters and designed controller. Once enoughdata are pre-processed from steady-state MRFT response of outer loops, system identification is performed by the outer loopsDNN which is immediately followed by controller tuning. S switches to 3 and the system is controlled optimally.

tics of D; this in turn introduces undesirable biases in trainingthe DNN classifiers. Alternatively, a very large partitioningdistance generates substantial discretization errors and doesnot guarantee proper performance margins for the optimalcontrollers designed offline for D. As our objective is auto-tuning controller parameters, for the criterion of discretization,we adopt the concept of controller performance deteriorationused for the system identification approach presented in [2].Given a performance index Q that quantifies errors resultingfrom a closed loop application of controller C to a processG, the controller performance deterioration Jij between twodynamic processes Gi, Gj is defined as:

Jij =Q(C∗i , Gj)−Q(C∗j , Gj)

Q(C∗j , Gj)× 100% (13)

where Jij represents the relative degradation in performancein terms of Q(C∗j , Gj) when the optimal controller of Gjis replaced by that of Gi. It must be noted that the aboveformulation of the controller deterioration is non-commutative,that is Jij 6= Jji. Therefore, the joint cost function Jmax(ij) =maxJij , Jji is used as the discretization criteria in theremainder of this paper. Additionally, the design of optimalcontrollers is limited to a PD structure with a minimum phase

margin constraint imposed to the controller optimization prob-lem. The performance index Q used for controller synthesis isthe conventional ISE criterion applied to a unit step response,and is given by:

QISE(C,G) =1

Ts

∫ Ts

0

e(t)2dt (14)

Following the criterion in Eq. (13), discretization is per-formed such that adjacent processes in D achieve a targetjoint cost J∗ within an admissible tolerance value. We firstdiscretize the three-dimensional altitude and attitude parameterspaces. For computational efficiency, we follow the discretiza-tion procedure explained in [2]; where discretization is firstperformed on a hemispherical hyper-surface S of the modelparameter subspace, then followed by subsequent scalingthe discretized S. Based on the bounds in Table I and thespecifications in Table II, discretization of Dalt and Datt yielda total number of Nalt = 208 and Natt = 48 representativeprocesses respectively.

The discretization of Dside presents additional complexitiesas the outer-loop response does not only depend on the modelparameters in Dside, but also on the inner loop process andcontroller as shown in Fig. 2. To address this complexity,a recursive approach is implemented where a different set

8

TABLE II: Specifications for the process of discretizing theparameter space D

Target joint cost J∗ 10%

Admissible tolerance 3%

Minimum phase margin φmconstraint

20

Optimization algorithm forcontroller design

Nelder-Mean simplex algorithm

of discretized outer loop model parameters Dside is definedfor each inner loop representative process Gatt ∈ Datt. Thisrecursive discretization procedure is summarized in Algorithm3 and a visual illustration of this process is presented in Fig.4. The discretization of Dside is performed independently forT2 and τside; it must be noted however that the sensitivity ofJij with respect to one parameter depends on the value of theother. We found that changes in T2 caused larger changes inJij when τside is largest. Conversely, changes to τside causedlarger changes in Jij when T2 is smallest. Spacing of thediscretization was based on the most sensitive Jij to changesin the parameters space which resulted in a slightly over-discretized D but guarantees that the cost between adjacentprocesses does not exceed J∗.

Algorithm 3: Identifying key processes for side motiondynamics

INPUT: (Datt, Dside, J∗) - Key processes for inner

loop dynamics, parameters domain of outer loopdynamics, target joint costOUTPUT: (Dside, C

∗side) - Set of key processes for

outer loop dynamics, Lookup table of outer loopoptimal controller parameters

1: for all Gatt,i ∈ Datt do2: Identify inner loop optimal controller C∗att,i;3: Utilizing C∗att,i, Gatt,i, discretize Dside into Dside,i

based on J∗;4: ϕd,i ←

GetDistinguishingPhase(C∗att,i, Gatt,i, Dside) fromAlgorithm 1

5: for all Gside,ij ∈ Dside,i do6: Identify outer loop optimal controller C∗side,ij ;7: C∗side,i ← C∗side,i ∪ C∗side,ij ;8: end for9: Dside ← Dside ∪ Dside,i;

10: C∗side ← C∗side ∪ C∗side,i;11: end for

V. DATA GENERATION AND DEEP NEURAL NETWORKMODEL TRAINING

The deep neural network component of the DNN-MRFTapproach provides a mapping from the MRFT response of theunknown process to the best representative process in D. Thismapping is denoted by Γ : X → D; where X ∈ R2×ns is avector concatenating ns samples of the controller output and

process variable of the MRFT response. We have previouslydemonstrated the appropriateness of DNN for the systemidentification task in [2], where a single network was utilizedfor the identification of attitude and altitude model parameters.In this section, we build upon our previous results and presenta multi-network solution for the full identification of UAVdynamics.

The classification outputs generated in Section IV fall intothree sets of model parameters, which requires three differentmappings to be solved. We train a unique DNN classifierfor each of these mappings. One challenge however is thedependency of the outer loop system response on the innerloop dynamics, which results in multiple variations of Dside

as demonstrated in Section IV. Similarly, different inner-loopprocesses would result in a different distinguishing phase forthe outer-loop model parameters, which in turn alters thecriteria for generating the DNN input vector X . Changes inthe classifier’s input and output layer due to the inter-loopdependencies make the utilization of a single DNN network forthe identification of side-motion model parameters impractical.Rather, we employ Natt = 48 DNN networks for the outer-loop identification problem, each assuming a specific inner-loop process Gatt ∈ Datt. In total, 50 DNNs are trained: onefor altitude dynamics, one for attitude dynamics, and 48 forside-motion dynamics.

Training data for the classification problem was generated insimulation for all member processes in D. For each Gi ∈ D,the MRFT response with parameter β set to the correspondingdistinguishing phase was simulated 30 times with randomlyvaried measurement noise ℵ and input biases u0 to generatethe DNN training set. The incorporation of imperfectionslike u0 and ℵ prompts regularization and generalization tovaried experimental conditions during the training process[30]. The maximum value of u0 was constrained to half therelay amplitude h of the MRFT controller as a reasonablebias magnitude in practical settings. A validation set wasalso generated in a similar manner for hyper-parameter tuningand evaluation purposes. The validation set consist of 15simulations per candidate process. The DNN input vector Xis obtained by processing the MRFT response according to thefollowing steps: sampling adjustment, cropping, zero-padding,amplitude normalization, and concatenation. The size of theinput vector X is determined by the slowest MRFT responsewithin the corresponding parameter set D. Fig. 5 illustratesour full pipeline of UAV system identification and controllertuning using deep neural network.

All the developed DNN models follow the same architec-ture shown in Fig. 5, where sequences of fully-connectedlayers and activation functions are concatenated. Dropout andbatch normalization are applied to the outputs of each fully-connected layer to avoid over-fitting and accelerate the trainingprocess [31], [32]. After the final fully-connected layer, weutilize the cost-augmented soft-max formulation introducedin [2], which exhibited performance improvements over theconventional soft-max formulation for system identificationtasks due to introducing meaningful discrepancies to the cost

9

Fig. 4: The full discritization scheme for identifying key processes in the parameter space. (a) Inner loop dynamics are firstdiscretized into Datt according to the principle of controller performance deterioration. (b) For each member process of Datt,a different set of discretized outer loop model parameters is identified. The output of the process would be Natt sets of outerloop discrete processes.

of miss-classification. The augmented formulation is given by:

pi =e(1+JiT )·ai∑Nj=1 e

(1+JiT )·aj(15)

where the controller deterioration joint cost JiT is utilized asthe measure of discrepancy between the DNN prediction andthe ground truth model parameters GT . Cross-entropy is thenutilized as the loss function for training the DNN models.

We utilized the ADAM optimization algorithm for trainingas it is a well-established algorithm with proven advantages interms of convergence speeds and robustness to noisy gradients[33], [34]. We implemented an automated search approach todetermine the best network size and set of hyper-parametersfor each of the developed 50 DNN models. The variablesincluded in the search process along with their correspondingsearch space are shown in Table III. For each classificationtask, the network that performed best on the validation setwas selected as the preeminent DNN model.

TABLE III: Search space of DNN structure and hyper-parameter optimization process

Parameter Search Space

Number of layers 1, 2, 3

Neurons per layer 50, 100, 1000, 3000

Activation function ReLU, tanh

Base learning rate 0.005

Gradient decay factor 0.9

Gradient decay factor 0.999

VI. IDENTIFICATION OF EXACT PROCESS GAIN

The information contained in the MRFT response of thesystem are embedded within the generated oscillations in threeforms: frequency, amplitude, and shape. The DNN classifieronly utilizes the frequency and shape of the oscillation toidentify model parameters as the amplitude is normalized toone during pre-processing. The importance of the shape of the

oscillation emphasizes the fact that the MRFT excites multiplefrequencies of the linear system. The periodic components ofthe self-excited oscillation can be given by the Fourier series[27]:

y(f) =

∞∑n=0

ancos(nf) + bnsin(nf) (16)

where an and bn are Fourier series coefficients. For an oddsymmetric nonlinearity (note that MRFT switching at steadystate resembles a hysteresis relay), coefficients for even valuesof n and other odd harmonics exist. The DF solution, presentedin Eq. (11) for the MRFT, accounts for the first order harmoniconly. The amplitude of the harmonics depend on the lowpass filtering properties for every process in D. Therefore,if we have identified process parameters experimentally, wecan use such knowledge to predict exact system amplituderesponse. Exact analytical solution of Lure systems can beprovided by the LPRS method [21], [35], or Tsypkin’s method[36]. To achieve exactness and real-time capability, we simplyintroduce a scaling coefficient ζ that provides exact systemgain for every system in D and make these values availablein a look-up table. The values of ζ are found by simulatingMRFT control with each system in D. In simulation, weuse the same MRFT implementation used experimentally andmeasure the system steady-state response amplitude to findζ. During the DNN-MRFT identification phase, the proper ζvalue is selected from the look-up table and is used to scale theidentified controller parameters as shown in Fig. 5. The resultsin Table V show the improvement in controller performanceon a simulated test set due to the use of identified exact gainscale compared to amplitude reported by the DF method. Notethat the DF method uses ζ = 4

π for all processes.

VII. DESIGN OF TAKE-OFF CONTROLLER

Using the trained DNN and MRFT, the UAV can takeoffand immediately identify UAV dynamic parameters. MRFTparameter β corresponds to the distinguishing phase and wasfound in Section III-B. MRFT parameter h needs to bedesigned such that it provides adequate amplitude response

10

Fig. 5: The full DNN-MRFT pipeline showing the steps of obtaining and pre-processing the MRFT respponse followed bysystem identification and controller synthesis. (a) The process’s MRFT response is obtained at the distinguishing phase andthe sampling time is adjusted to be 1ms. (b) One cycle of the steady-state oscillation is selected, zero-padding is appliedelsewhere. (c) The response is zero-centered and scaled to a unity amplitude. (d) PV and u are concatenated to form the DNNinput vector X . (e) The DNN network corresponding the the proper control channel is selected and used to predict the modelParameters G(s). The DNN structure consists of a sequence of fully-connected layers and activation functions. (f) From alookup table, the gain-normalized optimal controller parameters and the exact gain scaling coefficient ζ are found. ζ, h and a0are then used to properly scale the controller parameters.

and robustness against sensor noise and model biases. Biascaused by the gravity makes identification of altitude dynamicsparticularly challenging. The elimination of the gravity biaswithout prior knowledge of a UAV’s total generated thrust andmass requires a generalized controller that can handle the take-off state. For that we use a cascaded switched PID controller asshown in Fig. 3 by adding switching position 1∗ to altitude. Inthe first stage (S is at 1∗), a PI controller is used for take-off:

uz(t) =

Kpez(t) +∫ T0Kiez(t)dt : zI ≤ g + δ

uM (ez(t), h) + uz0 : C(17)

where δ > 0 is a bias factor to compensate for increasedefficiency in take-off due to ground effect, uM is Laplacerepresentation of the MRFT algorithm presented in Eq. (8),uz0 is the last output of the PI controller, and C is a conditionthat is set permanently to true once the condition in the firstline is violated. The first line of Eq. (17) corresponds to 1∗

position of S in Fig. 3, while the second line correspondsto switch position 1. The switching condition aims at mini-mizing the value of the bias present in the MRFT switching,which is perfectly achieved when uz0 produces a thrust thatcauses the UAV to hover. It is not always possible to havea clean measurement of acceleration which was the case inour experimental setup, and therefore we were using position

measurement. Though position measurement is lagged by aphase of π, take-off can be slowed down and a value ofuz0 close to hover thrust can still be achieved. For positionmeasurement case, the condition in the first line of Eq. (17)would be zI ≤ δp instead of zI ≤ g + δ.

Because the presented controllers in Eq. (17) will be appliedfor all multirotor UAVs with the full model parameter’s rangein Table I, suitable values of the take-off controller parametersneed to be designed. The optimization decision variables arethe parameters Kp, Ki, δp, and h (MRFT amplitude) presentin Eq. (17). A cost function has been designed to address theoptimization of these values:

Jbias =

√( thth+tl

− tb0)2

t2b0

Jtime =

0 : tr < tr0√

(tr−tr0)2t2r0

: tr ≥ tr0

Jamp =

0 : ar < ar0√

(ar−ar0)2a2r0

: ar ≥ ar0Jtot =Jbias + Jtime + Jamp

(18)

where Jbias, Jtime, Jamp are the costs associated with biasin relay, system rise-time, and excited process amplituderespectively. th is the duration MRFT switches high, tl is the

11

duration MRFT switches low, and tb0 = 0.5 is a constant thatcorresponds to the case when MRFT switching is symmetric,i.e. th = tl → tb0 = th

th+tl. The value of tr is the time it

takes to reach 90% of desired altitude from take-off (take-offis defined as passing 2cm altitude), and tr0 corresponds to thedesired maximum rise time which we chose to be 5s. The valueof ar corresponds to the steady-state amplitude of the self-excited oscillation due to MRFT. The value ar0 correspondsto the desired maximum MRFT amplitude response and waschosen to be 0.3m. The selection of cost function inputparameters reflects essential practical requirements of the auto-tuner. We found that a severely biased relay might force motorsto function near their operational extremes. A long take-offtime is not desired and can be dangerous due to the fact thatat take-off, MRFT is also running on roll and pitch whererotor tips might hit the floor. The last risk accounted for isassociated with excessively large amplitudes of the response,which might lead to crashes or undesired aggressiveness. Thecollective responses of systems at the vertices of D (actuallyD resembles a cuboid in the system parameters space) wasused to find Jtot. We found that this optimization problem isnon-convex so that multiple initial points were tested. Nelder-Mead simplex algorithm realized by "fminsearch" functionin MATLAB® has been used. The resulted optimal decisionvariables are:

h = 0.10746, Kp = 9.4969× 10−2,

Ki = 9.8754× 10−3, δp = 0.11984 (19)

The responses of systems at the vertices of D to the take-off algorithm with optimal take-off parameters can be seenin Fig. ?? in the appendix, where it can be clearly seen thatall UAV variants are stable and operating within the physicallimits. Note that some systems take very long to start taking-off compared to tr0. This is due to the use of the positionmeasurement for δp instead of acceleration measurement.Though the values presented in Eq. (19) guarantee stability,tuning from take-off can be made faster and smoother withsmaller amplitude of the excited oscillations and faster take-off time. We find this possible with prior knowledge of thepeak-thrust to weight ratio CTW of the UAV (the consideredrange of CTW based on Table I is 1.5 to 5). The value ofCTW is easy to find (i.e. motor datasheet and a weighingscale) which makes the suggested auto-tuner still suitable fornon-experts. The optimization of the take-off parameters canbe run again while using systems in D which satisfies selectedCTW value.

Due to inherent system biases (e.g. weight imbalance, sensormiscalibration, etc.) and due to the fact that the take-offalgorithm can get affected by disturbances external to thesystem (e.g. ground effect, wind, etc.), the amplitude and biasof the system response might become excessive according tothe criteria presented in Eq. (18). Because this might affectidentification accuracy, we designed a simple algorithm thatsucceeds the take-off algorithm and adjusts MRFT amplitudeand bias based on previously excited stable oscillations.

Fig. 6: DNN-MRFT auto-tuning experiment for the inner andouter control loops. (a) MRFT is performed on the inner-loopuntil steady-state oscillations are acquired. The last cycle ispassed to a DNN that predicts model and controller parame-ters. (b) After tuning the inner loop, DNN-MRFT is repeatedfor the identification and tuning of side motion dynamics. (c)The DNN-MRFT identification phase is complete and the UAVis driven back to origin by the auto-tuned controllers.

VIII. SIMULATION AND EXPERIMENTAL RESULTS

This section presents the simulation and experimental eval-uation of the DNN-MRFT approach. Both evaluation methodsfollow the protocol demonstrated in Fig. 3. DNN-MRFT isfirst used to identify altitude and attitude model parame-ters Galt, Gatt and their corresponding optimal controllersC∗alt, C∗att. Then depending on the estimated inner-loopdynamics, outer-loop distinguishing phase and DNN classifiersare selected and applied immediately to side-motion dynamicsto identify Galt parameters and optimal controller C∗side gains.The analysis presented in this section assesses the DNN-MRFT approach for: accuracy and persistence of the paramet-ric identification, the capability of the take-off controller tosuccessfully lift and stabilize a UAV with no prior knowledgeof system dynamics, and the performance of the auto-tunedcontrollers in aggressive trajectory following maneuvers.

Experimental tests were conducted on a variety of UAVdesigns as shown in Table. IV. For clarity, we only presentresults obtained on the QDrone in this section while theremaining experimental results can be found in the appendixand are also presented in the companion video [4]. Optitrack’smotion capture system was used for UAVs localization [37].

A. Persistence and Accuracy of Identification

12

TABLE IV: Specifications of the UAV designs used in exper-imental analysis

QDrone [38] DJI F550 DJI F550 withextended arms

Dimensions(cm)

40× 40× 15 79× 72× 27 111×100×27

Mass (kg) 1.0 2.09 3.38

Moments ofInertia (kgm2)Jxx, Jyy , Jzz

0.010,0.008, 0.015

0.031,0.030, 0.052

0.093,0.089, 0.156

Number of pro-pellers

4 6 6

Processor Intel AeroComputeBoard

Raspberry Pi 3B+

Raspberry Pi 3B+

1) Simulation Results: The objective of simulation analysisis to evaluate the optimality of the DNN-MRFT auto-tunedcontrollers for the full parameters’ range presented in Table Iwith exact knowledge of the ground truth model. Five hundreddifferent model parameter sets were randomly sampled fromD to form a testing set Dtest. For each GT ∈ Dtest, theDNN-MRFT approach predicts a process Gp and a controllerC∗p for both the inner and outer control loops under randomlyvaried conditions of noise ℵ and bias u0. We utilize thecontroller deterioration criterion JpT from Eq. (13) to quantifythe accuracy of Gp estimation. Additionally, the phase marginφm of C∗p (s)GT (s) is presented to assess the robustness ofthe synthesized controller. Average and worst-case results onthe entire testing set are reported in Table V. The worst-casephase margin of the side-motion control loop is reported asthe average of the worst-case prediction of each of the 48outer-loop DNNs. Results are reported with two different gainscaling methods: the first method uses the DF approximationwith ζ = 4

pi to approximate the gain of the unknown system,and the second one uses the exact scaling method described inSection VI. In Table V, the case when the gain is normalized(does not include errors introduced by gain scaling) is alsopresented for comparison.

The results in Table V demonstrate the near-optimal perfor-mance of the DNN-MRFT approach for the full range of D.Average controller deterioration cost for both inner-loop andouter-loop dynamics are near zero, with the worst-case dete-rioration being 15.68% for the side-motion auto-tuning case.Furthermore, our proposed DF gain scaling approach resultsin a generally lower deterioration than the DF approximation,especially when considering worst-case results. These resultsshow an inherent feature of DNN-MRFT: it employs MRFTand DNN to obtain a linear description of the underlyingdynamics, which preserves the useful properties of linearsystems such as the measurable robustness and performancemargins.

2) Experimental Results: This section assess the DNN-MRFT’s experimental performance and persistence for syn-thesizing controller parameters for attitude and side-motiondynamics. Starting from a hovering state, Fig. 6 shows the twostages of the DNN-MRFT auto-tuning procedure. The identi-

fied model parameters were Gatt =: Tprop = 0.0150, T1 =0.2005, τ = 0.0250 and Gside =: T2 = 0.3812, τ = 0.1;with the corresponding ISE optimal PD controllers C∗att =:Kp = 1.72,Kd = 0.15 and C∗side =: Kp = 1.89,Kd =0.56. The online synthesized controllers stabilize the UAVand smoothly drive the UAV to origin point.

To evaluate the persistence of system identification, thesame experiment in Fig. 6 was repeated five times. In allexperiments, the DNN-MRFT approach identifies identicalmodel parameters, which transcribes into a Jcross = 0%controller deterioration joint cost across the identified modelparameters from all experiments. The auto-tuning experimentwas also conducted with an artificial delay of 0.025 secondsadded to the side-motion control loop; and the identifiedoptimal controller parameters being C∗side−delay =: Kp =1.58,Kd = 0.56. The identification results across five rep-etitions of the experiment with added calibrated delay werepersistent with Jcross = 0%. The controller performancedeterioration cost across the two sets of experiments (with andwithout the added delay) was also persistent at Jcross = 3%.

B. Takeoff and Full channel Identification

Following the formulation of the take-off controller inSection VII, we evaluate the capability of DNN-MRFT forfull channel identification and auto-tuning of UAV’s startingfrom a landing state with no prior knowledge of systemdynamics. The take-off and auto-tuning procedure is carriedout in two subsequent stages: inner-loops and outer-loopsidentification stages. In the first stage, DNN-MRFT with thetake-off controller designed in Section VII is applied to thealtitude channel, corresponds to S at position 1∗ in Fig. 3,while placing S at position 1 for all other channels. Once thecondition C in Eq. (17) is met switch S switches to position1 for altitude case as well. Once steady-state oscillationsare detected on each inner-loop, a DNN identifies optimalcontroller parameters that are instantaneously applied to thecorresponding control loop. Once inner-loop identification iscomplete, the switch S switches to position 2 such that DNN-MRFT is carried out in the same manner for the identificationof side-motion channels. Similarly, once steady-state oscil-lations are detected on outer-loops, the corresponding DNNnetwork identifies the dynamics and applies optimal controllerby switching S to position 3.

To avoid excessive sideways drifting during the inner-loopauto-tuning stage, a bounding controller is implemented for theside-motion control loop. This controller alters the referenceattitude as the UAV’s horizontal position exceeds predefinedthresholds ε1, ε2 as formulated in Eq. (20). The boundingbox controller is the summation of a relay with hysteresis(switches at ε1 with amplitude ho1) and a relay with dead-band nonlinearity (switches at ε2 with amplitude ho2). Toguarantee that inner-loop MRFT oscillations reach steady-state, the threshold on position must be large enough suchthat the switching frequency of the bounding controller isconsiderably lower than that of the inner-loop oscillations.

13

TABLE V: Simulation results of the DNN-MRFT approach on 500 randomly selected processes in D

Gain scaling

Criterion AverageJatt

MaximumJatt

Averageφm,att

Minimumφm,att

AverageJside

MaximumJside

Averageφm,side

AverageMinimumφm,side

Normalized gain 0.45% 5.10% 19.65 14.21 -0.19% 4.91% 19.70 17.18

DF gain approximation 1.45% 9.68% 20.04 14.20 2.04% 57.44% 19.46 16.11

Exact gain scaling -1.77% 7.14% 18.41 11.24 -1.73% 15.68% 18.41 14.23

TABLE VI: Full channel controller tuning results on theQDrone.

ControlChannel

Full ChannelIdentification

Single ChannelIdentification

kp kd kp kd

Roll 1.63 0.14 1.72 0.15

Pitch 1.16 0.13 1.36 0.13

Altitude 62.92 9.63 65.05 10.49

x 1.69 0.50 1.85 0.55

y 1.79 0.53 1.89 0.56

ubb(t) =

0 : |e(t)| ≤ ε1 ∧ ubb(t−) = 0

ho1sgn(e(t)) : |e(t)| ≥ ε1 ∧ ubb(t−) = 0

ho1sgn(e(t)) : |e(t)| ≤ ε2 ∧ ubb(t−) 6= 0

(ho1 + ho2)sgn(e(t)) : |e(t)| ≥ ε2(20)

Fig. 9-(a, b) shows the profile of all control channels duringthe full-channel auto-tuning experiment. The take-off con-troller successfully lifts the UAV while the MRFT controllerstabilizes all control-loops. Once the identification phase iscomplete, the synthesized controllers smoothly drive the UAVback to the origin point and hold it at hover. The resultingcontroller parameters for the full channel auto-tuning experi-ment are presented in Table VI along with those obtained fromsingle channel experiments. The similarity between the twosets of controller parameters further indicates the persistenceof DNN-MRFT and validates its capability for full-channelauto-tuning without prior knowledge of system dynamics.

C. Trajectory Tracking Performance

1) Figure Eight Trajectory: We assess the performanceof the auto-tuned controller parameters shown in Table VIon a figure-eight trajectory as a widely used benchmark inliterature. For this purpose, we use the concept of minimum-snap trajectory optimization [39], [40] to design a figure-eight trajectory with a period of 5.5 seconds. To provide areference for evaluation, we also simulate the UAV’s response

Tim

ePr

ofile

0 1 2 3 4 5 6

Time (s)

-1

-0.5

0

0.5

1

x (

m)

X

0 1 2 3 4 5 6

Time (s)

-1

-0.5

0

0.5

1

y (

m)

Y

reference

experiment

simulation

Top

Vie

w

-1 -0.5 0 0.5 1

x (m)

-1.5

-1

-0.5

0

0.5

1

1.5

y (

m)

Reference

experiment

simulation

Fig. 7: Simulation and Experimental results of the DNN-MRFT auto-tuned controller on a figure-eight trajectory.

utilizing the model and controller parameters identified byDNN-MRFT. The full simulation and experimental profile ofthe trajectory following maneuver can be observed in Fig.7 and a quantification of the resultant errors is providedin Table VII. The similarity between the simulation andexperimental profile indicates accuracy in terms of identifiedmodel parameters, which in turn implies the optimality of the

14

TABLE VII: Evaluation of errors in figure-eight trajectoryfollowing experiment.

‖e(t)‖ ‖e(t)‖larm

‖ec(t)‖ ‖ec(t)‖larm

DNN-MRFT (larm =0.28m)

0.36 1.20 0.08 0.28

DNN-MRFT(simulation)

0.31 - 0.12 -

Sim-to-Real (larm =0.09m) [41]

0.47 5.22 - -

auto-tuned controllers.We benchmark DNN-MRFT’s performance on figure-eight

trajectory tracking against the Sim-to-Real reinforcementlearning (S2R) approach [41]. We report S2R results on anidentical path as presented in [41]. Table. VII shows the aver-age euclidean position error ‖e(t)‖ and the average contouring(lateral) errors ‖ec(t)‖ during trajectory following. Due todifferences in the UAV dimensions and its subsequent effectson maneuverability, we also report these errors normalized bythe UAVs’ arm length. The DNN-MRFT approach achievesa lower overall position error of ‖e(t)‖ = 0.36m, andsubstantially reduces the normalized errors when comparedagainst the S2R approach. By comparing ‖e(t)‖ and ‖ec(t)‖,it is apparent that longitudinal errors constitute the maincomponents of the overall position error. Longitudinal errorsmight arise from delays in the system response or limitationof the UAV dynamics, especially since UAV dynamics werenot considered in the trajectory generation process.

Fig. 8: The vertical window passage maneuver: the UAVpasses through the window with a near 90 deg roll angle.

2) Vertical Window Passage: The performance of the DNN-MRFT synthesized controllers have been evaluated for an ag-gressive vertical-window passage maneuver as the one shown

in Fig. 8. We designed a minimum snap trajectory to passthrough a vertically aligned window with a roll angle of90°. The attitude constraint was implemented by enforcinga specific relative value between the x, y, and z accelerationcomponents of the designed trajectory. The target window isof size 0.3 × 0.6m, leaving only 0.1 m of vertical clearanceand 0.075m of horizontal clearance for the QDrone. Given thisclearance and the geometry of QDrone, the minimum possiblewindow passage velocity is 1.45 m/s. This constraint arisesfrom the under-actuated nature of multi-rotor UAVs; whichlimits the attainable vertical thrust to near-zero at high pitchingor rolling angles leaving the UAV at a state of free-falling. Wedesigned two different trajectories to pass through the windowat different speeds of 2.75 m/s and 1.75 m/s, with the secondnearing the geometrical limit of feasible trajectories.

Fig. 9 shows the full state profile during both verticalwindow passage trajectories proceeded by the DNN-MRFTtakeoff and auto-tuning phase. The auto-tuned controllerssuccessfully maneuver the UAV and achieve an almost 90 degrolling angle at the window location. The capability of theauto-tuned controllers to execute this maneuver despite therigid clearance constraints and the low speeds indicates thatthese controllers are indeed near-optimal, and constitute state-of-the-art in terms of real-time multirotor UAV auto-tuning. Tothe best of our knowledge, DNN-MRFT is the first UAV fullauto-tuning approach to successfully perform such aggressivemaneuvers without any prior knowledge of system dynamics.A video demonstration of the DNN-MRFT auto-tuning capa-bility for the three different UAV designs listed in Table IV,and vertical narrow window passage can be found in [4].

IX. CONCLUSION

This paper extended the capabilities of a novel real-timesystem identification approach presented in [2] referred to asDNN-MRFT. DNN-MRFT uses oscillations excited by MRFTat a specific phase, called the distinguishing phase, to identifydynamical model parameters using a DNN classifier. DNN-MRFT was extended for higher order dynamical systemsin a cascaded manner. Such extension was used to identifylinearized system dynamics of side translational movementhaving a relative degree of five with a time delay in real-time. Accuracy of system gain identification was improved byusing exact solutions of Lure’s systems from simulation data.DNN-MRFT precision and accuracy were validated both insimulation and experimentation. A take-off controller suitablefor a wide range of UAV designs demonstrated experimentalreal-time identification capability. The generalized take-offcontroller with DNN-MRFT was verified on three differentUAV designs. As a result of identification, aggressive ma-neuvers were possible without any form of hand tuning orbiasing. The results were benchmarked against state-of-the-art and showed outstanding performance as a result of usingDNN-MRFT.

For future work, we aim to evaluate DNN-MRFT forsystems with other dynamical properties. We see that DNN-MRFT can be extended to MIMO systems with strong cross-channel couplings. Also, DNN-MRFT might be successful

15

Fig. 9: Full channel DNN-MRFT auto-tuning experiment starting from a landing state without prior knowledge of systemdynamics followed by two aggressive maneuvers of vertical window passage at different speeds. (a) The UAV takes off andperforms MRFT on altitude and attitude control loops. Once steady-state behaviour is observed for each control-loop, DNN-MRFT identifies the optimal controller parameters which are directly applied to control the plant. (b) DNN-MRFT is performedon both side motion control channels and tunes controllers accordingly. (c) The online tuned controllers smoothly drive theUAV back to origin and hold it at hover. (d) Vertical Window maneuver at 2.75 m/s. (e) Vertical Window maneuver at 1.75m/s. A video of the full experiments can be seen in [4].

when applied to dynamical systems with large time delays.Such extensions require better understanding of the theoreticalbasis of information embedded on dynamical systems.

ACKNOWLEDGMENT

We would like to thank Quanser team for their generousand timely support. We also thank Eng. Yehya Farhoud forhis help in preparing the experimental setup.

REFERENCES

[1] K. J. Astrom and B. Wittenmark, Adaptive Control, 2nd ed. AddisonWesley, 1995.

[2] A. Ayyad, M. Chehadeh, M. I. Awad, and Y. Zweiri, “Real-time systemidentification using deep learning for linear processes with applicationto unmanned aerial vehicles,” IEEE Access, vol. 8, pp. 122 539–122 553,2020.

[3] I. Boiko, “Modified relay feedback test (mrft) and tuning of pidcontrollers,” in Non-parametric Tuning of PID Controllers: A ModifiedRelay-Feedback-Test Approach. London: Springer London, 2013, pp.25–79. [Online]. Available: https://doi.org/10.1007/978-1-4471-4465-6_3

[4] A. Ayyad, P. Silva, M. Chehadeh, M. Wahbah, O. AbdulHay,I. Boiko, and Y. Zweiri, Multirotors From Takeoff to Real-TimeFull Identification using MRFT and DNNs, 2020. [Online]. Available:https://youtu.be/NH58HskNqk4

[5] A. P. Schoellig, F. L. Mueller, ·. Raffaello D’andrea, A. P. Schoellig,F. L. Mueller, ·. R. D’andrea, and R. D’andrea, “Optimization-basediterative learning for precise quadrocopter trajectory tracking,” vol. 33,pp. 103–127, 2012. [Online]. Available: https://link.springer.com/article/10.1007/s10514-012-9283-2

[6] F. Berkenkamp, A. P. Schoellig, and A. Krause, “Safecontroller optimization for quadrotors with gaussian processes,”CoRR, vol. abs/1509.01066, 2015. [Online]. Available: http://arxiv.org/abs/1509.01066

[7] N. O. Lambert, D. S. Drew, J. Yaconelli, S. Levine, R. Calandra, andK. S. Pister, “Low-level control of a quadrotor with deep model-basedreinforcement learning,” IEEE Robotics and Automation Letters, vol. 4,no. 4, pp. 4224–4230, 2019.

[8] Q. Li, J. Qian, Z. Zhu, X. Bao, M. K. Helwa, and A. P. Schoellig,

https://doi.org/10.1007/978-1-4471-4465-6_3

https://doi.org/10.1007/978-1-4471-4465-6_3

https://youtu.be/NH58HskNqk4

https://link.springer.com/article/10.1007/s10514-012-9283-2

https://link.springer.com/article/10.1007/s10514-012-9283-2

http://arxiv.org/abs/1509.01066

http://arxiv.org/abs/1509.01066

16

“Deep neural networks for improved, impromptu trajectory tracking ofquadrotors,” in 2017 IEEE International Conference on Robotics andAutomation (ICRA). IEEE, 2017, pp. 5183–5189.

[9] P. Abbeel, A. Coates, and A. Y. Ng, “Autonomous helicopter aerobaticsthrough apprenticeship learning,” The International Journal of RoboticsResearch, vol. 29, no. 13, pp. 1608–1639, 2010.

[10] M. S. Chehadeh and I. Boiko, “Design of rules for in-flight non-parametric tuning of PID controllers for unmanned aerial vehicles,”Journal of the Franklin Institute, vol. 356, no. 1, pp. 474–491, jan2019. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0016003218306604

[11] P. Poksawat, L. Wang, and A. Mohamed, “Automatic tuning of attitudecontrol system for fixed-wing unmanned aerial vehicles,” IET ControlTheory & Applications, vol. 10, no. 17, pp. 2233–2242, 2016.

[12] W. Giernacki, “Iterative learning method for in-flight auto-tuning of uavcontrollers based on basic sensory information,” Applied Sciences, vol. 9,no. 4, p. 648, 2019.

[13] W. Giernacki, D. Horla, T. Báca, and M. Saska, “Real-time model-free minimum-seeking autotuning method for unmanned aerial vehiclecontrollers based on fibonacci-search algorithm,” Sensors, vol. 19, no. 2,p. 312, 2019.

[14] D. A. Tesch, D. Eckhard, and W. C. Guarienti, “Pitch and roll con-trol of a quadcopter using cascade iterative feedback tuning,” IFAC-PapersOnLine, vol. 49, no. 30, pp. 30–35, 2016.

[15] D. Howard, “A platform that directly evolves multirotor controllers,”IEEE Transactions on Evolutionary Computation, vol. 21, no. 6, pp.943–955, 2017.

[16] Y. Du, J. Fang, and C. Miao, “Frequency-domain system identificationof an unmanned helicopter based on an adaptive genetic algorithm,”IEEE Transactions on Industrial Electronics, vol. 61, no. 2, pp. 870–881, 2014.

[17] C. Lin, C. Huang, C. Lee, and M. Chao, “Identification of flightvehicle models using fuzzified eigensystem realization algorithm,” IEEETransactions on Industrial Electronics, vol. 58, no. 11, pp. 5206–5219,2011.

[18] J. K. McElveen, K. R. Lee, and J. E. Bennett, “Identification ofmultivariable linear systems from input/output measurements,” IEEETransactions on Industrial Electronics, vol. 39, no. 3, pp. 189–193, 1992.

[19] N. Nevaranta, S. Derammelaere, J. Parkkinen, B. Vervisch, T. Lindh,K. Stockman, M. Niemelä, O. Pyrhönen, and J. Pyrhönen, “Onlineidentification of a mechanical system in frequency domain using slidingdft,” IEEE Transactions on Industrial Electronics, vol. 63, no. 9, pp.5712–5723, 2016.

[20] D. Erdogan, S. Jakubek, C. Mayr, and C. Hametner, “Combustion enginetest bed system identification under the presence of cyclic disturbances,”IEEE Transactions on Industrial Electronics, pp. 1–1, 2020.

[21] I. Boiko, “Input-output analysis of limit cycling relay feedback controlsystems,” in Proceedings of the 1999 American Control Conference (Cat.No. 99CH36251), vol. 1. IEEE, 1999, pp. 542–546.

[22] G. Hoffmann, H. Huang, S. Waslander, and C. Tomlin, QuadrotorHelicopter Flight Dynamics and Control: Theory and Experiment,2007, p. 6461. [Online]. Available: https://arc.aiaa.org/doi/abs/10.2514/6.2007-6461

[23] P. Pounds, R. Mahony, and P. Corke, “Modelling and control of a largequadrotor robot,” Control Engineering Practice, vol. 18, no. 7, pp. 691–699, 2010.

[24] C. Cheron, A. Dennis, V. Semerjyan, and Y. Chen, “A multifunctionalHIL testbed for multirotor VTOL UAV actuator,” in Proceedingsof 2010 IEEE/ASME International Conference on Mechatronic andEmbedded Systems and Applications. IEEE, jul 2010, pp. 44–48.[Online]. Available: http://ieeexplore.ieee.org/document/5552032/

[25] S. Bouabdallah, A. Noth, and R. Siegwart, “PID vs LQ controltechniques applied to an indoor micro quadrotor,” in 2004 IEEE/RSJInternational Conference on Intelligent Robots and Systems (IROS)(IEEE Cat. No.04CH37566), vol. 3, 2004, pp. 2451–2456 vol.3.

[26] R. W. Prouty, Helicopter Performance, Stability, and Control. KriegerPublishing Company, 2002.

[27] D. Atherton, Nonlinear Control Engineering: Describing Function Anal-ysis and Design. London: Van Nostrand Reinhold, 9 1975.

[28] K. Åström and T. Hägglund, “Automatic tuning of simple regulatorswith specifications on phase and amplitude margins,” Automatica,vol. 20, no. 5, pp. 645–651, sep 1984. [Online]. Available:https://www.sciencedirect.com/science/article/pii/0005109884900141

[29] I. Boiko, “Design of non-parametric process-specific optimal tuningrules for PID control of flow loops,” Journal of the FranklinInstitute, vol. 351, no. 2, pp. 964–985, feb 2014. [Online]. Available:https://www.sciencedirect.com/science/article/pii/S0016003213003487

[30] H. Noh, T. You, J. Mun, and B. Han, “Regularizing deep neuralnetworks by noise: Its interpretation and optimization,” in Advancesin Neural Information Processing Systems 30, I. Guyon, U. V.Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, andR. Garnett, Eds. Curran Associates, Inc., 2017, pp. 5109–5118.[Online]. Available: http://papers.nips.cc/paper/7096-regularizing-deep-neural-networks-by-noise-its-interpretation-and-optimization.pdf

[31] Y. Li and Y. Yuan, “Convergence analysis of two-layerneural networks with relu activation,” in Advances in NeuralInformation Processing Systems 30, I. Guyon, U. V. Luxburg,S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, andR. Garnett, Eds. Curran Associates, Inc., 2017, pp. 597–607. [Online]. Available: http://papers.nips.cc/paper/6662-convergence-analysis-of-two-layer-neural-networks-with-relu-activation.pdf

[32] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deepnetwork training by reducing internal covariate shift,” 2015, pp.448–456. [Online]. Available: http://jmlr.org/proceedings/papers/v37/ioffe15.pdf

[33] A. Shrestha and A. Mahmood, “Review of deep learning algorithms andarchitectures,” IEEE Access, vol. 7, pp. 53 040–53 065, 2019.

[34] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014.

[35] I. Boiko, Discontinuous control systems: frequency-domain analysis anddesign. Birkhäuser Basel, 2008.

[36] I. Boiko, L. Fridman, A. Pisano, and E. Usai, “Analysis of chatteringin systems with second-order sliding modes,” IEEE Transactions onAutomatic Control, vol. 52, no. 11, pp. 2085–2102, 2007.

[37] Optitrack. (2020) Optitrack - prime 17w. Accessed: 2020-05-10.[Online]. Available: https://optitrack.com/products/prime-17w/

[38] Quanser. (2020) Qdrone - quanser. Accessed: 2020-05-06. [Online].Available: https://www.quanser.com/products/qdrone/

[39] D. Mellinger and V. Kumar, “Minimum snap trajectory generation andcontrol for quadrotors,” in 2011 IEEE International Conference onRobotics and Automation, 2011, pp. 2520–2525.

[40] C. Richter, A. Bry, and N. Roy, Polynomial Trajectory Planning forAggressive Quadrotor Flight in Dense Indoor Environments, 04 2016,pp. 649–666.

[41] A. Molchanov, T. Chen, W. Hönig, J. A. Preiss, N. Ayanian, and G. S.Sukhatme, “Sim-to-(multi)-real: Transfer of low-level robust controlpolicies to multiple quadrotors,” arXiv preprint arXiv:1903.04628, 2019.

https://linkinghub.elsevier.com/retrieve/pii/S0016003218306604

https://linkinghub.elsevier.com/retrieve/pii/S0016003218306604

https://arc.aiaa.org/doi/abs/10.2514/6.2007-6461

https://arc.aiaa.org/doi/abs/10.2514/6.2007-6461

http://ieeexplore.ieee.org/document/5552032/

https://www.sciencedirect.com/science/article/pii/0005109884900141

https://www.sciencedirect.com/science/article/pii/S0016003213003487

http://papers.nips.cc/paper/7096-regularizing-deep-neural-networks-by-noise-its-interpretation-and-optimization.pdf

http://papers.nips.cc/paper/7096-regularizing-deep-neural-networks-by-noise-its-interpretation-and-optimization.pdf

http://papers.nips.cc/paper/6662-convergence-analysis-of-two-layer-neural-networks-with-relu-activation.pdf

http://papers.nips.cc/paper/6662-convergence-analysis-of-two-layer-neural-networks-with-relu-activation.pdf

http://jmlr.org/proceedings/papers/v37/ioffe15.pdf

http://jmlr.org/proceedings/papers/v37/ioffe15.pdf

https://optitrack.com/products/prime-17w/

https://www.quanser.com/products/qdrone/

17

APPENDIX

Fig. 10: Response of processes at the vertices of Dalt for the takeoff controller with parameters in Eq. (19). The parametersat the top of each graph represents the vector [τ, Tprop, T1, CTW , Jtot]. Vertical dashed line shows tr0 and the horizontal onesshow ar0.

18

Fig. 11: Full channel DNN-MRFT auto-tuning experiment starting from a landing state without prior knowledge of systemdynamics applied to DJI F550 custom hexarotor UAV (left column) and DJI F550 custom hexarotor UAV with extendedarms (right column). In period (a) identification of inner loops parameters was performed and in period (b) identification wasperformed on outer loop parameters. After auto-tuning, the multirotor UAVs followed a trajectory resembling a square. Bothauto-tuning experiments are shown in the video in [4].

19

Abdulla Ayyad received his MSc. in ElectricalEngineering from The University of Tokyo in 2019where he conducted research in the Spacecraft Con-trol and Robotics laboratory. He is currently a Re-search Associate in Khalifa University Center forAutonomous Robotic Systems (KUCARS) workingon several robot autonomy projects. His currentresearch targets the application of AI in the fieldsof perception, navigation, and control.

Pedro Henrique Silva received the B.Sc. degree incontrol engineering from the Federal University ofItajubá - Brazil, in 2017, and is currently pursuinghis M.Sc. in AI from the University of Bath. Hestarted in the field of UAVs first as a participant oninternational competitions and later professionallyin the agriculture industry. He is currently a Re-search Associate with the Khalifa University Centerfor Autonomous Robotic Systems (KUCARS). Hisresearch interests include perception, control and AIapplied to autonomous vehicles.

Mohamad Chehadeh received his MSc. in Elec-trical Engineering from Khalifa University, AbuDhabi, UAE, in 2017. He is currently with KhalifaUniversity Center for Autonomous Robotic Systems(KUCARS). His research interest is mainly focusedon identification, perception, and control of complexdynamical systems utilizing the recent advancementsin the field of AI.

Mohamad Wahbah received his BSc. and MSc.in Electrical Engineering from Khalifa University,Abu Dhabi, UAE. He is currently working as a re-search associate at Khalifa University Center for Au-tonomous Robotic Systems (KUCARS). Currentlyhe is involved in perception and estimation solutionsfor autonomous robots.

Oussama Abdul Hay received his MSc. In Thermal Power and FluidsEngineering from The University of Manchester, UK. He is currently workingas a research associate at Khalifa University Center for Autonomous RoboticSystems (KUCARS). His research interests are in the fields of perception andvisual servoing for autonomous robots.

Igor Boiko received his MSc, PhD and DSc degreesfrom Tula State University and Higher AttestationCommission, Russia. His research interests includefrequency-domain methods of analysis and design ofnonlinear systems, discontinuous and sliding modecontrol systems, PID control, process control theoryand applications. Currently he is a Professor withKhalifa University, Abu Dhabi, UAE.

Yahya Zweiri received the Ph.D. degree from theKing’s College London in 2003. He is currentlythe School Director of the Research and Enterprise,Kingston University London, U.K. He is also an As-sociate Professor with the Department of Aerospace,Khalifa University, United Arab Emirates. He wasinvolved in defense and security research projects inthe last 20 years at the Defence Science and Tech-nology Laboratory, King’s College London, and theKing Abdullah II Design and Development Bureau,Jordan. His central research focus is interaction dy-

namics between unmanned systems and unknown environments by means ofdeep learning, machine intelligence, constrained optimization, and advancedcontrol. He has published over 100 refereed journal and conference papersand filed six patents in USA and U.K. in unmanned systems field.

multirotors from takeoff to real-time full identiﬁcation using … · 1 day ago · kingston...

Documents