automatedbrickpatterngeneratorforroboticassembly...

Automated Brick Pattern Generator for Robotic AssemblyusingMachine Learning and Images

Bárbara Andrade Zandavali1, Manuel Jiménez García21University of East London 2University College [email protected] [email protected]

Brickwork is the oldest construction method still in use. Digital technologies, inturn, enabled new methods of representation and automation for bricklaying.While automation explored different approaches, representation was limited todeclarative methods, as parametric filling algorithms. Alternatively, this workproposes a framework for automated brickwork using a machine learning modelbased on image-to-image translation (Conditional Generative AdversarialNetworks). The framework consists of creating a dataset, training a model foreach bond, and converting the output images into vectorial data for roboticassembly. Criteria such as: reaching wall boundary accuracy, avoidance ofunsupported bricks, and brick's position accuracy were individually evaluated foreach bond. The results demonstrate that the proposed framework fulfils boundaryfilling and respects overall bonding structural rules. Size accuracy demonstratedinferior performance for the scale tested. The association of this method with`self-calibrating' robots could overcome this problem and be easily implementedfor on-site.

INTRODUCTIONRobotic assembly for brickworks has been re-searched since the 1990s, (Anliker, 1988) yet in thelast decade, a renewed interest for automatizationhas emerged. The introduction of robotic assem-bly in brickwork starts as a cost-effective alternativeto traditional assembly. More recently, designer andbuilders have taken advantage of customizationwithminor cost impact using digital technologies. (Bon-wetsch, 2015) While picking-and-placing is a trivialtask for automation, the construction constraints isstill a challenge. The impact of digital tools also in-fluenced how architects represent brickwork. Hence,the design representation for the robotic bricklayingis also the ‘instruction’ for its automation.

Bonwetsch (2015) classifies representation forbrickwork automation in two approaches: top-down

and bottom- up. While top-down representationsubdivides a global and well-defined structure (wallboundary) into smaller units (bricks), bottom-uprepresentation concentrates on the units’ arrange-ment and its local relationship ignoring overall con-straints. Top-down approaches are excessively tech-nical, removing the opportunity for innovation, inturn disregarding the boundary constraint reduc-ing the bottom-up applicability to real life scenarios.A more appropriate representation would combineboth approaches, addressing a variety of shapes butstill following a boundary.

Lessdeclarativemethods, suchasmachine learn-ing image-to-image translation models, are suitablesolutions. The machine learning models are able togeneralize the content of the filling for any boundarycondition based on image representations. (Isola et

Matter - DIGITAL PRODUCTION AND ROBOTICS 2 - Volume 3 - eCAADe 37 / SIGraDi 23 | 217

al., 2016) In parallel, machine vision technologies en-large the robot’s ability to handle variations on anon-going task, enhancing its flexibility. (Automata, 2018)For these reasons, this paper investigates an alterna-tive method of representation for automated brick-work using images. This research suggests that usingimages for representation methods would benefitthe integration of construction tasks to automation.Moreover, the application of less declarative meth-ods as artificial neural networks is an alternative tobe generic (different boundaries) while maintainingconstruction logic and pattern aesthetics. This paper,in turn, investigates whether image-to-image trans-lation models’ outcomes are satisfactorily accurateand generalized to be integrated to on-site roboticassembly?

BACKGROUND: MASONRY AUTOMATIONANDMACHINE LEARNINGMasonry is an additive construction method con-sisting of layering units in courses from the lowestto the highest level. Although brickworks automa-tion has been available for over 30 years, researchesstarted to explore programmability potential shift-ing from engineering-oriented practices towards adesign-oriented approach. The major downsides ofbrickwork automation were related to their lack ofadaptation to the site environment and handlingvariations in the predefined task.

To enlarge the robot’s adaptation to site condi-tions, previous research integrated mobility featuresand feedback systems to industrial robots. Alterna-tives for mobility include (a) developing a containerto move the robot on site (Bärtschi et al., 2010), (b)adapting tracks in the robot (Arch Union, 2018), and(c) using an aerial structure to ‘suspend’ the robots(Gramazio & Kohler, 2016). Helm et al. (2012) andDörfler et al. (2016) tackledbothmobility and captur-ing information by equipping industrial robots witha mobile track system and 2D and 3D scanners thatallowed the unit to detect objects or obstacles. Col-laborative robots with mobility systems are also analternative to enlarge robot’s site adaptation. Will-

mann et al. (2014) publicised a complete workflowfor Aerial Robotic Construction testedwith blocks as-sembly. One of the challenges faced concerned theaccuracy and reliability to place a brick in the ‘correctposition’. The inconstancy of ‘flying’ conditions re-flected in system precision. These attempts focusedon incorporating hardware devices (wheels, drones,cameras, scanners) to enlarge the robot’s site adap-tation with minor efforts to increase its flexibility inthe bricklaying task.

Two different trials focused on solving taskadaptability with smart algorithm solutions for a dif-ferent purpose. These robots use systems to captureinformation from the environment and calibrate theiraccuracy based on the task to be performed. Nairet al. (2017) trained a robot to manipulate a ropeusing self-supervised machine learning model andimitation. Robot EVA (Automata, 2018), in turn, is asmall andunexpansive robotdeveloped tobe trainedbased on imitation. The user first demonstrates themove and positioning, then the robot replicates theaction calibrating the system according to the envi-ronment. This paper suggests that the embodimentof automation can go beyond sensing the environ-ment and concentrates on creating a system that in-terprets the environment. It supports that the pro-cess of capturing the area to be filled, defining theboundaries and calculating each brick position is anintuitiveprocess that canbe replicated in automationwith the support of artificial intelligence.

Machine learning (ML) is a subset of artificial in-telligence that uses statistical techniques to enablecomputers program’s to progressively improve per-formance on a specific task via experience (‘learn-ing’), without being explicitly programmed (Samuel,1959). According to Duda & Stork (1976), learningrefers to how the algorithm reduces the error on atraining set by adjusting its weights. In the paperby Goodfellow, Bengio & Courville (2016), the taskwas to generate new examples based on the imagesin the training data, defined by them as synthesisand sampling. As data representation is a key factorfor machine learning efficiency (Goodfellow, Bengio

218 | eCAADe 37 / SIGraDi 23 - Matter - DIGITAL PRODUCTION AND ROBOTICS 2 - Volume 3

Figure 1Frameworkdiagram.

& Courville, 2016), using images for machine learn-ing has its challenges and specificities. ML modelstrained with images datasets calibrate its weights ac-cording to every pixel values (RGB).

Inspired by the visual cortex behaviour, Convo-lution Neural Networks (CNN) is a powerful networkfor image processing tasks, where each convolutionlayer extracts a specific feature (filter) from the in-put image. Gatys, Ecker & Bethge (2015) investigatedCNN potential for artistic purposes. The model un-dertakes that content and style representations arewell separable in CNN. As a result, it produces a newimage combining content (photo) and style (artwork)from two sources. Although the network results areinnovative, the input image constrains the content ofthe outcome, which limits the generalization to otherimages.

Along these lines, Goodfellow et al. (2014) sup-ported that methods for generative models are lessdisseminated than discriminative due to its complex-ity to ‘converge’. They overcome this drawback byproposing a Generative Adversarial Network (GAN),whose architecture combines two models - gener-ative and discriminative - in a competitive methodto create images undistinguished from the genuinedataset. (Goodfellow et al., 2014). Isola et al.(2016) proposed an intermediary approach, the Con-ditional Generative Adversarial Networks (cGAN), a

general-purpose solution to image-to-image transla-tion problems, publicly available as ‘pix2pix’. (Isola etal., 2016) Like GAN architecture, Pix2Pix trains gen-erative and discriminative networks simultaneously.Alternatively, it uses a pair of images for training, theinput and the target. Pix2pix model learns the objec-tive throughout training, by comparing the definedinputs and outputs during and inferring the objec-tive. cGAN was chosen to be part of this paper dueto its flexibility, the absence of labelling, and reliabil-ity even with small datasets. (Isola et al., 2016)

MATERIALS ANDMETHODSThis paper proposed a framework for automatedbrickwork using a machine learning model based onimage-to-image translation (cGAN) able to replicatetraditional bonding to every possible flat shape. Theframework development requires (a) the develop-ment of a large dataset of brick walls, (b) the networktraining and (c) the translation of images outcomesinto instructions for assembly. To create the dataset,a filling algorithm describes the brick patterns for avariety of wall geometries. The second step com-prises training cGAN (Isolda et al., 2016) to generatebrick patterns based on each bond dataset using im-age pairs with the input (wall boundary) and targetimage (brick wall). Then, an image processing algo-rithm converts the images into spatial positions for


automatic assembly simulation. Fig 01 illustrates theframeworks’major steps: recognizing aboundary, re-turning an image of thewall with a bondpattern and,finally, converting its positions in space to be built.

Generating the DatasetThe goal of this algorithm is to fill a variety of wallshapes with predefined bond patterns. Each bondpattern has a rule set to describe the brick arrange-ment. The chosen bonds use a single standard brickshape in two positions: stretcher and headers. Thebond differed themselves by the number of stretcherand headers units, the brick’s arrangement, and howeach row is aligned with the one below and above.The algorithm scans the polygon area (wall) checkingwhether the brick is inside the boundary. The screensize used in the framework (500 x 500 pixels) refersto 500 x 500 cm. The filling algorithm developed forthe bricks’ patterns comprise four main methods: (a)wall shapegeneration, (b) brick patterns filling, (c) ex-portingmethods for images (png format - pixels) andpositions (csv format - vectorial).

Firstly, the algorithm creates a bi-dimensionalshape that representswall shapes. Both elements aredescribed by a quad and its four vertices, located intoone of four screen quadrants. The lower vertices areconstrained to the same vertical value to guaranteethat the first row is parallel to the floor. After the wallboundary is set, the algorithm calculates the verticalandhorizontalminimumandmaximumvalues defin-ing a bounding box outside the polygon. (Fig. 2)To describe the brick bond within the wall boundary,this paper uses a ’Scan Line Polygon Filling algorithm.(Kubitz, 1968; Reynolds 1997; Lieberman 1978).

The algorithm goal is to define the polygonedges and scan its area pixel by pixel in the screenusing the edges as starting and stopping points. Thebrick pattern algorithm, in turn, iterates respectingthe brick size instead of checking every pixel. Theminimum and maximum method from wall shapecreation defines the scanning area and the startingpoint for scanning area combines the ‘back’ value forthe ‘x’ coordinate and the bottom value for the ‘y’ co-

ordinate. The algorithm, then, checks thepixel colourin the brick centre and its four corners to assure thatthe whole brick will be comprised within the bound-ary (Fig. 2). The filling pattern varies for each bond-ing, this paper uses four traditional bonding styles:(a) Stretcher, (b) Header, (c) English, (d) Flemish. Af-ter creating the brick pattern, the algorithm placessliced bricks to fill in the remaining gap spaces. Thefilling algorithm outputs comprise two images, thewall boundary and the bonding, and a table. The ta-ble summarises every brick position (coordinate X, Yand Z), dimensions and type.

Figure 2Scan line FillingAlgorithm Diagram.

Machine LearningThe machine learning step comprises the data setpre-processing, the model training and testing.Dataset creation is a key factor for training the net-work. The chosen network cGAN requires two im-ages to train the network. One image is the networkinput (A) and the other is the target image (B). In thisresearch, input images describe the wall boundarieswhile the target images describe the same shapefilled with the brick patterns. The data set is split intotraining and testing groups, with a proportion of 70% training - 30% testing. Training themodel requiresthe upload of the training portion of the dataset andto specify training parameters. The training methodrequires image pairs as inputs, the direction of train-ing (from input to target) and the number of itera-tions (epochs) per image pairs. Since the training isbased on the relation of the data set pairs, each bondhas auniquemodel. After training themodel, its eval-uation uses the testing group to test themodel’s abil-ity to generalize the solution.


Converting Images into InstructionsAs a final step, the framework converts the informa-tion from the network outcomes (images composedby pixels) into positions in the space (x, y, z coordi-nates). The proposed method scans every pixel inthe screen to find the bricks extremities. The cornerpixel differs from the other pixels due to its neigh-bourhood pixels colours. (Fig. 3) The algorithm scansthe entire screen, finds the upper left corner. The po-sitions are then exported as values in a table, whichcan be used as instructions.

Figure 3DIagram of themethod to convertthe output intoinstructions.

EXPERIMENTS AND RESULTSThe experiments are divided into two approachesaiming to (a) set up and ideal model for the network(offline), and (b) simulate hypothetical on-site situa-tions (online). Offline experiments tested a variety ofdataset representation style, the dataset size, num-ber of iterations (epochs), and including informationin the target. (Zandavali, 2018) Since dataset styleand size weremore representatives, the experimentsand results are explained as follows. Online experi-ments compared the four bonding for the same test-ing sample. The chosen criteria to evaluate the mod-

els’ performance comprised of quantitative andqual-itative features.

Evaluation Criteria and SampleThe quantitative aspect focuses on the performanceindicators calculated by the cGAN model for its net-works and performance rate. Each model comprisesthree networks; one for the discriminator, and twofor the generator (GAN and L1), whereas the idealtrade-off is to compensate discriminator and gener-ator simultaneously. The performance rate evaluatesthe resemblance between the output and the targetcomparing the value of every pixel in both images.Alternatively, qualitative criteria focus on automatedconstruction constraints as (a) filling accuracy of theinput boundary (‘Boundary index’), (b) avoiding un-supported bricks (overhangs larger than half of itssize), (c) replicating the brick size.

The best model performs the lowest discrimina-tor and L1 loss and highest GAN loss for its quanti-tative results. Performance rates range from 0 for adiscrepant copy, to 1 for a perfect copy. Qualitativeindicators take target images as a baseline, the closerthe output is to the target, the better its representa-tion and the network performance. Boundary ‘index’is the percentual difference area coverage of the fill-ing for a target image (target percental = target bricksarea (/) boundary area (*) 100) and the output (out-put percental = output bricks area(/) boundary area(*) 100). Therefore, the smaller the boundary index,themore accurate is the area coverage. Unsupportedbricks are counted for eachevaluation shapewithob-tuse base angles. Since target images do not displayunsupported bricks, the fewer unsupported bricks inthe outputs, the better. Brick size ‘index’ calculates apercentual based on the difference in the number ofrows from the target and output image (difference =target number of rows - output number of rows). Thisvalue is then divided by the baseline and multipliedby 100 (difference(/) target number of rows (*) 100).

15 images were used as an evaluation samplefor qualitative criteria. The shapes vary in size (small,mediumand large) and interior angles (square, acute,


and obtuse angles - Fig. 4). The shapes are part of thetesting set from the dataset processed after the net-work training. The variety of shapes aims to evaluatediscrepancies for different proportions, sizes and an-gles. Shapes with obtuse internal angles on the bot-tom vertices enable the evaluation of unsupportedbricks criteria.

Figure 4Evaluation shapes.

Offline ExperimentsComparing Dataset Styles. The definition of stylewas tested concerning two main aspects: (a) if themodel could perform well the ‘filling task’ and (b) ifthe outcome could be easily transformed into vectordata for the assembly. The tested datasets used 200images for training and iterated 100 times (epochs)for the stretcher bond. The styles varied the fillingand colour strokes for the wall boundary in the inputimage and for the header and stretcher bonds in thetarget. The four styles are illustrated in Fig. 5 withthe respective RGB values for (a) background, (b) wallstroke, (c) wall filling, (d) header brick fill, (e) headerbrick stroke, (f ) stretcher brick fill, (e) stretcher brickstroke. Greyscale values use one value varying from0 (black) to 255 (white).

Figure 5Dataset style ofinput and targetimages and itscorrespondingfilling and strokecolours.

The network was able to represent the overallstretcher bonding logic for every style. (Fig. 6) Thearea coverage for the boundary was similar withinthe scenarios. However, style A and C show a weakerboundary definition. Unsupported bricks index wasthe most representative result and demonstratedthat using a distinct colour for the bricks (style A andB) coped with the network to ’figure out’ the struc-tural logic. (Table 1) The bricks’ size representationdiscrepancy is similar for every dataset. The outputsrepresented the standard brick (24 x 8 cm) with 26 x9 cm. Style B displayed a superior quantitative per-formance since it has lower discriminator and L1 lossresults and higher GAN loss results.

Table 1Dataset stylecomparison results.

Comparing Dataset Size. The experiment concern-ing dataset size compared four scenarios, with 200,500, 1000, and 2000 dataset. The dataset ratio (train-ing/testing and regular/irregular shapes) is identicalfor every scenario. (Table 02). The experiment com-pares each dataset size trained for 100 epochs andstyle D. Table 02 summarizes the results for eachdataset size scenario. Quantitative results demon-strated a significant increase for GAN loss and de-crease for the discriminator loss from scenario A‘ toB’ (200 to 500 images training dataset). In turn, from500 to 1000 and from1000 to 2000, discriminator lossdecreased only 0,028 and 0,033 respectively.

Similarly, GAN loss had a minor growth from 500to 2000. Qualitative results were less expressive, ex-cept for the ‘un-supported’ bricks. The major im-pact of enlarging the dataset was the decrease of un-supported bricks. While 1000 image training datasetevaluation shapes displayed 24 unsupported bricks,2000 images dataset reduced this value to 4. This re-


Figure 6Input and outputimages for the fourdataset styles.

sult suggests that the network exposition to a largedataset contributes to replicate an implicit feature,which is not proportionally related to quantitative in-dicators.

Table 2Dataset sizecomparison results.

Final Model Set-up. To define a final model, this pa-per compared 8 eight trained models varying style

and dataset size using 100 epochs for training. Theirperformances are plotted in Fig. 7 comparing quanti-tative and qualitative criteria. Each model is referredin the graph for its style (A - D) and the number ofimages used in its training. Except for GAN loss, themodelwith a lower valuehas a superior performance.Two models displayed a superior performance (styleB - 500 and style D - 2000) are highlighted with adarker grey in the graphs. Except for GAN loss, bothmodels performed similarly. However, the model instyle B usedone-quarter of the images for its training.This fact reflects proportionally on the time expen-diture of its training, from 1 to 4 hours in a standardGPU computer. Another remarkable factor is the effi-ciency of style B to avoid unsupported bricks. More-over, the contrast within the colours in style B facili-


Figure 7Graph comparingthe performance ofthe testednetworks.

tates post-processing the images. For this reason, theset up combining style B, 500 images for trainingwasused for more bonding tests, presented as follows.

Online ExperimentsComparing bonding. Since the goal was to adapt toa task variation, this experiment aims to generate dif-ferent bonding patterns for identical shapes, evalu-ate how each populates the boundary, and how thenetwork performs for each pattern. Each bondinghas a specificmodel to train and test its capacity to fillthewall shapes. English and Flemishbonding config-uration prevents the filling for polygons with obtuseangles in the base. For this reason, the wall shapesused to train and test the model have base angleswithin 90 degrees.

The results have shown (Table 3) that the net-workwas able to describe each bondingmaintainingthe bricks’ local relationship. The number of unsup-ported bricks was higher in the outcomes of Headerbonding than in Stretcher bonding. The discrep-ancy of the size of the bricks was similar in everybonding, reinforcing that this might be a limitationof the network architecture. While Header and Flem-ish bonds outcomes replicated the edges and bricksstrokewith a significant resolution, the strokeswithinthe darker bricks were poorly defined in the Englishbond (Fig. 8). For a matter of curiosity, althoughEnglish and Flemish models were trained with con-strained shapes, this paper also tested shapes withobtuse base angles and the outcomes are displayedin Fig. 8.

Table 3Results for thebondingcomparison.

CONCLUSIONThe question posed at the beginning of this re-search addresses the extent to which machine learn-ing models, using images as a method of represen-


Figure 8Input and outputimages for the fourbonding patterns:Stretcher, Header,English andFlemish.

tation, could generate outputs satisfactorily accurateand generalized for automated brickwork construc-tion. The outcomes demonstrated promising resultsto replicate thebondingpattern, in turn, the accuracyto describe brick size was questionable (10%). More-over, even the final models ‘placed’ unsupportedbricks occasionally. The relevance of dataset repre-sentation was evident and, at the same time, unpre-dictable. The inclusion of qualitative metrics was akey feature to evaluate the network performance forthe specific task, while the losses and performancerates lack ‘meaningful’ information. Better outcomescouldbenefit from increasingor decreasing thenum-ber of layers in the network, opting for different op-timizers, changing the activation functions and biasfunctions from the original network. Another limita-

tion faced in this work was the resolution of the net-work output which compromises the conversion ofthe bi-dimensional results into 3d models.

Themethod in this research combines top-downand bottom-up methods gathering control and flex-ibility to address the wall’s boundary. The outcomesdemonstrated that the network was able to recog-nize the boundary and adapt to a variety of unknownwall shapes. Moreover, the outcomes demonstratedthat the network also respected the local relation-ship, inherent to every bonding, establishing a nego-tiation between the overall shape and the emergentpattern. Although the results demonstrated a fewlimitations, this research contributes to a promisingand growing field that associates ‘smart’ algorithmsto automation. The potential of usingmachine learn-


ing resides in its level of abstraction, which could in-crease the system adaptability and enlarge the au-tomation on site.

The results from this research unfolded possi-ble paths for future work. One path envisions train-ing the network for situations that go beyond pan-els such as including openings and exterior corners.A second attempt focus on testing the frameworkoutcomes in physical experiments which require theadaptation of the solution to environment and ma-chinery constraints. Finally, a third path comprisescomparing the results presented in this research toother machine learning models in architecture, suchas ones using further image-to-image translation ar-chitecture or using vector data information.

REFERENCESAnliker, F 1988 ’Needs for robots and advanced ma-

chines at construction sites. Social aspects ofrobotics’, 5th International Symposium on Automa-tion and Robotics in Construction., Tokyo

Bonwetsch, T 2015, Robotically assembled brickwork,Ma-nipulating assembly processes of discrete elements,Ph.D. Thesis, ETH Zurich

Duda, R, Hart, P and Stork, D 1976, ’Pattern Classification’,Journal of Classification, 24(2), pp. 305-307

Dörfler, K, Sandy, T, Giftthaler, M, Gramazio, F, Kohler,M and J, Buchli 2016 ’Mobile Robotic Brickwork’,Robotic Fabrication in Architecture, Art and Design,Robotic Fabrication in Architecture, Art and Design ,pp. 204-217

Gatys, L, Ecker, A and Bethge, M 2015 ’A neural algorithmof artistic style’, IEEE Conference on Computer Visionand Pattern Recognition

Goodfellow, I, Bengio, Y and Courville, A 2016, DeepLearning, MIT

Goodfellow, I, Pouget-Abadie, J, Mirza, M, Xu, B, Warde-Farley, D, Ozair, S, Courville, A and Bengio, Y 2014’Generative adversarial nets’, Advances in neural in-formation processing systems

Helm, V, Willmann, J, Gramazio, F and Kohler, M 2014 ’In-Situ Robotic Fabrication: Advanced Digital Manu-facturingBeyond theLaboratory’,GearingUpandAc-celerating Cross�fertilization between Academic andIndustrial Robotics Research in Europe: TechnologyTransfer Experiments from the ECHORD Project

Isola, P, Zhu, J, Zhou, T and Efros, A 2017 ’Computer

Vision and Pattern Recognition Image-to-ImageTranslation with Conditional Adversarial Networks’,Conference on Computer Vision and Pattern Recogni-tion - CVPR

Kubitz, W 1968, A tricolor cartograph, Ph.D. Thesis, Uni-versity of Illinois

Liberman, H 1978 ’How to color in a coloring book’, ACMSIGGRAPH Computer Graphics

Nair, A, Chen, D, Agrawal, P, Isola, P, Abbeel, P, Malik,J and Levine, S 2017 ’Combining Self-SupervisedLearning and Imitation for Vision-Based Rope Ma-nipulation’, International Conference on Robotics andAutomation ICRA

Reynolds, C 1977 ’Filling Polygons’, ArchitectureMachina-tions

Samuel, A 1959, ’Some Studies in Machine Learning Us-ing the Game of Checkers’, IBM Journal of Researchand Development, 3(3), pp. 210-229

Willmann, j, Augugliaro, f, Cadalbert, T, D’Andrea, R, Gra-mazio, F and Kohler, F 2012, ’Aerial Robotic Con-struction towards a New Field of Architectural Re-search’, International Journal of Architectural Com-puting, 10(3), pp. 439-459

Zandavali, B 2018, Automated Brick Pattern Generatorfor Robotic Assembly usingMachine Learning and Im-ages, Master’s Thesis, University College London

[1] https://automata.tech/product.html[2] http://www.archi-union.com/Homes/Projectshow/index/id/40[3] http://gramaziokohler.arch.ethz.ch/web/e/projekte/135.html


automatedbrickpatterngeneratorforroboticassembly...

Documents