moohialdin, ammar,lamari, fiona,miska, marc ...real-time...site workers and their ppe (helmet and...
Post on 20-Jan-2021
1 Views
Preview:
TRANSCRIPT
This may be the author’s version of a work that was submitted/acceptedfor publication in the following source:
Moohialdin, Ammar, Lamari, Fiona, Miska, Marc, & Trigunarsyah, Bam-bang(2021)A real-time computer visionsystem for workers’ PPE and posture detectionin actual construction site environment.In Wang, Chien Ming, Kitipornchai, Sritawat, & Dao, Vinh (Eds.)EASEC16: Proceedings of The 16th East Asian-Pacific Conference onStructural Engineering and Construction, 2019.Springer, Singapore, pp. 2169-2181.
This file was downloaded from: https://eprints.qut.edu.au/197052/
c© 2019 [Please consult the author]
This work is covered by copyright. Unless the document is being made available under aCreative Commons Licence, you must assume that re-use is limited to personal use andthat permission from the copyright owner must be obtained for all other uses. If the docu-ment is available under a Creative Commons License (or other specified license) then referto the Licence for details of permitted re-use. It is a condition of access that users recog-nise and abide by the legal requirements associated with these rights. If you believe thatthis work infringes copyright please provide details by email to qut.copyright@qut.edu.au
Notice: Please note that this document may not be the Version of Record(i.e. published version) of the work. Author manuscript versions (as Sub-mitted for peer review or as Accepted for publication after peer review) canbe identified by an absence of publisher branding and/or typeset appear-ance. If there is any doubt, please refer to the published source.
https://doi.org/10.1007/978-981-15-8079-6_199
16th East Asia-Pacific Conference on Structural Engineering & Construction (EASEC16)
Edited by C.M. Wang, V. Dao and S. Kitipornchai
Brisbane, Australia, December 3-6, 2019
A REAL-TIME COMPUTER VISION SYSTEM FOR WORKERS’ PPE
AND POSTURE DETECTION IN ACTUAL CONSTRUCTION SITE
ENVIRONMENT
Moohialdin, Ammar*a; Lamari, Fionaa; Marc, Miskaa; Trigunarsyah, Bambangb
a: School of Civil Engineering and Built Environment, Science and Engineering Faculty, Queensland
University of Technology, Brisbane 4000, Australia.
b: Property, Construction and Project Management, Design and Social Context, RMIT University, Melbourne
3000, Australia.
Emails: Ammar.moohialdin@hdr.qut.edu.au
Abstract. The real-time video detection model is yet a challenging, especially in detecting construction
site workers and their PPE (helmet and safety gear) and postures, since the construction site
environment consists multiple complications such as different illumination levels, shadows, complex
activities, a wide range of personal protective equipment (PPE) designs and colours. This paper
proposes a novel computer vision (CV) system to detect the construction workers’ PPE and postures in
a real-time manner. Four different recording sessions have been carried out to build a dataset of 95
videos by using a novel design of site cameras. The PPE detection included eight different types of
helmets and gears and the postures detection consisted of nine classes. The Python data-labelling tool
was used to annotate the selected datasets and the labelled datasets were used to build a detection model
based on the TensorFlow environment. The proposed method consists of two layers of decision trees,
which was tested and validated on two videos of 2000 frames. The proposed model achieves high-
performance results in both identification and recall ratios over 83% and 95%, respectively. It also
achieved higher accuracy in classifying the postures over 72% and 64% in model testing and validation.
The proposed model can promote potential improvements in the application of real-time video analysis
in actual site conditions.
Keywords: Construction; Worker; Computer Vision; PPE; Posture; Detection; Real-time.
1. INTRODUCTION
Collecting accurate information from the construction sites is essential to be an input for
decisions making processes. This information is needed to be in a real-time manner so it can
support safety and productivity-related decisions as well as proactive actions. Conventional
data collection methods such as monitoring sensing technology are overly intrusive (Wong et
al., 2014; Chan, et al., 2012a; 2012b), costly and require stuff training (Zhou et al., 2013; Liang
et al., 2011). It is, therefore, important to collect information from the construction site in proper
ways that that can fit the actual site conditions (Gatti et al., 2013).
The video recording applications in the construction sites can produce massive amounts of
inexpensive information (Dimitrov & Golparvar-Fard, 2014; Memarzadeh, et al., 2012; Chi &
Caldas, 2011) with minimal cost (Han & Lee, 2013). In the last decade, essential improvements
have been made in computerised vision analysis and its algorithms, which have been effectively
employed in real industry applications such as detecting the construction workers and their
movements (Han & Lee, 2013; Memarzadeh et al., 2013). However, the CV application is still
challenging when considering the real-time application and automated interpretation of the
WANG et al.
2
massive amount of video’s information. Moreover, there is a lack of structured data about the
site activates in a way that can be transcribed into logical algorithms (Seo, Han, Lee, & Kim,
2015). Besides, the CV application in the actual site conditions should have the ability to
process the videos with less time and likelihood error (Seo et al., 2015; Yang et al., 2015; Gong
& Caldas, 2010).
Construction site environments also include dust, direct sunlight, rain and the movement of
heavy equipment, which can be another challenge of using off-the-shelf cameras. Therefore,
this study aims to build a real-time CV system that can effectively be implemented in the actual
site conditions to detect the site workers’ PPE and postures. It also proposes a novel practical
design for a site cameras system that can fit the challenging construction site conditions. The
results of the proposed system implementation show that it can be effectively used in the actual
site conditions for real-time data analysis purposes.
2. METHODOLOGY
The CV used in this research refers to the process of transferring the knowledge of site
workers and their postures onto a computer to interpret site videos and retrieve meaningful
information. The CV system included three main parts: a data acquisition unit; a processing and
understanding unit; and a reporting unit.
2.1. Data Acquisition and Structure
This research used a 2D camera designed for the construction site applications rather than
off-the-shelf cameras. The camera included four main parts: a power unit, a processing unit, a
data storage unit and a camera, as shown in Figure 1. Construction site environments include
dust, direct sunlight, rain and the movement of heavy equipment; therefore, the cameras needed
protection to prevent them from being damaged. A plastic box was used for this purpose and
designed to sit at the top of a tripod that can reach heights of up to three meters to ensure the
coverage of a wide area. The recording system included a 2D camera with a framerate of 24 fps
and a resolution of 1024 by 720. The camera recorded video clips that were 41 seconds long. It
also sent each frame to a cloud browser that shows one fps. The camera gets 5-volt from the
processing unit and sent back the footage to the same unit. The processing unit then saved the
footage as videos on the external hard drive and sent images to the cloud browser.
Figure 1. The structure of the CVA measurement units.
It can be challenging to have a constant electrical power connection on a construction site as
additional arrangements are needed. The position of the camera may need to be changed as the
construction work progresses. Therefore, the camera was designed to run on solar power. The
CVA system also included a small battery of 12-volt that the transformer converted into 5-volt
power. The processor unit organised the power connection between the transformer and the
camera. It also controlled the storage of the footage storage on the hard drive and in the cloud
WANG et al.
3
browser. Three cameras were deployed on three different construction sites at allocated
locations. The orientation of the cameras at the top of the tripods helped to cover a flat area of
up to 20 m2 at ground level. The CVA system detected multiple objects, such as helmets, gear
and workers, as well as the workers’ postures. The distance between the camera and the targeted
workstation was around 30 to 40 m.
2.2. Requirements of Real-Time Data Analysis
A sensitivity analysis was conducted to determine the appropriate video resolution and frame
rate for a real-time data analysis. The analysis algorithm was the average number of the
processed frames per minute, which used to compare 138 different combinations of video
resolution versus framerate. The duration of the tested video was 15 seconds and it was
converted into six different resolutions: 4096 by 2160 – 4K; 2048 by 1080 – 2K; 1280 by 720
– 720P; 720 by 576 – 576P; 720 by 480 – 480P and 1960 by 1080. For each type of the
resolution, 23 different framerates between 8 fps and 30fps were tested.
A human detection code was created using MATLAB software. The sensitivity analysis
began by reading the video and then set the value of the video resolution, framerate and time
counter. The code then converted the video into frames. In each frame, the detection algorithm
identified any humans and plotted bounding boxes around them. The frames were then
converted into videos that include the detection results. Finally, the MATLAB video viewer
showed the videos with the bounding boxes. The code outcomes also presented a summary of
the processing time and the average number of frames processed per minute. Figure 2 shows
the sensitivity analysis process in the form of pseudocode. 1. Initialization;
2. Input1: Request_To_Upload_A_Video_Record;
3. Get_Information_About_ The_Video;
4. Output1: Show_Video_ Duration,Resolution,FrameRate,Number_Of_Frames;
5. Input2: Request_To_Inter_The_Required<Video_Resolution>;
6. Input3: Request_To_Inter_The_Required<Frame_Rate>;
7. Set_Timer_Start_Point;
8. Create_Output_Folder<Resized_Frames,Segemented_Frames, Frames_With_Bounding_Box>;
9. For i==1: Number_Of_Frames;
10. Stract_Frames_From_Video;
11. Resize_Frames_As<Input2>;
12. Output2: Save_Step11Otcomes_Into_ Resized_Frames_Floder<Frame_Name_
“frame”_sequential_number_of_three_ digits>;
13. Segment_Moving_Objects_From_Each_Frame;
14. Output3: Save_ Step13Otcomes _Into_ Segmented_Frames_Floder<Frame_Name_
“frame”_sequential_number_of_three_ digits>;
15. Set_Fliters_For_Human_Detection;
16. Draw_Bounding_Box_Arround_Moving_Objects;
17. Output4: Save_ Step16Otcomes _Into_Frames_With_Bounding _Floder<Frame_
Name_“frame”_sequential_number_of_three_ digits>;
18. Count_Number_Of_Bounding_Boxes;
19. Output5: WriteVideo<For_ALL_Frames_In_Resized_Frames_Floder>; Video_Name
<”Resized_Video”>;
20. Output6: WriteVideo<For_ALL_Frames_In_Segmented_Frames_Floder>; Video_
Name<”Segmented_Video”>;
21. Output7: WriteVideo<For_ALL_Frames_In_ Frames_With_Bounding_Box _Folder>;
Video_Name<” Bounding_Box_Video”>;
22. Display<Output5, Output6, Output7>;
23. Set_Timer_End_Point;
24. End;
25. Display<”Number_Of_Workers”== Number_Of_Bounding_Boxes>;
26. Display<”Processing_Time”==Timer_Counts>;
27. End Figure 2. The Pseudocode of Video Resolution and Frame Rate Sensitivity Analysis.
2.3. Video Frames Structure
The videos of the construction site consisted of a series of worker activities, including
walking, lifting, bending, carrying and kneeling. As a static frame of video cannot show changes
in the workers’ activities. A series of consecutive frames were extracted from the videos records
at the rate of 10 fps to support the real-time data analysis. These frames included dynamic
changes on the workers’ postures and activities, which were defined as foreground. Any static
objects in the frames, such as foundation and constructed work were defined as background.
The characteristics of the images include 2D RGB images defined in a colour intensity vector
versus the location of the pixels as:
𝑓(𝑥, 𝑦) =
𝑟(𝑥, 𝑦)𝑏(𝑥, 𝑦)𝑔(𝑥, 𝑦)
(1)
WANG et al.
4
Where, 𝑓(𝑥, 𝑦) is the image function and represents each pixel in the image; x and y are
integer variables representing the location of each pixel; and 𝑟, 𝑏 𝑎𝑛𝑑 𝑔 is the colour channel.
The colour intensity range is an integer number within 0 and 255. The images were defined as
a function of the location of the pixels and the time 𝑓(𝑥, 𝑦, 𝑡), where 𝑡 denotes the time domain.
The colour intensity of the moving pixels changes from the image at a time (𝑡) to time (𝑡 + ∆𝑡). The RGB image function can be formulated as:
𝑓(𝑥, 𝑦, 𝑡) =
𝑟(𝑥, 𝑦, 𝑡)𝑏(𝑥, 𝑦, 𝑡)𝑔(𝑥, 𝑦, 𝑡)
(2)
2.4. Tensorflow Model
TensorFlow has multiple models that have different processing speeds and levels of
accuracy, such as SSD-Mobilenet, R-CNN, Faster-RCNN and Mask-RCNN models. This
research compared Tensorflow models to assess their ability to achieve the required rate of 10
fps. The methodology also considered 0.30 sec per 10 frames as a safety margin of the
processing speed and 25 mAP (mean average precision) as the minimum accepted accuracy
level. Only two models met the criteria: SSD-Mobilenet-V1-FPN and Faster-RCNN-Inception-
V2. In term of processing speed, the two models are very similar, but Faster-RCNN has been
proven to be the most accurate model (Huang et al., 2017). Therefore, Faster-RCNN-Inception-
V2 was chosen to build the detection model.
2.5. Applied Decision Tree of the CVA
The first stage of the video analysis involved detecting if there were workers in each frame
using the highly visible items, i.e., the safety helmets and highly visible clothing (gear). The
worker detection algorithm considered three main classes (helmet, gear and worker’s body) to
detect the site workers in a real environment. The outcomes of this step provided bounding
boxes around helmets and gear, as Figure 3-a illustrates.
Video Record Start at time t Tack 10 fps
Image Cropping
and Size
Adjustment
No
Detections
Counts
00
Cycle
Determination
Image Colour
Enhancement
Safety
Helmet
Safety
Gear
Human
Body
Colour
Intensity?
Shape?
Yes No
WorkersNo
Yes
Initiate
Bounding BoxAll Workers Yes
No
00
No
01
Figure 3. The Proposed Decision Tree of the Task and Worker Detection Process.
The automated postures identification process provided information on a limited number of
worker body postures. It also included the identification of the upper and lower parts of the
workers’ bodies based on the bounding boxes around the workers. As the workers perform
different activities, their posture changes and their upper and lower body parts form different
lines and angles. In this research, both parts were assumed to create only two lines with one
angle. There is also one angle connecting the two parts. Based on this simplified definition of
the workers’ body postures, the decision tree defined five different postures positions (see
Figure 3-b). Initially, the proposed was designed to detect five postures including standing,
sitting, kneeling, bending and overhead. Then, the model has been modified to use these five
postures as a base to estimate four more postures including walking, pushing, carrying and
climbing. The primary assumption of the posture detection still depends on the number of lines
(a) (b)
WANG et al.
5
and angles created from the postures. For instance, in case the worker is walking, the upper
body part will create a similar pattern as the standing posture while the lower body part will
create two intersecting lines with an angle between 30° and 60°.
3. RESULTS AND DISCUSSIONS
3.1. Real-Time Sensitivity Analysis
A sensitivity analysis was conducted to determine the appropriate video resolution and
framerate for real-time data analysis. The analysis algorithm calculated the average number of
processed frames per minute, which used to compare 138 different combinations of video
resolutions versus framerates. The duration of the tested video was 15 seconds and it was
converted into six different resolutions: 4096 by 2160 – 4K; 2048 by 1080 – 2K; 1280 by 720
– 720P; 720 by 576 – 576P; 720 by 480 – 480P; and 1960 by 1080. For each resolution, 23
different framerates between 8 fps and 30fps have been tested. The results show that three video
resolutions have the minimum number of processed frames per minute 4K, 2K and 1960 by
1080, respectively. For the 4K resolution, there was no significant change in the average
number of processed frames per minute when the farm rate changes except in case the frame
rate is 26 fps. At this level, the average number of processed frames were around seven frames
per minute. The optimal average number of processed frames per minute has been identified to
be about 1368 and 1224 frames per minute at the resolution 480P with nine fps and 576P with
ten fps, respectively.
The analysis results have been formulated in a different structure to identify whether the
average number of processed frames can meet the frame rate of the processed video. The
processing ration (PR) indicates the ability of the CV system to process video records in a real-
time manner, which calculated as:
𝑃𝑅 =𝑉𝑖𝑑𝑒𝑜 𝐹𝑟𝑎𝑚𝑒 𝑅𝑎𝑡𝑒 𝑖𝑛 𝐹𝑟𝑎𝑚𝑒𝑠 𝑝𝑒𝑟 𝑀𝑖𝑛𝑢𝑡𝑒
𝑇ℎ𝑒 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑃𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑑 𝐹𝑟𝑎𝑚𝑒𝑠 𝑃𝑒𝑟 𝑀𝑖𝑛𝑢𝑡𝑒 (3)
In case the video framerate equals the average number of processed frames per minute, the
PR value will be one. The appropriate framerate for the real-time video processing has been
found between 8 and 10 fps and the resolution 576P and 480P. These combination sets of video
framerates and resolutions can provide around 50% extra processing capacity in a real-time
manner.
3.2. Model Training
This research utilises the TensorBoard to visualise the model training process and its
outcomes. It also presents the performance matrices that measures the accuracy of the model
compared to time and training iterations. The first performance indicator is the total loss that
describes the ability of the model to classifying each detected object into its assigned class and
to what extent this classification is accurate. As observed in the visualised graph (see Figure 4),
there are two continuous plots with different colours, dark and faded orange, indicating the
actual and smoothed total loss. Initially, the training process was set to include 120,000
iterations. The gradient update of the prediction accuracy increased, while the total loss rapidly
decreased and reached to around 0.4 at the 37,000 iterations. Thus, the training process was
stopped at 37,000 iterations as there is no further improvement in the total loss. The prediction
accuracy pattern also shows normal consistent changes from 20,000 iterations, which means
that around 20,000 and 30,000 iterations will also give the same detection accuracy level.
WANG et al.
6
Figure 4. Gradient Change in the Model’s Total Loss over Time.
3.3. PPE and Worker Detection
The first measure of the PPE and worker detection is the identification rate (IR) and recall
reate (RR), which was calculated for each class: helmet (H), gear (G) and worker (W). 10% of
the 3,000 frames have been used to test the detection model. The training and testing datasets
included a variety of helmet and gear colures and shapes. The most common colure of the
helmet is white colure (93.8%). The dataset also consists of some other colours such as blue,
yellow and green helmets. Similarly, various gear colours were included in the training and
testing datasets with different gear designs. For instance, some of the workers wore yellow
gears with blue colour and grey reflective tape, while others wore orange gear without reflective
tape. The most commonly used gear colure and design in the dataset under study is yellow-blue
colure without reflective tape. The variety in helmet and gear colours and shapes help to
enhance the diversity of the PPE that this model can detect effectively in a real-site
environment. Table 1 illustrates some examples of the detection outcomes.
Table 1. Illustration Examples of the Detection Outcomes.
Outcome Outcome
Example H G W Posture Example H G W Posture
√ √ √ Walking
√ √ √ Climbing
√ √ √ Bending
√ √ √ Standing
3.3.1. Workers and PPE Detection Evaluation To evaluate the performance of the detection model, a video of 1000 frames has been used
for testing and another video of 1000 frames has been used for validation. IR and RR are
calculated as per the definition of each detection class including helmet, gear and worker. The
validation process includes 200 frames from a different construction site and 100 frames of
different capturing angle, which include different site environment, video recording angles,
illumination levels and occlusion cases. The results of the model testing and validation are
summarised in Table 2. Four main measures were used to calculate the model performance
indicators including: (1) true-positive (TP) indicates to cases in which the model correctly
predicts and labels objects as positive that are actually showing in the scene; (2) true-negative
(TN) are the cases where the model correctly does not predict and label objects as positive that
are actually not showing in the scene; (3) false-positive (FP) indicates to the cases in which the
model incorrectly predicts and labels objects as positive while they are actually not showing in
the scene; (4) false-negative (FN) indicates to the cases where the model incorrectly does not
predict and label objects as positive while they are actually showing in the scene.
WANG et al.
7
Table 2. The IR and RR results of the Model Testing and Validation.
Testing Validation
Class Actual
IR RR Actual
IR RR 𝑷 𝑵 𝑷 𝑵
Helmet P
red
icte
d
𝑃 779 41 95.00% 89.03%
Pre
dic
ted
𝑃 1740 280 86.14% 98.31%
𝑁 96 8 𝑁 30 0
Gear 𝑃 1285 42
96.83% 98.47% 𝑃 1941 473
80.41% 99.18% 𝑁 20 1 𝑁 16 0
Worker 𝑃 1602 47
97.15% 99.38% 𝑃 2084 438
82.63% 99.81% 𝑁 10 1 𝑁 4 0
The IR measures the ability of the model to correctly detect the number of objects in each
frame as correctly as they belong to one of the three classes H, G and W. The testing results in
Table 2 reveals that the proposed detection model achieved an IR of 95%, 96.83% and 97.15
for the H, G and W, respectively. The IR validation results yielded 86.14%, 80.41% and 82.63%
for the H, G and W, respectively. Compared to the testing results, the validation results show
on average 13.27% decline of the IR. These slight declination in the IR can be explained in part
by the significant difference in the illumination level as difference capturing angles were also
included in the validation dataset. Meanwhile, the validation IR is still maintained above 80%
in the three classes.
3.3.2. Postures Detection Evaluation The average prediction accuracy measure was employed to examine the performance of the
proposed model on two datasets composed of various sequencing postures. The datasets
included a video of 1000 frames for model testing and another video of 1000 frames for model
validation. Since nine positions were included, a multiclass comparison approach was adopted
to construct the confusion matrix of the proposed model. Separate Python code has been
developed to perform the multiclass comparisons based on three main python libraries
including Seaborn, Pandas, Matplotlib and Numpy. The pseudocode in Figure 5-c describes the
algorithm applied to build the confusion matrix.
Figure 5. The Pseudocode of Constructing the Confusion Matrix.
Figure 5 summarises the results of the confusion matrix for the model testing and validation.
These results explain how the proposed model is performing when considering similarities
(c)
WANG et al.
8
between different postures and identifies the effects of the misclassification among these
postures. The results also provide insight into the misclassified detections and, more
importantly, to what class that have been misclassified. Overall, the results of model testing
(Figure 5-a) shows higher accuracy when classifying the postures in which the upper and lower
body parts are clearly shown. For instance, the model achieved a higher accuracy (on average
0.89) in classifying standing, walking, overhead, carrying and pushing postures.
In contrast, the postures of sitting, kneeling, climbing and bending have a lower accuracy
(on average 0.77) compared to other classes. Also, the results revealed that climbing, bending,
kneeling and walking postures have more crossed confusions with at least three different
classes. Meanwhile, the highest misclassification rates were identified between walking-
standing (0.072) and standing-walking due to strong similarities in the extracted line and angle
features from upper and lower body parts. Interestingly, the model has no misclassification
cases in pushing and sitting classes, which can be explained to the wide distinctiveness of these
type of postures compared to the other classes.
In the case of validation results (Figure 5-b), it can be observed that all classes have at the
minimum one confusion instances. It can also be observed that the model achieved a
classification accuracy over 0.75 in the first seven classes while achieving the accuracy of 0.67
and 0.64 on the last two classes, respectively. An analysis the validation dataset and confusion
instances reveals that most of the misclassifications have resulted from walking-standing strong
similarities; and the 200 additional validation frames as they include more repeated occlusions
instances due to the improper camera position, high illumination levels as the sunlight was
directly hitting the camera’s viewfinder, as the workstation was one meter more elevated than
the surface on which the camera was mounted and 50 meters away from the camera. On top of
these, the 200 frames consist of different site activities, concreting activities, which the model
has not been trained on. The dataset also includes 100 frames of a varying camera angle such
that the camera was mounted about 8 meters over the work area s
4. CONCLUSION
This paper presented a novel CV system for automated workers’ PPE and postures
detections, which is equipped with a practical design for site cameras. Two layers of decision
algorithms have been developed to perform the detection in a real-time manner based on the
Tensorflow environment. The proposed system has been tested and validated in real site
conditions and the results show that the average performance identification rate to be 90.57%,
88.62% and 89.89%, while the recall rate was 93.67%, 98.83% and 99.60% for W, H and G,
respectively. Meanwhile, the model confusion analysis reveals a higher accuracy when
classifying workers’ postures, in particular, postures in which the upper and lower body parts
are clearly shown in the scene such as standing, walking, overhead, carrying and pushing. In
turn, the results show that climbing, bending, kneeling and walking postures have more
misclassification rates compared to other postures. The proposed CV system testing and
validation results carry the promise of practical application on the construction site in a real-
time manner. The ongoing future work involves the opportunities of applying the prosed system
for construction site safety and productivity purposes. Future work also includes more analysis
of various types of site occlusions and their effects on the model performance, as well as
methods that can be applied to overcome these effects.
ACKNOWLEDGEMENTS
The first author thanks Queensland University of Technology (QUT) for the finical support
to this research in the form of PhD scholarship. The authors also would like to acknowledge Dr
Miljenka Perovic and Mr Nathan Sianidis for their support in getting access to the construction
site for data collection. The authors also acknowledge the QUT’s High-Performance Centre
(HPC) for providing access to large data storage and computational resources.
WANG et al.
9
REFERENCES
Chan, A. P. C., Yam, M. C. H., Chung, J. W. Y., & Yi, W. (2012a). Developing a heat stress model for
construction workers. Journal of Facilities Management, 10(1), 59–74.
https://doi.org/10.1108/14725961211200405
Chan, A. P. C., Yi, W., Wong, D. P., Yam, M. C. H., & Chan, D. W. M. (2012b). Determining an optimal
recovery time for construction rebar workers after working to exhaustion in a hot and humid
environment. Building and Environment, 58, 163–171.
https://doi.org/10.1016/j.buildenv.2012.07.006
Chi, S., & Caldas, C. H. (2011). Automated Object Identification Using Optical Video Cameras on
Construction Sites. Computer-Aided Civil and Infrastructure Engineering, 26(5), 368–380.
https://doi.org/10.1111/j.1467-8667.2010.00690.x
Dimitrov, A., & Golparvar-Fard, M. (2014). Vision-based material recognition for automated
monitoring of construction progress and generating building information modeling from unordered
site image collections. Advanced Engineering Informatics, 28(1), 37–49.
https://doi.org/10.1016/j.aei.2013.11.002
Gatti, U., Migliaccio, G., Bogus, S. M., Priyadarshini, S., & Scharrer, A. (2013). Using Workforce’s
Physiological Strain Monitoring to Enhance Social Sustainability of Construction. Journal of
Architectural Engineering, 19(3), 179–185. https://doi.org/10.1061/(ASCE)AE.1943-
5568.0000110
Gong, J., & Caldas, C. H. (2010). Computer Vision-Based Video Interpretation Model for Automated
Productivity Analysis of Construction Operations. Journal of Computing in Civil Engineering,
24(3), 252–263. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000027
Han, S., & Lee, S. (2013). A vision-based motion capture and recognition framework for behavior-based
safety management. Automation in Construction, 35, 131–141.
https://doi.org/10.1016/j.autcon.2013.05.001
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., … Murphy, K. (2017).
Speed/accuracy trade-offs for modern convolutional object detectors. In IEEE conference on
computer vision and pattern recognition (pp. 7310–7311). Retrieved from
https://arxiv.org/pdf/1611.10012.pdf
Liang, C., Zheng, G., Zhu, N., Tian, Z., Lu, S., & Chen, Y. (2011). A new environmental heat stress
index for indoor hot and humid environments based on Cox regression. Building and Environment,
46(12), 2472–2479. https://doi.org/10.1016/j.buildenv.2011.06.013
Memarzadeh, M, Heydarian, A., Golparvar-Fard, M., & Niebles, J. C. (2012). Real-Time and
Automated Recognition and 2D Tracking of Construction Workers and Equipment from Site
Video Streams. In Computing in Civil Engineering (2012) (pp. 429–436). Reston, VA: American
Society of Civil Engineers. https://doi.org/10.1061/9780784412343.0054
Memarzadeh, Milad, Golparvar-Fard, M., & Niebles, J. C. (2013). Automated 2D detection of
construction equipment and workers from site video streams using histograms of oriented gradients
and colors. Automation in Construction, 32, 24–37. https://doi.org/10.1016/j.autcon.2012.12.002
Seo, J., Han, S., Lee, S., & Kim, H. (2015). Computer vision techniques for construction safety and
health monitoring. Advanced Engineering Informatics, 29, 239–251.
https://doi.org/10.1016/j.aei.2015.02.001
Wong, D. P., Chung, J. W., Chan, A. P.-C., Wong, F. K., & Yi, W. (2014). Comparing the physiological
and perceptual responses of construction workers (bar benders and bar fixers) in a hot environment.
Applied Ergonomics, 45(6), 1705–1711. https://doi.org/10.1016/j.apergo.2014.06.002
Yang, J., Park, M.-W., Vela, P. A., & Golparvar-Fard, M. (2015). Construction performance monitoring
via still images, time-lapse photos, and video streams: Now, tomorrow, and the future. Advanced
Engineering Informatics, 29(2), 211–224. https://doi.org/10.1016/j.aei.2015.01.011
Zhou, Z., Irizarry, J., & Li, Q. (2013). Applying advanced technology to improve safety management in
the construction industry: a literature review. Construction Management and Economics, 31(6),
606–622. https://doi.org/10.1080/01446193.2013.798423
top related