Article of the Month - May 2019
|
Road Traffic Accident Black Spot Determination
by using Kernel Density Estimation Algorithm and Cluster Statistical
Significant Evaluation
Khanh Giang LE, Vietnam, Pei LIU, and
Liang-Tay LIN, Taiwan
Khanh Giang LE, Vietnam, Pei LIU, and Liang-Tay
LIN, Taiwan
This article in .pdf-format
(15 pages)
This paper is a peer reviewed paper presented at the
FIG Working Week 2019 in Vietnam. The results of the paper show that the
approach to determine road collision black spots locations was effective
and exact in identifying road traffic accident black spot in Hanoi,
Vietnam.
SUMMARY
Determining road collision black spot locations plays an
important role in reducing significantly the number of traffic
accidents. The article presents a new procedure that identifies
road traffic accident black spot locations by using GIS-based
kernel density estimation algorithm, evaluates statistical
significance of resulting collision clusters, and then arranging
them in accordance with their significance. Road traffic
accident data in three years (2015-2017) in Hanoi, Vietnam used
to analyze, test, and validate this approach. The results of the
paper show that the approach was effective and exact in
identifying road traffic accident black spot in Hanoi, Vietnam,
simultaneously these hot spots were ranked according to their
level of dangerousness. These outcomes will not only enable
traffic authorities to understand comprehensively the causes
behind each collision, but also to help them manage and deal
with hazardous areas according to the prior order in case of
limited budget and allocate traffic safety resources
appropriately.
1. INTRODUCTION
Road traffic accidents (RTA) are one of the important issues over the
world. According to the reports of World Health Organization (WHO),
there are more than 1.24 million deaths and about 50 million people
injured as results of RTA every year in the world (WHO, 2013). To
decrease significantly the number of crashes, it is crucial to
understand where and when traffic accidents happen frequently. The
locations, where are identified by a high accident occurrence compared
with the other locations, are known as black spots. The past studies
showed that the occurrences of RTA are infrequently random in space and
time. In fact, these locations identified by several key factors such as
geometric design, traffic volume, or weather conditions, etc. (Chainey
and Ratcliffe, 2013).
WHO reported that there were over a third of deaths owing to RTA in
low and middle-income nations among vehicles, cyclists, and pedestrians
(WHO, 2013). Vietnam is a developing country, thus RTA issue also is one
of the most concerns of transportation authorities. The annual social
expenditure of RTA in Hanoi, is the capital of Vietnam, in term of
medical treatment, deaths, and property damage occupy 2.9% GDP (5-12
billion USD) (Mai, 2018).
In 2017, there were 20,000 traffic crashes, about 8,200 deaths and
17,000 injured on Vietnam’s road networks (Giang, 2018). Currently,
non-spatial modelling has been used in Vietnam to identify RTA hot
spots, namely: Accident Frequency Method (AFM) (classification by level
of injury) over one year period (MOT, 2012). This is the oldest and
simplest method to identify dangerous locations. However, this method
has many limitations such as lacking of visualizing, connecting between
space and time, ranking of hot spot’s priority, does not take into
consideration traffic volume, which has a direct relationship with crash
frequency. Therefore, the results have bias toward high-volume locations
and suffers from the RTM bias (Li, 2006). Currently, there has not any
study dealing with collision mapping in Vietnam.
Geographic Information System (GIS) is a very powerful tool for
analyzing traffic safety. GIS can visualize the locations of accidents
and store its attributes. Thus, it is easy to find the reasons behind
each collision. Spatial data usage plays an important role on traffic
safety analysis. GIS enables us to collect, store, manipulate, query,
analyze, and visualize the spatial data (Lloyd, 2010; Satria and Castro,
2016).
Spatial analysis of RTA has been popularly applied to explore hot spots
(Anderson, 2009). GIS has been applied as a management system for
accident analysis by combination of spatial statistical methods
(Shafabakhsh et al., 2017). In the recent years, the combination of GIS
and statistical analysis is increasingly more used by many researchers
for assessing the road accidents (Yalcin and Duzgun, 2015; Benedek et
al., 2016). Kernel density estimation (KDE) is one of the most popular
density-based methods and has been widely used for detecting dangerous
road segments (Xie and Yan, 2013). However, KDE method has a drawback is
that the uncertainty about the exact location of the traffic collision
is showed by the search bandwidth of the kernel (Anderson, 2009). Thus,
KDE only is better for visualization purpose than for determining TA
black spot locations (Plug et al., 2011). The same issue was showed by
(Xie and Yan, 2008), KDE method lacks an investigation of the
statistical significance of the high-density locations. Recently, there
are very few researches that investigate comprehensively statistical
significance of KDE method. Thus, how to identify which clusters is
statistical significance is really necessary.
Therefore, in our study, firstly, RTA black spot locations was
determined by using GIS-based kernel density estimation algorithm, after
that statistical significance of resulting collision clusters was
evaluated, and then their order was arranged in accordance with their
significance. Finally, to validate this approach, we compare the results
with traffic accident reports during three years (2015-2017) in Hanoi,
Vietnam. The purpose of this paper is to present an improved procedure
of identifying RTA black spots. The remainders of the article are
arranged as follows. Section two depicts proposed methodology. Section
three illustrates analysis of the case study. Section four presents the
check and validation of the results of the proposed methodology.
Finally, conclusions and discussions are presented in section five.
2. METHODOLOGY
In this study, KDE method was utilized to identify traffic accident
black spot locations. However, this method lacks an investigation of the
statistical significance of the high-density locations (Xie and Yan,
2008). Therefore, this article proposed a new procedure aim to improve
the effectiveness and accuracy of KDE method. Figure 1 presents the
combination of KDE method with a statistical significance evaluation
process of the resulting clusters.
Proposed methodology was carried out by the following steps:
Firstly, collision locations were geocoded on the digital road
network. Secondly, KDE method was applied to calculate and create RTA
density map. On the other hand, RTA locations may be reported at the
same location. Therefore, integrate and collect events tools were used
to integrate and collect crashes that occurred at the same location.
This step created a collect events map. However, it is necessary to test
random distribution of RTAs in a section. If the RTAs in a section are
distributed randomly, this process stops. On the other hand, if the RTAs
in a section are non-random distribution, it is necessary to determine
bandwidth in which autocorrelation or clustering phenomena is maximized.
In order to do this, we applied Incremental Global Moran’s I and this
process was repeated many times. However, we need to find out what a
starting distance at which any given point has at least one
neighbor. After getting the optimal bandwidth, Local Moran’s I was
applied to generate a hotspots map. This optimal bandwidth will be
threshold distance input. Finally, RTA hotspots priority map was
produced as a result of the combination between the RTA density map and
the hotspots map.
Section 2.1 and 2.2 will explain more detail about two methods that were
applied in this study includes Kernel Density Estimation and Anselin
Local Moran’s I.

Fig. 1 The flowchart presents the process of RTA
back spot determination and statistical significance evaluation of the
resulting clusters.
2.1 Kernel Density Estimation (KDE)
There are several spatial analysis tools that enable us to
comprehensively understand the geographical changing of point models.
KDE is one of the most effective methods to determine the spatial models
of RTA (Blazquez and Celis, 2013; Satria and Castro, 2016). The density
of events is calculated within a definite research radius in the study
areas to create a smoothed surface. A kernel function is utilized to
assign a weight to the area surrounding the events proportional to its
distance to the point event. From there, the value is highest at the
point event location centre and decrease smoothly to a value of zero at
the radius of the research circle (see Fig. 2). At the end, a smoothed
continuous density surface is generated by adding the individual kernels
in the research area (Anderson, 2009; Rahimi and Shad, 2017). The
intensity at a specific location is calculated by Eq. (1):

where f(s) is the density estimate at the location s, n is the number
of observations, h is the bandwidth or kernel size, K is the kernel
function, and di is the distance between the location s and the location
of the ith observation.

Fig. 2 Diagram of how the quadratic kernel density
method works and is the basis for the density method used for this study
(source: Bailey and Gatrell, 1995).
2.2 Anselin Local Moran’s I
The Cluster and Outlier Analysis tool identifies spatial clusters of
features with high or low values. The tool also identifies spatial
outliers. To do this, the tool calculates a local Moran's I value, a
z-score, a pseudo p-value, and a code representing the cluster type for
each statistically significant feature. The z-scores and pseudo p-values
represent the statistical significance of the computed index values.
The local Moran’s I (Anselin, 1995) is one of the most widely used
Local Indicators of Spatial Association (LISA) statistics (Satria and
Castro, 2016). It measures the statistical correlation between
attributes at each location in a study area and the values (usually the
statistic mean) in the neighboring locations. It also tests the
significance of this similarity. Formally, the local Moran’s I can be
expressed as Eq. (2):

where QUOTE is a measure of the spatial weight between
regions i and j, QUOTE is the mean value, and QUOTE
is the value of the variable at locations i and j, and:

with n equating to the total number of features.
The zIi -score for the statistics are computed as:

where:


In general, there are four types of correlation among neighbouring
values: high-high (H-H), low-low (L-L), high-low (H-L), and low-high
(L-H). (H-H) and (L-L) indicate that there is a positive
autocorrelation, while (H-L) and (L-H) show that there is a negative
autocorrelation (O’Sullivan and Unwin, 2010). The (H-H) areas are
relevant for hazardous location detection and show locations where a
high number of crashes are surrounded by high values (Xie and Yan,
2013).
3. ANALYSIS OF THE CASE STUDY
3.1. Data and Area Study
This study was carried out in Hanoi,
Vietnam. Two different databases were used for this study.
First, a road network map was provided in a shape file format,
which includes specifications of roads such as road length, road
width, road type, and speed limits. Second, a traffic accident
database in three years (2015-2017) was provided by the
Transport Police Department in Hanoi. Such a time span is
sufficient because there are many records and the
characteristics of the TA remains unchanged relatively (Elvik,
2008). There are 1,132 crashes were recorded on Hanoi’s roads.
The collision database was provided in an Excel file and
contained significant accident parameters such as the date and
time of a crash, crash location, accident types, age and sex of
drivers, the
|
 |
number of the injured, etc.
|
Fig. 3. Study area with distribution of all collisions in
Hanoi (2015-2017).
|
3.2. Analysis Results
3.2.1. Kernel Density Estimation (KDE)
The output of KDE method is presented in a raster format consisting
of a grid of cells. The two main parameters that influence the KDE are
cell size and bandwidth. The choice of bandwidth is quite subjective
(Anderson, 2009). The past studies used this value changing from 20 to
1,000 m (Xie and Yan, 2013). In our research, we tried to pratice it ten
times including 100 m, 200 m, …, 1000 m in order to find the optimal
bandwidth for our research. Finally, we considered 1000 m-bandwidth
value because it enable us visualize RTA back spot locations easily.
However, it is not always a good idea to choose a large bandwidth, as
the RTA black spot locations will not be accurate. This is true as the
mention of (Anderson, 2009) is that the uncertainty about the exact
location of the traffic collision is showed by the search bandwidth of
the kernel. Fig. 4. shows RTA black spot locations in two different
bandwidth values are 500 m and 1000 m.

With the positions of RTA in Fig. 3, it is impossible to find out RTA
black spot locations. However, KDE method enables us visualize RTA black
spots easily. Fig. 4 shows that red colored areas are RTA black spots in
Hanoi (2015-2017), which mainly concentrate on NH-1A section such as Van
Dien station, Cho Tia station, Quang Trung – Nguyen Trai intersection,
Ha Dong, etc. However, the main advantage of the KDE method as opposed
to classical statistic clustering methods is that the uncertainty about
the exact position of the RTA is showed by the bandwidth of the kernel –
this means something like spreading the risk of an accident (Anderson,
2009). Therefore, it is necessary to investigate statistical
significance of the resulting clusters of RTA and find out the most
hazardous location.
3.2.1. Statistical Significance Evaluation Process
At this point the common application of the KDE method usualy ends.
The clusters which form the local maxima of the kernel function are
determined. Sometimes as arbitrary level of significance is identified
(Erdogan et al., 2008). However, we tried to identify the statistical
significance of a cluster more objectively. Our process was stating with
the null hypothesis: “The RTA in a section are distributed randomly”.
Statistical testing of the null hypothesis is based on a Spatial
Autocorrelation tool in ArcGIS software 10.2 that is Global Statistics –
Global Moran’s I.

From Fig. 5, we can conclude that there is less than 1% likelihood
that this clustered pattern could be the result of random chance
(z-score of 9.99). Therefore, the next step is to identify bandwidth in
which clustering phenomena is maximized. In order to do this, we applied
Incremental Global Moran’s I and this process was repeated many times.
However, we need to find out what a starting distance at which any given
point has at least one neighbor. We use Calculate Distance Band from
Neighbor Count to calculate, then we have a result is 500 m. Then, we
use incremental spatial autocorrelation to calculate and the result is
showed as Fig. 6 and Table 1.

Fig. 6. Spatial Autocorrelation
by Distance.
Table 1. Global Moran’s I
Summary by Distance.

Fig. 6 shows that the Moran’s I spatial autocorrelation was run at a
variety of distances and for each of those it got a z-score which is the
level of statistical significance. It deviated from our assumption of
randomness and when we compare these z-score across the various
increments of distance, we find that some of them are higher than
others. Thus, we can see here is that there is fact that a couple of
Peaks where z-score gets very high which is an indication that it is at
those distances where the clustering is maximized. We are more likely to
find natural clustering within the data. In addition, Table 1 shows that
the maximum peak is at 2250 m at which we can find maximum clustering
and that is the number that we need in order to process forward.
In the next step, we applied the cluster and outlier analysis
(Ansellin Local Moran’s I) to generate a map of hotspots (Fig. 7). Fig.
7 shows that all the gray points are points that did not show any
significant clustering with z-scores are low, so there is nothing really
happening. The red points show areas of clusters where we have features
with similarly high values near each other so in this case what that
means is that we have got RTA point’s high priority near each other and
those clusters are statistically significant meaning that they are very
far from random arrangement. In contrast, the green points represent
areas where they have similar low values clustered near each other so
those points are low priority. However, in this case, these points did
not appear. Besides, the orange and the yellow points are outliers that
are points in which case we find a high value or high RTA surrounded by
low RTA and conversely for the yellow points. In this case, the red
points were mainly occurred on NH-1A such as Van Dien station, Cho Tia
station; Thang Long Boulevard - Me Tri intersection and Pham Van Dong
road.

Fig. 7. Map representing RTA Hot spots, Cold spots
with statistical significance meaning.
4. THE VALIDATION OF THE RESULTS OF THE PROPOSED METHODOLOGY
The results of the proposed methodology showed that this approach is
appropriate for overcoming drawbacks of KDE method. Figure 8 illustrates
road traffic accident hotspost priority map. In general, there are four
types of correlation among neighbouring values: high-high (H-H), low-low
(L-L), high-low (H-L), and low-high (L-H). (H-H) and (L-L) indicate that
there is a positive autocorrelation, while (H-L) and (L-H) show that
there is a negative autocorrelation. The (H-H) areas are relevant for
hazardous location detection and show locations where a high number of
crashes are surrounded by high values. In this case, the red points
(circled in red) are (H-H) clusters where RTA occurred frequently and
these locations were mainly occurred on NH-1A such as Van Dien station,
Cho Tia station; Thang Long Boulevard - Me Tri intersection and Pham Van
Dong road.

Fig. 8. Road traffic accident
Hotspot priority Map
In addition, this approach investiagted the statistical significance
of the high-density loactions. For instance, location 1 (Fig. 8) was
identified as a high density point of RTA through applying KDE method.
However, after investigating the statistical significance of the
high-density locations, this location was identified as a (H-L) outlier
point. It means this location is a high RTA surrounded by low RTA.
Location 2 (Fig. 8) was identified as a high density zone of RTA through
applying KDE method. But, after investigating the statistical
significance of the high-density locations, this location was determined
as a not significant point (grey color) (Fig. 7). The results of the
proposed methodology are appropriate to the observations from the
reality and the reference data. This proposed methodology enables
traffic authorities understand the situations more clear and
comprehensively.
5. CONCLUSION
The paper proposed a new procedure that determines road traffic
accident black spot locations by using GIS-based kernel density
estimation algorithm, evaluates statistical significance of resulting
collision clusters, and then arranges them in accordance with their
significance. The results of the paper show that the approach was
effective and exact in identifying road traffic accident black spot in
Hanoi, Vietnam. These outcomes will not only enable traffic authorities
to understand comprehensively the causes behind each collision, but also
to help them manage and deal with hazardous areas according to the prior
order in case of limited budget and allocate traffic safety resources
appropriately.
The integration of KDE method and statistical significance evaluation
of the resulting clusters of RTA help to overcome the drawbacks of KDE
method. From there, the determination of RTA black spot locations will
be improved with high accuracy. The results of the paper show that RTA
black spots mainly occurred in NH-1A namely Van Dien station, Cho Tia
Station, and at Nguyen Trai - Quang Trung intersection, Thang Long
Boulevard - Me Tri intersection and Pham Van Dong road. This is also the
first study about this issue in Vietnam, so the contribution of the
article will help the traffic authorities easily solve this problem not
only in Hanoi, but also can apply for other cities.
However, within the scope of the paper, there is a limitation is that
does not take traffic volume in identifying RTA hot spots. Therefore, in
the forthcoming studies, the authors will solve this issue. In addition,
the authors will deploy this application online, which not only helps
the traffic authorities, police patrol to update emergence information
easily but also provide the citizen a black spot map in an updated,
accurate, and visual way.
REFERENCES
Anderson, T. K., 2009, Kernel density estimation and K-means
clustering to profile road accident hotspots, Accident Analysis and
Prevention, Elsevier.
Anselin, L, 1995, Local indicators of spatial association – LISA,
Geographical Analysis, Vol. 27, No. 2, Ohio State University Press.
Bailey, T. C., Gatrell, A. C., 1995, Interactive Spatial Data
Analysis. John Wiley and Sons, New York.
Benedek, J., Ciobanu, S. M., Man, T. C., 2016, Hotspots and social
background of urban traffic crashes: a case study in Cluj-Napoca
(Romania). Accident Analysis & Prevention, 87, 117-126.
Blazquez, C. A., Celis, M. S., 2013, A spatial and temporal analysis
of child pedestrian crashes in Santiago, Chile. Accident Analysis and
Prevention, 50, 304–311, Elsevier.
Chainey, S., Ratcliffe, J., 2013, GIS and Crime Mapping, John Wiley
and Sons, England.
O’Sullivan, D., Unwin, D., 2010, Geographic information anaysis, John
Wiley and Sons, New York.
Elvik, R., 2008, A survey of operationa definitions of hazardous road
locations in some European countries, Accident Analysis and Prevention,
40, 1830-1835, Elsevier.
Erdogan, S., Yilmaz, I., Baybure, T., Gullu, M., 2008, Geographical
information systems aided traffic accident analysis system case study:
city of Afyonkarahisar, Accident Analysis and Prevention, 40, 174-181,
Elsevier.
Ha Mai, 2018, Việt nam mất khoảng 130 tỉ usd chi phí cho tai nạn giao
thông trong 15 năm,
https://thanhnien.vn/thoi-su/viet-nam-mat-khoang-130-ti-usd-chi-phi-cho-tai-nan-giao-thong-trong-15-nam-954438.html
[accessed: 13:50, 25/09/2018].
Thu Giang, 2018, Ủy ban An toàn giao thông Quốc gia tổng kết công tác
năm 2017,
http://backantv.vn/tin-tuc-n17855/uy-ban-an-toan-giao-thong-quoc-gia-tong-ket-cong-tac-nam-2017.html
[accessed: 12:09, 25/09/2018].
Li, L, 2006, A GIS-based Bayesian approach for analyzing
spatial-temporal patterns of traffic crashes, Doctoral dissertation,
Texas A&M University.
Lloyd, C. D., 2010, Spatial data analysis: an introduction for GIS
user, Oxford University Press.
MOT, 2012, Thông tư 26/2012/TT-BGTVT, Quy định về việc xác định và xử
lý vị trí nguy hiểm trên đường bộ đang khai thác, BGTVT, Vietnam.
Plug, C., Xia, J. C. and Caulfield, C., 2011, Spatial and temporal
visualisation techniques for crash analysis, Accident
Analysis and Prevention, Elsevier.
Rahimi, S., Shad, R., 2017, Identification of road crash black-sites
using geographical information system, International Journal for Traffic
and Transport Engineering (IJTTE) 7(3):368-380.
Satria, R., Castro, M., 2016, GIS tools for analyzing accidents and
road design: a review, Transportation Research Procedia 18: 242 – 247.
Shafabakhsh, G. A., Famili, A., Bahadori, M. S., 2017, GIS-based
spatial analysis of urban traffic accidents: Case study in Mashhad,
Iran, J. Traffic Transp. Eng. (Engl. Ed.) 4 (3): 290-299.
WHO, 2013, Global status report on road safety 2013. Supporting a
decade of action, World Health Organization,
Department of Violence and Injury Prevention
and Disability, Geneva.
Xie, Z., Yan, J., 2013, Detecting traffic accident
clusters with network kernel density estimation
and local spatial statistics: an
integrated approach, J. Transp. Geogr, Elsevier.
Yalcin, G., Duzgun, H. S., 2015, Spatial analysis of two-wheeled
vehicles traffic crashes: Osmaniye in Turkey. KSCE Journal of Civil
Engineering, 19(7): 2225-2232
BIOGRAPHICAL NOTES
Khanh Giang Le
Doctoral Candidate, Ph.D program of Civil and Hydraulic Engineering
(2017 – Now).
Institution: College of Construction and Development, Feng Chia
University, Taiwan.
Master Degree: Birmingham City University, United Kingdom (2013 –
2014)
Bachelor Degree: University of Transport and Communications, Hanoi,
Vietnam (2001-2006)
He is a lecturer at Geodetic Division, Civil Engineering Faculty, the
University of Transport and Communications, Hanoi, Vietnam (2006 – Now).
He is interested in applying GIS, GPS, spatial statistics, and
geospatial analysis in transportation sector and urban studies.
Associate Professor Pei LIU
He is an Associate Professor of College of Construction and Development,
Feng Chia University, Taichung, Taiwan.
He got his PhD degree in Department of Civil, Environmental and Geodetic
Engineering, The Ohio State University, America. He is an expert in
Highway Engineering, Artificial Intelligence Methods, Numerical Methods,
and Pavement Engineering.
Professor Liang-Tay LIN
He is Dean of College of Construction and Development, Feng Chia
University, Taichung, Taiwan.
He got his PhD degree in Department of Civil Engineering, National
Taiwan University, Taiwan. He is President of Chinese Institute of
Transportation, Taiwan. He was Director General of Transportation
Bureau, Taichung City Government. He is Director of innovation centre
for Intelligent Transportation and Logistics. He is an expert in Traffic
Engineering, Traffic Control, Traffic Flow Theory, and Urban Traffic
Management.
CONTACTS
Khanh Giang LE
Doctoral Candidate, Ph.D program of Civil and Hydraulic Engineering.
College of Construction and Development, Feng Chia University.
Taichung
Taiwan
Tel. + 886 984267484
Email: khanhgiang298[at]gmail.com