astrometry in wide‐field surveys

11
Astrometry in Wide‐Field Surveys Author(s): András Pál and Gáspár Á. Bakos Source: Publications of the Astronomical Society of the Pacific, Vol. 118, No. 848 (October 2006), pp. 1474-1483 Published by: The University of Chicago Press on behalf of the Astronomical Society of the Pacific Stable URL: http://www.jstor.org/stable/10.1086/508573 . Accessed: 19/05/2014 09:30 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . The University of Chicago Press and Astronomical Society of the Pacific are collaborating with JSTOR to digitize, preserve and extend access to Publications of the Astronomical Society of the Pacific. http://www.jstor.org This content downloaded from 193.105.154.131 on Mon, 19 May 2014 09:30:03 AM All use subject to JSTOR Terms and Conditions

Upload: gaspara

Post on 06-Jan-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Astrometry in Wide‐Field Surveys

Astrometry in Wide‐Field SurveysAuthor(s): András Pál and Gáspár Á. BakosSource: Publications of the Astronomical Society of the Pacific, Vol. 118, No. 848 (October2006), pp. 1474-1483Published by: The University of Chicago Press on behalf of the Astronomical Society of the PacificStable URL: http://www.jstor.org/stable/10.1086/508573 .

Accessed: 19/05/2014 09:30

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

The University of Chicago Press and Astronomical Society of the Pacific are collaborating with JSTOR todigitize, preserve and extend access to Publications of the Astronomical Society of the Pacific.

http://www.jstor.org

This content downloaded from 193.105.154.131 on Mon, 19 May 2014 09:30:03 AMAll use subject to JSTOR Terms and Conditions

Page 2: Astrometry in Wide‐Field Surveys

1474

Publications of the Astronomical Society of the Pacific, 118: 1474–1483, 2006 October� 2006. The Astronomical Society of the Pacific. All rights reserved. Printed in U.S.A.

Astrometry in Wide-Field Surveys

Andras Pal1

Department of Astronomy, Lora´nd Eotvos University, H-1117 Budapest, Hungary; [email protected]

andGaspar A. Bakos2

Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138

Received 2006 July 21; accepted 2006 August 22; published 2006 October 26

ABSTRACT. We present a robust and fast algorithm for performing astrometry and source cross-identificationon lists of two-dimensional points, such as between a catalog and an astronomical image, or between two images.The method is based on minimal assumptions: the lists can be rotated, magnified, and inverted with respect toeach other in an arbitrary way. The algorithm is tailored to work efficiently on wide fields with a large numberof sources and significant nonlinear distortions, as long as the distortions can be approximated with lineartransformations locally over the scale length of the average distance between the points. The procedure is basedon symmetric point matching in a newly defined continuous triangle space that consists of triangles generatedby extended Delaunay triangulation. Our software implementation performed at the 99.995% success rate on∼260,000 frames taken by the HATNet project.

1. INTRODUCTION

Cross-matching two lists of two-dimensional points is a cru-cial step in astrometry and source identification. The task in-volves finding the appropriate geometric transformation thattransforms one list into the reference frame of the other, andthen finding the best matching point pairs. One of the listsusually contains the pixel coordinates of sources in an astro-nomical image (e.g., pointlike sources, such as stars), while theother list can either be a reference catalog with celestial co-ordinates, or it can also consist of pixel coordinates that orig-inate from a different source of observation (another image).Throughout this paper, we denote the reference (list) as , theRimage (list) as , and the function that transforms the referenceIto the image as .FRrI

The difficulty of the problem is that in order to find matchingpairs, one needs to know the transformation, and vice versa—to derive the transformation, one needs point pairs. Further-more, the lists may not fully overlap in space and may haveonly a small fraction of sources in common.

By making simple assumptions on the properties of ,FRrIhowever, the problem can be tackled. A very specific case isone in which there is only a simple translation between thelists, and one can use cross-correlation techniques (see Phil-lips & Davis 1995) to find the transformation. We note thatthe method proposed by Thiebaut et al. (2001) uses all of the

1 Visiting Astronomer, Harvard-Smithsonian Center for Astrophysics.2 Hubble Fellow.

image information to derive a transformation (translation andmagnification).

A more general assumption typical of astronomical appli-cations is that is a similarity transformation (rotation,FRrImagnification, and inversion, without shear); i.e.,F pRrI

, where the matrix is a (nonzero) scalarl times thelAr � b Aorthogonal matrix, is an arbitrary translation, and is theb rspatial vector of points. Exploiting the fact that geometric pat-terns remain similar after the transformation, more general al-gorithms have been developed that are based on pattern match-ing (Groth 1986; Valdes et al. 1995). The idea is that the initialtransformation is found with the aid of a specific set of patternsthat are generated from a subset of the points on both andR. For example, the subset can be that of the brightest sources,I

and the patterns can be triangles. With the knowledge of thisinitial transformation, more points can be cross-matched, andthe transformation between the lists can be iteratively refined.Some of these methods are implemented as an IMMATCH task(Phillips & Davis 1995) in IRAF.3

The above pattern-matching methods perform well as longas the dominant term in the transformation is linear, such asfor astrometry of narrow field-of-view (FOV) images, and aslong as the number of sources is small (because of the largenumber of patterns that can be generated; see below). In the

3 IRAF is distributed by the National Optical Astronomy Observatories,which are operated by the Association of Universities for Research in As-tronomy, Inc., under cooperative agreement with the National ScienceFoundation.

This content downloaded from 193.105.154.131 on Mon, 19 May 2014 09:30:03 AMAll use subject to JSTOR Terms and Conditions

Page 3: Astrometry in Wide‐Field Surveys

ASTROMETRY IN WIDE-FIELD SURVEYS 1475

2006 PASP,118:1474–1483

past decade of astronomy, with the development of large-formatCCD cameras or mosaic imagers, many wide-field surveys haveappeared, such as those looking for transient events (e.g.,ROTSE; Akerlof et al. 2000), transiting planets (e.g., KELT,Pepper et al. 2004; TrES, Alonso et al. 2004, HATNet, Bakoset al. 2002, 2004; see Charbonneau et al. 2006 for furtherreferences), or all-sky variability (e.g., ASAS; Pojmanski1997). There are nonnegligible, higher order distortion termsin the astrometric solution that are due, for instance, to theprojection of celestial to pixel coordinates and the propertiesof the fast-focal-ratio optical systems. Furthermore, these im-ages may contain∼105 sources, and pattern matching isnontrivial.

These surveys have necessitated a further generalization ofthe algorithm, which we present in this paper. More specifically,we were motivated by the astrometric requirements of the Hun-garian-made Automated Telescope Network (HATNet). EachHAT telescope in the network consists of a focal200 mmlength, telephoto lens and a 2K# 2K CCD yielding anf/1.8

FOV. In our experience, we need at least fourth-order8� # 8�polynomial functions of the pixel coordinates in order to prop-erly describe the distortion of the lens. With a typical exposuretime of 5 minutes in theI band, in a moderately dense field( ), there are 30,000 stars brighter than for whichb ≈ 15� I p 13better than 10% photometry can be achieved. If we considerall 3 j detections, we have to deal with the identification of∼100,000 sources.

The algorithm presented in this paper is based on and is ageneralization of the above pattern-matching algorithms. It isvery fast and works robustly for wide-field imaging, with min-imal assumptions. Namely, we assume that (1) the distortionsare nonnegligible but small compared to the linear term,(2) there exists a smooth transformation between the referenceand image points, (3) the point lists have a considerable numberof sources in common, and (4) the transformation is locallyinvertible. The paper is presented as follows. First, we describesymmetric point matching in § 2, followed by a discussion offinding the transformation (§ 3). The software implementationand its performance on a large and inhomogeneous data set isdemonstrated in § 4. Finally, we draw conclusions in § 5.

2. SYMMETRIC POINT MATCHING

First, let us assume that is known. To find point pairsFRrIbetween and , one should first transform the referenceR Ipoints to the reference frame of the image: .′R p F (R)RrINow it is possible to perform a simple symmetric point match-ing between and . One point ( ) from the first and′ ′R I R � R1

one point ( ) from the second set are treated as a pair ifI � I1

the closest point to is and the closest point to is .R I I R1 1 1 1

This requirement is symmetric by definition and excludes suchcases in which, e.g., the closest point to is but there existsR I1 1

an that is even closer to , etc.R I2 1

In one dimension, finding the point of a given list nearest

to a specific pointx can be implemented as a binary search.Let us assume that the point list withN points is ordered inascending order. This has to be done only once at the beginning,and using the QuickSort algorithm; for example, the requiredtime scales, on average, as . Thenx is comparedO(N log N)to the median of the list: if it is less than the median, the searchcan be continued recursively in the first points; if it isN/2greater than the median, the second half is used. In theN/2end, only one comparison is needed to find out whetherx iscloser to its left or right neighbor, so in total, 1� log (N)2

comparisons are needed, which is an function ofN.O(log N)Thus, the total time, including the initial sorting, also goes as

.O(N log N)As regards a list of two-dimensional points, let us assume

again that the points are given in ascending order by theirx-coordinates [initial sorting∼ ], and that they areO(N log N)spread uniformly in a square of unit area. Finding the nearestpoint to anx-coordinate also requires comparisons;O(log N)however, the point that is found presumably will not be thenearest in Euclidean distance. The expectation value of thedistance between two points is , and thus we have to�1/ Ncompare points within a strip that have this width and unityheight, meaning comparisons. Therefore, the total time�O( N)required for a symmetric point matching between two catalogsin two dimensions requires time.3/2O(N log N)

We note that finding the closest point within a given set ofpoints is also known as the nearest neighbor problem (for asummary, see Gionis 2002 [unpublished] and referencestherein).4 It is possible to reduce the computation time in twodimensions to with the aid of Voronoi diagramsO(N log N)and cells, but we have not implemented such an algorithm inour matching codes.

3. FINDING THE TRANSFORMATION

Let us return to the task of finding the transformation be-tween and . The first and most crucial step of the algorithmR Iis to find an initial “guess” for the transformation, based(1)FRrIon a variant of triangle matching. Using , is transformed(1)F RRrIto , symmetric point matching is done, and the paired coor-Idinates are used to further refine the transformation [leadingto in iterationi] and increase the number of matched points(i)FRrI

iteratively. A major part of this paper is devoted to finding theinitial transformation.

3.1. Triangle Matching

It was proposed earlier by Groth (1986) and Stetson (1989),and recently by others (see Valdes et al. 1995), that trianglematching be used for the initial “guess” of the transformation.The total number of triangles that can be formed usingN pointsis , an function ofN. As this can be3N(N � 1)(N � 2)/6 O(N )an overwhelming number, one can resort to using a subset of

4 See http://theory.stanford.edu/∼nmishra/CS361-2002/lecture12-scribe.pdf.

This content downloaded from 193.105.154.131 on Mon, 19 May 2014 09:30:03 AMAll use subject to JSTOR Terms and Conditions

Page 4: Astrometry in Wide‐Field Surveys

1476 PAL & BAKOS

2006 PASP,118:1474–1483

the points for the vertices of the triangles that are to be gen-erated. One can also limit the parameters of the triangles, suchas excluding elongated or large (small) triangles.

As triangles are uniquely defined by three parameters, forexample the length of the three sides, these parameters (or theirappropriate combinations) naturally span a three-dimensionaltriangle space. Because our assumption is that is domi-FRrInated by the linear term, to first-order approximation there isa single scalar magnification between and (besides theR Irotation, chirality, and translation). It is possible to reduce thetriangle space to a normalized, two-dimensional triangle space[ ], whereby the original size information is lost.(T , T ) � Tx y

Similar triangles (with or without taking into account a possibleflip) can be represented by the same points in this space, al-leviating triangle matching between and .R I

3.1.1. Triangle Spaces

There are multiple ways of deriving normalized trianglespaces. One can define a “mixed” normalized triangle space

in which the coordinates are insensitive to inversion be-mixTtween the original coordinate lists; i.e., all similar triangles arerepresented by the same point, irrespective of their chirality:

mixT p p/a, (1)x

mixT p q/a, (2)y

(Valdes et al. 1995), where , , andq are the sides of thea ptriangle, in descending order. Triangles in this space are shownin the left panel of Figure 1. Coordinates in the mixed trianglespace are continuous functions of the sides (and therefore ofthe spatial coordinates of the vertices of the original triangle),but the orientation information is lost. Because we assumedthat is smooth and bijective, no local inversions and flipsFRrIcan occur. In other words, and are either flipped or notR Iwith respect to each other, but chirality does not have a spatialdependence, and there are no “local spots” that are mirrored.Therefore, using mixed-triangle-space coordinates can yieldfalse triangle matchings that can lead to an inaccurate initialtransformation, or the match may even fail. Thus, for large setsof points and triangles, it is more reliable to fix the orientationof the transformation. For example, first assume the coordinatesare not flipped, perform a triangle match, and if this match isunsatisfactory, repeat the fit with flipped triangles.

This leads to the definition of an alternative “chiral” trianglespace:

chirT p b/a, (3)x

chirT p c/a, (4)y

wherea, b, andc are the sides in counterclockwise order, anda is the longest side. In this space, similar triangles with dif-ferent orientations have different coordinates. The shortcomingof is that it is not continuous: a small perturbation of anchirT

isosceles triangle can result in a new coordinate that is at theupper rightmost edge of the triangle space.

In the following, we show that it is possible to define aparameterization that is both continuous and that preserves chi-rality. Flip the chiral triangle space in the right panel of Fig-ure 1 along the line. This transformation movesT � T p 1x y

the equilateral triangle into the origin. Next, apply radial mag-nification of the whole space to move the line toT � T p 1x y

the arc (the magnification factor is not constant:2 2T � T p 1x y

1 along thex- and y-axis direction, and along the�2 T p Tx y

line). Finally, apply an azimuthal slew by a factor of 4 toidentify the and edges of theT p 0, T 1 0 T p 0, T 1 0y x x y

space. To be more specific, let us denote the sides as inchirTas , , andc, in counterclockwise order, wherea is the longest,a band define

a p 1 � b/a, (5)

b p 1 � c/a. (6)

With these values, it is easy to prove that by using the defi-nitions of the following variables,

a(a � b)x p , (7)1 2 2�a � b

b(a � b)y p , (8)1 2 2�a � b

2 2x p x � y , (9)2 1 1

y p 2x y , (10)2 1 1

one can define the triangle space coordinates as

4 2 2 42 2 ( )(a � b) a � 6a b � bx � y2 2contT p p , (11)x 3 2 2 2(a � b) (a � b )

2 22x y 4(a � b)ab(a � b )2 2contT p p . (12)y 3 2 2 2(a � b) (a � b )

The continuous triangle space defined here has manycontTadvantages. It is a continuous function of the sides for allnonsingular triangles, and it also preserves chirality informa-tion. Furthermore, it spans a larger area, and misidentificationof triangles (which may be very densely packed) is decreased.Some triangles in this space are shown in Figure 2.

3.1.2. Optimal Triangle Sets

As mentioned above, the total number of triangles that canbe formed fromN points is≈ . Wide-field images typically3N /6contain points or more, and the total number of triangles4O(10 )that can be generated—a complete triangle list—is impractical,for the following reasons. First, storing and handling such alarge number of triangles with typical computers is inconven-

This content downloaded from 193.105.154.131 on Mon, 19 May 2014 09:30:03 AMAll use subject to JSTOR Terms and Conditions

Page 5: Astrometry in Wide‐Field Surveys

ASTROMETRY IN WIDE-FIELD SURVEYS 1477

2006 PASP,118:1474–1483

Fig. 1.—Position of triangles in mixed and chiral triangle spaces. The exactposition of a given triangle is represented by its center of gravity. Note thatin the mixed triangle space, some triangles have identical side ratios but dif-ferent orientation overlap. The dashed line shows the boundaries of the trianglespace. The dot-dashed line represents the right triangles and separates obtuseand acute triangles.

Fig. 2.—Triangles in the continuous triangle space as defined by eqs. (11)–(12). We show the same triangles as those in Fig. 1 for the andmix chirT Ttriangle spaces. Equilateral triangles are centered on the origin. The dot-dashedline refers to the right triangles and divides the space to acute (inside) andobtuse (outside) triangles. Isosceles triangles are placed on thex-axis [where

].contT p 0y

ient. To give an example, a full triangulation of 10,000 pointsyields ∼ triangles.111.7# 10

Second, this complete triangle list includes many trianglesthat are not optimal to use. For example, large triangles canbe significantly distorted in with respect to , and thus areI Rrepresented by substantially different coordinates in the trianglespace. The size of optimal triangles is governed by two factors:the distortion of large triangles and the uncertainty of triangleparameters for small triangles that are comparable in size tothe astrometric errors of the vertices.

To make an estimate of the optimal size for triangles, let ususeD to denote the characteristic size of the image,d for theastrometric error, andL as the size of a selected triangle. Forthe sake of simplicity, let us ignore the distortion effects of acomplex optical assembly and estimate the distortion factor

in a wide-field imager as the difference between the ortho-fd

graphic and gnomonic projections:

f ≈ F(sind � tand)/dF ≈ F1 � cosdF (13)d

(see Calabretta & Greisen 2002), whered is the radial distanceas measured from the center of the field. For the HATNet frames( to the corners), this estimate yields .d p D ≈ 6� f ≈ 0.005d

The distortion effects yield an error of in the trianglef L/Dd

space—the bigger the triangle, the more significant the distor-tion. For the same triangle, astrometric errors cause an uncer-tainty of in the triangle space, which decreases with in-d/LcreasingL. Making the two errors equal,

f L dd p , (14)D L

an optimal triangle size can be estimated:

dD�L p . (15)opt fd

In our case, pixels (or 6�), , and the cen-d p 2048 f p 0.005d

troid uncertainty for an star is , so the optimalI p 11 d p 0.01size of the triangles is pixels.L ≈ 60–70opt

Third, dealing with many triangles may result in a trianglespace that is oversaturated by the large number of points andmay yield unexpected matchings of triangles. In all definitionsin the previous subsection, the area of the triangle space isapproximately unity. Given triangles with an errorj in trianglespace, assuming they have a uniform distribution with a 3j

spacing between them, and assuming , the numberj p d/Lopt

of triangles is delimited to

21 1 L DT ≈ ≈ p . (16)max ( )2(3 j) 9 d 9f dd

In our case (see values of , andd above), the formerD, fd

equation yields triangles. Note that this is 56T ≈ 2 # 10opt

orders of magnitude smaller than a complete triangulation[ ].11O(10 )

3.1.3. The Extended Delaunay Triangulation

Delaunay triangulation (see Shewchuk 1996) is a fast androbust way of generating a triangle mesh on a point set. De-launay triangles are disjointed triangles in which the circum-circle of any triangle contains no points from any other tri-angle. This is also equivalent to the most efficient exclusionof distorted triangles in a local triangulation. For a visual

This content downloaded from 193.105.154.131 on Mon, 19 May 2014 09:30:03 AMAll use subject to JSTOR Terms and Conditions

Page 6: Astrometry in Wide‐Field Surveys

1478 PAL & BAKOS

2006 PASP,118:1474–1483

Fig. 3.—Triangulations of some randomly distributed points.Left: Delaunaytriangulation (60 triangles in total);right: extended triangulation (312� p 1triangles) of the same point set.

example of a Delaunay triangulation of a random set of points,see Figure 3 (left).

Following Euler’s theorem (also known as the polyhedronformula), one can calculate the number of triangles in a De-launay triangulation ofN points:

T p 2N � 2 � C, (17)D

whereC is the number of edges on the convex hull of the pointset. For large values of , can be estimated as , asN T 2ND

is negligible. Therefore, if we select a subset of points2 � C(from or ) whose neighboring points are at a distance ofR I

, we get a Delaunay triangulation with approximatelyLopt

triangles. The , , and values for HAT images2 22D /L D d fopt d

correspond to≈6000 triangles (i.e., 3000 points). In our ex-perience, this yields very fast matching, but it is not robustenough for general use, because of the following reasons.

Delaunay triangulation is very sensitive to the removal of apoint from a star list. According to the polyhedron formula,on average, each point has six neighboring points and belongsto six triangles. Because of observational effects or unexpectedevents, the number of points fluctuates in the list. To mentiona few examples, it is customary to build up from the brightestIstars in an image, but stars may get saturated or fall on badcolumns and thus disappear from the list. Star detection al-gorithms may find sources according to the changing FWHMof the frames. Transients, variable stars, or minor planets canlead to additional sources, on occasion. In general, if one pointis removed, six Delaunay triangles are destroyed and four newones form that are totally disjointed from the six original ones(and therefore are represented by substantially different pointsin the triangle space). Removing of the generating points1

3

might completely change the triangulation.5

Second, and more important, is that there is no guaranteethat the spatialdensity of points in and is similar. ForR Iexample, the reference catalog is retrieved for stars with mag-nitude limits that are different from those found on the image.If the number of points in common in and is only a smallR Ifraction of the total number of points, the triangulations on thereference and image have an inappropriate number of (or evenno) common triangles.

Third, the number of triangles with Delaunay triangulationis definitely smaller than ; i.e., the triangle space couldT TD opt

support more triangles without much confusion.Therefore, it is beneficial to extend the Delaunay triangu-

lation. A natural way to do this is as follows. Define a level, and for any given pointP select all points from the set of�

N points that can be connected toP via maximum edges of�the Delaunay triangulation. Following this, one can generatethe full triangulation of this set and append the new triangles

5 Imagine a honeycomb structure in which all central points of the hexagonsare added or removed; these two constructions generate disjoint Delaunaytriangulations.

to the whole triangle set. This procedure can be repeated forall points in the point set at fixed . For self-consistence, the�

case is defined as the Delaunay triangulation itself. If� p 0all points have six neighbors, the number of “extended” tri-anglesper data point is

2 2 2T p (3� � 3� � 1)(3� � 3�)(3� � 3� � 1)/6 (18)�

for ; i.e., this extension introduces new triangles.6� 1 0 O(� )Because some of the extended triangles are repetitions of othertriangles from the original Delaunay triangulation and from theextensions of other points, the final dependence only goes as

. We note that our software implementation is slightly2O(T � )D

different, and the expansion requires time and auto-2O(N� )matically results in a triangle set in which each triangle is unique.To give an example, for points, the Delaunay tri-N p 10,000angulation gives 20,000 triangles, the extended trian-� p 1gulation gives∼115,000 triangles, gives some∼347,000� p 2triangles, gives 875,000 triangles, and gives� p 3 � p 4∼1,841,000 triangles. The extended triangulation is advanta-geous not only because it provides more triangles and thus hasa better chance for matching, but also because there is a biggervariety in size that enhances matching if the input and referencelists have different spatial densities.

3.1.4. Matching the Triangles in Triangle Space

If the triangle sets for both the reference and input list areknown, the triangles can be matched in the normalized trianglespace (where they are represented by two-dimensional points)using the symmetric point matching as described in § 2.

In the next step, we create an “vote” matrix ,N # N VR Iwhere and are the number of points in the reference andN NR Iinput lists, respectively, that were used to generate the trian-gulations. The elements of this matrix have an initial value ofzero. Each matched triangle corresponds to three points in thereference list (identified by , , and ) and three points inr r r1 2 3

the input list ( , , and ). Knowing these indices, the matrixi i i1 2 3

elements , , and are incremented. The magnitudeV V Vr i r i r i1 1 2 2 3 3

This content downloaded from 193.105.154.131 on Mon, 19 May 2014 09:30:03 AMAll use subject to JSTOR Terms and Conditions

Page 7: Astrometry in Wide‐Field Surveys

ASTROMETRY IN WIDE-FIELD SURVEYS 1479

2006 PASP,118:1474–1483

of this increment (the “vote”) can depend on the distances ofthe matching triangles in the triangle space: the closer they are,the more votes these points get. In our implementation, ifNT

triangles are completely matched, the closest pair gets votes,NT

the second closest pair gets votes, and so on.N � 1T

Having built up the vote matrix, we select the greatest el-ements of this matrix, and the appropriate points referring tothese row and column indices are considered as matchedsources. We note that not all of the positive matrix elementsare selected, because elements with smaller numbers of votesare likely to be due to misidentifications. We found that inpractice, the upper 40% of the matrix elements yield a robustmatch.

3.2. The Unitarity of the Transformations

If an initial set of possible point pairs are known from trianglematching, one can fit a smooth function (e.g., a polynomial)that transforms the reference set to the input points. Prior tothe transformation, our assumption is that the dominant termin the transformation is the similarity transformation, whichimplies that the homogeneous linear part of it shouldalmostbe a unitarity operator.6 After the transformation is determined,it is useful to measure how much we diverge from this as-sumption. As mentioned above (§ 1), similarity transformationscan be written as

a c′r p lAr � b { l r � b, (19)( )b d

where and thea, b, c, andd matrix components are thel ( 0sine and cosine of a given rotational angle (i.e., anda p c

).b p �cIf we separate the homogeneous linear part of the transfor-

mation, as described by a matrix similar to that in equa-tion (19), it will be a combination of rotation and dilation withpossible inversion if and . We can defineFaF ≈ FdF FcF ≈ FbFthe unitarity of a 2# 2 matrix as

2 2(a � d) � (b � c)2L { , (20)2 2 2 2a � b � c � d

where the plus (minus) indicates the definition for regular (in-verting) transformations, respectively. For a combination ofrotation and dilation,L is zero, and for a distorted transfor-mation, .L ≈ f K 1d

TheL unitarity gives a good measure of how well the initialtransformation was determined. It happens occasionally thatthe transformation is erroneous, and in our experience, in thesecasesL is not just larger than the expectational value of , butfdis ≈1. This enables fine-tuning of the algorithm, such as chang-

6 Here , where is the adjoint of , and is the identity; i.e.,� �AA p I A A Iis an orthogonal transformation with possible inversion and magnification.A

ing chirality of the triangle space or adding further iterationsuntil satisfactoryL is reached.

3.3. Point Matching in Practice

In practice, matching points between the reference andRimage proceeds as follows:I

1. Generate two triangle sets and on and , respec-T T R IR I

tively:a) In the first iteration, generate only Delaunay triangles.b) Later, if necessary, extended triangulation can be gen-

erated with increasing levels of .�2. Match these two triangle sets in the triangle space, using

symmetric point matching.3. Select some possible point pairs, using a vote algorithm

(yielding pairs).N0

4. Derive the initial smooth transformation , using a(1)FRrIleast-squares fit.a) Check the unitarity of .(1)FRrIb) If it is greater than a given threshold , increaseO( f )d

and go to step 1b. If the unitarity is less than this�threshold, proceed to step 5.

c) If the maximal allowed is reached, try the procedure�with triangles that are flipped with respect to eachother between the image and reference (i.e., switchchirality of the triangle space).(cont)T

5. Transform using this initial transformation to the ref-Rerence frame of the image [ ].′ (1)R p F (R)RrI

6. Perform a symmetric point matching between and′R I(yielding pairs).N 1 N1 0

7. Refine the transformation based on the greater number ofpairs, yielding transformation , wherei is the iteration(i)FRrI

number.8. If necessary, repeat points 5, 6, and 7 iteratively, increase

the number of matched points, and refine the transfor-mation.

For most astrometric transformations and distortions, it holdsthat locally they can be approximated with a similarity trans-formation. At a reasonable density of points on and , theR Itriangles generated by a (possibly extended) Delaunay trian-gulation are small enough not to be affected by the distortions.The crucial step is the initial triangle matching, and becauselocal triangles are used, it proves to be a robust procedure. Itshould be emphasized that can be any smooth transfor-(i)FRrI

mation; for example, an affine transformation with small shear,or a polynomial transformation of any reasonable order. Theoptimal value of the order depends on the magnitude of thedistortion. Detailed descriptions of fitting procedures for suchmodels and functions can be found in various textbooks(see, e.g., Press et al. 1992, chapter 15). It is noteworthy thatin step 7, one can perform a weighted fit with possible iterativerejection ofn j outlier points.

This content downloaded from 193.105.154.131 on Mon, 19 May 2014 09:30:03 AMAll use subject to JSTOR Terms and Conditions

Page 8: Astrometry in Wide‐Field Surveys

1480 PAL & BAKOS

2006 PASP,118:1474–1483

4. SOFTWARE IMPLEMENTATION ANDAPPLICATIONS

4.1. Software Implementation

The coordinate matching and transformation algorithms areimplemented in two stand-alone binary programs written inANSI C. The programgrmatch matches point sets, includingtriangle space generation, triangle matching, symmetric pointmatching, and polynomial fitting; that is, steps 1–4 in § 3.3.The other program,grtrans, transforms coordinate lists us-ing the transformation coefficients that are outputted bygrmatch. The grtrans code is also capable of fitting ageneral polynomial transformation between point pair lists ifthey are paired or matched manually, or by external software.We should note that in the case of degeneracy (e.g., when allpoints are on a perfect lattice), the match will fail.

Both programs are part of the FIHAT/HATpipe package thatis under development for the massive data reduction of theHATNet data flow. They can be easily embedded into UNIXenvironments, as both of them parse a wide range of command-line arguments for defining the structure of the input data andfine-tuning the algorithm. The programs are also capable ofredirecting their input and/or output to standard streams.

By combininggrmatch andgrtrans, one can easily de-rive the World Coordinate System (WCS) information for aFITS data file. The output of WCS keywords is now fullyimplemented ingrtrans, following the conventions of theWCSTools package (see Mink 2002).7 Such information is veryuseful for manual analysis with well-known FITS viewers (e.g.,DS9; see Joye & Mandel 2003). For a more detailed descriptionof the WCS, see Calabretta & Greisen (2002), and for therepresentation of distortions, see Shupe et al. (2005).8

The package containing the programsgrmatch andgrtrans and other related software are accessible online (af-ter registration).9

4.2. Performance on Large Data Sets

We usedgrmatch and grtrans to perform astrometryand star identification on a large set of images taken by theHAT Network of telescopes (Bakos et al. 2004). The resultspresented in this paper are based on observations originatingfrom the following HATNet telescopes: HAT-5, HAT-6, andHAT-7, located at the Fred Lawrence Whipple Observatory(FLWO), Arizona, plus HAT-8 and HAT-9 on the SmithsonianSubmillimeter Array roof, atop Mauna Kea, Hawaii.10 Briefly,the survey telescopes have an identical setup: ,200 mm f/1.8telephoto lens and a 2K# 2K CCD yielding an FOV.8� # 8�In order to test the method on different instruments, we alsoperformed astrometry on data taken by the follow-up instru-

7 See http://tdc-www.harvard.edu/wcstools.8 See also http://spider.ipac.caltech.edu/staff/shupe/distortion_v1.0.htm.9 See http://www.hatnet.hu/software.10 HAT-10 is also located at FLWO, but its data were not used in this paper.

ment TopHAT (FLWO). TopHAT is a 0.26 m diameter, f/5Ritchey-Chre´tien design telescope with a Baker wide-field cor-rector aided by a 2K# 2K Marconi chip, yielding a 1�.3 FOV.

The astrometry and identification steps were as follows. First,for all observed fields, reference star lists were generated usingthe Two Micron All Sky Survey catalog (2MASS; see Skrutskieet al. 2006) as reference. These reference lists include the sourceidentifiers, the original celestial coordinates (R.A., decl.), anestimatedI-band magnitude, and the projected coordinates

of the stars. We used arc projection (see Calabretta &(y, h)Greisen 2002) centered at the nominal center of a given field,and scaling of the projection was unity such that a star locatedat a distance of 1� from the center of the given celestial fieldhad a unit distance in the plane from the origin in the(y, h)reference list. The FOVs of the reference lists were a bit widerthan the nominal FOVs of the HAT telescopes, to ensure acomplete overlap between the two lists in spite of the smalluncertainties in the positioning of the telescopes.

Second, an input star list was generated for each image, usingour star detection algorithmfistar (also part of FIHAT/HAT-pipe), which detects and fits starlike objects above a givensignal-to-noise ratio threshold. This detection yields a set ofinput lists that include the pixel coordinates of the stars,(X, Y )and other quantities (including the flux, FWHM, and the shapeparameters).

Third, for each image, the input star list and the relevantreference star list were matched using the programgrmatch.The match was performed between the projected reference co-ordinates and the detected pixel coordinates . The(y, h) (X, Y )program outputs two files: the list of the matched lines (the“match” file) and a small file that includes the fitted polynomialtransformation parameters and some statistical data (the “trans-formation” file). It should be emphasized that the match wasnot done directly using the original celestial coordinates, asthey exhibit an unwanted curvature in the field.

Finally, the reference star list was transformed by the(y, h)programgrtrans into the system of the image, using(X, Y )the “transformation” file. The transformed list shows whereeach star with a given identifier would fall on the image. Thetransformation can also be used to calculate the WCS infor-mation for a given image.

We note that the crucial part of the process is the third step.This can be fine-tuned by using many parameters, one of themost important being the polynomial order. For a small FOV(less than 1�) and small distortions, linear or second-order pol-ynomials yield good results. For HAT images, we had to in-crease the order up to 6 to achieve the best results. Figure 4exhibits two vector plots that show the difference between thetransformed reference coordinates and the detected star coor-dinates using a second- and a fourth-order polynomial trans-formation for a typical HAT image. In the first case, by usinga second-order fit, definite radial structures remain, and thestars located at the corners of the image are not evenly matched,due to the large distortions in the optics. However, using a

This content downloaded from 193.105.154.131 on Mon, 19 May 2014 09:30:03 AMAll use subject to JSTOR Terms and Conditions

Page 9: Astrometry in Wide‐Field Surveys

ASTROMETRY IN WIDE-FIELD SURVEYS 1481

2006 PASP,118:1474–1483

Fig. 4.—Vector plots of the difference between the transformed referenceand the input star coordinates for a typical HAT field. Differences for thesecond-order (left) and fourth-order (right) polynomial fits.

Fig. 5.—Differences between theY-coordinates of the transformed referenceand the input star coordinates for a typical HAT field, using fourth-order (left)and sixth-order (right) polynomial fits.

fourth-order fit, all segments of the image are matched, andthe residuals are also smaller. These small residuals can bebetter visualized if only the difference between one of thecoordinates is shown in a gradient plot. Figure 5 illustrates thedifference between theY-coordinates for the same image, usinga fourth- and sixth-order polynomial fit. While there is a definiteresidual structure in the fourth-order fit, it disappears using thesixth-order polynomial transformation.

As regards statistics, we performed the astrometry andsource identification for 243,447 HAT images that had beenacquired between the beginning of 2003 and 2006 June. Thewide-field telescopes observed 52 individual and almost non-overlapping fields between the Galactic latitudesb p �30�and .b p �74�

We initiated the processing with the following parameters.For the triangulation, the 3000 brightest sources were used fromboth the reference catalog and the detected stars. The criticalunitarity was set to 0.01; therefore, if the fitted initial trans-formation had a unitarity larger than this value, the level ofthe triangulation expansion was increased. The final transfor-mation was determined using a weighted sixth-order polyno-mial fit. Because the astrometric errors of brighter stars aresmaller, we weighted data points based on their magnitudeduring the fit. Finally, the maximal distance of matches wasset to 1 pixel to reject false identifications.

Astrometry and cross-identification of sources was success-ful for 238,353 images. The remaining 5094 images were an-alyzed manually, and we found that only 13 of them were goodenough to expect astrometry to succeed; the rest were cloudyor showed various other errors. Astrometry on these 13 imagesalso succeeded by decreasing the number of stars for trian-gulation to 2000. This means that a completely automatic runyielded a 99.995% success rate, and the other images were alsomatched by applying small changes to the fine-tunedparameters.

In order to test the algorithm with a different instrument, wealso performed astrometry on 22,936 TopHAT images takenin 2005. The only difference in the procedure was that thepolynomial transformation was only of second order. The suc-

cess ratio was 93%, but 90% of the frames in which astrometryfailed were cloudy, with virtually no stars. Astrometry alsofailed on very short exposure (10 s)V-band frames. Fine-tuningthe parameters (number of triangles, input lists) resolved mostof these cases.

The following statistics were compiled on the wide-fieldHAT frames. The median number of matched sources relativeto the number of stars in the reference or the input list was

(median deviation). The average CPU usage98.38%� 0.31%was s frame�1 on a 64 bit AMD Opteron machine0.77� 0.22running at 2 GHz. Astrometry was successful on 96.78% ofthe images using Delaunay triangulation without extended tri-angles (CPU time: 0.73 s); 0.49% of the frames were processedat level extended triangulation (CPU: 1.79 s), 0.06% at� p 1

(CPU: 2.22 s), 33 images at (CPU: 3.67 s), and� p 2 � p 31334�5094 images at (CPU: 5.20 s). Here the number� p 45094 refers to those images for which astrometry failed evenat , mostly because of bad data quality (see above). The� p 4reason for the success of Delaunay triangulation for 96% ofthe wide-field HAT frameswithout extended triangulation isbecause the HAT instruments perform homogeneous data ac-quisition and are very well characterized (zero points, satura-tion). Thus, the 2MASS reference catalogs can be retrieved fora given field in such a way that there are many sources incommon. However, in general applications, when the saturationand faint magnitude limits of an image have only a crudeestimate, extended triangulation is essential.

Although the procedure is fast, we note that the most time-consuming part of the process is the triangulation generationand the triangle matching itself. On average, this required morethan 60% of the total time, and at , 92% of the time.� p 4The median value of the fit residuals was 0.06 pixels, whilethe median of the unitarities was 0.0042. The latter is in quitegood agreement with the expected value of the nonlinearityfactor, .f ≈ 0.005d

4.3. Comparison with Other Implementations

We also compared the performance of the programgrmatch with an existing implementation within IRAF,

This content downloaded from 193.105.154.131 on Mon, 19 May 2014 09:30:03 AMAll use subject to JSTOR Terms and Conditions

Page 10: Astrometry in Wide‐Field Surveys

1482 PAL & BAKOS

2006 PASP,118:1474–1483

namely the IMAGES.IMMATCH package, using the relatedtasksxyxymatch, geomap, andgeoxytran. The steps ofthe point matching were as described in § 3.3. First, an initialset of possible pairs were established usingxyxymatch andthe “triangles” option as the matching method. Because thetriangle sets generated byxyxymatch are full triangulations,we limited our input lists to the brightest sources; otherwise,the dependence of the number of triangles would have3O(N )resulted in an unrealistically long matching time. Second, theinitial transformation was fitted usinggeomap, followed bya transformation of the reference catalog to the frame of theinput list usinggeoxytran and this fit. Third, the trans-formed reference and the original input list were also matchedby xyxymatch, but this time using the “tolerance” matchingmethod. Finally, this new list of point pairs was used again torefine the geometric transformation withgeomap.

The comparison betweengrmatch and the IRAF IM-AGES.IMMATCH implementation was based on 950 individ-ual images, all acquired by TopHAT from the same FOV. Wenote that we had to use the relatively narrow-field TopHAT forthe comparison, as the triangle match on the original 8�.2 HAT-Net frames is almost hopeless, given the spatial distortions, thelarge number of stars, and the difficulty of selecting the bright-est stars and at the same time retaining a small total numberof selected sources (in order to be able to cope with a fulltriangulation). On each image, there were approximately 800–900 detected stars, depending on the air mass or thin clouds.For the triangulation and the initialxyxymatch fit, we usedthe 35 brightest sources from both the reference catalog andthe input star lists. We found thatgrmatch required∼0.1 sCPU time on average, while the whole procedure using theIRAF-based tasks, as described above, required∼5–7 s net CPUtime for a single image. Both algorithms yielded the sametransformation coefficients and found the same number of pairs.However, in three cases, the number of sources used for tri-angulation had to be increased manually to 40 or 45. It isnoteworthy that although the IRAF version proved to be sig-nificantly slower, the time-consuming part was the firstxyxymatch matching. All other tasks, including the secondmatching (with “tolerance” option), required only a fraction ofa second per image.

5. SUMMARY

In this paper, we present a robust algorithm for cross-match-ing two lists of two-dimensional points. The task is twofold:finding the smooth spatial transformation between the lists, andcross-matching the points. These two steps are intertwined andare performed in an iterative way until a satisfactory transfor-mation and matching rate are reached. We make only verybasic assumptions that hold for almost all astronomical appli-cations, including wide-field surveys with distorted fields anda large number of sources. Namely, the transformation between

the point lists is largely a similarity transformation (arbitraryshift, rotation, magnification, inversion). A significant distor-tion term can be present, given that it can be linearized on thescale length of the average distance of neighboring points.

In § 2 we brieflydescribe symmetric point matching in oneand two dimensions, because this tool is used throughout theastrometry procedure. Finding the initial transformation be-tween the point lists is based on triangle matching. First, wedefine various normalized triangle spaces in § 3.1.1. The“mixed” triangle space of Valdes et al. (1995) is a continuousfunction of the triangle parameters, but flipped triangles arenot distinguished. The “chiral” triangle space ensures that chi-rality information is preserved, but this space is not continuous.We show that it is possible to define a “continuous” trianglespace that is both continuous and preserves chirality, and thatfurthermore spans a larger volume and diminishes the confu-sion of triangles with similar coordinates.

Taking into account the distortion of a field and the astro-metric errors, we calculate both the optimal size and numberof triangles. For the typical setup of a HATNet telescope( FOV, distortion factor ), the optimal size8� # 8� f ∼ 0.005d

is 0�.2, and the optimal number of triangles is less than. We use Delaunay triangulation for generating the tri-62 # 10

angles of the triangle space. This has the advantage of beingfast, robust, and generating local triangles that are less proneto being distorted. However, we note in § 3.1.3 that Delaunaytriangulation is sensitive to the removal or addition of pointsto the list, and thus is unstable. We introduce an extension ofthis triangulation that is parameterized by an level.�

Having determined the transformation between the two lists,it is possible to check how well the initial assumption aboutthe linearity of the dominant term holds. In § 3.2, we introducethe unitarity of the transformation, a simple scalar measure ofthis property. We describe the practical details of the algorithmin § 3.3 and the actual software implementation (grmatch,grtrans) in § 4.1.

Finally, we ran these programs on some 240,000 framestaken by the wide-angle cameras of HATNet, plus 20,000frames acquired by the TopHAT telescope. The success ratewas very close to 100%, and the routines handled the variouspointing errors, defocusing, and sixth-order distortions in thewide fields. Both programs will become available from theauthors upon request, in binary format and for a wide rangeof architectures.

A. P. would like to thank the hospitality of the Harvard-Smithsonian Center for Astrophysics, where this work was par-tially carried out. A. P. was also supported by Hungarian OTKAgrant T-038437. The HATNet project is funded by NASA grantNNG 04-GN74G. G. A´ . B. wishes to acknowledge fundingfrom NASA Hubble Fellowship grant HST-HF-01170.01-A.Both authors would like to thank Istva´n Domsa for the earlydevelopment of triangle and point-matching codes.

This content downloaded from 193.105.154.131 on Mon, 19 May 2014 09:30:03 AMAll use subject to JSTOR Terms and Conditions

Page 11: Astrometry in Wide‐Field Surveys

ASTROMETRY IN WIDE-FIELD SURVEYS 1483

2006 PASP,118:1474–1483

REFERENCES

Akerlof, C., et al. 2000, AJ, 119, 1901Alonso, R., et al. 2004, ApJ, 613, L153Bakos, G. A., Lazar, J., Papp, I., Sa´ri, P., & Green, E. M. 2002, PASP,

114, 974Bakos, G. A., Noyes, R. W., Kova´cs, G., Stanek, K. Z., Sasselov,

D. D., & Domsa, I. 2004, PASP, 116, 266Calabretta, M. R., & Greisen, E. W. 2002, A&A, 395, 1077Charbonneau, D., Brown, T. M., Burrows, A., & Laughlin, G. 2006,

in Protostars and Planets V, ed. B. Reipurth, D. Jewitt, & K. Keil(Tucson: Univ. Arizona Press), in press (astro-ph/0603376)

Groth, E. J. 1986, AJ, 91, 1244Joye, W. A., & Mandel, E. 2003, in ASP Conf. Ser. 295, Astronomical

Data Analysis Software and Systems XII, ed. H. E. Payne, R. I.Jedrzejewski, & R. N. Hook (San Francisco: ASP), 489

Mink, D. J. 2002, in ASP Conf. Ser. 281, Astronomical Data AnalysisSoftware and Systems XI, ed. D. A. Bohlender, D. Durand, &T. H. Handley (San Francisco: ASP), 169

Pepper, J., Gould, A., & Depoy, D. L. 2004, in AIP Conf. Proc. 713,The Search for Other Worlds (New York: AIP), 185

Phillips, A. C., & Davis, L. E. 1995, in ASP Conf. Ser. 77, Astro-nomical Data Analysis Software and Systems IV, ed. R. A. Shaw,H. E. Payne, & J. J. E. Hayes (San Francisco: ASP), 297

Pojmanski, G. 1997, Acta Astron., 47, 467Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P.

1992, Numerical Recipes in C: The Art of Scientific Computing(2nd ed.; Cambridge: Cambridge Univ. Press)

Shewchuk, R. J. 1996, in Applied Computational Geometry: TowardsGeometric Engineering, ed. M. C. Lin & D. Manocha (Berlin:Springer), 1148, 203

Shupe, D. L., Moshir, M., Li, J., Makovoz, D., Narron, R., & Hook,R. N. 2005, in ASP Conf. Ser. 347, Astronomical Data AnalysisSoftware and Systems XIV, ed. P. Shopbell, M. Britton, & R. Ebert(San Francisco: ASP), 491

Skrutskie, M. F., et al. 2006, AJ, 131, 1163Stetson, P. B. 1989, in V Advanced School of Astrophysics, ed. B.

Barbury et al. (Sa˜o Paulo: Univ. Sa˜o Paulo)Valdes, F. G., Campusano, L. E., Vela´squez, J. D., & Stetson, P. B.

1995, PASP, 107, 1119Thiebaut, C., & Boe¨r, M. 2001, in ASP Conf. Ser. 238, Astronomical

Data Analysis Software and Systems X, ed. F. R. Harnden, Jr.,F. A. Primini, & H. E. Payne (San Francisco: ASP), 388

This content downloaded from 193.105.154.131 on Mon, 19 May 2014 09:30:03 AMAll use subject to JSTOR Terms and Conditions