|
Abstract:
The Euclidean distance between two points $p_1,p_2$ (column vectors)
is
defined by ||p_1-p_2||=\sqrt{(p_1-p_2)^T(p_1-p_2)}. A matrix $D=[d_{ij}]$
is called a Euclidean distance matrix (EDM) if there are points
p_1,p_2,...,p_n such that d_{ij}=||p_i-p_j||^2 for i,j=1,2,...,n.
Apparently a EDM is symmetric with zero diagonal and all off-diagonal
elements non-negative, but not vice versa. A matrix A=[a_{ij}] is called
symmetric partial if there are unspecified entries, and a_{ij} is
specified and equal to a_{ji} whenever a_{ji} is specified. A matrix D is
called a EDM completion of A if D is a EDM. The Euclidean distance matrix
completion problem (EDMCP) is to find a EDM $D$ that completes a given
symmetric partial matrix.
Theoretical properties of EDMs have been well studied and numerical
optimization is widely used to tackle the EDMCPs. One prominent
application of the EDMCP is protein structure prediction. The distance
information comes from the structural interpretation of nuclear magnetic
resonance data. We transform the EDMCP into a global optimization problem
and use modified Newton methods to descent directions. To reach the global
minimum, we develop a dimensional relaxation method. This approach is not
new. To our knowledge, it was first suggested by Crippen (1982), and later
used by Purisima and Scheraga (1986) and Havel (1991).
We experimented on random problems and structure prediction of proteins
1BPI, 1CBNa, 1MBC, and 2GDM. For 55% unspecified entries or less, our
system found the global minimum in all cases by a local minimization
procedure. For unspecified entries 60%-80%, our modified Newton algorithms
generally find the solution with dimensional relaxation 1 or 2. For cases
with unspecified entries 85% or higher, even if a global minimizer is
found it is not reliable because insufficient distance information may
result in multiple solutions. This is joint work with Dianne P. O’Leary. |