Distance matrix

In mathematics, computer science and especially graph theory, a distance matrix is a square matrix (two-dimensional array) containing the distances, taken pairwise, between the elements of a set.^[1] Depending upon the application involved, the distance being used to define this matrix may or may not be a metric. If there are $N$ elements, this matrix will have size $N \times N$ . In graph-theoretic applications the elements are more often referred to as points, nodes or vertices.

Non-metric distance matrices[]

In general, a distance matrix is a weighted adjacency matrix of some graph. In a network, a directed graph with weights assigned to the arcs, the distance between two nodes of the network can be defined as the minimum of the sums of the weights on the shortest paths joining the two nodes.^[2] This distance function, while well defined, is not a metric. There need be no restrictions on the weights other than the need to be able to combine and compare them, so negative weights are used in some applications. Since paths are directed, symmetry can not be guaranteed, and if cycles exist the distance matrix may not be hollow.

An algebraic formulation of the above can be obtained by using the min-plus algebra. Matrix multiplication in this system is defined as follows: Given two $n\times n$ matrices $A=(a_{ij})$ and $B=(b_{ij})$ , their distance product $C=(c_{ij})=A\star B$ is defined as an $n\times n$ matrix such that $c_{ij}=\min _{k=1}^{n}\{a_{ik}+b_{kj}\}$ . Note that the off-diagonal elements that are not connected directly will need to be set to infinity or a suitable large value for the min-plus operations to work correctly. A zero in these locations will be incorrectly interpreted as an edge with no distance, cost, etc.

If $W$ is an $n\times n$ matrix containing the edge weights of a graph, then $W^{k}$ (using this distance product) gives the distances between vertices using paths of length at most $k$ edges, and $W^{n}$ is the distance matrix of the graph.

An arbitrary graph $G$ on $n$ vertices can be modeled as a weighted complete graph on $n$ vertices by assigning a weight of one to each edge of the complete graph that corresponds to an edge of $G$ and zero to all other edges. $W$ for this complete graph is the adjacency matrix of $G$ . The distance matrix of $G$ can be computed from $W$ as above, however, $W n$ calculated by the usual matrix multiplication only encodes the number of paths between any two vertices of length exactly $n$ .

Metric distance matrices[]

The value of a distance matrix formalism in many applications is in how the distance matrix can manifestly encode the metric axioms and in how it lends itself to the use of linear algebra techniques. That is, if $M = (x ij)$ with $1 \leq i, j \leq N$ is a distance matrix for a metric distance, then

the entries on the main diagonal are all zero (that is, the matrix is a hollow matrix), i.e. $x ii = 0$ for all $1 \leq i \leq N$ ,
all the off-diagonal entries are positive ( $x ij > 0$ if $i \neq j$ ), (that is, a non-negative matrix),
the matrix is a symmetric matrix ( $x ij = x ji$ ), and
for any $i$ and $j$ , $x ij \leq x ik + x kj$ for all $k$ (the triangle inequality). This can be stated in terms of tropical matrix multiplication

When a distance matrix satisfies the first three axioms (making it a semi-metric) it is sometimes referred to as a pre-distance matrix. A pre-distance matrix that can be embedded in a euclidean space is called a Euclidean distance matrix.

Another common example of a metric distance matrix arises in coding theory when in a block code the elements are strings of fixed length over an alphabet and the distance between them is given by the Hamming distance metric. The smallest non-zero entry in the distance matrix measures the error correcting and error detecting capability of the code.

Applications[]

Hierarchical clustering[]

A distance matrix is necessary for hierarchical clustering.

Phylogenetic analysis[]

Distance matrices are used in phylogenetic analysis.

Other uses[]

In bioinformatics, distance matrices are used to represent protein structures in a coordinate-independent manner, as well as the pairwise distances between two sequences in sequence space. They are used in structural and sequential alignment, and for the determination of protein structures from NMR or X-ray crystallography.

Sometimes it is more convenient to express data as a similarity matrix.

It is used to define the distance correlation.

Examples[]

For example, suppose these data are to be analyzed, where pixel Euclidean distance is the distance metric.

Raw data

The distance matrix would be:

	a	b	c	d	e	f
a	0	184	222	177	216	231
b	184	0	45	123	128	200
c	222	45	0	129	121	203
d	177	123	129	0	46	83
e	216	128	121	46	0	83
f	231	200	203	83	83	0

These data can then be viewed in graphic form as a heat map. In this image, black denotes a distance of 0 and white is maximal distance.

Graphical View

References[]

^ Weyenberg, G., & Yoshida, R. (2015). Reconstructing the phylogeny: Computational methods. In Algebraic and Discrete Mathematical methods for modern Biology (pp. 293-319). Academic Press.
^ Frank Harary, Robert Z. Norman and Dorwin Cartwright (1965) Structural Models: An Introduction to the Theory of Directed Graphs, pages 134–8, John Wiley & Sons MR0184874

[1] Weyenberg, G., & Yoshida, R. (2015). Reconstructing the phylogeny: Computational methods. In Algebraic and Discrete Mathematical methods for modern Biology (pp. 293-319). Academic Press.

[2] Frank Harary, Robert Z. Norman and Dorwin Cartwright (1965) Structural Models: An Introduction to the Theory of Directed Graphs, pages 134–8, John Wiley & Sons MR0184874

[1]

[2]

v t Matrix classes
Explicitly constrained entries	Alternant Anti-diagonal Anti-Hermitian Anti-symmetric Arrowhead Band Bidiagonal Bisymmetric Block-diagonal Block Block tridiagonal Boolean Cauchy Centrosymmetric Conference Complex Hadamard Copositive Diagonally dominant Diagonal Discrete Fourier Transform Elementary Equivalent Frobenius Generalized permutation Hadamard Hankel Hermitian Hessenberg Hollow Integer Logical Matrix unit Metzler Moore Nonnegative Pentadiagonal Permutation Persymmetric Polynomial Quaternionic Signature Skew-Hermitian Skew-symmetric Skyline Sparse Sylvester Symmetric Toeplitz Triangular Tridiagonal Unitary Vandermonde Walsh Z
Constant	Exchange Hilbert Identity Lehmer Of ones Pascal Pauli Redheffer Shift Zero
Conditions on eigenvalues or eigenvectors	Companion Convergent Defective Diagonalizable Hurwitz Positive-definite Stieltjes
Satisfying conditions on products or inverses	Congruent Idempotent or Projection Invertible Involutory Nilpotent Normal Orthogonal Unimodular Unipotent Totally unimodular Weighing
With specific applications	Adjugate Alternating sign Augmented Bézout Carleman Cartan Circulant Cofactor Commutation Confusion Coxeter Distance Duplication and elimination Euclidean distance Fundamental (linear differential equation) Generator Gram Hessian Householder Jacobian Moment Payoff Pick Random Rotation Seifert Shear Similarity Symplectic Totally positive Transformation
Used in statistics	Centering Correlation Covariance Design Doubly stochastic Fisher information Hat Precision Stochastic Transition
Used in graph theory	Adjacency Biadjacency Degree Edmonds Incidence Laplacian Seidel adjacency Tutte
Used in science and engineering	Cabibbo–Kobayashi–Maskawa Density Fundamental (computer vision) Fuzzy associative Gamma Gell-Mann Hamiltonian Irregular Overlap S State transition Substitution Z (chemistry)
Related terms	Jordan normal form Linear independence Matrix exponential Matrix representation of conic sections Perfect matrix Pseudoinverse Row echelon form Wronskian
List of matrices Category:Matrices