Ethernet Wang - More Than Met the Eye

Is it possible to analyze a museum’s collection by the aesthetic and semantic content of objects within, beyond historical metadata?

In this class project, I propose and implement a novel methodology to analyze a collection of historic photographs. Each image is mapped to an embedding vector in a latent space by a Resnet-18 neural network model trained on ImageNet image classification. Then, the embeddings are clustered via a Ward hierarchical clustering model into "conceptual clusters."

We apply this methodology to the Metropolitan Museum of Art’s photograph collection, which primarily consists of 19th-century photos. Qualitatively, it performs surprisingly well in extracting clusters representing specific aesthetic ideas, semantic subject types, and photographic technologies, especially since neither the embedding network nor the clustering model has prior knowledge about the specific subject matter.

With these clusters, we analyze their contents, measure how they change over time, and speculate on further applications of the methodology.

In other words:

A neural network translates each photograph to a point in a latent conceptual space, where similar photographs are nearby and dissimilar photographs are far away
A clustering algorithm groups these points into clusters, each representing a "genre" of photographs
The contents of these clusters can then be inspected, interpreted, and analyzed to understand the collection as a whole