
Both and Neither:in silico v1.0, Ecce Homology
Ecce Homology, a physically interactive new-media work, visualizes genetic data as calligraphic forms. A novel computer-vision user interface allows multiple participants, through their movement in the installation space, to select genes from the human genome for visualizing the Basic Local Alignment Search Tool (BLAST), a primary algorithm in comparative genomics. Ecce Homology was successfully installed in the UCLA Fowler Museum, 6 November 2003-4 January 2004.
Article Frontispiece. Calligraphic visualization of human NADH dehydrogenase (ubiquinone) 1 alpha amino acid sequence.
In Silico V1.0: Discovery-Based Collaboration
in silico v1.0 is a collaboration composed of eight artists and scientists representing bioinformatics, computer science, engineering, molecular biology, performance, proteomics and new media. We are motivated by several goals: to create works that contribute simultaneously to the realms of science and art while retaining discipline-specific rigor; to investigate the nature of interdisciplinary collaboration; and to explore how artistic practice and aesthetic experience can nurture scientific discovery. Through our collaboration we are developing Ecce Homology, an artistic exploration of the intersection of comparative genomics and immersive experience.
The sequencing of the human genome is perhaps the crowning achievement of the reductionist ethos of science in the last century. In contrast, genomic biology is moving toward discovery-based and predictive frameworks for scientific practice [1]. Unprecedented amounts of genomic data are being generated daily. To capitalize on this wealth of data, new tools must be developed. The need to build knowledge from data, or to find patterns within vast datasets, is driving the development and application of interdisciplinary and alternative approaches [2]. Ecce Homology is one such approach. The work fuses multi-user interaction with a compelling immersive aesthetic experience of genomic data sets and a depiction of a complex bioinformatics algorithm. It is also intended to engage the social and scientific importance of genomic biology and bioinformatics while aiming to foster awareness of the use of tools for making meaning and knowledge in science. Ecce Homology's novel calligraphic gene visualization incorporates intricate, non-alphabet forms for scientific visualization inspired by the traditions of Chinese and Sanskrit calligraphy, pictographic writing systems and the relationship between structure and function in biological molecules. In creating Ecce Homology we established a shared language that incorporates elements of each collaborator's area of expertise. These elements from distinct disciplines were continually reinterpreted and elaborated. The result is hybrid process and product.
Ecce Homology
Ecce Homology consists of an interactive installation that offers an aesthetic and meditative encounter, visualizes a primary algorithm in comparative genomics (the Basic Local Alignment Search Tool [BLAST]) [3] and implements a novel calligraphic visualization of genomic data. It premiered in Los Angeles in 2003 as part of the University of California at Los Angeles Fowler Museum of Cultural History and Hammer Museum's exhibition From the Verandah (Color Plate E No. 1).
Elemental carbon, as a basis of life on Earth and a component of Asian ink, was among the various themes from Buddhist [End Page 287]
Comparison of standard 3D and calligraphic visualizations of human and rice amylase protein. (left) Human amylase, alpha 1A; salivary protein, standard 3D structure. (middle) Human amylase, alpha 1A; salivary calligraphic protein stroke. (right) Rice alpha-amylase protein calligraphic stroke. Image orientation: The start (N-terminus) of each protein, including the standard 3D structure, is at the top left of each image/stroke. 3D Representation: A standard representation of the 3D structure of amylase based on the atomic coordinates deposited in the protein data bank. Image rendered with Pymol [13].
tradition that influenced the design of From the Verandah. This exhibition, in turn, was set within the context of The Art of Rice, an exhibition examining the relationship between rice and culture throughout Asia. Responding to these themes, Ecce Homology allows visitors to initiate BLAST analyses exploring evolutionary relationships between genes from human beings and the rice genome that comprise the metabolic pathways for cellular respiration (the process by which carbohydrate is broken down into carbon dioxide in order to release the energy necessary for life). These BLAST analyses are rendered for the viewer via five video projectors that produce changing imagery depicting both genetic homology between human and rice proteins and the traditionally unseen operation of BLAST, a fundamental black box of bioinformatics.
Ecce Homology presents genomic data through a novel visualization composed of calligraphic forms or "characters" representing the protein products encoded by genes. The characters are created by using genomic and protein data to drive a virtual calligraphic brush. The form and visual structure of pictographic languages are directly connected to their meaning, in the same way that the protein structure specified by DNA reflects its function in an organism. The stroke placement, shape and brush quality for each gene character are determined by the physical and chemical properties of its nucleotide and amino acid sequences. As shown in Fig. 1, the left-hand profile of each gene's protein stroke is determined by a physical property of the amino acid, the proportion of its mass to its volume. The stroke's right side represents a chemical property along the gene's amino acids, hydrophobicity, or the amino acid's tendency to be buried inside the protein. Curvature of the stroke is determined by chemical properties of the sequence's ionizable amino acids, which are most commonly associated with interactions with other amino acids. The visualization is created from amino acid sequence chunks that are segmented by a "turn prediction" algorithm [4]. Each segment's corresponding calligraphic stroke is connected to its neighbor by a connection whose shape is based on a secondary structure property of the segment. In this way, strokes contain both secondary structure information as well
Brush model. (left) A sample brush profile curve. (right) Brush model interface.
[End Page 288]
Transformation of helical DNA strokes. (left) Helical DNA stroke for exon D10491. (right) Transformed DNA stroke for exon D10491.
as chemical and physical information about each position in the sequence. Genomic data selected from human and rice genomes maintained by the National Center for Biotechnology Information (NCBI), The Institute for Genomic Research (TIGR) and the Rice database at Gramene: A Comparative Mapping Resource for Grains (Cornell University) were used for the installation of Ecce Homology in the UCLA Fowler Museum of Cultural History. As metaphors, the rice and human characters for genes involved in cellular respiration capture the cycling of energy and the unity of life.
Relevance and Blast
To raise awareness of the tools that generate scientific meaning and knowledge, particularly in the field of genomics, we explore the BLAST algorithm, one of the primary in silico (computational) analyses conducted worldwide as part of the Human Genome Project. For both ethical and technical reasons the function of each gene in the human genome cannot currently be ascertained directly from the human genome itself. Usually, in order to determine the function of a gene, scientists must rely on comparisons between our genes/genome and those of other organisms. BLAST allows researchers to compare DNA or protein sequences of unknown identity, function and structure with "knowns" from validated databases, providing a measure of similarity or homology (similarity attributed to descent from a common ancestor/evolution) among sequences.
BLAST analyses are conducted worldwide via web servers supported by major genome sequencing consortia in Europe, Japan and the U.S.A., as well as in local laboratories on individual computers. Daily, an average of 100,000 unique BLAST runs arising from 70,000 unique IP addresses are conducted on the U.S. National Center for Biotechnology Information's web servers [5]. BLAST is arguably the most widely used data-mining tool in history. Analyses are routinely performed on remote servers via batch processing, typically showing the user only the final result of the algorithm. Although the use of BLAST is ubiquitous, fairly few users may know how or why a particular match was found.
Ecce Homology overlays a real-time visualization of the BLAST algorithm's internal processes on the calligraphic forms. Revealing the operation of the normally invisible BLAST process is a central conceptual and aesthetic element of the work. Transformed into an experience that proceeds at the scale of human observational time, BLAST is both engine and subject of this physically interactive installation.
Characters: Calligraphic Gene Visualization
DNA and protein sequences within genomic databases are presented as long strings of letters, many of which span hundreds of thousands of characters. Internal organization, structural elements or features of biological interest, and patterns among sequences are difficult, if not impossible, to discern simply by reading these lengthy text strings. Presenting large genomic data sets in a meaningful and non-overwhelming manner, offering a multi-user physical interface to visualized data and animating complex algorithms operating on such data sets present further challenges. The calligraphic visualization explored in Ecce Homology
DNA and protein strokes for human salivary Amylase, Alpha 1A, salivary gene. (left) Exon (E) and intron (I) strokes for the nucleic acid sequence of the human salivary Amylase, Alpha 1A, salivary gene. The sequence begins at the top left and ends at the bottom right. (right) The protein stroke for the corresponding amino acid sequence. The sequence begins at the top left and ends at the bottom right. Protein strokes include a combination of amino acid mass to volume ratio, amino acid hydrophobicity, ionizable amino acids, turn prediction [14] and secondary structure properties. DNA strokes include sequence length, percent guanine and cytosine residues [15] and DNA curvature prediction [16].
[End Page 289]
Viewer stillness triggers expansion of projection. (left) Installation in contracted state with "breathing" motion. (middle) Expansion in process. (right) Viewer creating gesture traces.
forces group members to apply and join their individual disciplinary skills in the pursuit of a hybrid approach to these challenges.
Choosing calligraphy as a visualization "platform," we confront a new set of dimensions on which to visualize characteristics of the gene. A calligrapher's individual style incorporates the use of materials, the turn and fold of the brush, the quantity of ink loaded onto it, and the motion, speed and force with which the ink is applied. The qualities that create personal calligraphic style support our objective of representing genes' actual data/structure and the characteristics of the proteins they encode in each "biological character." Rather than simply depicting the linear DNA or protein sequence, the characters communicate information about biological features of the molecules along each sequence. In our model, stroke curvature, brush width, pressure and brush profile are varied based on the gene being drawn. Genomic and protein data drive the virtual calligraphic brush so that each DNA and protein sequence expresses its characteristics in its own "written image." Databases and the scientific literature provide information about such biological features, yet multiple features are not routinely combined and explored as aesthetic/visual patterns. The spatially compact calligraphic gene visualization developed for Ecce Homology affords display of many genes simultaneously, enabling the recognition of similarity and homology via pattern recognition (Fig. 1). Conventional tools cannot easily obtain this holistic view.
To approach the beauty and dynamism of Chinese and Sanskrit calligraphy, we adopted a computational brush model in which ink is deposited in areas where the brush shape intersects a virtual textured paper surface height field [6] (Fig. 2). The brush shape is modeled by a height field defined by profile and cross-section curves, similar to the lofting procedure sometimes used in describing boat hulls. This allows a range of shape and pressure phenomena to be defined with only these two interactively drawn curves. Although it is not physically realistic [7], this model allows for more control and multidimensional expressiveness in mapping the gene data to the stroke image, offering an interesting challenge and opportunity for visualization. One such challenge arose in creating a suitable and unique luminous white-on-black aesthetic that captures the character of calligraphy (Article Frontispiece).
Image resolution posed an additional challenge. Within the Fowler gallery the projection expanded from approximately 4 to 45 feet across. The number of characters presented in the display resulted in a 200-×-200-pixel limit per character despite the large display area. An ongoing challenge is the construction of combined DNA and protein characters. The qualities of DNA and protein strokes are determined by several algorithms, including BetaTPred2 [8] and BEND [9], and key characteristics of DNA such as G+C content [10]. The BEND algorithm, which predicts ideal DNA helix bending and curvature, is used to define the curvature for strokes representing DNA sequences. Its output results in helical strokes for each DNA sequence. Establishing a system in which each character is composed of numerous strokes given these limitations required extensive tuning of the brush model. One solution was to perform a mathematical transformation on the DNA data so that each stroke represents the difference from the predicted helix bending and curvature along a sequence produced by the BEND algorithm rather than its direct output. This approach achieves the desired calligraphic aesthetic while representing the underlying data (Figs 3 and 4). The details of the transformation and mapping of data to visual form lie outside the scope of this
Aesthetic of slowness. (left) Slow motion results in continuous and sustained gesture traces. (right) Swift motion results in scattered, rapidly fading gestures.
[End Page 290]
Human gene character undergoing BLAST. The human gene (translated into protein) selected from the characters on the vertical axis is enlarged in the central area where the viewer's gesture traces had been. The collection of points at the upper left represents the query sequence being segmented into "words" that are compared to the target database sequences depicted on the horizontal axis.
paper, but the group is continuing its research into devising a schema for the assembly of characters combining both DNA and protein strokes and hopes to make its findings available in further publications.
An Aesthetic of Slowness
Ecce Homology's interactive experience stems from an aesthetic of slowness and encourages stillness, slow movement and contemplative presence. The installation is an open-ended experience. No knowledge of genomics is required, but additional information detailing the scientific underpinnings of the installation is provided outside the installation in an adjoining reading room. Visitors can learn more, then return to re-experience it with the benefit of that knowledge.
The work is projected on a long, blank, black wall. Characters along the vertical axis represent human genes, the "subject" database. The characters on the horizontal axes represent genes from other organisms (e.g. rice), the "target" database. When there is no viewer present, the installation remains in a contracted state, exhibiting a shimmering, "breathing" motion—a small projection in the middle of a wall (Fig. 5). When a viewer stands still long enough in front of this display, the system initiates an expansion of the projection into a 45-ft-wide by 12-ft-high collection of calligraphic gene characters.
At the conclusion of the expansion, a central space remains at the intersection of the horizontal and vertical axes of characters, in which multiple users can interact simultaneously with the piece through a computer-vision and feature-extraction system in which hand movement (or hand-like movement) is detected and rendered in real time near the calligraphic forms. By moving in the space, users draw their own light-filled calligraphic characters. As one's motion slows, the light-filled gesture traces become more continuous and persistent. This dynamic, the inverse of what might be expected, is an aesthetic element intended to call awareness to the priority placed on the speed with which the BLAST algorithm operates and its accepted black box nature. It is also designed to evoke a sustained sense of presence and contemplation for visitors (Fig. 6).
Pattern matching compares the user-drawn forms with those of human genes and provides a multi-user selection mechanism. When a sufficiently close match is found, a human gene character from the vertical axis is selected. This use of pattern matching as a selection process is a metaphor for BLAST. Upon selection, a human gene character is enlarged and displayed in the central space where the user's gesture traces had been, and the BLAST engine compares it to the genes from other organisms residing in the "target" database (Fig. 7). The algorithmic process and the homology it reveals are visualized in real time for viewers.
When a viewer is no longer present, the installation returns to its initial, contracted state and resumes a shimmering "breathing" motion. New subsets of gene characters are selected from the subject and target databases for the installation's vertical and horizontal display axes, and the installation awaits its subsequent viewers.
The intuitive and immersive experience for visitors in the installation is created via the integration of a complex array of interrelated software modules running on a computer network using a [End Page 291]
Modular approach. Due to the complexity and computational requirements of the installation it was implemented as a collection of independent software modules running on multiple machines and networked together over TCP/IP using the Java-based middleware framework Kolo [17].
Java-based middleware framework, Kolo, developed at the UCLA Hypermedia Studio [11] (Fig. 8). As users move in the installation space, position information generated by the computer vision module is forwarded to the pattern matching module and the graphics modules that render user movement. (Three modules collectively render the five projectors' individual images.) When a match between user-drawn and database forms are found, a BLAST run is triggered. The standard BLAST algorithm [12] is implemented in C++ code that outputs intermediate information about its progress to the graphics modules through Kolo. The graphics modules render the progress of BLAST's sequence segmentation and comparison using the calligraphic representation of the genomic data.
Being Both and Neither
in silico v1.0 is a self-organizing collaboration that arose outside of institutional structures while functioning effectively and productively within an academic environment. We aspire to enhance and contribute to invention and discovery in multiple disciplines simultaneously. Several members came to the group with "hybrid" backgrounds (e.g. English literature and biology, engineering and performance, media art and computer science, media art and molecular biology) and others with discipline-specific backgrounds (bioinformatics, computer graphics and biochemistry). Our working process required significant effort in integrating multiple worldviews and values. Collective working values were established at the outset and evolved as the work progressed. The group established a fundamental priority that the science could not be compromised for the artistic goals of the project, yet the artistic/aesthetic goals were to be equal in value to the underlying science informing the work. Over and above their own disciplinary contributions, all team members were responsible for participating in the artistic process of the work's coming into being. Each member of the team wrestled with knowledge from other disciplines to understand the inner workings and develop an informed basis upon which to participate in the creation of the artwork. For example, the group read the original BLAST papers and consulted with bioinformatics experts. The resulting understanding then informed the aesthetic and artistic direction as well as the eventual implementation. Our study of calligraphy exemplifies group members' commitment to working outside of their own fields, and outside of an artistic context, in order to participate in a hybrid project. In addition to reading and experimentation, we received tutelage from Hirokazu Kosaka in the art of calligraphy. Such working practices allowed us as a group, and as individuals, to make coherent decisions throughout the development and implementation process. In essence, the process of creating Ecce Homology involved the making of creative decisions based on collaborators' areas of expertise and worldviews in a manner responsive to the evolving vision for the project. Due to our initial focus on integrating several disparate disciplines, the work may fall short when evaluated within any single discipline. Such an outcome is perhaps endemic to the hybrid products of interdisciplinary research. Yet Ecce Homology and our calligraphic visualization system hold unexamined potentials that require further development and study. In addition they offer insight into both the potential novelty of the contributions and inherent difficulties of interdisciplinary collaboration and its outcomes.
Additional information about Ecce Homology is available at <http://www.insilicov1.org>.
Ruth West is an artist with a background as a molecular genetics researcher. Working predominantly with computer-based media, West explores how artistic practice and aesthetic experience can nurture scientific discovery. West is director of Visual Analytics and Interactive Technologies for the UCSD National Center for Microscopy and Imaging Research and a research associate at the UCSD Center for Research in Computing and the Arts, where she is the first CalIT2 New Media Artist crossing over to the Digitally Enabled Genomic Medicine Layer.
Jeff Burke is a researcher and lecturer in the UCLA School of Theater, Film and Television. His current focus is on developing a new joint Center for Research in Media, Engineering and Performance (REMAP) with the Henry Samueli School of Engineering and Applied Science. He has co-authored, designed, programmed or produced performances and new genre installations exhibited in eight countries from 1999 to 2005, collaborating with diverse teams for their design and production.
Cheryl Kerfeld is a protein crystallographer in the UCLA-DOE Center for Genomics and Proteomics and an academic administrator in Honors and Undergraduate Programs in the College of Letters and Sciences. As an administrator Kerfeld, who also holds an M.A. in English literature, develops cross-disciplinary curricula for the life sciences.
Eitan Mendelowitz has collaborated on interactive art installations exhibited internationally at venues including SIGGRAPH, ArtFutura and Ars Electronica. Currently a researcher at the UCLA Hypermedia Studio, Eitan is co-developing Kolo, a ubiquitous computing framework for use in the performing and media arts. Eitan is pursuing his Ph.D. in artificial intelligence from UCLA and recently achieved an MFA (2002) from the Department of Design | Media Arts. His interests include the use of intelligent environments, sensor networks and behavior-based artificial intelligence in the arts.
Thomas Holton is a software developer in the UCLA-DOE Center for Genomics and Proteomics. He is also developing software for the TB Consortium, which will allow scientists to share information about TB proteins and their structures. He was an original member in the TEXTAL Project, a pattern recognition system for the automatic determination of protein structure from electron density maps. He received an MS in biochemistry from Texas A&M, a BS in chemistry and BA in music from University of Tennessee, Knoxville.
J.P. Lewis is a computer graphics researcher in the Computer Graphics and Immersive Technology lab at USC's Integrated Media Systems Center. Previously he has worked at think tanks including Interval Research (Palo Alto) and NEC Research (Princeton) and in the special effects industry. He has software development credits on Forrest Gump and several other films, and his algorithms have been incorporated in several leading graphics packages.
Ethan Drucker is currently pursuing his Ph.D. at the UCLA Computer Graphics laboratory.
Weihong Yan is the manager of the Bioinformatics User Facility at UCLA. She has degrees in electrical engineering and biology. Her research experience includes signal processing, cell biology and bioinformatics. She currently leads bioinformatics training workshops and provides consultation and system support for bioinformatic hardware and software.
Acknowledgments
Ecce Homology is sponsored by Intel Corporation, NEC Solutions America, Inc., Visual Systems Division, University of California at Los Angeles Technology Sandbox, UCLA Academic Technology Services, Computer Graphics and Immersive Technology Laboratory, University of Southern California Integrated Media Systems Center, University of California at San Diego Center for Research in Computing and the Arts, UCLA HyperMedia Studio, UCSD Sixth College: Culture, Art and Technology, and the University of California Institute for Research in the Arts. We also gratefully acknowledge the instruction in calligraphy offered us by Hirokazu Kosaka, Buddhist priest, calligrapher and director of the Japanese American Cultural and Community Center in Los Angeles. Special thanks to Neil R. Smalheiser, BetaTPred2 and the UCLA Fowler Museum. [End Page 292]
References and Notes
Footnotes
Based on a paper presented at "Artists in Industry and the Academy," a special section of the 92nd Annual Conference of the College Art Association, Seattle, WA, 18-21 February 2004.