Bin Wang
  contact info:  
  languages: fluent in Chinese and English  
  Interest areas: biochemistry, structural biology and bioinformatics  
  March 8, 2002
Bin Wang is a PhD student in Biochemistry at the University of Utah and is currently finishing a dissertation on using Nuclear Magnetic Resonance and other biochemical techniques to study protein structure and protein ligand interaction, as part of a larger effort to understand the structure and functioning of the HIV Virus. Bin is also about to receive a Master's degree from Utah State University in Computer Science. His Master's thesis attempts to improve upon the implementation of an algorithm commonly used within biological science, by achieving a faster solution. For this project he used the Cell Matrix™ architecture, and he demonstrated his approach and obtained his results using the Cell Matrix Layout Editor™ and Cell Matrix Simulator™. This month, Bin will defend his thesis and become the the first person awarded an advanced degree for Cell Matrix research.
The problem Bin worked with is sequence alignment, which is the way in which a length of DNA, RNA or protein is matched up to another. Sequence alignment can be used to see the similarities between two materials, such as for instance the similarities between the genetic material of Neanderthals and homo sapiens. It can also be used probablistically, to identify a particular sequence, given the "noise" encountered in the real world of the experimentalist when isolating the material. It can also be used to assemble a long strand from smaller overlapping strand samples by finding the regions where they overlap.
     Many sequence alignment programs and packages have been written. Biologists can go to a web site to submit their sequence for comparison against a large database of known sequences; one site commonly used is NCBI's, which uses the BLAST search protocol for narrowing the search and then doing the comparison. Sequence alignment is generally done using the dynamic programming algorithm taught in computer science theory courses, as applied to sequence aligment. To find the place where, for example, two DNA sequences best line up with each other, one sequence is shifted relative to the other, and a score is obtained for how often the individual amino acid complexes A T G and C correspond. The algorithm also accounts for the possibility that there is some noise in the two sequences, and thus that there are missing sections, "typos" or other noise, by permitting each sequence to advance forward while the other is not advanced. A full description of the basic algorithm is well presented in Setubal and Meidanis' textbook on computational molecular biology, Introduction to Computational Molecular Biology. Robert Giegerich and David Wheeler, of the VSNS Biocomputing Division at the University of Bielefeld, have an online description and example of sequence alignment using dynamic programming, as well as further information about sequence alignment methods and variations with examples.
     Although the preceding mechanical description sounds very sequential, the algorithm is actually highly parallelizable in terms of data dependencies. Roughly, this is because the score obtained by one alignment is not needed to obtain a score for another. However, most implementations of this algorithm are sequential, because they were written to run on sequential CPU/memory machines, and often for machines with a single CPU. The sequential version of the algorithm requires n*m steps where the sequences are of length n and m. This is because sequence A of length n must be shifted n times and compared to the m elements of sequence B after each shift.
     Using the Cell Matrix architecture, and the DES encryption key cracker from a paper by Durbeck and Macias as an example, Bin was able to go back to first principles and design an ideal machine for the sequence alignment algorithm. Bin's version takes only O(n) steps to compare two sequences of maximum length n. His sequence alignment achieves the best possible performance, linear time. "The time required is linear with the sequence length; it's actually about 1.414n, which is the length of the diagonal" Bin reported, after running simulations with short sequences and comparing the runtimes. To our knowledge, this is the first implementation of sequence alignment that achieves linear performance, albeit by hardware requirements that grow quadratically: the 2D Cell Matrix needed to implement the dynamic programming matrix in parallel grows in 2D as the lengths of the two sequences increase.
     Once he publishes his work, you will be able to read it on our publications page. His layout is based upon, and looks much like a detailed version of, the illustrations of the algorithm for comparing two sequences from Section 3.3.2 of Introduction to Computational Molecular Biology. His design uses purely combinatorial logic -- no state information is generated or used, and no CPU processors are involved. "Several of the solutions within Bin's design are really clever and elegant," says Cell Matrix Corporation Director Nicholas Macias, who was a member of Bin's thesis committee.
"The Cell Matrix is a very interesting architecture; a lot of exciting things can be done on this platform," says Bin. He also sees a lot of potential for building a Cell Matrix using nanotechnology or molecular engineering. "I think Cell Matrix is a great architecture for nanotechnology because of its uniformity and the fact that it can be built from bottom up. Chemistry or biochemistry can provide the technique to easily build such a homogenous architecture. For instance, you can imagine a cell in a Cell Matrix being a single crystal lattice composed of chemical or biochemical molecules encoding the necessary circuit for its simple function. The whole crystal with interconneting crystal lattice is then the Cell Matrix architecture itself. It can then be configured or evolve into some useful circuit, just like Cell Matrices implemented using silicon technology."
     "It's just very exciting to imagine a Cell Matrix growing out of a solution like a crystal. Because it is hard to avoid having flaws in a crystal, the inherent fault tolerance of the Cell Matrix architecture will make this approach more practical."
After graduation, Bin intends to continue working as an experimentalist in biology, but hopes to incorporate his computer science skills as well. His future plans center around bioinformatics and structural biology, and he intends to work at developing algorithms and software for bioinformatics data analysis, and at solving protein structures and doing structure-based drug design.
  Selected Publications and Further Information:  
  B. Wang. Implementation of a dynamic programming algorithm for DNA sequence alignment on the Cell Matrix™ architecture. M.S. thesis, Utah State University, 2002. Available in Adobe PDF format.  
  Introduction to Computational Molecular Biology. Joao Carlos Setubal, Joao Meidanis  
  More about the lab where Bin is doing his doctoral work: Sundquist Lab