3 credits

**Prerequisites:**

a) Beginning programming skills:

- An understanding of file organization on a UNIX system and basic UNIX system commands and scripting.
- Use of a text editor such as Emacs or Vi.
- Experience solving problems by designing and implementing simple algorithms.
- Basic syntax and semantics of a programming language such as Python, Unicon, C++, Java, or other high level programming language . The course will use Python.
- A familiarity with the programming concepts of functions, control structures, types, recursive algorithms, and random number generation.
- Simple data structures such as arrays, and trees.

b) A solid intuition of basic math and statistics including distributions, probability, conditional probability, random variables, introductory calculus, matrices, binary, bit operations, graphs (nodes an arcs).

c) A willingness to think of a computer as both a toy to play with and a tool to experiment with.

**References:**

Under construction!

Editors

Linux Notes- A Regular Expression Tutorial
- A simplified version of unix history.

- TBD

- AMS math guide
- LaTeX for Computer Scientists
- The Giant Book of Symbols
- Cool tool that lets you draw a latex symbol and it will do pattern matching to look it up. Try it! Don't always pick the first symbol it picks.
- The Source for all things TeX and LaTeX

**Goal:**

To understand the major biological problems related to sequence analysis and the algorithms/data structures behind the major bioinformatic tools used to solve them. The course will expose the students to a computer science perspective of design and implementation of algorithms but does not teach a specific language and instead assumes the student will acquire that language in advance. This does not study how to use major analysis tools. That is a different course.

**Topics:**

- A very compact introduction to the biology of biological sequences
- Pairwise sequence alignment
- Sequence search with approximate matching
- Markov chains and hidden Markov chains
- Identification of sequence families
- Multiple sequence alignment
- Deterministic phylogenetic analysis
- Bootstrapping phylogenies
- Probabilistic phylogenetic analysis
- Evolutionary computation applied to phylogenetic analysis
- Shotgun sequence assembly
- Grammatical approaches to RNA structure
- Epsistatic analysis