Lambda the Ultimate

inactiveTopic Computational biolinguistics
started 9/28/2002; 7:46:36 AM - last post 9/29/2002; 2:16:59 PM
Ehud Lamm - Computational biolinguistics  blueArrow
9/28/2002; 7:46:36 AM (reads: 797, responses: 6)
Computational biolinguistics
The National Science Foundation has made a $9 million, five-year grant to a collaboration of Carnegie Mellon University computer scientists, University of Pittsburgh and Massachusetts Institute of Technology biological chemists, and others from Boston University and the National Canadian Research Council to advance a new field called computational biolinguistics.

Computational biolinguistics, which combines the use of computational tools, including statistical language modeling, machine learning methods and high-level language processing, will allow scientists to better understand how proteins work inside cells.

I was always skpetical about this approach. It seems to me that we have still a lot to learn at the lab -- studying chemical and biological processes. Moving from genome sequences to what actually happens inside cells, is much more complicated than it looks.


Posted to general by Ehud Lamm on 9/28/02; 7:48:11 AM

Ehud Lamm - Re: Computational biolinguistics  blueArrow
9/28/2002; 3:19:20 PM (reads: 792, responses: 0)
This is like trying to figure out a language, when you don't know nither the syntax, nor the semantics -- and you don't even know how the hardware works.

Would you try to read the source of an interpreter, if one was available? Sure you would. This means doing biological experiments. Statistical analysis just won't cut it.

By the way, a similar apporach is used to try and understand neuro-signaling. I have the same doubts about that line of research too.

Michael Vanier - Re: Computational biolinguistics  blueArrow
9/28/2002; 8:19:09 PM (reads: 788, responses: 1)
I think the use of the word "language" is just hype. It's not really "language" in the sense that we use it here on LtU. Why not just call all of physics and chemistry "linguistics"? Sure, DNA is sort of a language, but I think it's stretching the analogy.

If you had as much experience as I have in the area of computational modeling of biological processes (which is what my Ph.D. work was on), you would realize that hype is the order of the day in almost all cases. That doesn't mean that the work is worthless; only that things are never as they seem.

Ehud Lamm - Re: Computational biolinguistics  blueArrow
9/28/2002; 11:27:20 PM (reads: 816, responses: 0)
I think we are in agreement

jon fernquest - Re: Computational biolinguistics  blueArrow
9/29/2002; 2:37:54 AM (reads: 772, responses: 0)
I was reading up on the Aho-Corasick algorithm (quick matching of a set of strings against a text) last year in a book on *string-matching algorithms* and most of the later part of the book was on genetic-biological applications. Today I just happened upon the homepage of Lloyd Allison who has an extensive sub-page devoted to the topic worth checking out:

http://www.csse.monash.edu.au/~lloyd/tildeStrings/

Noel Welsh - Re: Computational biolinguistics  blueArrow
9/29/2002; 2:13:24 PM (reads: 750, responses: 1)

Statistical analysis just won't cut it.

My understanding is that all these processes are error-prone; you don't get the same result every time. So it seems that statistical analysis is a valid method to use.

Ehud Lamm - Re: Computational biolinguistics  blueArrow
9/29/2002; 2:16:59 PM (reads: 783, responses: 0)
To clarify: the problem isn't statistics. Statistics are used throughout science. The problem is thinking that a statistical inference algorithms taking from machine learning will be able to give you a clue as to the intericate workings of the cell. It can help some, but the problem is that there is (a) too much data (b) too many dimensions (c) not enough data about important things (e.g., how proteins fold). etc.