Review, Spring 2008

From InformaticsWiki

Jump to: navigation, search

Please use this page to ask questions in review for the final exam, the "Practicum."

Dr Byrne In question 15 you are asking to “Find some primate and rodent sequences for the protein Lin28B.” Do you mean we have to get few sequences of primates? Please give me some hint

  It's Dr. Moss's question, so check back later.  As I read it, yes, indeed multiple species for this protein.  Remember 
  that the tax browser allows you to work back up a taxonomy so you could specify "Primate" and then look in the 
  protein database that fit that criterion.  --Dr. Byrne 22:38, 22 April 2008 (EDT)



For #15 I am using lalign compare the two sequences. What do you mean by "Can you find whether any other mammals have these missing/extra sequences? What does your finding suggest about when these sequences arose during evolution?"

Thank You.

  Let's have Dr. Moss address that one.  --Dr. Byrne 22:38, 22 April 2008 (EDT)

Hello, this is what # 8 in Practice Review asking. "More recently, Boix-Chornet and colleagues described, in a J Biol Chem article in 2006, how global chromatin hypomethylation is a characteristic of cells undergoing apoptosis. Using the PubMed interface, what is the official symbol of the EntrezGene associated with this paper?" For this question I have already found the paper through pubmed and in the paper they discuss Trimethylated Histone H4 so I thought I have to find Gene Symbol for this thru EntrezGene which I did. However, I could not find gene symbol for Trimethylated Histone H4! What am I doing wrong? Thank You.

  The best way is to use the "Links" feature to the right of the reference you are seeking
  in the PubMed interface.  When you right-click on that, you'll see a link to "Gene."  The
  paper discusses "Trimethylated Histone H4," but that appears to be a post-transcriptional
  alteration to the gene.  PubMed links to HIST4H4.  Click on the image, below to clarify what
  you should be seeing in PubMed.  --Dr. Byrne 11:32, 22 April 2008 (EDT)

Hello, For question #9 in the practice_review, it is about dot matrix comparison. You mentioned to use window size 3 and stringency 2. What does it mean? Thank you.

  These are two variables that you can set; 'window size' is how long a length of sequence you will be 
  comparing; 'stringency' is the minimum number matches that must be positive before a score is noted and
  a dot put on the matrix.  --Dr. Byrne 11:32, 22 April 2008 (EDT)


Hello Dr.Byrne,

For question #8 I found the paper using pubmed then I found Trimethylated Histone H4 which is discussed throught the paper and then I went search for the Gene Symbol in EntrezGene; however I couldn't find the symbol. Can you please tell me I have done the write way?

Thank You.

  See response, above.  --Dr. Byrne 11:32, 22 April 2008 (EDT)

Dr Byrne

Please give us some guidance for BLOSUM62 question.

 It's a long question which we talked about last Wednesday night.  
 In creating this kind of substitution matrix, BLOSUM alignments are made over broad evolutionary distances 
 and without gaps.  They represent substitution scores (like probabilities) that represented divergence from 
 distant homologs using really conservatively defined regions, thought to mirror evolutionary reality. Each 
 BLOSUM matrix ignores sequences having more than a certain percentage identity:  BLOSUM62 is derived from 
 alignments of no more than 62% identity, for example.  The BLOSUM matrix you used should be suited to the 
 nature of the relationship you know you are dealin with.  In reality, most folks use BLOSUM62.  While the 
 value of each substitution may seem to reflect biochemistry, the score is not taken from some chemical
 assumption, it is based in biological reality which 'does' reflect both biochemistry and abundance, it 
 appears.  --Dr. Byrne 11:32, 22 April 2008 (EDT)

Dr Byrne

For question 11,can we change BLOSUM62 to lower (say BLOSUM45).

  Exactly so.  BTW, following Alex's suggestion, I'll post a key to WebCT shortly and email everyone with that 
  fact.  


After presentation you have suggested me a website for looking up the number of citations. Please let me know the website.

  You may be referring to Web of Science.  Look at the Information Integration exercise.  --Dr. Byrne 21:46, 21 April 2008 (EDT)

For question no. 6, please give me some hint. I used pubmed and limits function but was not successful.

  Apologies, it was not a well-phrased question as it referred to a paper from 'last' semester.  Please ignore it.  --Dr. Byrne 11:32, 22 April 2008 (EDT)

T hank you.


Dr Moss

Please let me how to retrieve sequence for a protein of a particular organism (eg bacteria, worm). I know it’s a simple question, but I was stuck several times. Hence I want to know the trick. Thank you.

 The simplest way is to search Entrez with the gene name and organism.--Dr. Moss 06:20, 22 April 2008 (EDT) 

Just curious if you can post answers today so we can double check what we did?--Alex Shalman 12:21, 22 April 2008 (EDT)

 Working on it now --Dr. Byrne 15:01, 22 April 2008 (EDT)

Question #12: NCBI > Search Structure for Zinc > this gives me an answer of 4545, the answer you gave is 4,094. What am I missing?--Alex Shalman 10:06, 23 April 2008 (EDT)

 Good question.  When you just search the Structure database for 'zinc,' 
 you are doing one of those unstructured   searches that may lead to less 
 than specific answers because it's a text search.  If you go to the structure
 page for the protein in the question, 1TUP, you'll see the zinc ligands 
 listed at the bottom.  Click on those and   you go to the 'structured' PubChem 
 page for zinc.  That's where you'll get a better figure.  --Dr. Byrne 10:21, 23 April 2008 (EDT)

Notes for final exam and review. April 16, 2008

These are the competencies we want to see demonstrated. In some cases that will mean phrasing a description or explanation in your own words. In other cases it will mean performing a search to retrieve data or using an EMBOSS (or other website) tool to evaluate data or sets of data. So where below we use verbs like “understand,” “distinguish,” etc. such understanding will be tested by actually using the tool or database

NCBI databases and tools

  • Find and evaluate data, especially nucleic acid, protein and structures
    • Distinguish the value and limitations of RefSeq
    • Understand the value of non-RefSeq entries
  • Appreciate the centrality of TaxBrowser in finding data
  • Appreciate EntrezGene as a concentrator of data from a variety of resources

BLAST

  • Evaluate output
  • Use taxonomy tools in output to select sequences from desired taxa

Comparisons

  • Understand when and where the following and should be used
    • Local vs. global alignments
    • Multiple sequence alignments
    • dot plots
  • Understand the value of the following types of substitution matrices
    • BLOSUM
    • Psi-BLAST matrix
  • What value do Vector Alignment (VAST) searches add to traditional sequence searches and comparisons.

Characterizations

  • What is the value of describing or understanding a protein domain?

Whole genome databases

  • What kinds of information are available in genome databases?
  • What is the difference between a genetic map and a physical/genomic map
  • How can you visualize sequence conservation across genomes using databases?
  • How can you find alleles, mutations, chromosomal abnormalities associated with particular genes, phenotypes and diseases?

Literature

  • Effectively use PubMed
  • Describe NIH’s Open Access policy and its implications
  • Trace threads of literature forward in time
    • What does this mean?

Personal tools