MINE User's Manual



The User manual is under construction.  In the future we hope it will contain expanded instructions for all key functions in MINE (most scripts in MINE have basic instructions) and numerous specific examples that illustrate how to use MINE to 'explore and data-mine' sequences.


How do I....

How do I Use the Search and Display Engine for exploration and data-mining:
How do I find all entries with the words glycoprotein AND kinase?
How do I find all entries with the words glycoprotein OR kinase?
How do I find all entries WITHOUT the words glycoprotein OR kinase?
The search engine won't accept more than one word.  How do I find "sus scrofa"?
How do I search all sequences for the motif "CAGTGGATAC"?
How do I search for all sequences containing microsatellites (repeats of motifs of 1-6 bp)?
How do I get the right information to plot in Excel the G+C content versus sequence length for all sequences in the database?
How do I get a subset of sequences into fasta format into a new file to put into PAUP?



How do I Use the Search and Display Engine for exploration and data-mining:
The Search and Display Engine allows users to search data files and display search results in a variety of formats.  The Search Engine can be used for a variety of practical applications (e.g. checking entry quality, making lists of entries and features for 'housekeeping' purposes or to feature at other sites within a MINE database) but more importantly, it has a variety of features that make it useful in exploring, data-mining, and testing basic hypotheses about the features of selected data sets.  Basic instructions for using the Search Engine are found on the Search Engine Page.   Here are some more specific examples:

How do I find all entries with the words glycoprotein AND kinase?
Use the SEARCH ENGINE and multiple searches to act as AND statements:
1. wholefile contains glycoprotein
2. (click MORE button)
3. wholefile contains kinase
4. (click Search & Display)

How do I find all entries with the words glycoprotein OR kinase?
Use the SEARCH ENGINE and the "|" (OR) symbol in your pattern:
1. wholefile contains glycoprotein|kinase
2. (click Search & Display)

How do I find all entries WITHOUT the words glycoprotein OR kinase?
Use the SEARCH ENGINE and the "does not contain" comparison SEARCH option:
1. wholefile doesnotcontain glycoprotein|kinase
2. (click Search & Display)

The search engine won't accept more than one word.  How do I find "sus scrofa"?
Use wildcards to match spaces.  The search engine only takes in the first "word" it sees and stops at any whitespace.  You can get around this by using the "." symbol which matches any letter, number, or whitespace. The + symbol means "1 or more" and the * symbol means "from 0 to infinity". The following will match "sus scrofa".

1. wholefile contains sus(.)+scrofa (match a single space, letter or number between sus and scrofa)
2. (click Search & Display)

1. wholefile contains sus(.)*scrofa (match zero or infinite characters between sus and scrofa)
2. (click Search & Display)

How do I search all sequences for the motif "CAGTGGATAC"?
Use the SEARCH ENGINE with the "contains" SEARCH option:
1. seq contains CAGTGGATAC
2. (click Search & Display)
(For more advanced pattern matching with more useful output use the script Page_Regex.cgi.)

How do I search for all sequences containing microsatellites (repeats of motifs of 1-6 bp)?
Use the SEARCH ENGINE with the "contains" SEARCH option. You can use this complex but correct regular expression as your pattern:
1. seq contains (.)\1{7,}|(.{2,3}?)\2{3,}|(.{4,100}?)\3{2,}
2. (click Search & Display)
For more advanced pattern matching using regular expression, or for searches for patterns that occur frequently, you'll get much more useful output with the PATTERN SEARCHING script, Page_Regex.cgi.  See 'All scripts' in the MINE menu for a link to this script.  It contains further instructions and a list of useful regular expressions.

How do I get the right information to plot in Excel the G+C content versus sequence length for all sequences in the database?
Use SEARCH ENGINE with report format DISPLAY option:
1. Select "Report Format" from the Display options.
2. Tick off gc and seq_len
3. File_name contains db (will capture all files since the all end in db)
4. (click Search & Display)
5. Copy and paste the results in Excel, import using the ** symbols as column separators
6. Use Excel to graph the results


How do I get a subset of sequences into fasta format into a new file to put into PAUP?
Use the SEARCH ENGINE with fasta format DISPLAY option
1. Select "Fasta format" from the Display options.
2. File_name contains (your pattern)
4. (click Search & Display)
5. Copy and Paste your fasta formatted sequences into a new file, or save the results of your search using a new name.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

END