Input, Format, View
Analyze
Extras
Site Management
Q8r1-96 Genome Database
Submit
Blast
All Scripts
Admin
HOME
Edit
G+C & More
Link MINE
"MINE"
List of Genes
Translate & Convert
Remote Tools
Manual
HOME
Index
View Data Log
Search & Display
Comments
MINE
Related Links
Statistics
Search Engine
Web Log
FAQ
Contact Us
         

MINE FAQ

CGI-MINE.pl


What is MINE?

MINE: Molecular INformation Explorer

MINE is an acronym for 'Molecular INformation Explorer', but the word MINE itself better conveys the goals of this project. 'Mine' can either be a place (noun; 'a mine') or an action (verb 'to mine'). The same holds true for MINE: it is designed to be a 'mine', or a place to store and organize sequence data online, but it is also a tool to allow one to manipulate and analyze this data, or 'mine' it, for clues about sequence function and evolution.

In practical terms, MINE is a set of integrated Perl-CGI scripts designed to facilitate the management and data-mining of large sets of primary sequence data in a simple way. (See the philosophy of MINE).


The Purpose of the MINE project and the PfSBW25 Encyclopaedia

MINE and its first database, the PfSBW25 Encyclopaedia, are being developed in a collaboration between Andrew Spiers (database curator) and Dawn Field (developer of the code in MINE). We began working on MINE to fulfill the need for a robust, but simple way of making the results of ``small-scale'' (non-genome scale) sequencing projects public.

The data produced from such projects is often short-run non-continguous sequence (SRNC) data and has been generated from a number of independent experiments carried out by one or more research labs. MINE offers a unique way to store and organize such data in an informative, Internet accessible database.

More importantly, MINE takes the first steps towards allowing users to conduct their own data-mining 'experiments' on the complete contents of the database. The current data-mining features of MINE include extensive access to blast searching (full featured blast interface), and the ability to query the complete contents of the database with a custom written search engine. MINE is built to facilitate piping the contents of the database into other bioinformatic resources and the results of searching the contents of the database can be displayed and saved in a variety of formats (file names, fasta format,report form etc).

MINE is simultaneously being developed with the hopes of creating and making available some 'simple' code that facilitates the learning and teaching of Perl programming in bioinformatics. All the code in MINE has been written to be as simple and clear to read as possible and is fully annotated, and all the scripts in MINE have web links to their own source code. There are also directions for getting started writing your own Perl-CGI scripts using a simple template script.

If you have questions or comments specifically about the code in MINE or are interested in using MINE in any way please contact Dawn Field at dfield@molbiol.ox.ac.uk. If you have questions about the contents of the PfSBW25 Encyclopaedia please contact Andrew Spiers at andrew.spiers@plant-sciences.ox.ac.uk.


The Philosophy of MINE

The Philosophy of MINE is captured in three mottos (cliches)!

Simple is best! Don't reinvent the wheel. Feel the force, read the source.

MINE is simply meant to be a bit of easy-to-use Perl scripting that ties together some of the most common functions in DNA sequence analysis (e.g. blasting). Where ever possible, MINE choses the simple over the complicated and strongly opts to pipe things to existing programs instead of reinventing the wheel. MINE also subscribes strongly to the belief that bioinformatics and simple programming should be demystified for the interested biologist and simple scripts and basic tools should be extremely accessible for those that want to move into the field. If you understand the 'source' (code), you not only understand better what you are doing, you also have the power to change it!


What hardware and software is needed to run MINE?

System Requirements

EndUser: Just a web browser is needed to use/access MINE once it is installed on a server running unix. We prefer Netscape Navigator, because unlike IE, it comes with a free html editor! :)

WebMaster: MINE should be optimally installed in a unix environment if you wish to use bioperl and blastall. (If you have an academic e-mail account, you probably have access to what you need. Ask your system administrator about installing Perl-CGI scripts.) MINE should also run on a PC or Macintosh running Perl 5.0 or higher and server software. MINE has been written on Sun Solaris unix server. Although there has been no effort to make MINE portable, you will find it should run on a mac running MacPerl and server software. This is just a benefit of using perl. It's smart. If you would like to test it on a PC, feel free and report back what you find. Of course if you don't have the bioperl modules (only for unix) or the blastall suite of programs (binary for unix only) you won't be able to use the corresponding scripts that use these 'outside' applications.

For installation you need a working version of the Perl (use 5.0 or higher). You will absolutely need the module CGI.pm ( http://stein.cshl.org/WWW/CGI/ ), but this should be included in any standard distribution of PERL. With these two things, you are ready to use MINE. If you want MINE to do local blast searches for you, you will need to install the free blastall suite of programs (NCBI). If you want to use the script Page_Change_Format.cgi (see Change Format on the menu) you will need to install the BIOPERL package of modules ( http://bio.perl.org ). If you don't have blastall or BIOPERL installed, don't worry, everything else except the blast script and the change formats script will work fine! Just follow the instructions for installing.

MINE itself only takes up about 700 kb of space (including copies of the source code, or '_source' files that can be seen with a browser), but you will need to factor in how much disk space your database will take. The biggest resource hog is of course dependent on your plans for running blast searches. TblastX searches are computationally intensive, and of course large blast databases take up lots of space (e.g. the fasta file download of genbank is going on 5 Gb).


What are the core functionalties of MINE?

MINE is essentially built around three core scripts. These core scripts (and their associated support scripts) provide a web-based environment in which to 1) INPUT DATA (as name=value fields) 2) do unlimited BLAST searches and store hyperlinked blast report files with their associated database entries, and 3) use the SEARCH ENGINE to create sophisticated queries that can retrieve any subset of the data (e.g. all sequences greater than 400 nucleotides, with G+C content greater than 60%, with at least one mononucleotide run of G's longer than 7 nucleotides). Search results can be displayed and saved in a variety of formats (file names, fasta format, report form etc). This assures that SRNC data can easily be exported into spreadsheet applications (e.g. Excel accepts the column-based report format) and a wide variety of more specialized bioinformatic tools (e.g. GCG uses lists of files and software for sequence alignment and phylogenetic analysis accepts fasta formatted sequences).

In addition to these scripts there are also scripts that allow users to: 1) view all files, 2) calculate basic statistics, 3) change the format of any entry (GCG, fasta etc) 4) summarize blast reports, 5) make blast databases, 6) do batch analysis (e.g. G+C content), 7) log comments (user/collaborators), 8) log hyper-links to important search results and supplementary information, 9) log visitors. Also included in MINE (Version 1.0) are simple command line utility scripts (e.g. an FTP tool to automatically retrieve remote documents) and very basic, fully annotated Perl-CGI scripts that are made to demonstrate simple tasks for beginners who are interested in learning how Perl code (and MINE) works.

It is envisioned that the database will be used by multiple people and that these people might be in different labs. It is therefore essential to communicate about bugs, suggestions, changes to the data or the database etc. The Page_Comments.cgi script provides a simple bulletin-board CGI-script. This script is simple to customize to related purposes. You could clone it and change it subtly to have a second more public comment board for EndUsers (people that visit the site from the web). The script Page_Results.cgi (Results option in menu) was created by slightly altering the Page_Comments.cgi script.


Why are there COMMAND LINE UTILITIES included?

Utilities useful for doing extra things with MINE (click on 'All scripts' in the MINE menu and scroll down the list). Mostly, these are small pieces of command line code that do somethings extra....They may also provide a starting point for writing new scripts that do similar things...


How do I Convert file formats (MINE, html, fasta, genbank etc)

There are currently two main scripts that help you convert file formats in MINE The ``Search'' script (select ``Search'' in the MINE menu) page allows you to display file as 1) a list of file names) 2) complete content 3) specific fields only 4)fasta format and 5) report form (column based output e.g. for export to excel). The ``Format'' script (select ``Format'' in the MINE menu) currently allows you to see the contents of a file as html, fasta, reverse complement, or in translation.

If you can get a file into fasta format you can go pretty easily to other formats. In the future, perhaps a larger file converter script (written on top of the SeqIO.pm module from BIOPERL) will be added.

There are three ways in MINE to get .db files into fasta format.

1) To format all the ``.db'' files in the database as individual fasta files: select the option ``extended menu'' from the MINE menu and scroll down to the script ``Page_Make_Source_ADMIN.cgi''. Click on this script and you will see a button on the new web page that says ``Update the fasta files...''. If you click on this, all your ``.db'' files with automatically get copied to fasta formatted files now named with the extension ``.db.fasta''. If you haven't already, put the extension ``db.fasta'' into the MINE_Customization_File where it says ``extensions to show in database log''. You will then see a list of .fasta files in the new column in your Database Log.

        pico -w MINE_Customization_File                                 # open the file in an editor (e.g. pico)
        # list all the extensions used in the database so the database log can be generated
        @extensions = (
                "QBR.....db",
                "QBR.....db.fasta",
                "db_blastn.blast.html",
                "db_TblastX.blast.html",                                # edit the list of extension to include ".db.fasta"
                "blast_database"
                );

2) Or you can go to the search page and select to display file in fasta format. For instance if you select ``display as fasta'' and do a search for all file names containing ``.db'', you will get a concatentated list of all your ``.db'' files in fasta format. You then have the option to copy this information to a permanent file.

3) If you only want to change a single file (and not make a 'hard-copy' of it) go to the menu and select the option ``Format''. Type in the file you want to conver and select ``fasta''. The fasta formatted information will appear on screen and you can copy and paste it if you like.


How fast will MINE work?

If you are running MINE on your own machine, you'll know how many jobs are running at once. But like, most people, you are running MINE on a large, multi-user UNIX machine. This means sometimes a task (e.g. a TblastX search) will finish quickly, and other times they will seem to take ages. The speed of all tasks undertaked within MINE is a function of the current load on your machine.


How to check the CPU load

If you are curious about the current load on the machine (or about who exactly is hoggin all the CPU) here is the way to check. Pull up a telent window and type the command 'top'. The top command will produce a list of all the 'top' cpu consuming processes (jobs) running on your machine along with a lot of statistics.

For example, right now it's VERY FULL - See the 'CPU states: 0.0% idle' below? That means no CPU left for new processes!

        last pid: 12096;  load averages: 15.38, 13.37, 13.62                   12:36:06
        445 processes: 421 sleeping, 3 running, 11 zombie, 2 stopped, 8 on cpu
        CPU states:  0.0% idle, 92.6% user,  7.4% kernel,  0.0% iowait,  0.0% swap
        Memory: 8192M real, 484M free, 2048M swap in use, 208K swap free
          PID USERNAME THR PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
        11995 apache    12   0   19   69M   45M cpu1    1:01 15.04% blastall
         6730 haem       8   0   19   45M   16M cpu13  63.7H 15.02% fasta33_t
        10145 haem       1   0    0  135M  128M cpu8   46:30  8.61% stash
        23835 manager    1   0    0  140M  128M run   119:12  8.50% srsbuild
        23836 manager    1   0    0  134M  117M cpu9  118:58  8.28% srsbuild
        12464 haem       1   0    0  170M  156M cpu5   63:25  8.27% stash
          818 haem       1   0    0 1438M   57M run    39.5H  8.26% sim4
         2771 haem       1  10    0 1438M  137M cpu12  39.1H  8.16% sim4
        23834 manager    1   0    0  116M   97M run   117:42  8.13% srsbuild
        11980 manager    1  60    0 1216K  880K sleep   0:21  4.42% gunzip
        11634 manager    1  37    0 1592K 1360K sleep   0:04  0.60% ftp
        12073 dfield     1   0    0 1600K 1416K cpu0    0:01  0.57% top
        12036 dfield     1  48    0 2360K 2184K sleep   0:00  0.15% tcsh
        12070 lpeiser    1  50    0 2104K 1944K sleep   0:00  0.11% tcsh
        12022 jbath      1  28    0 2216K 2072K sleep   0:00  0.08% tcsh


How and Why to set up Cron (automatic) jobs within MINE

Cron is the word used to describe tasks that are done by your computer automatically. It's very easy to set up a unix machine to do task for you even if you are not there to start them going. By filling up a file called the 'crontab' file with information about each 1) task you would like done and 2) when you would like it done, you can automate certain chores. For example, you might wish to automate your blast searches so that they are updated monthly.

Here are the directions for setting up a cron job for tasks that use the CGI scripts in MINE.

To learn more about the command crontab, read the man page by typing:

        dfield@enterprise [dfield] man crontab

You'll see there are two basic commands, list and edit.

To add a new cron job, you'll have to edit this file. Before you do make sure the EDITOR environmental variable is set to your favorite editor (default is ed)

        dfield@enterprise [dfield] printenv                     # print all your variables to check
        dfield@enterprise [dfield] setenv EDITOR pico           # set the default editor to pico
        dfield@enterprise [dfield] crontab -e

Add in a new job according to the rules outlined in the man page:

        15 3 * * * helloworld                                   # will run helloworld at 3:15 each day
        15 3 * * 1-5  calculate >results                        # will run calculate at 3:15 each weekday
                                                                # and redirect the results to the file results

When finished, close the editor and you can now see the cron jobs in your crontab file with the command:

        dfield@enterprise [dfield] crontab -l

To test if it will work, select a script that runs a very simple task and set the time to 1 minute from now:

        dfield@enterprise [dfield] date                         # check the exact time
        Thursday August 17 13:59:46 BST 2000
        dfield@enterprise [dfield] crontab -e                   # edit crontab file
        0 14 * * * helloworld                                    # will run helloworld in one minute!

If you are happy, add more cronjobs. Now your job will run at the specified time!

If your cron job generates an error message, you should receive it in an e-mail. One tip to remember when trying to get cron jobs to work is to use 'absolute paths to scripts you are trying to run' and to 'data you are trying to access'.

Setting up a cron job that runs cgi scripts is a bit more difficult (because cgi scripts inherit so few environmental variables). Instructions to follow.


Does MINE log visitors?

A simple web-logging function is built into MINE. If you look in the library CGI-MINE.pl you will see the function

 #############################
 # FUNCTION: LOG
 # keeps a log of who is accesses
 # the pages of MINE
 #############################
 # Usage: &log();

You will also notice these lines of code near the top of each script:

        # Each time a script is invoked for the first time, log the visit in the custom MINE server log (see CGI-MINE.pl)
        $action = $query->param('action');
        if ($action eq undef) {&log();}

Each time a script is invoked for the first time, the date, IP, machine name, and script visited is logged in the file log_archive_file. This file is parsed by the script Page_Web_Log_Reader (accessible only through the See all Scripts option in the menu) to show you how many times each script has been visited and how many users have visited the site.


General Notes on Using MINE and a Browser Interface

The best advice for getting started with MINE is to Use it! MINE has been written so that if you root around a bit, it should be self-explanatory. You can always click on MINE in the menu while using MINE to come back to this page. There are some tricks and easy solutions to manupulating your data that we list below.

One of the most important contributing factors to whether a MINE database is successful or not, it the amount of thought put into naming files and defining the fields in each database submission (any .db file).

Using a browser as the Graphic User Interface (GUI) to an application offers special advantages and disadvantages. Keeping the following concepts in mind is sure to make using MINE easier and more self-explanatory.


Installing MINE

To use this set of scripts simply install them (MINE) on your unix server (PC or Mac running server software).

UNIX INSTRUCTIONS for INSTALLING MINE: Get the MINE.tar file onto your computer. You must place it inside your public_html/cgi-bin/ directory.

If you do not already have a working cgi-bin no worries. These are the briefest instructions, but you really should talk to your system administrator before you set up a cgi-bin and start serving scripts from it. Especially since cgi-scripts can be a major security risk if run improperly and some administrators severely restrict their use.

    Log on to your unix account and do the following:
    (all the statements following the # sign are comments)
    If you don't already have a public_html folder (you NEED one to serve up webpages from a unix account)
    mkdir public_html               # make the directory
    chmod 755 public_html           # set the proper permission
    If you don't already have a cgi-bin (you NEED one to run .cgi scripts using perl)
    cd public_html                  # enter your public_html directory (or make it, permission should be 755)
    mkdir cgi-bin                   # make the cgi-bin directory, it must be inside your public_html directory 
                                    # since you are serving webpages
    chmod 775 cgi-bin               # read, write and execute in your cgi-bin (set apache as your group)
    cd cgi-bin                      # enter your cgi-bin and ftp the file

FTP: You can directly download the file MINE.tar to your computer by clicking here: ftp://molbiol.ox.ac.uk/pub/Calypso_ftp_files/MINE.tar

Or you can use this ftp address ftp.molbiol.ox.ac.uk to log on with an ftp program like fetch for the mac or ftp on unix.

    Sample ftp session from a unix account:
    [cgi-bin]                       ftp ftp.molbiol.ox.ac.uk        # log onto the ftp site
    [cgi-bin]                       anonymous                       # enter anonymous as your user name
    [cgi-bin]                       dfield@imm.edu                  # enter your e-mail as a password
    ftp>                            cd /pub/Calypso_ftp_files/      # change to the right directory
    ftp>                            get MINE.tar                    # get the file MINE.tar
    ftp>                            quit                            # end your ftp session
    Once you have the file in your cgi-bin:
    [cgi-bin] ls        # display all the files in your cgi-bin directory
    MINE.tar        other_files
    you just need to untar MINE.tar and set the proper permissions and you are ready to use the scripts in MINE.
    [cgi-bin] tar -xvf MINE.tar             # untar the MINE.tar file
    [cgi-bin] chmod 775 MINE                # the directory MINE will be created, change permission to 777
    [cgi-bin] cd MINE                       # go into the MINE directory to set proper permissions for all files
    [MINE]  chmod 775 tmp *cgi                          # set the directory /tmp and all .cgi scripts to executable
    [MINE]  chmod 744 *driver *utility *wrapper         # set all perl scripts to run for user and read only for web
    [MINE]  chmod 644 CGI* MINE* *LINK_MINE make*       # set all text files to read permission

There is one more thing to do - copy the default customization file to a working copy (one without the extension ``.default''). This file is not automatically installed so that it can't overwrite your existing customization file if you've already set up one (e.g. if you are updating your version of MINE). If you already have a MINE_Customization file, make sure there are no new updates that came in the .default version. If there are differences, simply add them to your existing file to update it.

    [MINE] cp MINE_Customization_File.default MINE_Customization_File   # copy the file from the default
    [MINE] chmod 644 MINE_Customization_File            # set the permission to read

Use the directions found in the MINE_Customization_File to customize your version of MINE. The MINE_Customization_File contains lots of global variables used by the scripts included in MINE. You can take your time doing this, since everything will run even if you don't touch this file. Customize it as you learn more about MINE, and as you develop ideas for how you want your MINE database to operate and evolve.

    Now direct your browser to a URL that goes to the script Page_Menu.cgi using the following information:
    http://your_machine_name/~your_username/cgi-bin/MINE/Page_MINE.cgi.
    For example, a working URL should look similar to this sample URL:
    http://enterprise.molbiol.ox.ac.uk/~dfield/cgi-bin/MINE/Page_MINE.cgi

At this point you should have a working version of MINE and you can set about customizing it as you wish. Some of the scripts will need some customization to run, so see below for how to customize MINE to suit your machine.

If you want to have all the source code show up on in the automatic links at the bottom of each script, just go to ``ADMIN'' in the menu and select the button ``make source code''. It will properly copy all your code into text files that can be seen by the browser. If you don't care about seeing the source, or if you want to make your site more secure, don't bother with this.

If you decide to update your installation of MINE (reinstall a newer version of the MINE.tar), just follow these instructions again. When you untar the MINE.tar file it WON'T OVERWRITE any other files in the MINE directory. This means that you won't lose any files stored in the MINE directory that aren't part of the MINE.tar file. Existing database entries and any other files, like results files etc. won't be effected and will stay safe! (The tar application is smart! ``No-clobbering of files' is a built in feature. Great!)

Once MINE is downloaded you will see exactly what you saw at the PfSWB25 site, but you won't have any of the PfSWB25 data. There are only a few customizations that you will need to make and these are listed under the heading Customizing MINE once installed below.


Trouble-Shooting an Installation

Trouble-Shooting: Your installation of Perl: Do you have Perl installed? Type 'perl -v' to see the version and check if it's there. Second, none of these scripts will work if you don't have the (free) standard module CGI.pm installed for some reason (should come with your standard Perl installation). Second make sure that you have permission to run cgi scripts in your cgi-bin (in your PATH environmental variable and all the file permissions set properly). If you have this, and your scripts don't work right away, the first thing to check is the path to perl that is found as the first line of each script (anything named Page*cgi). You will find that the path on all these scripts is

        #!/usr/local/bin/perl

You may find that the path you need on your machine is different. A equally common path is

        #!/usr/bin/perl

Try typing whereis perl on the unix command line to find out the path to your installation of perl, or ask your trusty system administrator.


Installing a single MINE script

If you like you can take, by the highly technical 'copy and paste' method, only the source code for one or a few MINE scripts instead of installing everything. MINE has been built to be modular and in general all the scripts run with complete independence from any other code in MINE - with one major exception! All the scripts in MINE need the CGI-MINE.pl library and many make use of the global variables set in the MINE_Customization_File.

So, if you copy a single script here is what you need to do to make it run.

1. Let's say you would like to run the 'Page_SeqIO_converter.cgi' script on your own machine. First go to the script on the web and scroll to the bottom of the screen (same for any MINE script). Click on the link to the source code for the script in 'See the source code for this script: source.'

2. Select all the text (source code) on this page and paste it into a new file on your machine - inside your cgi-bin! You can call this new script anything you like.

 pico -w Page_SeqIO_converter.cgi       # make a new file and paste in the code
                                        # save the file
 chmod 777 Page_SeqIO_converter.cgi     # change the permissions so it becomes a script, not a text file (644)

3. Now you need to do the same for the CGI-MINE.pl file, but give it the permission 'chmod 644 CGI-MINE.pl' and do not change the name. Each script in MINE looks inside the CGI-MINE.pl file for common functions. If you try to run a script without it having access to this file (in the same directory) you will get a fatal error message.

4. Now you have a choice with the MINE_Customization_File. You must make a file on your machine with this name and give it a permission of read only ('chmod 644 MINE_Customization_File'), but you don't have to fill up the contents! In fact, at the minimum all you have to do is put a single line inside the file

 1;

If you don't know why this works already, don't worry about it too much. You just need to know that the ``1'' lets the script 'Page_SeqIO_converter.cgi' know that the preference file is returning a value of 'true' (it exists). If you do want the abiltiy to set some of the preferences for this file (many of the scripts won't run correctly without them), you do need to paste in, and modify, the contents of the 'MINE_Customization_File.default'.

If you want to remove the MINE menu from the top of the script simply 'comment out' the code that produces it with a hash symobl ('#'):

        #######
        # START THE WEBPAGE
        #######
        # print the MINE menu
        &footer;

Comment out '&footer' by making it #&footer;

        # print the MINE menu
        # &footer;


Customizing MINE once installed

MINE will run after being installed properly, but there are some things that you will want to customize for your database. In addition, the scripts that use Bioperl Modules will not work when you click on them if you do not have Bioperl installed (Page_Format.cgi in the main menu). Here is a (full!) list of things you will eventually want to consider working with to best suit your needs. If you find anything else that does not work properly after these 'adjustments' are made please e-mail me: dfield@molbiol.ox.ac.uk.


General Notes on Security

To run MINE locally on your computer you will need to think about security issues. CGI scripts are inherently insecure because they allow strangers (anyone with a web browser and your URL) to access your computer. There are several options for selecting the level of security you would like to run MINE under. When you install MINE, all the code that allows writing to your disk is disabled (see use of the $write_permission variable). The instructions explain these lines of code, and you can select which 'write options' you would like to enable.

All server software can be customized to allow you to restrict who can log on to certain web pages. This allows you to build a layer of security into your database, or to stop it from being 'public' before publication of the data contained within it. This is especially useful if several people want access to private data over the web, but don't want to make the site ``public''. Along with restricting access, you should set up a robots.txt file or ask for your server's robots.txt file to be amended to stop web robots from secretly coming in an indexing your site for their public search engines.


Restricting Access to URL's Using the Apache Server (.htaccess files)

Free (terrific) Apache software runs most of the unix servers around, so forgive the fact that this description is Apache restricted. If you do not have an Apache server, it is likely that your server will provide similar access restriction features.

You can chose to restrict access to the entire MINE directory using an ``.htaccess'' file. This tells the Apache server who can access the scripts in your cgi-bin. Yes, there is a ``dot'' in front of htaccess. You have to type ``ls -a'' to see ``dot'' files on a unix machine. These are files that you don't want to overwrite so they are ``protected''.

Into your MINE directory (or any directory that you wish to restrict) place a .htaccess file with information about who to restrict and who to allow. You'll have to read up in the Apache documentation on the exact commands you need to include, but it might look similar to the contents below (see your system Administrator).

        pico -w .htaccess               # make a ".htaccess" file
        (add contents using editor)
        AuthType Basic
        AuthName "Gameplayers restricted"
        AuthSystem on
        require user pokemon pichachu

Unless you are running the server yourself (you have root access to the files that determine how your server works) you will need to discuss your needs for restricting access with your system administrator. The sytem administrator will need to add information about the directories you wish to restrict into the apache system. It is the httpd.conf file that needs to be altered. Only the web manager can alter this file.

When you load a URL in a directory that is controlled by an .htaccess file, you will get a prompt for password. Once you log in, you will have access to that directory until you QUIT your browser. This means that you should shut down the browser you were working with when finished to make sure no one ``unwanted'' views the restricted URL without your knowing it.


Restricting web robots using a robots.txt file

If you are logging visits to your site you will see that eventually a web robot will visit your site. Web robots are run by search engines looking for new html pages to index for their ``search the web'' sites. There are many search engines deploying web robots to hunt out any and all potential URLs! Because of this robots.txt files have been invented. Using a robots.txt file will to keep a site truly private.

To fully restrict any web robots purusing your ``territory'' (any file viewable as URL on the web) you need to make a robots.txt file and place it in the first URL a web robot will find (you need to provide the ``/robots.txt'' in the top-level of your URL space). You will probably need to have your web manager do this for you since the file needs to go into a place like

        /usr/local/etc/httpd/htdocs/robots.txt

and you probably won't have access. Then you need to place information about the directories you want to restrict into this main file. There can only be ONE robots.txt file on a server.

So the proper way to restrict the site http://www.w3.org/ is to use a URL like http://www.w3.org/robots.txt . Placing the robots.txt file anywhere else in the system will not have any effect! (See: http://info.webcrawler.com/mak/projects/robots/exclusion-admin.html for the details on where to put your robots.txt file).

The file to make is simply ``robots.txt'' and you can fill it with things like this:

        pico -w robots.txt                      # make a new robots.txt file in the editor pico

        # go away                               # add these contents to the robots.txt file
        User-agent: *                           # exclude all robots from the entire server     
        Disallow: /
        chmod 644 robots.txt                    # make sure the robot has permission to read the file

You will probably need more information for a robots.txt file to create a robots.txt file that best suits your needs so, for all the info you need about robots.txt files and creating them go to: http://info.webcrawler.com/mak/projects/robots/norobots.html

Here are some more hints taken from that site:

The ``/robots.txt'' file usually contains a record looking like this:

       User-agent: *
       Disallow: /cgi-bin/
       Disallow: /tmp/
       Disallow: /~joe/

In this example, three directories are excluded.

To exclude all robots from the entire server

       User-agent: *
       Disallow: /

To allow all robots complete access

       User-agent: *
       Disallow:


Why is MINE written in PERL?

Perl was selected for this project because it is the ideal computer language for implementing the goals of this type of bioinformatic project. Perl's two best features are its ability to co-operate with other applications and its exceptional ability to manipulate text and large numbers of files with ease and efficiency. In practice, this means that Perl code is used in this project to glue together existing software into a network of tools, and to create new code that fills in any gaps in existing bioinformatic tools.

Perl as Glue. Perl is a great way to glue together the input and output of multiple bioinformatic resources. This means that pipelines, and more importantly, networks, of integrated tools can easily be constructed. In this way, Perl can be used to build 'power-applications' (or meta-applications) that combine several pieces of existing, specialized code (e.g. the bioperl modules) and full-blown applications (e.g. blastall). This goes a long way towards assuring that ``the wheel is never reinvented''. If you find a good tool, use it! Perl will allow you to glue together almost anything, if not everything, that runs using a command line in a unix environment.

Perl as a Workhorse. Perl is currently the best programming language for manipulating text and large numbers of files. This makes Perl ideal for web programming and also for processing large amounts of sequence data. Even making use of the simplest perl code and using only a limited number of lines of code it is possible to write quite efficient and useful scripts. More powerful scripts can then be made from combinations of smaller pieces of code.


Learning more about Perl

There is tons of stuff out there about perl - too little time, too much to do, to account for it all here. This will get you started, though.

First if you want to use perl, you need to make sure you have access to it. On a unix system type

        perl -v

to see if it's installed, get the version you have installed, and see some suggestions on where to go for more info.

        field@enterprise [BIOPERL] perl -v
        This is perl, v5.6.0 built for sun4-solaris
        Copyright 1987-2000, Larry Wall
        Perl may be copied only under the terms of either the Artistic License or the
        GNU General Public License, which may be found in the Perl 5.0 source kit.
        Complete documentation for Perl, including FAQ lists, should be found on
        this system using `man perl' or `perldoc perl'.  If you have access to the
        Internet, point your browser at http://www.perl.com/, the Perl Home Page.

Perl has built in documentation (an excellent resource, although not the best place to start if you are a beginner) that can be found by typing 'man perl'. Here is what you get when you invoke the man page for perl. Listed are all the parts of the Perl manual:

        dfield@enterprise [MINE] man perl
 Perl Programmers Reference Guide                          PERL(1)
 NAME
      perl - Practical Extraction and Report Language
 SYNOPSIS
     perl [ -sTuU ] [ -hv ] [ -V[:configvar] ]
         [ -cw ] [ -d[:debugger] ] [ -D[number/list] ]
         [ -pna ] [ -Fpattern ] [ -l[octal] ] [ -0[octal] ]
         [ -Idir ] [ -m[-]module ] [ -M[-]'module...' ]
         [ -P ] [ -S ] [ -x[dir] ]
         [ -i[extension] ] [ -e 'command' ] [ --
      ] [ programfile ] [ argument ]...
     For ease of access, the Perl manual has been split up into
     several sections:
         perl                Perl overview (this section)
         perldelta           Perl changes since previous version
         perl5005delta       Perl changes in version 5.005
         perl5004delta       Perl changes in version 5.004
         perlfaq             Perl frequently asked questions
         perltoc             Perl documentation table of contents
         perldata            Perl data structures
         perlsyn             Perl syntax
         perlop              Perl operators and precedence
         perlre              Perl regular expressions
         perlrun             Perl execution and options
         perlfunc            Perl builtin functions
         perlopentut         Perl open() tutorial
         perlvar             Perl predefined variables
         perlsub             Perl subroutines
         perlmod             Perl modules: how they work
         perlmodlib          Perl modules: how to write and use
         perlmodinstall      Perl modules: how to install from CPAN
         perlform            Perl formats
         perlunicode         Perl unicode support
         perllocale          Perl locale support

An annotated (personal website) run down of Books about Perl http://www.netaxs.com/~joc/perlbooks.html


Learning some Perl using MINE

Perl is easy enough to learn, that you may want to start thinking about writing your own scripts, or more simply editing some of these scripts. The goal of MINE is to provide a useful, functioning bioinformatic tool that also serves as a gateway to the process of learning how to use perl in bioinformatics.

All of the source code in MINE is freely available for modification. MINE is designed so that access to source code is super easy, just use one of the options:


The simplest perl script ever!

 pico -w myscript               #  open a new text file (here, called 'myscript') in the editor pico (use pico, xemacs or your favorite editor)
        !/usr/bin/perl 
        print "Hello!";         # type these two lines of code into the new file
                                # exit (the pico editor) saving your file 
 chmod 744 myscript             # set the (user) permission to executable (1) as well as readable (4) and writable (2) (remember 1+2+4 = 7)
 myscript                       # run your new script and it will print Hello! to screen

In order for this to work, the directory you are working in must be in your PATH environmental variable. To see if it is, try using the ``printenv'' command to check if the path to where you want to write scripts is included (a path to your ``~user/bin'' directory should be included):

 printenv PATH

Or, for very short scripts, just use the perl interpreter (the 'program' that 'runs' Perl code/scripts) on the command line:

 perl -e 'print "hello world\n";'

This is a tiny script that will print ``hello world'' when run.


Starting a New Perl Script

You can write your own code that builds on the basic collection of MINE scripts, by either modifying existing scripts or be adding completely new scripts.

To get you started writing new script these it a script called Page_Template.cgi. You can find this script in the list of scripts when you select the See all Scripts options from the menu. This script is the prototype for all the CGI scripts used in MINE. It will run, and it's just up to you to start modifying it!

To get started you will want to copy this file and then make it executable.

        [cgi-bin] cp Page_Template.cgi Page_yourdescription.cgi # copy the template script to your new script name
        [cgi-bin] chmod 755 Page_yourdescription.cgi            # give it permission to be executable
        [cgi-bin] Page_yourdescription.cgi                      # run your new script

Use the naming convention of Page_*.cgi only if you want the script to be a part of MINE.


Why write lots of small, independent scripts?

1. If one script breaks, every other script still works!

2. Why invoke a huge script to only have it perform one small task. Perl parses and compiles all the code in the script before running and this means you will waste time.

3. It's much easier to understand the code of a small uncomplicated script when it comes time to expand, change, or debug a script.

4. Flexibility.

5. Small scripts are easy to bundle up into subrountines that can be added to libraries and modules.

6. Common tasks can be placed in a library or module so you don't duplicate any code by using lots of small scripts.

7. It's easy to link together several small scripts to make a new tools (instead of picking apart a larger script to get the pieces of code you want).


How to read and edit the code in the MINE scripts

No project will have identical requirements for the type of code it needs. It is understood that if you have a bit of programming knowledge you will want to expand/change different aspects of MINE.

MINE is designed to be flexible.

As much as possible, the code in MINE is annotated line by line. There is no POD documentation in individual scripts in this version of MINE Version 1.0, but there will be ample POD documentation in future verions of MINE (it's harder for beginners!). All pod documentation is found at the end of the CGI-MINE.pl library (text file). This pod is extracted to make the MINE manual you are looking at now.

If you scroll through a few scripts you will come across some common functions found in the CGI-MINE.pl library that are used repeatedly. These include:

        $action = $query->param('action');      # Each time a script is invoked for the first time,
        if ($action eq undef) {&log();}         # log the visit in the custom MINE server log (see CGI-MINE.pl)
        &footer();                              # print the MINE menu
        &table_top();
        &table_bottom();                        # these tables are the make the basic gray background boxes seen in MINE
        &mine_cp();                             # write the MINE copywrite to page

You will also see a link to a copy of each script's source code written at the bottom a each page. Here is the common code for writing that link:

        # make a url link to the source for this script
        $source = "<font=\"blue\"><i><a href=\"Page_Comments_source\">source</a></i></font>";
        print "<p> See the source code for this script:  $source";

Any of the code in MINE can be edited, all you need is a text editor and a new idea.

All of these scripts have been written using xemacs, a menu-based version of emacs that makes it easy to write code with good style. xemacs will help you indent code at the proper places. All the code in MINE conforms to the Perl Style Guide found here:

http://www.perl.com/pub/doc/manual/html/pod/perlstyle.html at the official perl site: http://www.perl.com/


List of external websites and tools that provide key functionalities

Libraries and modules used

CGI.pm: fantastic library written and maintained by Lincoln Stein for creating CGI pages (MINE uses this extensively!) http://stein.cshl.org/WWW/CGI/

BoulderIO: great library for handling files from genbank and other major databases (Lincoln Stein) http://stein.cshl.org/software/boulder/

BoulderIO::Genbank: used by MINE to retreive genbank documents from Enztrez http://stein.cshl.org/software/boulder/docs/Boulder/Genbank.html

BIOPERL: a collection of bioinformatics modeuls written in perl http://bio.perl.org/

Documentation for all the BIOPERL modules: http://bioperl.org/Core/Latest/modules.html

The script Change_Format.cgi demonstrates two of the core modules found in BIOPERL, Seq.pm and SeqIO.pm.

Blasting

DbWatcher: set up daily checks again blast database for novel hits http://enterprise.molbiol.ox.ac.uk/help/dbwatcher.html

Unix Queues: the command to run blastall is submitted to a batch queue http://www.molbiol.ox.ac.uk/help/batchqueue.htm Genbank

Databases

Entrez manual: understanding how genbank works and how to access info ftp://ncbi.nlm.nih.gov/entrez/docs/entrzdoc.hqx

How to make WWW Links to Entrez: http://www.ncbi.nlm.nih.gov:80/entrez/query/static/linking.html

Running Entrez in your unix account: very useful for getting data into your account http://www.ncbi.nlm.nih.gov/Entrez/Network/nentrez.unix.html

Data Formats

Genbank Docs: read all about the format ftp://ncbi.nlm.nih.gov/genbank/gbrel.txt

GFF file format: a simple file format for representing features of a sequence http://www.sanger.ac.uk/Software/formats/GFF/

Viewing Data graphically

ARTEMIS: the sanger center's microbial genome annotation tool (Kim Rutherford) http://www.sanger.ac.uk/Software/Artemis/

With this java tool you can graphically examine any genbank or embl document (also accepts and displays gff formatted information). The real purpose of this great tool is genome annotation. There are lots of built in tools ranging from blasting options to nucleotide bias plots (e.g. G+C sliding window analysis) and one can easily edit/create genbank and embl documents.

SEQUIN: the stand alone program for creating genbank documents for submission to genbank (can view any genbank document graphically) http://www.ncbi.nlm.nih.gov/Sequin/download.html


List of Changes

June, 2001

Added a new function to CGI-MINE that write a link to the source code of each script only if the variable $show_source is set to a non-zero number in the MINE_Customization_File.

January, 2001

Completely changed (simplified format) menu and added more links.

A SELF_blast_database is now created automatically and formatted (usin formatdb) using cron.

Changed manual into a FAQ.

October, November, 2000

Added script ``Page_SeqIO_converter.cgi'' which makes use of bioperl modules to do file format conversions (genbank, embl, fasta etc).

Added the option ``search whole file'' to the search engine.

Put the formatdb option into a crontab file so that the SBW_blast_database is reformatted for blasting every day. Added directions for setting up cron jobs in the MINE manual.

September, 2000

Added ``List Extension Descriptions'' to Database Log.

Added Script Page_Alleles.cgi that allows the automated creation and editing of database files (specifically created to add allelic data, but could be used for any time of new information).

August, 2000

Added two small utilities for merging and querying information in GFF formatted files.

Created the MINE_Customization_File which is now the only file that needs customization in a new install. Use arrays of extensions to customize how many of the scripts act on information in the database.

Added scripts to include genbank DNA, peptide and genome documents.

Updated the Database_Log.cgi script so that the columns to be displayed can be selected by user.

Created Page_View.cgi which takes an .db file and displays it as nice html.

Started work on an extended menu that can be accessed from some of the experiemental scripts.

Altered look of search and display page.

Changed Format script: added option to view as html and wrapped the sequence and the add_links field; added option to translate three or 6 frames.

Updated the Database Statistics page: it now tabulated number of files and total characters for subsets of files and presents list in table format.

July, 2000

Blast interface now makes a /tmp directory if one doesn't already exist and cleans up this directory (deletes all files) before every new blast search.

Added option on search page to save results as text or html.

Updated the MINE menu. Now it better reflects the division between MINE and the database it holds: added more links to curated pages. Also emphasized the division into three parts (DATA INPUT, BLAST/ANALYSIS, SEARCH). Did this while preparing short manuscript describing MINE and the Pf Encyclopedia.

Added peptide composition script and several utility scripts.

June, 2000

Added information to manual.

Added the function 'add links to database entries' to the Analysis page (links can be seen using search engine display options 'defined entries only' or 'report format'.)

Added the function 'seen' to the search engine

Changed all instances of @files = <*.db>; to code that uses Perl's readdir function. Needed to because someone started using MINE to generate thousands of files. Unix limits the number of files that can be captured by the '<files>' method (at around 1000), but read dir can go way beyond that. Successfully worked with 10,000 files in one directory!

Feb-May, 2000

This list started about three months after the project started and after all the basic scripts were written.


Modifying MINE & Copywrite *

Modifications and the writing of novel scripts are encouraged! Since the purpose of MINE is to demonstrate the use of PERL, all scripts should strive to be as easy to read and as extensively annotated as possible.

MINE: Molecular INformation Explorer. Copyright 2000 Dawn Field. All rights reserved. The CGI-PERL scripts belonging to MINE may be used and modified freely, but this copyright notice must remain attached to all MINE source code. If you make modifications please do not distribute unless you fully document the modifications.


Support for MINE

I hope to continue to develop and support MINE for the indefinate future. Please send your comments, suggests etc to dfield@molbiol.ox.ac.uk

See the source code for this script: source

run by MINE: Molecular INformation Explorer. Copyright 2000-2001 Dawn Field. All rights reserved.