|
July, 1999 Serving 3000 Web Pages with Apache/iX Get yourself configured
with the new Web server — By Andreas Schmidt On these pages I will: • Explain a little bit about the three main configuration files of Apache/iX, • Introduce Server Side Includes (SSI), • Show some special effects you can accomplish using SSI directives and variables, • Get Apache/iX server statistics and configuration, • Show how to secure areas within your Web projects, • Explain how to access files outside of APACHE account, and • Give some hints for Web publishing to an Apache/iX server. In my examples, Apache/iX is installed as delivered, which basically means Server Side Includes are not enabled, and no document directory is secured. You may already have heard about SSI in Netscape FastTrack Server, which is bundled in HP-UX nowadays. It’s called “server-parsed HTML.” It took me awhile before I found this in the Web based configuration interface there. The configuration files The following three files in the /APACHE/PUB/conf/ directory are important to configure the Apache/iX server: httpd.conf, srm.conf and access.conf. For each, I will explain a little bit and show some special options and effects for your Web project, especially using SSI. For further information about these files, refer to www.apache.org or literature about the NCSA-based server. I used a German translation of the book “Managing Internet Information Service,” written by Russ Jones. httpd.conf is the main configuration file of the server. It controls the service but not the details of the single files and areas of your Web projects. It is mainly used to define the user and group of the server, the e-mail ID of the Webmaster, the location of the server binaries, the log files, and more. You can use the default entries — ServerType: standalone; Port: 80; User: MGR.APACHE; Group: APACHE ServerAdmin is the mail ID where the Webmaster is reachable. This information will be inserted into all server-generated pages in a problem case. (The mail program of your browser pops up. You can send e-mail only if your browser is configured to do so. You may also enable your Apache/iX server for Sendmail/iX, and use the same server as the e-mail server of your browser.) ServerRoot: /APACHE/PUB: Informs the server where to start to load the program httpd and the configuration in the directory conf/, relative to the ServerRoot directory. Log files ErrorLog logs/error_log and TransferLog logs/transfer_log are stored in the directory named here. From time to time you should check these files in /APACHE/PUB/logs/ for size and problems. ServerName is your_apache_server. In most of the cases this is the same as the CPU name, but you can specify another name. Note: I didn’t test starting Apache/iX out of inetd. (If you make this choice, the ServerType must be changed to inetd, and Port, User, and Group are ignored. This is configured in the inetd configuration files.) srm.conf configures the resources of the server. Here you have to define where the server will find the documents and scripts. The main keywords are: DocumentRoot: /APACHE/PUB/htdocs: This is the absolute path name of the place where the documents will be stored. Other directories may be referenced via Alias or by links in this directory. DirectoryIndex: index.html: Default name of the document, which is shown if only the name of a directory is browsed to. Alias /icons/: APACHE/PUB/icons/: Place to look if only /icons/ is used in a link. ScriptAlias /cgi-bin/ /APACHE/PUB/cgi-bin/: This is where all files with a prefix /cgi-bin/ have been stored. AddHandler server-parsed ..shtml: This is important to activate SSI. Having a default entry other than SSI will only work for pages having the suffix .shtml. I changed this to AddHandler server-parsed .html to enable the Server Side Includes for all pages. We’ll see in the Server Side Includes section that follows what implications this may have. AccessFileName .htaccess: This is the default where the access information of a document directory is stored in. /APACHE/PUB/conf/access.conf is the global access control file (ACF). It defines how browser clients may access the whole Web server or dedicated directories. The default entry is <Directory name_of_DocumentRoot> (so in my example, it would be: <Directory /APACHE/PUB/htdocs>). I recommend removing Indexes for script directories, and AllowOverride should be changed to None, so that no other option may override an existing .htaccess security definition. The Server Side Includes (SSIs) To enable the Server Side Includes, the <Directory> entry for documents in access.conf must be changed to Options Includes ExecCGI Indexes FollowSymLinks. Together with the entry in srm.conf, AddHandler server-parsed suffix, these documents are now parsed for one of the following SSI directives: config: modifies various aspects of SSI echo: inserts value of CGI or SSI environment variables exec: executes external programs and inserts output in current document include: inserts text of document into current file fsize: inserts the size of a specified file flastmod: inserts last modification date and time for a specified file. You can enable the execution of scripts while loading a document using <!—#exec cmd=”script_name”—>, or include another file’s content using <!- #include file=”file_name”—>. This is a great feature, but it has its disadvantages: It can be quite costly for a server to continually parse documents before sending them to the client. It may create a security risk. But if used cautiously, it can be a very powerful tool. Besides the CGI environment variables like SERVER_NAME, QUERY_STRING, REMOTE_HOST and some others which can be used from every CGI script, there are additional SSI environment variables: DOCUMENT_NAME They can easily be inserted in a document using <!-#echo var=”SSI_variable”->. To get an overview of all CGI variables, you may write a little script to execute the env command in the Posix shell, as shown below:
To get an overview of all
SSI variables, you must know their names and use the directive
<!—#echo var=”variable_name”—> per
variable. This method also works to see the CGI variable names!
Quite simple, isn’t
it? But to do more, read the following section as well. What can be done now, having enabled the SSI? If you have this enabled for all documents (.html and not only for .shtml), you may use it to show the same header and footer, current time, file name, file update time, and more on all pages. You can use SSI pre-defined words or you can execute CGI scripts for this. Here are some examples with HTML code and CGI scripts. To show the file name and file update time your basic HTML code will look like:
DOCUMENT_NAME is an SSI variable for the current Web source file. The directive <!—#config timefmt=...> declares the format of displaying dates and time, here the European look-alike. You can use the following format masks: %a Day of week, short Sun,
Mon, ... LAST_MODIFIED is a pre-defined variable of the file’s date. The same can also be achieved without enabling SSI, using a little Javascript alert which you have to activate with a click:
To display the actual date on top of page:
To establish a page counter: A little script is needed for this, and one file per page which contains the actual number of hits. I decided to name the counter file PageName.ct, and the owner of this file is SERVER.APACHE with mode 640. The script looks like:
It is executed via SSI #exec, here for the main page:
Here is a small script presenting the page access statistics:
You can display the same
header and footer for all pages. I combined the three sections of
code above to do this. These are nice effects which can enrich your
Web pages. For more information about SSI you may refer to special
literature. I based my coding on Shishir Gundavaram’s CGI
Programming on the World Wide Web, especially pages 87 onward. This is an easy one, and
does not depending on having enabled SSI. Insert into access.conf the
following: The handler server-info
will show you the whole configuration “on a click.” It is
especially worthwhile for the Webmaster to check the server status
information. If you have big Web projects, you may need to hide some areas from public viewing. Three components comprise Apache/iX document security: • The program (or
unix-like: the binary)
/APACHE/PUB/apache_1.2.5_mpe/support/htpasswd Here is how to implement the access security for a document directory. It is not possible to secure single pages — if you want to achieve this you must keep such a page in its own directory. Let’s assume the
following: Billing information should be made available for some
persons via the intranet. First, you create a separate directory
under /APACHE/PUB/htdocs named billing. In this directory you will
create a file named .htaccess or another name you defined in srm.conf
under “AccessFileName.” This file will look like: For AuthType there is currently only Basic implemented. The Limit directive may be GET or POST or both.This directive describes what is needed to access the documents stored in the current directory (in this example, /APACHE/PUB/htdocs/billing). Here, dedicated user names are required. The passwords of these users are stored in /APACHE/PUB/security/.htpasswd. Having created the ..htaccess file, you must define the users and the passwords using the program htpasswd. Syntax for the program is htpasswd [-c] passwordfile username. The -c flag creates a new file. In our configuration, the passwordfile is /APACHE/PUB/security/.htpasswd, and the usernames are andreas, robert, deniro. For each, you must invoke the program like this: :/APACHE/PUB/apache_1.2.5_mpe/support/htpasswd & :“/APACHE/PUB/security/.htpasswd robert” The dialog is Adding user robert New password:donots Re-type new password:donots Bingo! The user and password are defined, and .htaccess ensures that only the required users will have access to the documents in billing. Typing the URL http://your_apache_web_server/billing will result in the display of the user/password box in the browser. The user robert has to type in “robert” and the password “donots” before he will see something. It’s as easy as it sounds. The only thing you need to plan for is if you’re using forms and CGI scripts out of an secured document directory. In that case, all other users may use only the CGI if they know the parameters or, in CGI Web terms, the QUERY_STRING which is handled over by the client to the server. This is not prohibited by this method. But it’s not probable that a user will be able to guess a URL like this: http://your_apache_web_server/cgi-bin/… …bill_data.sh?box=alpha&period=9903&detail=yes&ty pe=CPU Linking to documents and scripts outside the APACHE account If you want to allow other people and groups you can trust to publish their documents outside of the APACHE account, you must pay attention to security, of course. The HP 3000 Web security never bypasses the standard MPE security. So everything outside the APACHE account must be visible and probably executable by ANY ... or is secured via ACDs to allow the access/execution for user SERVER.APACHE explicitly. This is very important, especially if you will allow the server to store CGI scripts outside APACHE, and so out of your Webmaster’s control, acting as MGR and SERVER.APACHE. But if you have confidence in those people and groups, you can protect the APACHE account from unwanted changes if you have already granted too many people direct access into APACHE. The easy and non-risky way is to allow .html documents to be stored outside of APACHE. This can easily be established using a link to the group these documents are saved in. For example, your OpenDesk Administrator wants to share some information on the intranet. They create their own group, WEB.HPOFFICE, having at least R:ANY so that the APACHE server can read it (or appropriate ACDs on each file). The Webmaster now only has to create a link in /APACHE/PUB/htdocs/ to link to this group: :NEWLINK /APACHE/PUB/htdocs/edi-docs;to=/HPOFFICE/WEB “edi-docs” is only a name — you may prefer edi-html or od or ... it’s just a name! All references pointing to edi-docs, e.g. http://your_apache_server/edi-docs/introduction.html will show the files in the group WEB.HPOFFICE, in this example the file /HPOFFICE/WEB/introduction.html. No change in any config file is needed! To allow the execution of CGI scripts not stored in the default group, /APACHE/PUB/cgi-bin/ must be explicitly configured and allowed in the file /APACHE/PUB/conf/srm.conf. In our example, if the OpenDesk Administrator wants to have all his Web stuff in the same group, the following entry is needed: ScriptAlias /edi-bin/ /HPOFFICE/WEB/ All references to scripts with the prefix edi-bin/ will be searched in WEB.HPOFFICE. For example, http://your_apache_server/ edi-bin/statistics.sh will execute a Posix script named /HPOFFICE/WEB/statistics.sh. But again, you must trust in the creator of the scripts stored here, and the scripts must have either R,X:ANY (not recommended) or appropriate ACDs for SERVER.APACHE to grant READ and EXECUTE access. This is much better security-wise, but requires special attention to replacing those files after having edited them. Editing and publishing your Web project The first thought I had upon hearing about Samba/iX was: Wow, now I can edit my HTML code on the PC using a well-known editor (not MS Word) and directly publish on the server into the right place. And indeed, it is a nice option to connect to the Web server via Samba/iX, at least to edit your files under /APACHE/PUB/htdocs or /APACHE/PUB/cgi-bin. This will enable the PC-based editor of your choice to access, edit, and save your Web files. So you may work with PC Qedit, or MS Word, or NetFusion, or any other specialized Web Page Editor. I prefer the freeware program PFE. If you do not want to “Samba” your Web project, you may use VI.HPBIN.SYS or Robelle’s Qedit. Both are able to keep files in HFS file format. TDP.PUB.SYS will NOT work — it’s not able to keep HFS file name. The third option is to work on your PC and to download the Web files via FTP to the right place on the HP 3000. But I think that is an really old-fashioned method — there are better options. Summary This article described some of my experiences with the Apache/iX server running on MPE/iX 5.5. One warning: There are still some unstable areas in the interface between MPE/iX and Posix called STREAMS. But a lot of patches are available to avoid unwanted effects like System Aborts because of this (we hit one using simple piping of sh commands). Apache/iX is stable and reliable. I want to encourage all of you to make use of it to provide Internet capability from our beloved HP 3000s, and especially not hide yourselves behind the Unix/Internet gurus in your companies! The HP 3000 can do as much as any platform, and Apache/iX is the right step to keep the HP 3000s in the market — together with all the other ported Unix tools like Java/iX, Perl/iX, Samba/iX, Sendmail/iX, Bind/iX and more. Maybe it will help if Hewlett-Packard will educate their salespeople that there is another reality beside NT, Unix and Linux — called MPE/iX! Andreas Schmidt is a
Computer Technology Specialist for Computer Science Corp. in Bad
Homburg, Germany Copyright The 3000 NewsWire. All rights reserved. |