apache mod_rewrite module guide - part 1
1. What is mod_rewrite?
mod_rewrite
is an Apache module that allows for server-side manipulation of requested URLs
. Incoming URLs
are checked against a series of rules. The rules contain a regular expression
to detect a particular pattern. If the pattern is found in the URL
, and the proper conditions are met, the pattern is replaced with a provided substitution string or action. This process continues until there are no more rules left or the process is explicitly told to stop.
This is summarized in these three points:
- There are a list of rules that are processed in order.
- If a rule matches, it checks the conditions for that rule.
- If everything is a go, it makes a substitution or action.
2. Basic rules
- In previous lesson we learn a Terms Used to Describe Directives and apache configuration .
The most important term from previous lesson is URL-PATH
- As always, anything that you can put in a
.htaccess
file can also be placed inside the global configuration file. Withmod_rewrite
, there is a small differences if you put a rule in one or the other. Most notably:
This is something to keep in mind if you see examples online or if you’re trying an example yourself: beware of the leading slash. I will attempt to clarify this below when we work through some examples together.
3. Enabling mod_rewrite on the Server
So, let's create a simlink:
~] cd /etc/apache2/mods-enabled/
~] ln -s ../mods-available/rewrite.load .
~] ls -alFh
...
lrwxrwxrwx 1 root root 30 Jan 8 13:54 rewrite.load -> ../mods-available/rewrite.load
...
Reload apache configuration with /etc/init.d/apache2 reload
or with systemd command systemctl reload apache2.service
. We can see mod_rewrite
module is loaded to apache server configuration with this command:
~] apachectl -t -D DUMP_MODULES
rewrite_module (shared)
setenvif_module (shared)
ssl_module (shared)
status_module (shared)
After load mod_rewrite module we can enable mod_rewrite directives in .htaccess file or in Configuration File our (sub)domain with this directive:
# Enable Rewriting
RewriteEngine on
4. Regular Expressions
This tutorial does not intend to teach you regular expressions. From Apache 2.0 they're Perl Compatible Regular Expressions (PCRE) .
5. General record of RewriteRule
RewriteRule has the following syntax:
RewriteEngine on
RewriteRule what-client-ask what-client-really-get [optional-parameters]
-
The first line of
RewriteEngine on
is the magic to turn mod_rewrite on. -
RewriteRule
:
-
What client ask (first parameter). It is a URL-PATH . e.g. /index.html , e.g. /archive/my_file.php. It is a regular expression
When the server sees that the client wants a page that matches the first parameter, it starts doing something. -
The page address that the user actually receives (the second parameter) is either an absolute address (starting at http:// or https://) or relative. The relative address is derived either from the current directory or from the root of the site - if the what-client-really-get entry begins with a slash. For example, https://myredlinux.com/file.html is the absolute address, /file.html is the relative address.
This second parameter, unlike the first parameter, is not written as a regular expression, for example, it is not necessary to escape the dots with a backslash (but again, when they do not, it does not matter). -
As far as [optional-parameters] are concerned, I will mention them differently in the following examples and at the end of this text
5.1 What is matched - WHAT-CLIENT-ASK
- In VirtualHost context, The Pattern will initially be matched against the part of the URL after the hostname and port, and before the query string (e.g. "/app1/index.html"). This is the (%-decoded) URL-PATH .
- In per-directory context (Directory
and .htaccess
), the Pattern is matched against only a partial path, for example a request of "/app1/index.html" may result in comparison against
"app1/index.html"
or"index.html"
depending on where the RewriteRule is defined. - The directory path where the rule is defined is stripped from the currently mapped filesystem path before comparison (up to and including a trailing slash). The net result of this per-directory prefix stripping is that rules in this context only match against the portion of the currently mapped filesystem path "below" where the rule is defined.
- Directives such as
DocumentRoot
andAlias
, or even the result of previousRewriteRule
substitutions, determine the currently mapped filesystem path. - If you wish to match against the hostname, port, or query string, use a RewriteCond
with the
%{HTTP_HOST}
,%{SERVER_PORT}
, or%{QUERY_STRING}
variables respectively.
5.1.1 Per-directory Rewrites (Directory directive or .htaccess file inside directory)
- The rewrite engine may be used in .htaccess
files and in
<Directory>
sections, with some additional complexity. - To enable the rewrite engine in this context, you need to set
RewriteEngine On
andOptions FollowSymLinks
must be enabled. If your administrator has disabled override ofFollowSymLinks
for a user's directory, then you cannot use the rewrite engine. This restriction is required for security reasons. - See the RewriteBase directive for more information regarding what prefix will be added back to relative substitutions.
- If you wish to match against the full URL-path in a per-directory (htaccess)
RewriteRule
, use the%{REQUEST_URI}
variable in aRewriteCond
. - The removed prefix always ends with a slash, meaning the matching occurs against a string which never has a leading slash. Therefore, a Pattern with
^/
never matches in per-directory context. - Although rewrite rules are syntactically permitted in <Location> and <Files> sections (including their regular expression counterparts), this should never be necessary and is unsupported. A likely feature to break in these contexts is relative substitutions.
5.2 WHAT-CLIENT-REALLY-GET - SUBSTITUTIONS
The what-client-really-get of a rewrite rule is the string that replaces the original URL-PATH that was matched by what-client-ask. The what-client-really-get may be a:
-
file-system path \
Designates the location on the file-system of the resource to be delivered to the client. What-client-really-get strings are only treated as a file-system path when the rule is configured in server (virtualhost) context and the first component of the path in the substitution exists in the file-system -
URL-PATH \
A DocumentRoot-relative path to the resource to be served. Note that mod_rewrite tries to guess whether you have specified a file-system path or a URL-path by checking to see if the first segment of the path exists at the root of the file-system. For example, if you specify a what-client-really-get string of /www/file.html, then this will be treated as a URL-path unless a directory named www exists at the root or your file-system (or, in the case of using rewrites in a .htaccess file, relative to your document root), in which case it will be treated as a file-system path. If you wish other URL-mapping directives (such as Alias ) to be applied to the resulting URL-path, use the[PT]
flag as described below. -
Absolute URL \
If an absolute URL is specified, mod_rewrite checks to see whether the hostname matches the current host. If it does, the scheme and hostname are stripped out and the resulting path is treated as a URL-path. Otherwise, an external redirect is performed for the given URL. To force an external redirect back to the current host, see the[R]
flag below. -
- (dash) \
A dash indicates that no substitution should be performed (the existing path is passed through untouched). This is used when a flag (see below) needs to be applied without changing the path.
In addition to plain text, the what-client-really-get string can include
- back-references ($N) to the RewriteRule pattern
- back-references (%N) to the last matched RewriteCond pattern
- server-variables as in rule condition test-strings (%{VARNAME})
- mapping-function calls (${mapname:key|default})
5.3 Notes
In mod_rewrite, the NOT character ('!
') is also available as a possible pattern prefix. This enables you to negate a pattern; to say, for instance: if the current URL does NOT match this pattern. This can be used for exceptional cases, where it is easier to match the negative pattern, or as a last default rule.
6. How is mod_rewrite rules processed
The rules in mod_rewrite apache module are processed in the order that they appear. Note that each RewriteRule is acting on the URL-PATH
. When a rule makes a substitution, the modified URL-PATH
will be handed to the next rule. This means that the URL that a rule is processing may have been edited by a previous rule! The URL is continually being updated by each rule that it matches. This is important to remember!!!
6.1 Flow Chart
Here is a flow chart that tries to provide a visualization of the generic flow of execution across multiple rules in a apache config file or .htaccess file. Note that, at the top of the flow chart, the value going into the rewrite rules is that “URL Part” and if the substitution is successful, the modified part proceeds into the next rule.
I referred to rewriting conditions earlier, but didn’t go into detail. One or more RewriteCond
is associated with a single RewriteRule
. The conditions appear before the rule they are associated with one another, but only get evaluated if the rule’s pattern matched. As the flow chart illustrates, if a rewrite rule’s pattern matches, then Apache will check to see if there are any conditions for that rule. If there aren't, then it will make the substitution and continue. If there are conditions, on the other hand, then it will only make the substitution if all of the conditions are true. Let's visualize this in a concrete example.
7. Redirect vs Remapping
The crucial for mod_rewrite
apache module is understand, what is redirection
and what is remapping
7.1 Redirect
Redirect
is when I add the rule [R = 301] to the end of the line RewriteRule in square brackets as follows:
RewriteEngine on
RewriteRule (.*) /result.html [R=301]
- RewriteEngine on - turn mod_rewrite on
- (.*) - regular expression that matches all chars in URL-PATH - it is what-client-ask
- When I enter full url e.g. http://example.com/directory/question.html
- then I'll see a different address in the browser line: http://example.com/directory/result.html because the server redirects me (and the browser accepts it)
- [R = 301] - redirect as 301.
- Redirect with status 301 means that the resource (page) is moved permanently to a new location. The client/browser should not attempt to request the original location but use the new location from now on.
- Redirect with status 302 means that the resource is temporarily located somewhere else, and the client/browser should continue requesting the original url. 302 is default redirection.
7.2 Remapping
I will explain remapping
in the following example:
RewriteEngine on
RewriteRule question-url-path\.html remapping-url-path.html
In this example, redirection is not performed, but remapping
(not there [R]). This means that the user will still see the address they entered (or clicked on), but server remap
content question-url-path.html file with content of remapping-url-path.html file. Note that this time there are no square brackets - remapping is the default behavior of mod_rewrite.
- I write to web browser url http://example.com/directory/question-url-path.html
- I get content from URL http://example.com/directory/remapping-url-path.html
- but I still see the original address in the browser http://example.com/directory/question-url-path.html
Default behavior of mod_rewrite
remapping is the default behavior of mod_rewrite7.3 When is redirect and when in remapping
Default behavior for mod_rewrite module of apache web server is remapping. In what cases is a redirect?
- there is a clear instruction to redirect (eg [R])
- or can not be remapped - these are cases where the new address starts at http:// or https://. Then the server will not allow the page to remap, even if it was from its server. The following listing redirects even if it does not [R]:
RewriteRule (.*) https://www.mybluelinux.com
8. Variables from regular expressions
Redirect or remapping one file to another is not very useful. It's much better to find something in the called url and use it to call something else. That "something" will be variable
.
Maybe I can find an article number and use it to call a hidden url. The following example assumes that the pages are written in php and their addresses normally have a question mark (?). Articles need this url:
example.com/script.php?id=234
But I would like to refer to this page and write it without a question mark, for example
example.com/page-234
I will do this in the rule for mod_rewrite
to find the article number as a variable (it will be named $1) and use this variable to define what should be replaced. The rule entry looks like this:
RewriteRule ^page-(.*) script.php?id=$1
Explanation
- the user will ask for url-path page-543
mod_rewrite
will see it and notice that it matches the regular expression ^page-(.*). Conversion (.*) corresponds to any number of characters, and therefore corresponds to the string 543. The caption is stored in the variable $1 (the first because it is the first parenthesis).mod_rewrite
furtherremap
user content that finds it at script.php?id=$1- which now corresponds to script script.php?id=543 because $1 equals 543
- apache web server send as response the content of script.php?id=543 file
- this intricate address with a question mark and parameters will not see the user at all, this is a hidden url even if it is functional
- the user can see at the end of the address in web browser page-543 (this is a remapping)