|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
This is very widespread type of Web sites abuse which is very difficult to exterminate. So blocking them is futile exercise. You better redirect them to some page after detection that this is a low case request.
Robots which submit requests with all directories put in lower case in total probably represent the highest frequency of hits user of your Web site. I wonder if this is the case when authors of brain-dead Windows-based robots are trying to debug something on this site.
For Windows this abomination is not a big problem. For example there is an option in iis7 Enforce lowercase URLs (iis 7 - IIS7 and Enforce lowercase URLs - Stack Overflow)
Unix is and always was case sensitive as for both directories and files. You can get general impression about this activity using something like
cut -d ' ' -f 7 http_logs120903_0530.log | egrep "/[a-z]+/" | sort | uniqFor example
/utilities/teraterm.shtml /authentication/kerberos.shtml /links/russian/culture/music/female_singers/valentina_tolkunova.shtml /links/russian/culture/music/romances/yesenin_romances.shtml /solaris/security/rbac.shtml /history/multix.shtml /solaris/processes_and_memory/swap_space_management.shtml /social/toxic_managers/communication/negative_politeness.shtml /links/russian/culture/music/russian_duets.shtml /links/russian/culture/music/female_singers/lyudmila_gurchenko.shtml /people/gurtyak/programs/keyrus/keyrus73/keyrus.txt /tools/cat.shtml /lang/pl1.shtml /algorithms/des.shtml /tools/uniq.shtmlSometime they refer to non existent directories
/upc/share/jdk1.2/docs/api/java/security/package-summary.html /upc/share/jdk1.2/docs/api/java/sql/package-summary.html /upc/share/jdk1.2/docs/api/java/util/package-summary.html /upc/share/jdk1.2/docs/api/javax/accessibility/package-summary.html /upc/share/jdk1.2/docs/api/overview-summary.html /cdrom/jrl/index.htm /upc/share/jdk1.2/docs/index.html /Afs/rpi.edu/home/34/floydb1/html/composit.htmlMost commonly all the request is lower case. For example:
/admin/tivoli/tec /bookshelf/classic.shtml /tools/tail.shtml /office/open_office.shtml /social/toxic_managers/bullies.shtml /scripting/perlorama/modules/expect.shtml /links/russian/ice_skating/ice_age/navka_basharov.shtml /links/russian/culture/music/russian_waltzs.shtml /links/russian/culture/music/male_singers/vyacheslav_dobryinin.shtml /links/russian/culture/music/female_singers/tamara_miansarova.shtmlSometimes only one directory (this is typical for PHP probes) or a couple of directories are in lower case:
/Access_control/admin/categories.php/login.php?cPath=&action=new_product_preview /Admin/tips.shtml//wp-content/themes/MyApp/timthumb.php?src=http://wordpress.com.suppaddleboard.com/eva.php
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
WebmasterWorld
tedster
The mess is created by Windows servers. In their default configuration they are not case sensitive, but most of the other operating systems are, including those used by Google and other search engines.
There is some spidering evidence of Google trying to discover which sites are on non-case-sensitive servers, but that's a crazy job and I would not depend on Google or any other Search Engine getting it accurately sorted.
Help them out - if you can make all urls lower case, that is the best practice. If you can configure your server to be case sensitive, that's another best practice. If you have a URL that is already well ranked and it uses some uppercase, then know that changing those letters to lowercase does create alternative urls.
It is a rare thing to acquire a duplicate "penalty", but when the same content appears on technically different urls, then that kind of duplication has negative effects. Backlink influence gets split up, one or more of the url versions gets filtered out of search results and so on.
This is not a true penalty, as in a black mark against your domain. However, the ranking and traffic problems that are generated can feel like one.
Onders
From personal experience we have been trying to move over mixed case URL's to lower case ones for a while..
Although from a ranking perspective both can do equally well (on our site at least!) - the thinking is that from a user perspective, having easy URL's all of one type is much more memorable.. (we also have some hyphens and some underscores..)With the mixed case URL's which have backlinks to them we've been more reluctant. With the ones where there is no external influence we've just changed the URL and put a 301 on the old URL and no problem. If there are backlinks I'd think about trying to get them redirected and then putting a 301 on the old page...
But as was mentioned - if it aint broke.... Are you really sure you want to be tampering with it?!
Jan 20, 2011 | Stack Overflow
I was just reading an article on writing rules, from Scott GuTips/Trick: Fix Common SEO Problems Using the URL Rewrite Extension
He talks about the issue of excluding static files (.jpeg, .jpg, .gif, etc.) from the lowercase rewrite, and shows how you can add conditions to exclude files. Another article is where I found the condition for excluding more than just Scott's example
Mike's Umbraco blog - URL Rewriting and SEO
He adds the condition:
<add input="{URL}" pattern="^.*\.(axd|css|js|jpg|jpeg|png|gif)$" negate="true" ignoreCase="true" />
I hope this helps you in future rewrites.
This post describes some of the tips and tricks that one may find useful when solving URL-based problems for their web server or web site. Each tip/trick has a description of a problem and then an example of how it can be solved with IIS 7 URL Rewrite Module.
1. Add or Remove Trailing Slash
- Add or Remove Trailing Slash
- Enforce Lower Case URLs
- Canonical Hostnames
- Redirect to HTTPS
- Return HTTP 503 Status Code in Response
- Prevent Image Hotlinking
- Reverse Proxy to Another Site/Server
- Preserve Protocol Prefix in Reverse Proxy
- Rewrite/Redirect Based on Query String Parameter
- Avoid Rewriting of Requests for ASP.NET Web Resources
Many web applications use "virtual URLs" – that is the URLs that do not directly map to the file and directory layout on web server's file system. An example of such application may be an ASP.NET MVC application with URL format similar to this: http://stackoverflow.com/questions/60857/modrewrite-equivalent-for-iis-7-0 or a PHP application with URL format that looks like this: http://ruslany.net/2008/11/url-rewrite-module-release-to-web/. If you try to request these URLs with or without trailing slash you will still get the same page. That is OK for human visitors, but may be a problem for search engine crawlers as well as for web analytics services. Different URLs for the same page may cause crawlers to treat the same page as different pages, thus affecting the page ranking. They will also cause Web Analytics statistics for this page to be split up.
This problem is very easy to fix with a rewrite rule. Having or not having a trailing slash in the URL is a matter of taste, but once you've made a choice you can enforce the canonical URL format by using one of these rewrite rules:
To always remove trailing slash from the URL:
view plaincopy to clipboardprint?
- <rule name="Remove trailing slash" stopProcessing="true">
- <match url="(.*)/$" />
- <conditions>
- <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
- <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
- </conditions>
- <action type="Redirect" redirectType="Permanent" url="{R:1}" />
- </rule>
To always add trailing slash to the URL:
view plaincopy to clipboardprint?2. Enforce Lower Case URLs
- <rule name="Add trailing slash" stopProcessing="true">
- <match url="(.*[^/])$" />
- <conditions>
- <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
- <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
- </conditions>
- <action type="Redirect" redirectType="Permanent" url="{R:1}/" />
- </rule>
A problem similar to the trailing slash problem may happen when somebody links to your web page by using different casing, e.g. http://ruslany.net/2008/07/IISNET-Uses-Url-Rewrite-Module/ vs. http://ruslany.net/2008/07/iisnet-uses-url-rewrite-module/. In this case again the search crawlers will treat the same page as two different pages and two different statistics sets will show up in Web Analytics reports.
What you want to do is to ensure that if somebody comes to your web site by using a non-canonical link, then you redirect them to the canonical URL that uses only lowercase characters:
view plaincopy to clipboardprint?3. Canonical Hostnames
- <rule name="Convert to lower case" stopProcessing="true">
- <match url=".*[A-Z].*" ignoreCase="false" />
- <action type="Redirect" url="{ToLower:{R:0}}" redirectType="Permanent" />
- </rule>
Very often you may have one IIS web site that uses several different host names. The most common example is when a site can be accessed via http://www.yoursitename.com and via http://yoursitename.com. Or, perhaps, you have recently changed you domain name from oldsitename.com to newsitename.com and you want your visitors to use new domain name when bookmarking links to your site. A very simple redirect rule will take care of that:
view plaincopy to clipboardprint?
- <rule name="Canonical Host Name" stopProcessing="true">
- <match url="(.*)" />
- <conditions>
- <add input="{HTTP_HOST}" negate="true" pattern="^ruslany\.net$" />
- </conditions>
- <action type="Redirect" url="http://ruslany.net/{R:1}" redirectType="Permanent" />
- </rule>
To see an example of how that works try browsing to http://www.ruslany.net/2008/10/aspnet-postbacks-and-url-rewriting/. You will see in the browser's address bar that "www" is removed from the domain name.
4. Redirect to HTTPSWhen a site that requires SSL is accessed via non-secure HTTP connection, IIS responds with HTTP 403 (Unauthorized) status code. This may be fine if you always expect that your site visitors will be typing "https://…" in the browser's address bar. But if you want your site to be easily discoverable and more user friendly, you probably would not want to return 403 response to visitors who came over unsecure HTTP connection. Instead you would want to redirect them to the secure equivalent of the URL they have requested. A typical example is this URL: http://www.paypal.com. If you follow it you will see that browser gets redirected to https://www.paypal.com.
With URL Rewrite Module you can perform this kind of redirection by using the following rule:
view plaincopy to clipboardprint?
- <rule name="Redirect to HTTPS" stopProcessing="true">
- <match url="(.*)" />
- <conditions>
- <add input="{HTTPS}" pattern="^OFF$" />
- </conditions>
- <action type="Redirect" url="https://{HTTP_HOST}/{R:1}" redirectType="Permanent" />
- </rule>
Note that for this rule to work within the same web site you will need to disable "Require SSL" checkbox for the web site. If you do not want to do that, then you can create two web sites in IIS – one with http binding and another with https binding – and then add this rule to the web.config file of the site with http binding.
5. Return HTTP 503 Status Code in ResponseHTTP status code 503 means that the server is currently unable to handle the request due to maintenance. This status code implies that the outage is temporary, so when search engine crawler gets HTTP 503 response from your site, it will know not to index this response, but instead to come back later.
When you stop the IIS application pool for your web site, IIS will return HTTP 503 for all requests to that site. But what if you are doing maintenance to a certain location of the web site and you do not want to shut down the entire site because of that? With URL Rewrite Module you can return 503 response only when HTTP requests are made to a specific URL path:
view plaincopy to clipboardprint?6. Prevent Image Hotlinking
- <rule name="Return 503" stopProcessing="true">
- <match url="^products/sale/.*" />
- <action type="CustomResponse" statusCode="503"
- subStatusCode="0"
- statusReason="Site is unavailable"
- statusDescription="Site is down for maintenance" />
- </rule>
Image Hotlinking is the use of an image from one site into a web page belonging to a second site. Unauthorized image hotlinking from your site increases bandwidth use, even though the site is not being viewed as intended. There are other concerns with image hotlinking, for example copyrights or usage of images in an inappropriate context.
With URL Rewrite Module, it is very easy to prevent image hotlinking. For example the following rewrite rule prevents hotlinking to all images on a web site http://ruslany.net:
view plaincopy to clipboardprint?
- <rule name="Prevent image hotlinking">
- <match url=".*\.(gif|jpg|png)$"/>
- <conditions>
- <add input="{HTTP_REFERER}" pattern="^$" negate="true" />
- <add input="{HTTP_REFERER}" pattern="^http://ruslany\.net/.*$" negate="true" />
- </conditions>
- <action type="Rewrite" url="/images/say_no_to_hotlinking.jpg" />
- </rule>
This rule will rewrite a request for any image file to /images/say_no_to_hotlinking.jpg only if the HTTP Referer header on the request is not empty and is not equal to the site's domain.
7. Reverse Proxy To Another Site/ServerBy using URL Rewrite Module together with Application Request Routing module you can have IIS 7 act as a reverse proxy. For example, you have an intranet web server and you want to expose its content over internet. To enable that you will need to perform the following configuration steps on the server that will act as a proxy:
Step1: Check the "Enable proxy" checkbox located in Application Request Routing feature view is IIS Manager.
Step2: Add the following rule to the web site that will be used to proxy HTTP requests:
view plaincopy to clipboardprint?
- <rule name="Proxy">
- <match url="(.*)" />
- <action type="Rewrite" url="http://internalserver/{R:1}" />
- </rule>
Note the http:// prefix in the rewrite rule action. That is what indicates that this request must be proxy'ed, instead of being rewritten. When rule has "Rewrite" action with the URL that contains the protocol prefix, then URL Rewrite Module will not perform its standard URL rewriting logic. Instead it will pass the request to Application Request Routing module, which will proxy that request to the URL specified in the rule.
8. Preserve Protocol Prefix in Reverse ProxyThe rule in previous tip always uses non-secure connection to the internal content server. Even if the request came to the proxy server over HTTPS, the proxy server will pass that request to the content server over HTTP. In many cases this may be exactly what you want to do. But sometimes it may be necessary to preserve the secure connection all the way to the content server. In other words, if client connects to the server over HTTPS, then the proxy should use "https://" prefix when making requests to content server. Similarly, if client connected over HTTP, then proxy should use "http://" connection to content server.
This logic can be easily expressed by this rewrite rule:
view plaincopy to clipboardprint?9. Rewrite/Redirect Based on Query String Parameters
- <rule name="Proxy">
- <match url="(.*)" />
- <conditions>
- <add input="{CACHE_URL}" pattern="^(https?)://" />
- </conditions>
- <action type="Rewrite" url="{C:1}://internalserver/{R:1}" />
- </rule>
When rewriting/redirection decisions are being made by using values extracted from the query string, very often one cannot rely on having the query string parameters always listed in exact same order. So the rewrite rule must be written in such a way so that it can extract the query string parameters independently of their relative order in the query string.
The following rule shows an example of how two different query string parameters are extracted from the query string and then used in the rewritten URL:
view plaincopy to clipboardprint?
- <rule name="Query String Rewrite">
- <match url="page\.asp$" />
- <conditions>
- <add input="{QUERY_STRING}" pattern="p1=(\d+)" />
- <add input="##{C:1}##_{QUERY_STRING}" pattern="##([^#]+)##_.*p2=(\d+)" />
- </conditions>
- <action type="rewrite" url="newpage.aspx?param1={C:1}¶m2={C:2}" appendQueryString="false"/>
- </rule>
With this rule, when request is made to page.asp?p2=321&p1=123, it will be rewritten to newpage.aspx?param1=123¶m2=321. Parameters p1 and p2 can be in any order in the original query string.
10. Avoid Rewriting of Requests for ASP.NET Web ResourcesASP.NET-based web applications very often make requests to WebResources.axd file to retrieve assembly resources and serve them to the Web browser. There is no such file exists on the server because ASP.NET generates the content dynamically when WebResources.axd is requested. So if you have a URL rewrite rule that does rewriting or redirection only if requested URL does not correspond to a file or a folder on a web server's file system, that rule may accidentally rewrite requests made to WebResources.axd and thus break your application.
This problem can be easily prevented if you add one extra condition to the rewrite rule:
view plaincopy to clipboardprint?
- <rule name="RewriteUserFriendlyURL1" stopProcessing="true">
- <match url="^([^/]+)/?$" />
- <conditions>
- <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
- <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
- <!-- The following condition prevents rule from rewriting requests to .axd files -->
- <add input="{URL}" negate="true" pattern="\.axd$" />
- </conditions>
- <action type="Rewrite" url="article.aspx?p={R:1}" />
- </rule>
Google matched content |
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March, 29, 2020