|Home||Switchboard||Unix Administration||Red Hat||TCP/IP Networks||Neoliberalism||Toxic Managers|
May the source be with you, but remember the KISS principle ;-)
Skepticism and critical thinking is not panacea, but can help to understand the world better
|News||"Everything in the Cloud" Utopia||Recommended Links||Papers||Seven sisters||Site Search Engine
|Search engines privacy|
|Privacy is Dead – Get Over It||Pitfalls of Google as a Unix Information Search Engine||Google Embedded Tracking and Hidden Redirects in Search Results||Is Google evil ?||Issues of security and trust in "cloud" env||Humor||Etc|
For decades cultural critics have warned against the future where giant centralized organization (the government, or huge corporate ion) applies total spying over its citizens to create an effectively totalitarian society: it is such fears that have given rise to the adjective "Orwellian,", after George Orwell dystopia 1984. Those fears can now be called "Googlean"...
Like any large company Google is burdened by mediocre middle managers, egomaniac founders and shady connections including connections to NSA and other three letter agencies. With time Google focus in search became not on search quality, but of revenue. Workforce is mostly young and thus malleable (The Independent).
“Unfortunately, in spite of the common belief, I think the average level of Google engineers is mediocre. With a lot of arrogance, too. Everybody believes he (males dominate) is better than his neighbor.”
... ... ...
"If you look at Google products, you see tons of clutter, useless features, lack of simplicity/elegance, and unwarranted focus on technical complexity."
... ... ...
“I’d say the relentless daily mediocre thinking of middle management types who are completely focused on metrics to the exclusion of all other factors. They don't want to rock the boat, they don't know how to inspire their workforce, and they rely far too much on the Google name and reputation to do that for them.”
We all know more or less how Google works: Links act as votes, and the more votes a page has, the higher its PageRank. Plus unknown number of human Google Web slaves manually rank the pages that have PageRank above certain level. We also know that by monopolizing search Google became the eye that is watching what you are doing on the Internet. And if somebody knows what searches I performed on Google he knows quite a bit about me. Storing searches that Google practice is a threat to privacy, threat of similar magnitude when somebody knows social books that I read (hello Amazon ;-), or what items I bought (hello Visa and Mastercard).
The links to the page are the key to this system, they along with the rank of "quoting" site determine relevance. This is similar to the way references in academic journals are valued: it's most important how many researchers refer the given paper (citation index). Not a rocket science. Of course Google does not rank SOLELY based on links to a page, they use a combination of the number of links to the page, the text on the page, its position is search results ( previous PageRank ), etc just like every other search engine. In other words PageRank is evolving and now takes into account offer variables.
But the scale on which Google perform this is amazing. Thousands if not millions computers are involved, consuming tremendous amount of electricity. I suspect that Google caches most of the Web content. And here quantity turns into quality.
Google's second role as a advertisement agency spoils efforts to make the best serach engine and permit it to do a decent job only in locating the authoritative source for any particular topic that is not popular or politically, socially or technically important. As soon as topic is important and being in the first dozen of popular Google search phrases can be monetized Google became a very weak search engine that is typically defeated by those who can put enough effort to subvert PageRank for their purposes. And as click on the link can be monetized, Google as a company contains within itself the weapon for destruction of search results of its engine. The links that are the key to this system, no longer determine relevance of the page, they determine mainly the degree of the author to earn money for the topic. I think it's fairly obvious that for this reason Google is tampering with the basic PageRank framework, though we don't know specific area in which it is tweaked to cut the most blatant abuse.
One can exploit this system by creating "fake" or "pointer" sites that serve only to drive up the rankings for another site. With good coordination among a sizable group of people, or a single person with lots of free time and some money Google can be fooled. Such a practice is known as Google Bombing, and is not only possible but quite profitable. Google unleashed the Googe API, a SOAP interface that allows developers to query Google and retrieve results without having to use the normal HTML form interface. This unleashed a new forms of Google bombing. As a result many Google search results has "junk" among top findings. Previously there were a lot of malware sites that got high in Google search result which made Google the top malware distribution mechanism in the world. Malware authors also regularly purchased "Google words" to refer searches to their sites. Now Google seems start paying some attention to its role as a malware propagator ;-).
Still this page ranking algorithms allows Google to do more or less decent job of locating the authoritative source for any particular topic. It is not perfect, it's not the best in all cases. See Pitfalls of Google as a Unix information Search Engine.
For example Bing often produces better set of findings for queues about Windows. Yandex has good search capabilities for LiveJournal. But it is well implemented (unlike Gmail ;-) and is frankly quite useful. While other search engines have their own strong point Google is dominant search engine by a wide margin.
Another interesting feature of Google (probably created with close cooperation with NSA) is That it can attribute quite with amazing precision. In this particular business it has no equals.
There has been some debate about the degree to which weblogs affect Google's rankings due to its "lemmings" effect, facilitating large numbers of people simultaneously link to the latest and greatest trends, fads, memes, or news bites on the internet. I do not know the answer, but it looks it does distort Google results considerably.
In many ways Google became Microsoft of Web search and represents a danger for firms competing in the same space due to its size:
"The Thomson Gale publishing group has put together a comprehensive review of Google Scholar, and they find it highly lacking compared with similar offerings from Highwire Press, Scopus, and The Web of Science. Still Google's overhyped offerings can drive these superior services out of the market?"
The perks of being a Google employee are legendary. If you manage to fight your way through the notoriously obtuse interview questions then you're rewarded with corporate nirvana: a sprawling Promised Land of deconstructed office space and free amenities.
Despite this Google has the fourth highest employee turnover of any Fortune 500 company (Amazon is number two) so why can't it hang on to its employees?*
In a thread on Quora we can find at least some answers. The question asks "What's the worst part about working at Google?" and self-proclaimed Googlers ('Xooglers' as they're known - though this lot are mostly anonymous) have populated the thread to carp about life at the search giant.
We've hand-picked some of the choicest extracts below, or you can check out the thread in full here.
*We're being a little unfair here: high turnover isn't necessarily a bad thing and is at least partly a reflection of a fast-moving industry full of very, very employable people.
Google moved away from its start-up roots a long time ago
"Google was not a start-up environment by the time I left. The same office politics. It was easy to get promoted if you worked on the right projects and projected your work in the right way."
And because it's so well-tuned, there are quite a few boring jobs
"Google is an incredible machine that prints money thanks to AdWords. Unless you are an amazingly talented engineer who gets to create something new, chances are you're simply a guy/girl with an oil can greasing the cogs of that machine."
Basically, everyone's over-qualified because everyone's amazing
"I worked at one of the larger non-MV [Mountain View, Google's HQ] campuses, and the only intellectual stimulation I encountered in my time there was the interview process."
"The other thing is that its very hard to have *huge* impact at Google. Most of the large exciting problems were already solved, so you probably will end up working in the smallest meaningless tiny feature nobody cares about."
And that means people can be arrogant too
"Unfortunately, in spite of the common belief, I think the average level of Google engineers is mediocre. With a lot of arrogance, too. Everybody believes he (males dominate) is better than his neighbor."
The dominating engineering culture can hurt productivity
"There is not enough focus on product and visual design. This has led to many aborted/semi-successful products, like Wave, Google Video, Buzz, Dodgeball, Orkut, Knol, and Friend Connect. There is probably too much focus on pure engineering"
"If you look at Google products, you see tons of clutter, useless features, lack of simplicity/elegance, and unwarranted focus on technical complexity."
And this can also make for bad managers
"I'd say the relentless daily mediocre thinking of middle management types who are completely focused on metrics to the exclusion of all other factors. They don't want to rock the boat, they don't know how to inspire their workforce, and they rely far too much on the Google name and reputation to do that for them."
With all those amenities, there's actually less private space
"It's not uncommon to see 3-4 employees in a single cube, or several managers sharing an office. With all the open areas for food, games, TV, tech talks, etc, it can be surprisingly hard to find a quiet, private place to think."
And discipline can also be a problem
"There was no discipline in the offices. People chatted about random things on the emailing lists, often insulting each other. I once emailed a very big team asking a genuine question (as an external customer of their product). The response was sarcastic. If you try to do that at a company like Amazon, you will be immediately reprimanded (or so I think)."
Basically it seems that what makes Google so appealing can also be a problem for some
"I've always said that Google is hands-down the best corporate in the world. You get to work with incredible products, inspiring people, enjoy amazing perks, have unforgettable experiences, and get paid very well. It's all so incredibly easy. And that's the best part and the worst part about working at Google."
A Google malware researcher gave a rare peek inside the company's massive anti-malware and anti-phishing efforts at the SecTor conference here, and the data the company has gathered shows that the attackers who make it their business to infect sites and exploit users are adapting their tactics very quickly and creatively to combat the efforts of Google and others. While Google is still a relative newcomer to the public security scene, the company has deployed a number of services and technologies recently that are designed to identify phishing sites, as well as sites serving malware, and prevent users from finding them. The tools include the Google SafeBrowsing API and a handful of services that are available to help site owners and network administrators find and eliminate malware and the attendant bugs from their sites. Fabrice Jaubert, of Google's anti-malware team, said the company has had good luck identifying and weeding out malicious sites of late. Still, as much as 1.5 percent of all search result pages on Google include links to at least one malware-distribution site, he said."
Circuit Diagram Wiring
The research, titled "The Ghost in the Browser: Analysis of Web-Based Malware," reported that an adversary who can successfully compromise a victim's browser can gain access to banking and medical records, authorization passwords, and personal communication records.
Google said that in its analysis of several billion URLs and an in-depth look at 4.5 million Web sites over a 12-month period, it discovered 450,000 sites were successfully launching drive-by-downloads of malware code.
Graham Cluley, a senior analyst with security firm Sophos, said researchers at his firm agree with Google's findings. "Everybody needs to learn to protect themselves better from these kind of attacks," he said. "More and more businesses are recognizing the need to scan their Web gateway just as they do their e-mail gateway to keep abreast of emerging threats."
Google also concluded that average computer users have no way to protect against these threats. "Their browser can be compromised just by visiting a page and become the vehicle for installing multitudes of malware on their systems," the nine-page report announced.
Google discovered that some of the most common malware sites were those that contained advertising. Sites that offer up user-generated content, such as blogs and forums, and those that offer third-party widgets, such as free traffic counters, are also commonly used by attackers looking to install code that makes victims of visitors.
As many antivirus engines rely on creating signatures from malware samples, adversaries can prevent detection by changing their code more frequently than antivirus engines are updated with new signatures, according to the Google study.
Although Cluley agreed with Google's research, he said it's important to clarify the threat. Some news headlines, he noted, have declared that Google's research revealed one in 10 Web sites are infected. But, he added, that's not accurate. The one-to-10 ratio is only true of the pages that Google already decided were worthy of further investigation, he clarified.
In its own research, Sophos discovers an average of 8,193 new malicious Web pages each day. What's most worrying, Cluley argued, is that 70 percent of these infected Web pages are on legitimate Web sites. In other words, the offending pages are often on sites that have been hacked or had malware planted on them without the owner of the Web page necessarily knowing.
"The Web is the new battleground between the good guys and the bad guys - if you have not already defended yourself then there is no time to lose," Cluley said. "Defense can come in the form of multilayered protection, such as desktop, e-mail, and Web gateways, but should be combined with security updates for your browsers and client firewalls."
Source : http://news.yahoo.com/s/nf/20070521/bs_nf/52404;_ylt=AvT1YODt6VBMRgslilBE5v8jtBAF
January 29, 2007 | 104 comments
Written by Charles S. Knight, SEO, and edited by Richard MacManus. The Top 100 is listed at the end of the analysis.
Ask anyone which search engine they use to find information on the Internet and they will almost certainly reply: "Google." Look a little further, and market research shows that people actually use four main search engines for 99.99% of their searches: Google, Yahoo!, MSN, and Ask.com (in that order). But in my travels as a Search Engine Optimizer (SEO), I have discovered that in that .01% lies a vast multitude of the most innovative and creative search engines you have never seen. So many, in fact, that I have had to limit my list of the very best ones to a mere 100.
But it's not just the sheer number of them that makes them worthy of attention; each one of these search engines has that standard "About Us" link at the bottom of the homepage. I call it the "why we're better than Google" page. And after reading dozens and dozens of these pages, I have come to the conclusion that, taken as a whole, they are right!
The Search Homepage
In order to address their claims systematically, it helps to group them into categories and then compare them to their Google counterparts. For example, let's look at the first thing that almost everyone sees when they go to search the Internet - the ubiquitous Google homepage. That famously sparse, clean sheet of paper with the colorful Google logo is the most popular Web page in the entire World Wide Web. For millions and millions of Internet users, that Spartan white page IS the Internet.
Google has successfully made their site the front door through which everyone passes in order to access the Internet. But staring at an almost blank sheet of paper has become, well, boring. Take Ms. Dewey for example. While some may object to her sultry demeanor, it's pretty hard to deny that interfacing with her is far more visually appealing than with an inert white screen.
A second example comes from Simply Google. Instead of squeezing through the keyhole in order to reach Google's 37 search options, Simply Google places all of those choices and many, many more all on the very first page; neatly arranged in columns.
A second arena is sometimes referred to as Natural Language Processing (NLP), or Artificial Intelligence (AI). It is the desire we all have of wanting to ask a search engine questions in everyday sentences, and receive a human-like answer (remember "Good Morning, HAL"?). Many of us remember Ask Jeeves, the famous butler, which was an early attempt in this direction - that unfortunately failed.
Google's approach, Google Answers, was to enlist a cadre of "experts." The concept was that you would pose a question to one of these experts, negotiate a price for an answer, and then pay up when it was found and delivered. It was such a failure, Google had to cancel the whole program. Enter ChaCha. With ChaCha, you can pose any question that you wish, click on the "Search With Guide" button, and a ChaCha Guide appears in a Chat box and dialogues with you until you find what you are looking for. There's no time limit, and no fee.
Perhaps Google's most glaring and egregious shortcoming is their insistence on displaying the outcome of a search in an impossibly long, one-dimensional list of results. We all intuitively know that the World Wide Web is just that, a three dimensional (or "3-D") web of interconnected Web pages. Several search engines, known as clustering engines, routinely present their search results on a two-dimensional map that one can navigate through in search of the best answer. Search engines like KartOO and Quintura are excellent examples.
Recommendation Search Engines
Another promising category is the recommendation search engines. While Google essentially helps you to find what you already know (you just can't find it), recommendation engines show you a whole world of things that you didn't even know existed. Check out What to Rent, Music Map, or the stunning Live Plasma display. When you input a favorite movie, book, or artist, they recommend to you a world of titles or similar artists that you may never have heard of, but would most likely enjoy.
Next we come to the metasearch engines. When you perform a search on Google, the results that you get are all from, well, Google! But metasearch engines have been around for years. They allow you to search not only Google, but a variety of other search engines too - in one fell swoop. There are many search engines that can do this, Dogpile, for instance, searches all of the "big four" mentioned above (Google, Yahoo!, MSN, and Ask) simultaneously. You could also try Zuula or PlanetSearch - which plows through 16 search engines at a time for you. A very interesting site to watch is GoshMe. Instead of searching an incredible number of Web pages, like conventional search engines, GoshMe searches for search engines (or databases) that each tap into an incredible number of Web pages. As I perceive it, GoshMe is a meta-metasearch engine (still in Beta)!
Other Alt Search Engines
And so it goes, feature after feature after feature. TheFind is a better shopping experience than Google's Froogle, IMHO. Like is a true visual search engine, unlike Google's Images, which just matches your keywords into images that have been tagged with those same keywords. Coming soon is Mobot (see the Demo at www.mobot.com). Google Mobile does let you perform a search on your mobile phone, but check out the Slifter Mobile Demo when you get a chance!
Finally, almost prophetically, Google is silent. Silent! At least Speeglebot talks to you, and Nayio listens! But of course, why should Google worry about these upstarts (all 100 of them)? Aren't they just like flies buzzing around an elephant? Can't Google just ignore them, as their share of the search market continues to creep upwards towards 100%, or perhaps just buy them? Perhaps.
The Last Question
Issac Asimov, the preeminent science fiction writer of our time, once said that his favorite story, by far, was The Last Question. The question, for those who have not read it, is "Can Entropy Be Reversed?" That is, can the ultimate running down of all things, the burning out of all stars (or their collapse) be stopped - or is it hopelessly inevitable?
The question for this age, I submit, is… "Can Google Be Defeated"? Or is Google's mission "to organize the world's information and make it universally accessible and useful" a fait accompli?
Perhaps the place to start is by reading (or re-reading) Asimov's "The Last Question." I won't give it away, but it does suggest The Answer….
Charles Knight is the Principal of Charles Knight SEO, a Search Engine Optimization company in Charlottesville, VA.
The Top 100
For an Excel spreadsheet of the entire Top 100 Alternative Search Engines, go to: http://charlesknightseo.com/list.aspx or email the author at Charles@CharlesKnightSEO.com.
This list is in alphabetical order. Feel free to share this list, but please retain Charles' name and email.
Update: Thanks Sanjeev Narang for providing a hyperlinked version of the list.
Update, 5 February 2007: Charles Knight has left a detailed comment (#94) in response to all the great feedback in the comments to this post. He also notes:
"...while it looks like a very simple, almost crude list of 100 names, it has taken countless hours to try and do it properly and fairly. The list will be updated all year long, and the Top 100 can only get better and better until the Best of 2007 are announced on 12/31/07."
- d e c i p h o
- digg labs swarm
- gnn o d
- MS. DEWEY
- OiHoi Search
- retrevo gamma
- ROLLYO O
- Simply Google
- Singing FISH
- S R C H R
- Web 2.0
- What to RENT?
- Yahoo! MINDSET
Posted by: Yakov | January 29, 2007 6:01 AM
Interesting list. I guess this list is the top 100 AFTER Google, Ask, MSN and Yahoo.
I would have to list Vivisimo above some of the others you have listed here.
I too don't have the time to go through all of them but I'd like to know if any of your top 100 are vertically focussed. I've been following one vertical in particular, health, and I don't see any of the ones I found to be useful.
"The Thomson Gale publishing group has put together a comprehensive review of Google Scholar, and they find it highly lacking compared with similar offerings from Highwire Press, Scopus, and The Web of Science. Will Google's overhyped offerings drive these superior services out of the market?"
Harvest A Distributed Search System
Noted-L (1 of 3) [Noted] Google Adds Wildcards to Phrases
If you're a regular ResearchBuzz reader you already know how to
search for phrases in Google using "wildcard words" -- you just use
the word "the," which Google always considers a stopword. So, search
for "three the mice" in Google and you'll find three green mice,
three blue mice, three blind mice, etc.
Google has made using "the" unnecessary by adding a word-sized
asterisk to its search syntax. What is a word-sized asterisk? It's
an asterisk you can use in place of a word; "three * mice" will find
three green mice, three blind mice, etc. This asterisk CANNOT be
used for part of a word. If you try to search for "three bl* mice"
you'll get no results. Thanks to Gary Price for this tip.
J C Lawrence
Fortunato - May 20th 1998, 07:05 EST
Isearch is software for indexing and searching text documents. It supports full text and field based search, relevance ranked results, Boolean queries, and heterogeneous databases. Isearch can parse many kinds of documents "out of the box," including HTML, mail folders, list digests, SGML-style tagged data, and USMARC. It can be extended to support other formats by creating descendant classes in C++ that define the document structure. It is pretty easy to customize in this way, provided that you know some C++ (and you will need to ftp the source code). A CGI interface is also included for web based searching.
ftp://ftp.redhat.com/pub/contrib/hurricane/i386/Isearch-1.41-1.i386.rpm (1 hit)
Anyone who's ever used a search engine knows about broken links. Lousy interfaces. 10,000 returns with no meaningful results. Missing pages.
Essentially, a search engine is a type of software that creates indexes of databases or Web sites based on content. When you submit a search term, it goes out and "reads" its indexes and returns applicable results. Excite, HotBot and Lycos are popular examples.
A recent report by Nielsen/NetRatings Inc. shows search engine popularity is slipping. Click for more. And a study conducted at the NEC Research Institute confirms search engines can't keep up with the Web's rapid growth. Click for more. NECRI discovered:
- The Web has an estimated 800 million searchable pages
- It takes more than six months for a new page to show up on a search engine
- Even the best engine, Northern Light, only searches one-sixth of the Net's pages
Metasearch sites are a timesaver because they query multiple engines simultaneously. Examples include:
These hubs weed through the information glut with topic guides featuring recommended sites. They'll also recommend sites based on your search terms. Theoretically, Yahoo is a directory. But I prefer sites that put an emphasis on quality rather than quantity. Examples:
The right utility does wonders for streamlining searching. Here are three of my favorites, and you'll find more at the ZDNet Software Library. Click for more.
Alexa got my Natural Born Killer's nod. Click for more. This free browser add-on acts as helpful backseat driver when you surf. Offering stats on sites, including ratings by Alexa's thousands of users. Click for more.
BullsEye lets you search and store info. The searchers are highly customizable. For instance, it will search for industry-specific business news. Click for more.
Copernic has an easy learning curve for the novice (or frustrated searcher). It searches multiple sites (as narrowly or widely as you'd like), validates links and features good custom sorting and multi-threading options. Click for more.
The "Google Boxes" you mention won't allow you to increase your site's ranking, unless it's already up there in the top ten. Very different from Google bombing, which can bring an unknown site to the top.
As you pointed out and can be read about originally here, each site A which links to a site B increases B's ranking by a small amount, but doesn't affect A directly.
A's own ranking can only be affected positively if there are loops of the form A->B->some other sites->A, and the smaller the loop, the higher the effect on A. That's because all sites within the loop are affected, with decreasing benefits. The most affected site will be B, followed by B's successor, followed by B's successor's successor, etc, up to A, who gets a very small boost if the loop is large.
The total increase in A's ranking is a result of adding the small increases for all possible loops.
Now suppose you're an unknown site. Nobody links to you, but you decide to link to the top ten. Since these don't link back to you, or in a very very roundabout way, you'll get zero benefit from your Google Box.
Now suppose you're a top ten site, and half the top ten sites link back to you directly. If you link to each of them, you'll get a relatively large boost to your own ranking back from each of them, hence a noticeable increase. But if you had already linked to all of them before, you won't get any benefit, since I believe Google counts multiple links to the same page as one.
You'll note that the bit which allows you to increase your site's ranking requires lots of links, just like in Google bombing. If you've read so far, I think you'll agree that Google bombing is the better technique (which, for the record, I don't condone).
I think it's fairly obvious that Google are tampering with the basic PageRank framework, though we (I?) don't know for sure. For example they allow you to search by language, or with more or less pr0n. The fact that they can identify/classify documents this way means they have a framework in place.
So it should be relatively easy for them to include document weights if they wanted to which take the classifications into account, as rusty and jsled already proposed. Simple document weighting is really easy to do, but the trick of course is to end up with useful weights.
Having said all this, your particular example can be explained without resorting to advanced PageRank modifications.
Other people have reported similar phenomena (too lazy to find the links), with the following explanation: sometimes, people publish web server access logs (maybe inadvertently) which the search engine crawlers find. On those logs, there's a whole lot of information, including the IP of the client's machine, and perhaps the referral address (ie the last web address visited before the server was queried). Google may use these addresses as if they were a direct link to your site. This might have occurred if you have ever fired up your browser, browsed your own site, and then browsed some other site in the same session.
All Eyes on Google
Google was launched less than four years ago by two graduate students in computer science: one, a Russian émigré named Sergey Brin, now 29; the other, a Michigan-reared engineer named Larry E. Page, now all of 30. As a gateway to 3 billion Web pages, Google is a strangely unadorned site: 37 words, four tabs and a blank space where you type in a query of up to 10 words. Google's over 10,000 networked Google computers crawl through an index to those 3 billion pages, rank them with an equation that includes 500 million variables and spit out up to a few thousand listings. The ranking takes 500 milliseconds; the computers can handle a peak rate equal to 7 million queries per hour.
But Google has become much more than merely a search service. It is a daily tool and main entry point for millions of users, stealing the spotlight from the browser (Explorer or whatever) and Internet portals like Yahoo. It is a labor of love for programmers, who have built applications off of Google and posted them like trophies on the Web. One does a "smackdown," comparing the Internet ubiquity of two words ("love" beats "money," but not by much); another creates poems (see boxes).
For Wall Street and Silicon Valley, Google is the great bright hope for an initial public offering that might revive moribund tech stocks. And Google has become its own meme, the stuff of New Yorker cartoons and a brand, like Kleenex and Band-Aid, that is in danger of becoming a part of the English language. You don't search for something on the Web anymore. You Google it.
Google now can be queried in 36 languages, with more to come. At the posh Hotel Bel Air, in Los Angeles, manager Lisa Hagen makes a point of Googling all guests before arrival, searching out better ways to spoil them. "If we find out they like to jog early in the day, we make sure they get a room with morning sun," she says. In Boston, Mark Kini manages a small limousine service that spends 80% of its ad budget on Google and other search sites. Says he: "It's how we survive the recession." In Westport, Conn. consultant Elena Amboyan's kids use Google daily; even when they research something at the library, they say they're Googling it.
It is all much more than Brin and Page ever had in mind when they started. "Sure, I'm surprised by the success," says Brin, unassuming, rumpled and wiry, his sneakers scuffing the upholstery of a conference-room chair. Users love Google, he says, because they find things there when they are desperate to know an answer. Keep offering better results and you hold their loyalty forever--and sell them stuff. Page adds that Google has become "like a person to them, helping them and giving them intelligence any hour of the day."
The passion and success igniting Google, and its emergence as a new interface for the Internet, have made it a rich, fat target for rivals. Yahoo (NasdaqNM:YHOO - News) is taking aim. So is the biggest search outfit, Overture (NasdaqNM:OVER - News), a little-known billion-dollar vendor that provides unbranded search services for other Web sites and has sued Google, alleging patent infringement. A gaggle of some 200 Web sites in China is reportedly going after Google, too.
And now Google faces the most lethal threat of all: Microsoft (NasdaqNM:MSFT - News), aroused, is taking aim at the popular site. This bears an eerie resemblance to the rise--and calamitous fall--of Netscape, the first commercially successful Web browser.
Will Google be the next victim of a Windows that swallows everything? To help ensure a future, Brin and Page brought in a grown-up as chief executive, Valley veteran Eric Schmidt, 48. Fittingly, Schmidt had abundant experience struggling against Microsoft in his two previous jobs: He was chief technology officer at Sun Microsystems (NasdaqNM:SUNW - News), then chief executive of Novell (NasdaqNM:NOVL - News), two companies that thought, wrongly, they had Microsoft licked. Google's founders credit Schmidt with successfully managing their company's most intense period of growth.
To survive and succeed will require lots of talent, lots of acquisitions and lots more money. More important, Google will need to quell the hubris that is much in abundance at the jubilant company these days. To be at Google is to bask in your own public relations. The hallways of the company's four buildings in Mountain View, Calif. are decorated with articles from around the world praising the company. One current job posting includes duties as Google's company historian. Over 70 of the 800 employees have Ph.D.s. Google's head of engineering admits his big-brained staff is in awe of itself; he hopes the simplicity of the Google page masks that from the outside world.
In some ways Google feels like the giddy dot-coms of the stock-market bubble, circa 1999. Informal to a fault, Google offices are littered with party-colored lava lamps, bins of free Coke and candy and giant plastic balls that invoke Google's multicolored logo. The cafeteria serves free lunch to the workaholic ranks (and dinners, too; there's lots of code to write). When pizza gets delivered at one o'clock in the morning, plenty of people are on hand to devour it. Every day a thousand more résumés arrive from people hoping to join this work party.
But the dot-com parallels end when you look at the finances. The dot-bombs burned through tons of other people's money. Google makes a pile of cash on its own. After it went live in September 1999--six months before the Internet bubble finally popped--Google took in perhaps $25 million in 2000. Then it leaped fourfold to approach $100 million in 2001 and tripled to $300 million last year. Its gross could more than double this year to $700 million, estimates Safa Rashtchy of U.S. Bancorp Piper Jaffray.
Google, privately held--and determinedly so, for now--won't talk numbers, but it does brag that it just logged its ninth consecutive profitable quarter.Its revenue flows include ads (the bulk); search services for Yahoo, America Online and other sites (perhaps $100 million there); and custom-tailored, bright yellow servers for corporate accounts.
"Cheesy as it may sound," says cofounder Brin about the company's early days, "we never thought in terms of revenue streams." Now he must, for the next year or two could determine whether Google delivers on the high hopes it inspires in so many quarters or instead falters, glorying in its early success while others plot its doom.
Google traces back to 1995, when Sergey Brin and Larry Page, whose fathers taught college math, met at Stanford. The sons saw search as an interesting problem in organizing very large datasets.
At the time, users typed in a few words and got a list of thousands of Web sites using those words, but most of the results were irrelevant. Brin and Page quelled users' frustrations by adding order to this randomness. They judged a listed site's prominence by how many other Web sites valued it enough to have links to it. They gave sites a resulting "Page rank" (for Larry, not Web pages). This cliquey if democratic approach was later augmented by other algorithms that weight sites by other variables--news sites get a higher ranking than a 16-year-old's personal Web log.
The two grad students soon found their results were a step above any other kind of search. They had dubbed this system Back Rub, after the "back links" that pointed to a site. They adopted the name Google in early 1997, in a tribute to scale, a play on the number known as a googol--a one followed by a hundred zeros. The universe does not contain a googol atoms. The denizens of the company headquarters breezily refer to it as the Googleplex, that being the word for the unimaginably large number defined as a one followed by a googol of zeros.
Brin and Page introduced Google to the world in a paper they presented at the World Wide Web Conference in April 1998. Naively, they were downright hostile to advertising, calling it "insidious … because it is not clear who 'deserves' to be there, and who is willing to pay money to be listed." A few hundred million in revenues later, Brin has changed his mind. On a Google results page, he says, "There are eight spots for ads and ten search results. It's a lot of room for diversity."
Soon after, the pair began trying to sell their technology to Web sites, including Infoseek, Excite and Yahoo. They found no takers; one chief executive told them that if his site could search only 80% as well as everyone else's, that was okay by him. "That company is now out of business," Page says. Then their faculty adviser invited them to a breakfast with Sun Microsystems cofounder Andreas Bechtolsheim on the Stanford campus. Midway through the demo, Bechtolsheim stopped them and wrote a check for $100,000 to Google Inc.
This presented a problem, as Google didn't yet have a bank account. There wasn't even a "Google Inc."--they hadn't yet decided to form a company. The check sat in a drawer for several weeks, and then they got serious.
By June 1999 Google had raised almost $30 million from venture firms Sequoia Capital and Kleiner Perkins Caufield & Byers, plus Stanford and individual investors. Three months later the Google site officially blasted off. It could scan 30 million Web pages. Today it culls 100 times as many, and still taps only half the Internet; the rest lies behind corporate firewalls or in isolated islands unlinked to anything else.
As Google began to thrive, the Web world was crashing, and this, too, proved lucky for the pair of founders. As dot-coms collapsed, Google took over cheap office space, barely used Aeron chairs, dozens of servers and platoons of out-of-work programmers. By mid-2001 Google was profitable, employed several hundred people and was seeing traffic grow 20% every month. Thriving despite the surrounding downturn, Google went shopping for a seasoned chief executive. "My job was to impose a little order," Schmidt says now. "I made it clear that I wasn't coming in to get rid of the founders." Sergey Brin gave up his chairman mantle and assumed the title of president of technology; Page, who had been chief executive, is product president.
While the two techies concentrated on improving their search formulas, Schmidt focused more on building a better business model. Google had run ads with its search results for a while, but on a fixed-fee basis. Its main rival, Overture, publicly held and with $668 million in sales last year (it projects $1 billion in revenues this year), had already gone a step further. It exacted higher fees from advertisers by selling them rights to given keywords so their ads pop up first when those words are entered in a query. Sponsors paid on a cost-per-click basis instead of the usual cost-per-thousand-visitors.
At one point in 2001, Google officials even met with Overture to compare notes, Overture officials say. In December 2001 Google started a similar test on its Usenet section, unveiling a service called Adwords. The response was so enthusiastic that, by February 2002, Adwords had been extended to all Google listings. It grew to 100,000 bidders in ten months, and thousands more advertisers are still signing up. Total Web advertising fell about 5% last year, to $6.5 billion, while search ads almost tripled to $1.4 billion and could hit $7 billion in five years, says Piper Jaffray's Rashtchy. (Google itself advertises very little, instead relying on word of mouth.)
"Some companies have purchased thousands of keywords, and they use them to test multiple products against multiple words," says Sheryl Sandberg, director of the wildly successful Adwords program. Noting most all Adwords bidders are U.S.-based, while half of Google searches are by users overseas, Sandberg sees huge growth in foreign markets. "The monetization should follow. This is a global bid," she says. Ads are sold in 11 languages.
Schmidt calls the success of Adwords "a total accident--when we went off fixed pricing, my only directive was В'Just don't let revenues drop.'" His foes at Overture allege instead patent infringement, suing Google in April of last year; one month later America Online (NYSE:AOL - News) dropped Overture in favor of Google Adwords. The case is likely to drag on for a long time.
In Adwords, businesses use an auction system on the Google site to bid for the most popularly searched words and phrases. Google gets paid every time someone clicks on the ad itself. Bids start at 5 cents per click but can go to $15 or more for high-end products like helicopter parts. Critically, Google demotes a sponsor to a lower rung on its page if its response rate is too low, elevating a rival's ad for getting more clicks. This imposes a built-in pressure on businesses. They're even asked to revamp wording if less than 0.5% of viewers click on their ads. By contrast, many traditional banner ads get click rates of just 0.3%.
This could transform the $193 billion business of direct marketing. Junk mailers constantly work on narrowing the recipient list to the people most likely to respond and on jazzing up the envelopes to trick them into looking inside. Google ends the guesswork. People directly declare what interests them, and Google feeds them an appropriate ad. The ad's few pitch words are critical. For big corporate accounts like Dow Chemical, Google account executives continually recraft the message, like a haiku of commerce, aiming to maximize the click-through.
Google's long-term dream is to index all of the world's public information and make it searchable--everything from driver records to radio shows and films--and reap profits from it. This is scarier than it sounds. Google holds an archive of 800 million postings to Internet newsgroups, from alt.sex.bondage to alt.humanities.classics, most of which it bought on the cheap just before Dejanews.com went out of business in 2000.
It is a strange bazaar of information and a repository of embarrassment for people who were forthright (or shortsighted) enough to forgo anonymity in their postings. Google easily unearths the Web's first mention of Microsoft; and Sergey Brin's 1992 complaint about selling his car; and the musings of a married midwestern academic who posted a plea on alt.sex.fetish.tickling. Ours for the ages, unless he follows Google's somewhat obscure directions--located in the "Groups Help" section--on removing work from the archive. Even posts like that one trigger a precision-targeted ad: One offers "Discount 14-K Gold Anklets." Like much of the Web, Google also makes good money on porn.
While Google wants to own the world, Microsoft is going after Google. It now has 70 engineers working on search technology, and by some accounts it could triple that staff. Its new best friend is Overture, which already provides search services for Microsoft's MSN online service. Overture scientists frequently visit Microsoft in Redmond to plan next-generation features. Microsoft also could acquire a search company this year; one likely candidate would be San Francisco-based Looksmart (NasdaqNM:LOOK - News). Neither Overture, with a market capitalization of $669 million, nor Looksmart, at $328 million, would be more than a bagatelle for Microsoft, which has $38 billion in cash.
The Google guys profess to be unfazed. They have assiduously avoided the sins of Netscape, which belligerently jeered at Microsoft's efforts to build a Web browser. "Netscape mooned the giant," says one Google exec, noting Google welcomes Microsoft ads on its site. Plenty of other threats abound. Yahoo, despite investing in Google and paying for its service, in December paid $235 million in cash to acquire faded search firm Inktomi. Overture recently spent $177 million for the Web-search assets of Fast Search & Transfer and AltaVista, while Ask Jeeves (2002 revenues $74 million, net loss of $15 million) put up $3.8 million for Teoma. Even Google's engineers admit Fast and Teoma deliver results comparable to theirs.
Google has bought some prizes of its own, including personalization technology that "learns" what you are interested in based on previous searches; and a company called Blogger, which helps people set up their own Web-based diaries, or Web logs. More "blogs" mean more content, yielding more pages on which to run ads and more links to other pages. The more links, the better Google's results. Most recently Google scored a company called Applied Semantics, whose content-scanning techniques can be used to tailor ads not just based on the words a user searches, but also on the actual pages he reads on the Internet. That buy was a double score for Google--Applied Semantics had been selling those services to Overture. In the week following the purchase, Overture's shares fell about 30%.
The need to acquire more tech could add to the pressure for Google to go public, so it could use its stock as currency. Both Brin and Page are daunted by the prospect of baring Google's secret financials and losing focus in the drive to boost profits every quarter. "I fear we'll grow shortsighted and lose the wider potential applications of our company," says Brin. "The biggest thing we'd lose is the opportunity cost of what we could do if we didn't go public."
But Google's growing ranks want it, Wall Street bankers yearn for it and clues hint that all of them will get it. In the overcrowded office of Sheryl Sandberg, the 33-year-old Adwords chief, sits a crimson lava lamp given to her by investment bankers at Morgan Stanley. Very hip, very Google-geist. The former U.S. Treasury official says with a laugh, "They have high hopes for us."
Downstairs, past a Google grand piano and a few big plastic balls, Chief Executive Schmidt convenes a meeting of two dozen managers for a project they refer to as "Keeping Eric Out of Jail." They are altering Google's billing and accounting systems to comply with the new Sarbanes-Oxley Act--a law that applies to all public companies but no private ones. It may take until October to comply, but Schmidt's urgency is palpable.
Every Friday he holds a companywide meeting, preaching to a cocky flock. Along with Brin and Page he talks business, technology--and attitude. He reminds these whiz kids to count on nothing. Remember the Netscapes, he exhorts, the high-tech stars that gained fans, made paper millionaires of the early staff and then burned up in the heat of competition. Just about everybody, save Google's massing rivals, hopes they're listening.
Google "Reveals Index Secrets": Charts Indexing of Your Site Over Time
Jul 25, 2012 at 1:43pm ET by Vanessa Fox
Yesterday, Google webmaster tools launched Index Status (available under Health) that charts the number of indexed pages for your site over the last year. Total Indexed Count Google says that this count is accurate (unlike the site: search operator) and is post-canonicalization. In other words, if your site includes a lot of duplicate URLs (due [...]
Firefox 14 Now Encrypts Google Searches, But Search Terms Still Will "Leak" Out
Jul 17, 2012 at 3:43pm ET by Danny Sullivan
Firefox 14 has officially launched today, which means all Google searches are encrypted by default. However, due to a Google loophole, the encryption will not prevent things you search for from "leaking" out to Google's advertisers nor potentially showing up as search suggestions or in data reported to web sites through Google Webmaster Central. The Firefox [...]
Google matched content
The following search engines consist of just one or two Perl scripts, and are suitable for small to medium sites.
- Simple Search. A very small, elegant search engine. Easy to install, modify and customize. Does not use an index file, so the search is always up-to-date. Suitable for sites containing a few hundred files (on this site, Simple Search takes about 14 seconds to search 5MB of text in about 1000 documents). Matt Wright.
- ICE. An indexing search engine. Easy to install, very fast, and produces a compact index file (on this site, the index file is 12% of the size of the original files, and searches are nearly instantaneous). Does not search the titles of html documents, although this is easily fixed if you know Perl. Excellent installation documentation is provided by webreference.internet.com, although this has not been updated for Version 2. Christian Neuss.
- SWAT. Search With Authority Tool. An indexing search engine. Still in beta on 2 April 1998, so its capabilities are hard to evaluate. Requires extension modules from the Perl/CPAN archive. Chris Nite, ChrisCrawler.
- Xavatoria Search. An indexing search engine. Easy to install. Its unusually powerful search syntax is modelled on AltaVista's simple search, and includes phrases, wildcards, requires/excludes and so on. As with AltaVista, output of results is paged and includes a 2-line summary of each document. On the downside, is slower than ICE and produces a relatively large index file (on this site, the index file is 49% of the size of the original files and searches take less than 2 seconds). The Guide to the Web for Statisticians uses a much modified version of Xavatoria which reduces to the index file to about 33% of the original files. Fluid Dynamics.
The search engines below are written in C with Web interfaces in Perl. Installation is more involved than for the engines above, but indexing and searching is generally faster. These engines are typically server orientated. That is, they are designed for installation by Web server administrators, with individual users able to configure their own index files.
- Excite for Web Servers. Perhaps the slickest of the freely available search engines. Features a fuzzy search style, and the ability to find "more like this". For literal key word searches, Glimpse or SWISH for example may be more suitable. Index file is about 15% of the size of the original files. Comes ready compiled, and with very complete installation assistance. However you do need to have root privileges on your Web server to install it. The Byte Magazine site is a good example of Excite in use.
- Glimpse. A powerful indexing and query system. Returns not only document names but the lines of each document in which keywords were found. With WebGlimpse, provides search capabilities for a Web site. WebGlimpse automatically adds a search box to every html file in your site, and allows searches of the "neighbourhood" of each file. Index file can be very small or moderately large depending on the options you choose. The HIV InfoWeb is a nice example of Glimpse (not WebGlimpse) in action. Udi Manber, Sun Wu and Burra Gopal, University of Arizona.
- ht://Dig. A popular search engine developed at San Diego State University and used by a number of North American universities. Includes its own spider and can index files through the http server, allowing you to index pages produced dynamically by CGI programs. Like Glimpse, can display the context of successful keywords. Andrew Scherpbier.
- SWISH-E. Simple Web Indexing System for Humans - Enhanced. One of the most popular web site search engines. Originally by Kevin Hughes, EIT, now enhanced by a team at the Berkeley Digital Library SunSITE, University of California. Of the search engines in this section, this is the easiest to install if you want to index only your own files rather than an entire server. Well documentated, even with it's own mailing list. Can limit searches to specified html tags, such as meta tags, titles or headings. Claims the index file is 2-5% of the size of the original files, but on this site the index is 25% of the size of the original files.
- SWISH++. A new version of SWISH written in C++, still in beta version in March 1998. Testing by the author suggests that it will be substantially faster than any of the other search engines listed here. Current documentation is terse but adequate. Installation requires an up-to-date C++ compiler. Paul Lucas.
- Swish-Web. An alternative web interface for the SWISH search engine. Rod Clark, Small Hours.
- Webinator. A new indexing search engine, free for the first 10,000 documents. Already has a good number of high profile users. You need root privileges to install it. Thunderstone.
For high end commercial search engines capable of handling very large sites, see the reviews by the US Department of Education and Network Computing Magazine.
- Ultraseek Server. Seems to be the current winner amongst the commercial search engines. Easier to install and with more features than any of the search engines above. On this site Ultraseek writes an index file of about 11% of the size of the original files, and uses another 9MB of disk space for program and documentation files. Ultraseek can be seen in action on the NewsWorks, Sun Microsystems and CNN Interactive sites.
Appindex is a perl script designed to retrieve information from freshmeat's application index. It searches for a program (perl regexp accepted), retrieves available info on that program, then optionally launches a browser to view the homepage.
Appindex will now read the config and application index file from the user's homedir if it is available. Catagory searching is also implemented, along with a '-u' option that will update the local copy of the application index.
*** Inference Find! -- Server 46 -- not bad
* Infoseek ksh tutorial
Astalavista -- daily updated search engine monitoring hundreds sites with hack & crack stuff.
Mister Driver- search engine for device drivers.
News Hunt -- links to free newspaper archives and a searchable database of searchable newspapers.
***** Spider's Apprentice The -- a public service site that offers help on searching the Web. They also analyze and rate the major search engines.
Verity Internet Virtual Library Search
-- searchable index of documents of interest to those using
and developing the world-wide web and its related technologies
*** Sprockets and Cogs - Truly Targeted Technical Search Tool -- not impressive, but useful
Search the Web version 1.0
Perl script that performs a search on Altavista, Excite, Infoseek, Lycos, Yahoo, and Webcrawler.
Language: Perl Platform: Unix, Windows
View Product Homepage
Download Complete Source Code, 0.010M bytes
Click file name to view online:
index.html, 3152 bytes
top-search.htm, 727 bytes
search-logo.gif, 3742 bytes
search.gif, 1348 bytes
rlaj-search.gif, 2905 bytes
main-search.htm, 1250 bytes
index_bg.gif, 1280 bytes
web-search.cgi, 3308 bytes
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info|
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: March, 12, 2019