Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Bigger doesn't imply better. Bigger often is a sign of obesity, of lost control, of overcomplexity, of cancerous cells

Slightly Skeptical Notes on Search Engines and Google

News  "Everything in the Cloud" Utopia Recommended Links Papers Seven sisters Site Search Engine
Implementations
Search engines privacy
Privacy is Dead – Get Over It Pitfalls of Google as a Unix Information Search Engine Google Embedded Tracking and Hidden Redirects in Search Results Is Google evil ? Issues of security and trust in "cloud" env Humor Etc

For decades cultural critics have warned against the future where giant centralized organization (the government, or huge corporate ion) applies total spying over its citizens to create an effectively totalitarian society: it is such fears that have given rise to the adjective "Orwellian,", after George Orwell dystopia 1984. Those fears can now be called "Googlean"...

Like any large company Google is burdened by mediocre middle managers, egomaniac founders and shady connections including connections to NSA and other three letter agencies. With time Google focus in search became not on search quality, but of revenue. Workforce is mostly young and thus malleable (The Independent).

“Unfortunately, in spite of the common belief, I think the average level of Google engineers is mediocre. With a lot of arrogance, too. Everybody believes he (males dominate) is better than his neighbor.”

... ... ...

"If you look at Google products, you see tons of clutter, useless features, lack of simplicity/elegance, and unwarranted focus on technical complexity."

... ... ...

“I’d say the relentless daily mediocre thinking of middle management types who are completely focused on metrics to the exclusion of all other factors. They don't want to rock the boat, they don't know how to inspire their workforce, and they rely far too much on the Google name and reputation to do that for them.”

We all know more or less how Google works: Links act as votes, and the more votes a page has, the higher its PageRank. Plus unknown number of  human Google Web slaves manually rank the pages that have PageRank above certain level. We also know that by monopolizing search Google became the eye that is watching what you are doing on the Internet. And if somebody knows what searches I performed on Google he knows quite a bit about me. Storing searches that Google practice is a threat to privacy, threat of similar magnitude when somebody knows social books that I read (hello Amazon ;-), or what items I bought (hello Visa and Mastercard).

The links to the page are the key to this system, they along with the rank of "quoting" site determine relevance. This is similar to the way references in academic journals are valued: it's most important how many researchers refer the given paper (citation index).  Not a rocket science. Of course Google does not rank SOLELY based on links to a page, they use a combination of the number of links to the page, the text on the page, its position is search results ( previous PageRank ), etc just like every other search engine. In other words PageRank is evolving and now takes into account offer variables.

But the scale on which Google perform this is amazing. Thousands if not millions computers are involved, consuming tremendous amount of electricity. I suspect that Google caches most of the Web content.  And here quantity turns into quality.

Google's second role as a advertisement agency spoils efforts to make the best serach engine and permit it to do a decent job only in locating the authoritative source for any particular topic that is not popular or politically, socially or technically important. As soon as topic is important and being in the first dozen of popular Google search phrases can be monetized Google became a very weak search engine that is typically defeated by those who can put enough effort to subvert PageRank for their purposes.  And as click on the link can be monetized, Google as a company contains within itself the weapon for destruction of search results of its engine. The links that are the key to this system, no longer determine relevance of the page, they determine mainly the degree of the author to earn money for the topic. I think it's fairly obvious that for this reason Google is tampering with the basic PageRank framework, though we   don't know specific area in which it is tweaked to cut the most blatant abuse.

One can exploit this system by creating "fake" or "pointer" sites that serve only to drive up the rankings for another site. With good coordination among a sizable group of people, or a single person with lots of free time and some money Google can be fooled.  Such a practice is known as Google Bombing, and is not only possible but quite profitable. Google unleashed the Googe API, a SOAP interface that allows developers to query Google and retrieve results without having to use the normal HTML form interface. This unleashed a new forms of Google bombing.  As a result many Google search results has "junk" among top findings. Previously there were a lot of malware sites that got high in Google search result which made Google the top malware distribution mechanism in the world. Malware authors also regularly purchased "Google words" to refer searches to their sites.  Now Google seems start paying some attention to its role as a malware propagator ;-).

Still this page ranking algorithms allows Google to do more or less decent job of locating the authoritative source for any particular topic. It is not perfect, it's not the best in all cases. See  Pitfalls of Google as a Unix information Search Engine.

For example Bing often produces better set of findings for queues about Windows. Yandex has good search capabilities for LiveJournal. But it is well implemented (unlike Gmail ;-) and is frankly quite useful. While other search engines have their own strong point Google is dominant search engine by a wide margin. 

Another interesting feature of Google (probably created with close cooperation with NSA) is That it can attribute quite with amazing precision. In this particular business it has no equals. 

There has been some debate about the degree to which weblogs affect Google's rankings due to its "lemmings" effect, facilitating large numbers of people simultaneously link to the latest and greatest trends, fads, memes, or news bites on the internet. I do not know the answer, but it looks it does distort Google results considerably.

In many ways Google became Microsoft of Web search and represents a danger for firms competing in the same space due to its size:

"The Thomson Gale publishing group has put together a comprehensive review of Google Scholar, and they find it highly lacking compared with similar offerings from Highwire Press, Scopus, and The Web of Science. Still Google's overhyped offerings can drive these superior services out of the market?"

Note:


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Nov 12, 2013] Anonymous employees reveal the worst thing about working for Google

The Independent

The perks of being a Google employee are legendary. If you manage to fight your way through the notoriously obtuse interview questions then you're rewarded with corporate nirvana: a sprawling Promised Land of deconstructed office space and free amenities.

Despite this Google has the fourth highest employee turnover of any Fortune 500 company (Amazon is number two) so why can't it hang on to its employees?*

In a thread on Quora we can find at least some answers. The question asks "What's the worst part about working at Google?" and self-proclaimed Googlers ('Xooglers' as they're known - though this lot are mostly anonymous) have populated the thread to carp about life at the search giant.

We've hand-picked some of the choicest extracts below, or you can check out the thread in full here.

*We're being a little unfair here: high turnover isn't necessarily a bad thing and is at least partly a reflection of a fast-moving industry full of very, very employable people.

Google moved away from its start-up roots a long time ago

"Google was not a start-up environment by the time I left. The same office politics. It was easy to get promoted if you worked on the right projects and projected your work in the right way."

And because it's so well-tuned, there are quite a few boring jobs

"Google is an incredible machine that prints money thanks to AdWords. Unless you are an amazingly talented engineer who gets to create something new, chances are you're simply a guy/girl with an oil can greasing the cogs of that machine."

Basically, everyone's over-qualified because everyone's amazing

"I worked at one of the larger non-MV [Mountain View, Google's HQ] campuses, and the only intellectual stimulation I encountered in my time there was the interview process."

"The other thing is that its very hard to have *huge* impact at Google. Most of the large exciting problems were already solved, so you probably will end up working in the smallest meaningless tiny feature nobody cares about."

And that means people can be arrogant too

"Unfortunately, in spite of the common belief, I think the average level of Google engineers is mediocre. With a lot of arrogance, too. Everybody believes he (males dominate) is better than his neighbor."

The dominating engineering culture can hurt productivity

"There is not enough focus on product and visual design. This has led to many aborted/semi-successful products, like Wave, Google Video, Buzz, Dodgeball, Orkut, Knol, and Friend Connect. There is probably too much focus on pure engineering"

"If you look at Google products, you see tons of clutter, useless features, lack of simplicity/elegance, and unwarranted focus on technical complexity."

And this can also make for bad managers

"I'd say the relentless daily mediocre thinking of middle management types who are completely focused on metrics to the exclusion of all other factors. They don't want to rock the boat, they don't know how to inspire their workforce, and they rely far too much on the Google name and reputation to do that for them."

With all those amenities, there's actually less private space

"It's not uncommon to see 3-4 employees in a single cube, or several managers sharing an office. With all the open areas for food, games, TV, tech talks, etc, it can be surprisingly hard to find a quiet, private place to think."

And discipline can also be a problem

"There was no discipline in the offices. People chatted about random things on the emailing lists, often insulting each other. I once emailed a very big team asking a genuine question (as an external customer of their product). The response was sarcastic. If you try to do that at a company like Amazon, you will be immediately reprimanded (or so I think)."

Basically it seems that what makes Google so appealing can also be a problem for some

"I've always said that Google is hands-down the best corporate in the world. You get to work with incredible products, inspiring people, enjoy amazing perks, have unforgettable experiences, and get paid very well. It's all so incredibly easy. And that's the best part and the worst part about working at Google."

Inside Google's Anti-Malware Operation

Slashdot

A Google malware researcher gave a rare peek inside the company's massive anti-malware and anti-phishing efforts at the SecTor conference here, and the data the company has gathered shows that the attackers who make it their business to infect sites and exploit users are adapting their tactics very quickly and creatively to combat the efforts of Google and others. While Google is still a relative newcomer to the public security scene, the company has deployed a number of services and technologies recently that are designed to identify phishing sites, as well as sites serving malware, and prevent users from finding them. The tools include the Google SafeBrowsing API and a handful of services that are available to help site owners and network administrators find and eliminate malware and the attendant bugs from their sites. Fabrice Jaubert, of Google's anti-malware team, said the company has had good luck identifying and weeding out malicious sites of late. Still, as much as 1.5 percent of all search result pages on Google include links to at least one malware-distribution site, he said."

Google Malware Runs Rampant on the Web

Circuit Diagram Wiring

The research, titled "The Ghost in the Browser: Analysis of Web-Based Malware," reported that an adversary who can successfully compromise a victim's browser can gain access to banking and medical records, authorization passwords, and personal communication records.

Google said that in its analysis of several billion URLs and an in-depth look at 4.5 million Web sites over a 12-month period, it discovered 450,000 sites were successfully launching drive-by-downloads of malware code.

Graham Cluley, a senior analyst with security firm Sophos, said researchers at his firm agree with Google's findings. "Everybody needs to learn to protect themselves better from these kind of attacks," he said. "More and more businesses are recognizing the need to scan their Web gateway just as they do their e-mail gateway to keep abreast of emerging threats."

Sitting Ducks?

Google also concluded that average computer users have no way to protect against these threats. "Their browser can be compromised just by visiting a page and become the vehicle for installing multitudes of malware on their systems," the nine-page report announced.

Google discovered that some of the most common malware sites were those that contained advertising. Sites that offer up user-generated content, such as blogs and forums, and those that offer third-party widgets, such as free traffic counters, are also commonly used by attackers looking to install code that makes victims of visitors.

As many antivirus engines rely on creating signatures from malware samples, adversaries can prevent detection by changing their code more frequently than antivirus engines are updated with new signatures, according to the Google study.

Threat Clarified

Although Cluley agreed with Google's research, he said it's important to clarify the threat. Some news headlines, he noted, have declared that Google's research revealed one in 10 Web sites are infected. But, he added, that's not accurate. The one-to-10 ratio is only true of the pages that Google already decided were worthy of further investigation, he clarified.

In its own research, Sophos discovers an average of 8,193 new malicious Web pages each day. What's most worrying, Cluley argued, is that 70 percent of these infected Web pages are on legitimate Web sites. In other words, the offending pages are often on sites that have been hacked or had malware planted on them without the owner of the Web page necessarily knowing.

"The Web is the new battleground between the good guys and the bad guys - if you have not already defended yourself then there is no time to lose," Cluley said. "Defense can come in the form of multilayered protection, such as desktop, e-mail, and Web gateways, but should be combined with security updates for your browsers and client firewalls."

Source : http://news.yahoo.com/s/nf/20070521/bs_nf/52404;_ylt=AvT1YODt6VBMRgslilBE5v8jtBAF

[Jan 29, 2007] The Top 100 Alternative Search Engines by Charles Knight

January 29, 2007 | 104 comments

Written by Charles S. Knight, SEO, and edited by Richard MacManus. The Top 100 is listed at the end of the analysis.

Ask anyone which search engine they use to find information on the Internet and they will almost certainly reply: "Google." Look a little further, and market research shows that people actually use four main search engines for 99.99% of their searches: Google, Yahoo!, MSN, and Ask.com (in that order). But in my travels as a Search Engine Optimizer (SEO), I have discovered that in that .01% lies a vast multitude of the most innovative and creative search engines you have never seen. So many, in fact, that I have had to limit my list of the very best ones to a mere 100.

But it's not just the sheer number of them that makes them worthy of attention; each one of these search engines has that standard "About Us" link at the bottom of the homepage. I call it the "why we're better than Google" page. And after reading dozens and dozens of these pages, I have come to the conclusion that, taken as a whole, they are right!

The Search Homepage

In order to address their claims systematically, it helps to group them into categories and then compare them to their Google counterparts. For example, let's look at the first thing that almost everyone sees when they go to search the Internet - the ubiquitous Google homepage. That famously sparse, clean sheet of paper with the colorful Google logo is the most popular Web page in the entire World Wide Web. For millions and millions of Internet users, that Spartan white page IS the Internet.

Google has successfully made their site the front door through which everyone passes in order to access the Internet. But staring at an almost blank sheet of paper has become, well, boring. Take Ms. Dewey for example. While some may object to her sultry demeanor, it's pretty hard to deny that interfacing with her is far more visually appealing than with an inert white screen.

A second example comes from Simply Google. Instead of squeezing through the keyhole in order to reach Google's 37 search options, Simply Google places all of those choices and many, many more all on the very first page; neatly arranged in columns.

Artificial Intelligence

A second arena is sometimes referred to as Natural Language Processing (NLP), or Artificial Intelligence (AI). It is the desire we all have of wanting to ask a search engine questions in everyday sentences, and receive a human-like answer (remember "Good Morning, HAL"?). Many of us remember Ask Jeeves, the famous butler, which was an early attempt in this direction - that unfortunately failed.

Google's approach, Google Answers, was to enlist a cadre of "experts." The concept was that you would pose a question to one of these experts, negotiate a price for an answer, and then pay up when it was found and delivered. It was such a failure, Google had to cancel the whole program. Enter ChaCha. With ChaCha, you can pose any question that you wish, click on the "Search With Guide" button, and a ChaCha Guide appears in a Chat box and dialogues with you until you find what you are looking for. There's no time limit, and no fee.

Clustering Engines

Perhaps Google's most glaring and egregious shortcoming is their insistence on displaying the outcome of a search in an impossibly long, one-dimensional list of results. We all intuitively know that the World Wide Web is just that, a three dimensional (or "3-D") web of interconnected Web pages. Several search engines, known as clustering engines, routinely present their search results on a two-dimensional map that one can navigate through in search of the best answer. Search engines like KartOO and Quintura are excellent examples.

Recommendation Search Engines

Another promising category is the recommendation search engines. While Google essentially helps you to find what you already know (you just can't find it), recommendation engines show you a whole world of things that you didn't even know existed. Check out What to Rent, Music Map, or the stunning Live Plasma display. When you input a favorite movie, book, or artist, they recommend to you a world of titles or similar artists that you may never have heard of, but would most likely enjoy.

Metasearch Engines

Next we come to the metasearch engines. When you perform a search on Google, the results that you get are all from, well, Google! But metasearch engines have been around for years. They allow you to search not only Google, but a variety of other search engines too - in one fell swoop. There are many search engines that can do this, Dogpile, for instance, searches all of the "big four" mentioned above (Google, Yahoo!, MSN, and Ask) simultaneously. You could also try Zuula or PlanetSearch - which plows through 16 search engines at a time for you. A very interesting site to watch is GoshMe. Instead of searching an incredible number of Web pages, like conventional search engines, GoshMe searches for search engines (or databases) that each tap into an incredible number of Web pages. As I perceive it, GoshMe is a meta-metasearch engine (still in Beta)!

Other Alt Search Engines

And so it goes, feature after feature after feature. TheFind is a better shopping experience than Google's Froogle, IMHO. Like is a true visual search engine, unlike Google's Images, which just matches your keywords into images that have been tagged with those same keywords. Coming soon is Mobot (see the Demo at www.mobot.com). Google Mobile does let you perform a search on your mobile phone, but check out the Slifter Mobile Demo when you get a chance!

Finally, almost prophetically, Google is silent. Silent! At least Speeglebot talks to you, and Nayio listens! But of course, why should Google worry about these upstarts (all 100 of them)? Aren't they just like flies buzzing around an elephant? Can't Google just ignore them, as their share of the search market continues to creep upwards towards 100%, or perhaps just buy them? Perhaps.

The Last Question

Issac Asimov, the preeminent science fiction writer of our time, once said that his favorite story, by far, was The Last Question. The question, for those who have not read it, is "Can Entropy Be Reversed?" That is, can the ultimate running down of all things, the burning out of all stars (or their collapse) be stopped - or is it hopelessly inevitable?

The question for this age, I submit, is… "Can Google Be Defeated"? Or is Google's mission "to organize the world's information and make it universally accessible and useful" a fait accompli?

Perhaps the place to start is by reading (or re-reading) Asimov's "The Last Question." I won't give it away, but it does suggest The Answer….

Charles Knight is the Principal of Charles Knight SEO, a Search Engine Optimization company in Charlottesville, VA.

The Top 100

For an Excel spreadsheet of the entire Top 100 Alternative Search Engines, go to: http://charlesknightseo.com/list.aspx or email the author at Charles@CharlesKnightSEO.com.

This list is in alphabetical order. Feel free to share this list, but please retain Charles' name and email.

Update: Thanks Sanjeev Narang for providing a hyperlinked version of the list.

Update, 5 February 2007: Charles Knight has left a detailed comment (#94) in response to all the great feedback in the comments to this post. He also notes:

"...while it looks like a very simple, almost crude list of 100 names, it has taken countless hours to try and do it properly and fairly. The list will be updated all year long, and the Top 100 can only get better and better until the Best of 2007 are announced on 12/31/07."

Posted by: Yakov | January 29, 2007 6:01 AM

Interesting list. I guess this list is the top 100 AFTER Google, Ask, MSN and Yahoo.
I would have to list Vivisimo above some of the others you have listed here.
I too don't have the time to go through all of them but I'd like to know if any of your top 100 are vertically focussed. I've been following one vertical in particular, health, and I don't see any of the ones I found to be useful.

Slashdot News for nerds, stuff that matters

"The Thomson Gale publishing group has put together a comprehensive review of Google Scholar, and they find it highly lacking compared with similar offerings from Highwire Press, Scopus, and The Web of Science. Will Google's overhyped offerings drive these superior services out of the market?"

Harvest A Distributed Search System

Noted-L (1 of 3) [Noted] Google Adds Wildcards to Phrases

http://www.researchbuzz.com/news/2002/jan03jan0902.html#googleadds

--<cut>--
If you're a regular ResearchBuzz reader you already know how to
search for phrases in Google using "wildcard words" -- you just use
the word "the," which Google always considers a stopword. So, search
for "three the mice" in Google and you'll find three green mice,
three blue mice, three blind mice, etc.

Google has made using "the" unnecessary by adding a word-sized
asterisk to its search syntax. What is a word-sized asterisk? It's
an asterisk you can use in place of a word; "three * mice" will find
three green mice, three blind mice, etc. This asterisk CANNOT be
used for part of a word. If you try to search for "three bl* mice"
you'll get no results. Thanks to Gary Price for this tip.
--<cut>--

--
J C Lawrence

HOWTO search the WEB

Isearch
Fortunato - May 20th 1998, 07:05 EST

Homepage: http://www.etymon.com/Isearch/

Isearch is software for indexing and searching text documents. It supports full text and field based search, relevance ranked results, Boolean queries, and heterogeneous databases. Isearch can parse many kinds of documents "out of the box," including HTML, mail folders, list digests, SGML-style tagged data, and USMARC. It can be extended to support other formats by creating descendant classes in C++ that define the document structure. It is pretty easy to customize in this way, provided that you know some C++ (and you will need to ftp the source code). A CGI interface is also included for web based searching.
ftp://ftp.redhat.com/pub/contrib/hurricane/i386/Isearch-1.41-1.i386.rpm (1 hit)

[July 8, 1999] Story Search Stinks! But You Don't Have to Take it Anymore

Anyone who's ever used a search engine knows about broken links. Lousy interfaces. 10,000 returns with no meaningful results. Missing pages.

Essentially, a search engine is a type of software that creates indexes of databases or Web sites based on content. When you submit a search term, it goes out and "reads" its indexes and returns applicable results. Excite, HotBot and Lycos are popular examples.

A recent report by Nielsen/NetRatings Inc. shows search engine popularity is slipping. Click for more. And a study conducted at the NEC Research Institute confirms search engines can't keep up with the Web's rapid growth. Click for more. NECRI discovered:

METASEARCHES
Metasearch sites are a timesaver because they query multiple engines simultaneously. Examples include:

DIRECTORIES
These hubs weed through the information glut with topic guides featuring recommended sites. They'll also recommend sites based on your search terms. Theoretically, Yahoo is a directory. But I prefer sites that put an emphasis on quality rather than quantity. Examples:

TOOLS
The right utility does wonders for streamlining searching. Here are three of my favorites, and you'll find more at the ZDNet Software Library. Click for more.

Alexa got my Natural Born Killer's nod. Click for more. This free browser add-on acts as helpful backseat driver when you surf. Offering stats on sites, including ratings by Alexa's thousands of users. Click for more.

BullsEye lets you search and store info. The searchers are highly customizable. For instance, it will search for industry-specific business news. Click for more.

Copernic has an easy learning curve for the novice (or frustrated searcher). It searches multiple sites (as narrowly or widely as you'd like), validates links and features good custom sorting and multi-threading options. Click for more.

kuro5hin.org Comments Google and Recursion

The "Google Boxes" you mention won't allow you to increase your site's ranking, unless it's already up there in the top ten. Very different from Google bombing, which can bring an unknown site to the top.

As you pointed out and can be read about originally here, each site A which links to a site B increases B's ranking by a small amount, but doesn't affect A directly.

A's own ranking can only be affected positively if there are loops of the form A->B->some other sites->A, and the smaller the loop, the higher the effect on A. That's because all sites within the loop are affected, with decreasing benefits. The most affected site will be B, followed by B's successor, followed by B's successor's successor, etc, up to A, who gets a very small boost if the loop is large.

The total increase in A's ranking is a result of adding the small increases for all possible loops.

Now suppose you're an unknown site. Nobody links to you, but you decide to link to the top ten. Since these don't link back to you, or in a very very roundabout way, you'll get zero benefit from your Google Box.

Now suppose you're a top ten site, and half the top ten sites link back to you directly. If you link to each of them, you'll get a relatively large boost to your own ranking back from each of them, hence a noticeable increase. But if you had already linked to all of them before, you won't get any benefit, since I believe Google counts multiple links to the same page as one.

You'll note that the bit which allows you to increase your site's ranking requires lots of links, just like in Google bombing. If you've read so far, I think you'll agree that Google bombing is the better technique (which, for the record, I don't condone).

I think it's fairly obvious that Google are tampering with the basic PageRank framework, though we (I?) don't know for sure. For example they allow you to search by language, or with more or less pr0n. The fact that they can identify/classify documents this way means they have a framework in place.

So it should be relatively easy for them to include document weights if they wanted to which take the classifications into account, as rusty and jsled already proposed. Simple document weighting is really easy to do, but the trick of course is to end up with useful weights.

Having said all this, your particular example can be explained without resorting to advanced PageRank modifications.

Other people have reported similar phenomena (too lazy to find the links), with the following explanation: sometimes, people publish web server access logs (maybe inadvertently) which the search engine crawlers find. On those logs, there's a whole lot of information, including the IP of the client's machine, and perhaps the referral address (ie the last web address visited before the server was queried). Google may use these addresses as if they were a direct link to your site. This might have occurred if you have ever fired up your browser, browsed your own site, and then browsed some other site in the same session.

All Eyes on Google

Google was launched less than four years ago by two graduate students in computer science: one, a Russian émigré named Sergey Brin, now 29; the other, a Michigan-reared engineer named Larry E. Page, now all of 30. As a gateway to 3 billion Web pages, Google is a strangely unadorned site: 37 words, four tabs and a blank space where you type in a query of up to 10 words. Google's over 10,000 networked Google computers crawl through an index to those 3 billion pages, rank them with an equation that includes 500 million variables and spit out up to a few thousand listings. The ranking takes 500 milliseconds; the computers can handle a peak rate equal to 7 million queries per hour.

But Google has become much more than merely a search service. It is a daily tool and main entry point for millions of users, stealing the spotlight from the browser (Explorer or whatever) and Internet portals like Yahoo. It is a labor of love for programmers, who have built applications off of Google and posted them like trophies on the Web. One does a "smackdown," comparing the Internet ubiquity of two words ("love" beats "money," but not by much); another creates poems (see boxes).

For Wall Street and Silicon Valley, Google is the great bright hope for an initial public offering that might revive moribund tech stocks. And Google has become its own meme, the stuff of New Yorker cartoons and a brand, like Kleenex and Band-Aid, that is in danger of becoming a part of the English language. You don't search for something on the Web anymore. You Google it.

Google now can be queried in 36 languages, with more to come. At the posh Hotel Bel Air, in Los Angeles, manager Lisa Hagen makes a point of Googling all guests before arrival, searching out better ways to spoil them. "If we find out they like to jog early in the day, we make sure they get a room with morning sun," she says. In Boston, Mark Kini manages a small limousine service that spends 80% of its ad budget on Google and other search sites. Says he: "It's how we survive the recession." In Westport, Conn. consultant Elena Amboyan's kids use Google daily; even when they research something at the library, they say they're Googling it.

It is all much more than Brin and Page ever had in mind when they started. "Sure, I'm surprised by the success," says Brin, unassuming, rumpled and wiry, his sneakers scuffing the upholstery of a conference-room chair. Users love Google, he says, because they find things there when they are desperate to know an answer. Keep offering better results and you hold their loyalty forever--and sell them stuff. Page adds that Google has become "like a person to them, helping them and giving them intelligence any hour of the day."

The passion and success igniting Google, and its emergence as a new interface for the Internet, have made it a rich, fat target for rivals. Yahoo (NasdaqNM:YHOO - News) is taking aim. So is the biggest search outfit, Overture (NasdaqNM:OVER - News), a little-known billion-dollar vendor that provides unbranded search services for other Web sites and has sued Google, alleging patent infringement. A gaggle of some 200 Web sites in China is reportedly going after Google, too.

And now Google faces the most lethal threat of all: Microsoft (NasdaqNM:MSFT - News), aroused, is taking aim at the popular site. This bears an eerie resemblance to the rise--and calamitous fall--of Netscape, the first commercially successful Web browser.

Will Google be the next victim of a Windows that swallows everything? To help ensure a future, Brin and Page brought in a grown-up as chief executive, Valley veteran Eric Schmidt, 48. Fittingly, Schmidt had abundant experience struggling against Microsoft in his two previous jobs: He was chief technology officer at Sun Microsystems (NasdaqNM:SUNW - News), then chief executive of Novell (NasdaqNM:NOVL - News), two companies that thought, wrongly, they had Microsoft licked. Google's founders credit Schmidt with successfully managing their company's most intense period of growth.

To survive and succeed will require lots of talent, lots of acquisitions and lots more money. More important, Google will need to quell the hubris that is much in abundance at the jubilant company these days. To be at Google is to bask in your own public relations. The hallways of the company's four buildings in Mountain View, Calif. are decorated with articles from around the world praising the company. One current job posting includes duties as Google's company historian. Over 70 of the 800 employees have Ph.D.s. Google's head of engineering admits his big-brained staff is in awe of itself; he hopes the simplicity of the Google page masks that from the outside world.

In some ways Google feels like the giddy dot-coms of the stock-market bubble, circa 1999. Informal to a fault, Google offices are littered with party-colored lava lamps, bins of free Coke and candy and giant plastic balls that invoke Google's multicolored logo. The cafeteria serves free lunch to the workaholic ranks (and dinners, too; there's lots of code to write). When pizza gets delivered at one o'clock in the morning, plenty of people are on hand to devour it. Every day a thousand more résumés arrive from people hoping to join this work party.

But the dot-com parallels end when you look at the finances. The dot-bombs burned through tons of other people's money. Google makes a pile of cash on its own. After it went live in September 1999--six months before the Internet bubble finally popped--Google took in perhaps $25 million in 2000. Then it leaped fourfold to approach $100 million in 2001 and tripled to $300 million last year. Its gross could more than double this year to $700 million, estimates Safa Rashtchy of U.S. Bancorp Piper Jaffray.

Google, privately held--and determinedly so, for now--won't talk numbers, but it does brag that it just logged its ninth consecutive profitable quarter.Its revenue flows include ads (the bulk); search services for Yahoo, America Online and other sites (perhaps $100 million there); and custom-tailored, bright yellow servers for corporate accounts.

"Cheesy as it may sound," says cofounder Brin about the company's early days, "we never thought in terms of revenue streams." Now he must, for the next year or two could determine whether Google delivers on the high hopes it inspires in so many quarters or instead falters, glorying in its early success while others plot its doom.

Google traces back to 1995, when Sergey Brin and Larry Page, whose fathers taught college math, met at Stanford. The sons saw search as an interesting problem in organizing very large datasets.

At the time, users typed in a few words and got a list of thousands of Web sites using those words, but most of the results were irrelevant. Brin and Page quelled users' frustrations by adding order to this randomness. They judged a listed site's prominence by how many other Web sites valued it enough to have links to it. They gave sites a resulting "Page rank" (for Larry, not Web pages). This cliquey if democratic approach was later augmented by other algorithms that weight sites by other variables--news sites get a higher ranking than a 16-year-old's personal Web log.

The two grad students soon found their results were a step above any other kind of search. They had dubbed this system Back Rub, after the "back links" that pointed to a site. They adopted the name Google in early 1997, in a tribute to scale, a play on the number known as a googol--a one followed by a hundred zeros. The universe does not contain a googol atoms. The denizens of the company headquarters breezily refer to it as the Googleplex, that being the word for the unimaginably large number defined as a one followed by a googol of zeros.

Brin and Page introduced Google to the world in a paper they presented at the World Wide Web Conference in April 1998. Naively, they were downright hostile to advertising, calling it "insidious … because it is not clear who 'deserves' to be there, and who is willing to pay money to be listed." A few hundred million in revenues later, Brin has changed his mind. On a Google results page, he says, "There are eight spots for ads and ten search results. It's a lot of room for diversity."

Soon after, the pair began trying to sell their technology to Web sites, including Infoseek, Excite and Yahoo. They found no takers; one chief executive told them that if his site could search only 80% as well as everyone else's, that was okay by him. "That company is now out of business," Page says. Then their faculty adviser invited them to a breakfast with Sun Microsystems cofounder Andreas Bechtolsheim on the Stanford campus. Midway through the demo, Bechtolsheim stopped them and wrote a check for $100,000 to Google Inc.

This presented a problem, as Google didn't yet have a bank account. There wasn't even a "Google Inc."--they hadn't yet decided to form a company. The check sat in a drawer for several weeks, and then they got serious.

By June 1999 Google had raised almost $30 million from venture firms Sequoia Capital and Kleiner Perkins Caufield & Byers, plus Stanford and individual investors. Three months later the Google site officially blasted off. It could scan 30 million Web pages. Today it culls 100 times as many, and still taps only half the Internet; the rest lies behind corporate firewalls or in isolated islands unlinked to anything else.

As Google began to thrive, the Web world was crashing, and this, too, proved lucky for the pair of founders. As dot-coms collapsed, Google took over cheap office space, barely used Aeron chairs, dozens of servers and platoons of out-of-work programmers. By mid-2001 Google was profitable, employed several hundred people and was seeing traffic grow 20% every month. Thriving despite the surrounding downturn, Google went shopping for a seasoned chief executive. "My job was to impose a little order," Schmidt says now. "I made it clear that I wasn't coming in to get rid of the founders." Sergey Brin gave up his chairman mantle and assumed the title of president of technology; Page, who had been chief executive, is product president.

While the two techies concentrated on improving their search formulas, Schmidt focused more on building a better business model. Google had run ads with its search results for a while, but on a fixed-fee basis. Its main rival, Overture, publicly held and with $668 million in sales last year (it projects $1 billion in revenues this year), had already gone a step further. It exacted higher fees from advertisers by selling them rights to given keywords so their ads pop up first when those words are entered in a query. Sponsors paid on a cost-per-click basis instead of the usual cost-per-thousand-visitors.

At one point in 2001, Google officials even met with Overture to compare notes, Overture officials say. In December 2001 Google started a similar test on its Usenet section, unveiling a service called Adwords. The response was so enthusiastic that, by February 2002, Adwords had been extended to all Google listings. It grew to 100,000 bidders in ten months, and thousands more advertisers are still signing up. Total Web advertising fell about 5% last year, to $6.5 billion, while search ads almost tripled to $1.4 billion and could hit $7 billion in five years, says Piper Jaffray's Rashtchy. (Google itself advertises very little, instead relying on word of mouth.)

"Some companies have purchased thousands of keywords, and they use them to test multiple products against multiple words," says Sheryl Sandberg, director of the wildly successful Adwords program. Noting most all Adwords bidders are U.S.-based, while half of Google searches are by users overseas, Sandberg sees huge growth in foreign markets. "The monetization should follow. This is a global bid," she says. Ads are sold in 11 languages.

Schmidt calls the success of Adwords "a total accident--when we went off fixed pricing, my only directive was В'Just don't let revenues drop.'" His foes at Overture allege instead patent infringement, suing Google in April of last year; one month later America Online (NYSE:AOL - News) dropped Overture in favor of Google Adwords. The case is likely to drag on for a long time.

In Adwords, businesses use an auction system on the Google site to bid for the most popularly searched words and phrases. Google gets paid every time someone clicks on the ad itself. Bids start at 5 cents per click but can go to $15 or more for high-end products like helicopter parts. Critically, Google demotes a sponsor to a lower rung on its page if its response rate is too low, elevating a rival's ad for getting more clicks. This imposes a built-in pressure on businesses. They're even asked to revamp wording if less than 0.5% of viewers click on their ads. By contrast, many traditional banner ads get click rates of just 0.3%.

This could transform the $193 billion business of direct marketing. Junk mailers constantly work on narrowing the recipient list to the people most likely to respond and on jazzing up the envelopes to trick them into looking inside. Google ends the guesswork. People directly declare what interests them, and Google feeds them an appropriate ad. The ad's few pitch words are critical. For big corporate accounts like Dow Chemical, Google account executives continually recraft the message, like a haiku of commerce, aiming to maximize the click-through.

Google's long-term dream is to index all of the world's public information and make it searchable--everything from driver records to radio shows and films--and reap profits from it. This is scarier than it sounds. Google holds an archive of 800 million postings to Internet newsgroups, from alt.sex.bondage to alt.humanities.classics, most of which it bought on the cheap just before Dejanews.com went out of business in 2000.

It is a strange bazaar of information and a repository of embarrassment for people who were forthright (or shortsighted) enough to forgo anonymity in their postings. Google easily unearths the Web's first mention of Microsoft; and Sergey Brin's 1992 complaint about selling his car; and the musings of a married midwestern academic who posted a plea on alt.sex.fetish.tickling. Ours for the ages, unless he follows Google's somewhat obscure directions--located in the "Groups Help" section--on removing work from the archive. Even posts like that one trigger a precision-targeted ad: One offers "Discount 14-K Gold Anklets." Like much of the Web, Google also makes good money on porn.

While Google wants to own the world, Microsoft is going after Google. It now has 70 engineers working on search technology, and by some accounts it could triple that staff. Its new best friend is Overture, which already provides search services for Microsoft's MSN online service. Overture scientists frequently visit Microsoft in Redmond to plan next-generation features. Microsoft also could acquire a search company this year; one likely candidate would be San Francisco-based Looksmart (NasdaqNM:LOOK - News). Neither Overture, with a market capitalization of $669 million, nor Looksmart, at $328 million, would be more than a bagatelle for Microsoft, which has $38 billion in cash.

The Google guys profess to be unfazed. They have assiduously avoided the sins of Netscape, which belligerently jeered at Microsoft's efforts to build a Web browser. "Netscape mooned the giant," says one Google exec, noting Google welcomes Microsoft ads on its site. Plenty of other threats abound. Yahoo, despite investing in Google and paying for its service, in December paid $235 million in cash to acquire faded search firm Inktomi. Overture recently spent $177 million for the Web-search assets of Fast Search & Transfer and AltaVista, while Ask Jeeves (2002 revenues $74 million, net loss of $15 million) put up $3.8 million for Teoma. Even Google's engineers admit Fast and Teoma deliver results comparable to theirs.

Google has bought some prizes of its own, including personalization technology that "learns" what you are interested in based on previous searches; and a company called Blogger, which helps people set up their own Web-based diaries, or Web logs. More "blogs" mean more content, yielding more pages on which to run ads and more links to other pages. The more links, the better Google's results. Most recently Google scored a company called Applied Semantics, whose content-scanning techniques can be used to tailor ads not just based on the words a user searches, but also on the actual pages he reads on the Internet. That buy was a double score for Google--Applied Semantics had been selling those services to Overture. In the week following the purchase, Overture's shares fell about 30%.

The need to acquire more tech could add to the pressure for Google to go public, so it could use its stock as currency. Both Brin and Page are daunted by the prospect of baring Google's secret financials and losing focus in the drive to boost profits every quarter. "I fear we'll grow shortsighted and lose the wider potential applications of our company," says Brin. "The biggest thing we'd lose is the opportunity cost of what we could do if we didn't go public."

But Google's growing ranks want it, Wall Street bankers yearn for it and clues hint that all of them will get it. In the overcrowded office of Sheryl Sandberg, the 33-year-old Adwords chief, sits a crimson lava lamp given to her by investment bankers at Morgan Stanley. Very hip, very Google-geist. The former U.S. Treasury official says with a laugh, "They have high hopes for us."

Downstairs, past a Google grand piano and a few big plastic balls, Chief Executive Schmidt convenes a meeting of two dozen managers for a project they refer to as "Keeping Eric Out of Jail." They are altering Google's billing and accounting systems to comply with the new Sarbanes-Oxley Act--a law that applies to all public companies but no private ones. It may take until October to comply, but Schmidt's urgency is palpable.

Every Friday he holds a companywide meeting, preaching to a cocky flock. Along with Brin and Page he talks business, technology--and attitude. He reminds these whiz kids to count on nothing. Remember the Netscapes, he exhorts, the high-tech stars that gained fans, made paper millionaires of the early staff and then burned up in the heat of competition. Just about everybody, save Google's massing rivals, hopes they're listening.

Google "Reveals Index Secrets": Charts Indexing of Your Site Over Time

Jul 25, 2012 at 1:43pm ET by Vanessa Fox

Yesterday, Google webmaster tools launched Index Status (available under Health) that charts the number of indexed pages for your site over the last year. Total Indexed Count Google says that this count is accurate (unlike the site: search operator) and is post-canonicalization. In other words, if your site includes a lot of duplicate URLs (due [...]

Firefox 14 Now Encrypts Google Searches, But Search Terms Still Will "Leak" Out

Jul 17, 2012 at 3:43pm ET by Danny Sullivan

Firefox 14 has officially launched today, which means all Google searches are encrypted by default. However, due to a Google loophole, the encryption will not prevent things you search for from "leaking" out to Google's advertisers nor potentially showing up as search suggestions or in data reported to web sites through Google Webmaster Central. The Firefox [...]

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Top articles

Sites

Seven Sisters

Papers

Sites

Site Search Engines

The following search engines consist of just one or two Perl scripts, and are suitable for small to medium sites.

The search engines below are written in C with Web interfaces in Perl. Installation is more involved than for the engines above, but indexing and searching is generally faster. These engines are typically server orientated. That is, they are designed for installation by Web server administrators, with individual users able to configure their own index files.

For high end commercial search engines capable of handling very large sites, see the reviews by the US Department of Education and Network Computing Magazine.

appindex 0.04

Appindex is a perl script designed to retrieve information from freshmeat's application index. It searches for a program (perl regexp accepted), retrieves available info on that program, then optionally launches a browser to view the homepage.

Appindex will now read the config and application index file from the user's homedir if it is available. Catagory searching is also implemented, along with a '-u' option that will update the local copy of the application index.

whee @ 12/24/98 - 04:41 EST

*** Inference Find! -- Server 46 -- not bad

* Infoseek ksh tutorial

Astalavista -- daily updated search engine monitoring hundreds sites with hack & crack stuff.

Mister Driver- search engine for device drivers.

News Hunt -- links to free newspaper archives and a searchable database of searchable newspapers.

***** Spider's Apprentice The -- a public service site that offers help on searching the Web. They also analyze and rate the major search engines.

Verity Internet Virtual Library Search -- searchable index of documents of interest to those using
and developing the world-wide web and its related technologies


*** Sprockets and Cogs - Truly Targeted Technical Search Tool -- not impressive, but useful


Search the Web version 1.0
Perl script that performs a search on Altavista, Excite, Infoseek, Lycos, Yahoo, and Webcrawler.

Language: Perl Platform: Unix, Windows

View Product Homepage

Download Complete Source Code, 0.010M bytes

Click file name to view online:
index.html, 3152 bytes
top-search.htm, 727 bytes
search-logo.gif, 3742 bytes
search.gif, 1348 bytes
rlaj-search.gif, 2905 bytes
main-search.htm, 1250 bytes
index_bg.gif, 1280 bytes
web-search.cgi, 3308 bytes



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: September, 12, 2017