Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Pitfalls of Google as a Unix Information Search Engine

News Seven sisters Recommended Links Papers Google Bombing Google as a malware propagation engine
Link Harvest      Humor  Random Findings Etc

Google tried to be universal search engine, and it overextended its functionality. becoming mediocre in the key area -- Unix/Linux information search.  It does not do even elementary things such as providing information about the date of registration of particular domain.  Now many users for privacy concerns switch to alternative search engines, such as duckduckgo.com  See Google Search - Wikipedia

Google bombing flourish in Unix search space

In theory one can defeat this system of creating "fake" or "pointer" sites that serve only to drive up the rankings for another site by assigning to the site another variable (say relevance).  But as long as Google pay money to displaying advertisement on the pages such practice (known as Google Bombing) can adapt to it although it now involves a good deal more of coordination among a group of people (for example top linkers conspiracy), or a single person with lots of free time and a drive for Google income (which in some third world countries is enough for pretty decent living).

There has been some debate about the degree to which blogs can affect Google's rankings by facilitating large numbers of people to all simultaneously link to the latest and greatest trends, fads, memes, or news bites on the internet. As soon as Google made available the Google API, a SOAP interface that allows developers to query Google and retrieve results without having to use the normal HTML form interface, a new more efficient form of Google bombing become possible. Actually this is the bombing with feedback.

Google as a malware propagation engine

More sinister variant of Google bombing is its use by criminal companies to get high ranking of specific searches and infect computers and propagate spyware of some other types of malware via such websites.

For general information please visit HOWTO search the WEB. For information on other search engines, see Search Engine Watch.

Note: search in not the only one overhyped in Google:

"The Thomson Gale publishing group has put together a comprehensive review of Google Scholar, and they find it highly lacking compared with similar offerings from Highwire Press, Scopus, and The Web of Science. Will Google's overhyped offerings drive these superior services out of the market?"


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Nov 12, 2013] Anonymous employees reveal the worst thing about working for Google

The Independent

The perks of being a Google employee are legendary. If you manage to fight your way through the notoriously obtuse interview questions then you're rewarded with corporate nirvana: a sprawling Promised Land of deconstructed office space and free amenities.

Despite this Google has the fourth highest employee turnover of any Fortune 500 company (Amazon is number two) so why can't it hang on to its employees?*

In a thread on Quora we can find at least some answers. The question asks "What's the worst part about working at Google?" and self-proclaimed Googlers ('Xooglers' as they're known - though this lot are mostly anonymous) have populated the thread to carp about life at the search giant.

We've hand-picked some of the choicest extracts below, or you can check out the thread in full here.

*We're being a little unfair here: high turnover isn't necessarily a bad thing and is at least partly a reflection of a fast-moving industry full of very, very employable people.

Google moved away from its start-up roots a long time ago

"Google was not a start-up environment by the time I left. The same office politics. It was easy to get promoted if you worked on the right projects and projected your work in the right way."

And because it's so well-tuned, there are quite a few boring jobs

"Google is an incredible machine that prints money thanks to AdWords. Unless you are an amazingly talented engineer who gets to create something new, chances are you're simply a guy/girl with an oil can greasing the cogs of that machine."

Basically, everyone's over-qualified because everyone's amazing

"I worked at one of the larger non-MV [Mountain View, Google's HQ] campuses, and the only intellectual stimulation I encountered in my time there was the interview process."

"The other thing is that its very hard to have *huge* impact at Google. Most of the large exciting problems were already solved, so you probably will end up working in the smallest meaningless tiny feature nobody cares about."

And that means people can be arrogant too

"Unfortunately, in spite of the common belief, I think the average level of Google engineers is mediocre. With a lot of arrogance, too. Everybody believes he (males dominate) is better than his neighbor."

The dominating engineering culture can hurt productivity

"There is not enough focus on product and visual design. This has led to many aborted/semi-successful products, like Wave, Google Video, Buzz, Dodgeball, Orkut, Knol, and Friend Connect. There is probably too much focus on pure engineering"

"If you look at Google products, you see tons of clutter, useless features, lack of simplicity/elegance, and unwarranted focus on technical complexity."

And this can also make for bad managers

"I'd say the relentless daily mediocre thinking of middle management types who are completely focused on metrics to the exclusion of all other factors. They don't want to rock the boat, they don't know how to inspire their workforce, and they rely far too much on the Google name and reputation to do that for them."

With all those amenities, there's actually less private space

"It's not uncommon to see 3-4 employees in a single cube, or several managers sharing an office. With all the open areas for food, games, TV, tech talks, etc, it can be surprisingly hard to find a quiet, private place to think."

And discipline can also be a problem

"There was no discipline in the offices. People chatted about random things on the emailing lists, often insulting each other. I once emailed a very big team asking a genuine question (as an external customer of their product). The response was sarcastic. If you try to do that at a company like Amazon, you will be immediately reprimanded (or so I think)."

Basically it seems that what makes Google so appealing can also be a problem for some

"I've always said that Google is hands-down the best corporate in the world. You get to work with incredible products, inspiring people, enjoy amazing perks, have unforgettable experiences, and get paid very well. It's all so incredibly easy. And that's the best part and the worst part about working at Google."

Inside Google's Anti-Malware Operation

Slashdot

A Google malware researcher gave a rare peek inside the company's massive anti-malware and anti-phishing efforts at the SecTor conference here, and the data the company has gathered shows that the attackers who make it their business to infect sites and exploit users are adapting their tactics very quickly and creatively to combat the efforts of Google and others. While Google is still a relative newcomer to the public security scene, the company has deployed a number of services and technologies recently that are designed to identify phishing sites, as well as sites serving malware, and prevent users from finding them. The tools include the Google SafeBrowsing API and a handful of services that are available to help site owners and network administrators find and eliminate malware and the attendant bugs from their sites. Fabrice Jaubert, of Google's anti-malware team, said the company has had good luck identifying and weeding out malicious sites of late. Still, as much as 1.5 percent of all search result pages on Google include links to at least one malware-distribution site, he said."

Google Malware Runs Rampant on the Web

Circuit Diagram Wiring

The research, titled "The Ghost in the Browser: Analysis of Web-Based Malware," reported that an adversary who can successfully compromise a victim's browser can gain access to banking and medical records, authorization passwords, and personal communication records.

Google said that in its analysis of several billion URLs and an in-depth look at 4.5 million Web sites over a 12-month period, it discovered 450,000 sites were successfully launching drive-by-downloads of malware code.

Graham Cluley, a senior analyst with security firm Sophos, said researchers at his firm agree with Google's findings. "Everybody needs to learn to protect themselves better from these kind of attacks," he said. "More and more businesses are recognizing the need to scan their Web gateway just as they do their e-mail gateway to keep abreast of emerging threats."

Sitting Ducks?

Google also concluded that average computer users have no way to protect against these threats. "Their browser can be compromised just by visiting a page and become the vehicle for installing multitudes of malware on their systems," the nine-page report announced.

Google discovered that some of the most common malware sites were those that contained advertising. Sites that offer up user-generated content, such as blogs and forums, and those that offer third-party widgets, such as free traffic counters, are also commonly used by attackers looking to install code that makes victims of visitors.

As many antivirus engines rely on creating signatures from malware samples, adversaries can prevent detection by changing their code more frequently than antivirus engines are updated with new signatures, according to the Google study.

Threat Clarified

Although Cluley agreed with Google's research, he said it's important to clarify the threat. Some news headlines, he noted, have declared that Google's research revealed one in 10 Web sites are infected. But, he added, that's not accurate. The one-to-10 ratio is only true of the pages that Google already decided were worthy of further investigation, he clarified.

In its own research, Sophos discovers an average of 8,193 new malicious Web pages each day. What's most worrying, Cluley argued, is that 70 percent of these infected Web pages are on legitimate Web sites. In other words, the offending pages are often on sites that have been hacked or had malware planted on them without the owner of the Web page necessarily knowing.

"The Web is the new battleground between the good guys and the bad guys - if you have not already defended yourself then there is no time to lose," Cluley said. "Defense can come in the form of multilayered protection, such as desktop, e-mail, and Web gateways, but should be combined with security updates for your browsers and client firewalls."

Source : http://news.yahoo.com/s/nf/20070521/bs_nf/52404;_ylt=AvT1YODt6VBMRgslilBE5v8jtBAF

kuro5hin.org Comments Google and Recursion

The "Google Boxes" you mention won't allow you to increase your site's ranking, unless it's already up there in the top ten. Very different from Google bombing, which can bring an unknown site to the top.

As you pointed out and can be read about originally here, each site A which links to a site B increases B's ranking by a small amount, but doesn't affect A directly.

A's own ranking can only be affected positively if there are loops of the form A->B->some other sites->A, and the smaller the loop, the higher the effect on A. That's because all sites within the loop are affected, with decreasing benefits. The most affected site will be B, followed by B's successor, followed by B's successor's successor, etc, up to A, who gets a very small boost if the loop is large.

The total increase in A's ranking is a result of adding the small increases for all possible loops.

Now suppose you're an unknown site. Nobody links to you, but you decide to link to the top ten. Since these don't link back to you, or in a very very roundabout way, you'll get zero benefit from your Google Box.

Now suppose you're a top ten site, and half the top ten sites link back to you directly. If you link to each of them, you'll get a relatively large boost to your own ranking back from each of them, hence a noticeable increase. But if you had already linked to all of them before, you won't get any benefit, since I believe Google counts multiple links to the same page as one.

You'll note that the bit which allows you to increase your site's ranking requires lots of links, just like in Google bombing. If you've read so far, I think you'll agree that Google bombing is the better technique (which, for the record, I don't condone).

I think it's fairly obvious that Google are tampering with the basic PageRank framework, though we (I?) don't know for sure. For example they allow you to search by language. The fact that they can identify/classify documents this way means they have a framework in place.

So it should be relatively easy for them to include document weights if they wanted to which take the classifications into account, as rusty and jsled already proposed. Simple document weighting is really easy to do, but the trick of course is to end up with useful weights.

Having said all this, your particular example can be explained without resorting to advanced PageRank modifications.

Other people have reported similar phenomena (too lazy to find the links), with the following explanation: sometimes, people publish web server access logs (maybe inadvertently) which the search engine crawlers find. On those logs, there's a whole lot of information, including the IP of the client's machine, and perhaps the referral address (ie the last web address visited before the server was queried). Google may use these addresses as if they were a direct link to your site.

Re:20% Time? by zget (2395308)

July 20 | #36828322

As someone else here commented, Google has been changing rapidly recently: http://linux.slashdot.org/comments.pl?sid=2339084&cid=36825878 [slashdot.org]

I also was able to meet with some (middle management) people at Google and their attitude reminded me very strongly of MS's behavior 15 years ago: They don't listen to what others say and what they say often implies: "We're the smartest people on the planet, the world revolves around us, if you don't want to work with us and use our stuff, you're just an idiot." So it think I can conclude that Google sees themselves as "winning" the way that MS saw themselves winning in the late 90's.

You can see the same change with all the "privacy is not important" and the recent Google+ product. I think we are really seeing a turning point here. Google has finally passed the point where it has, after a long time, accepted it's not the small geeky company it once was and is now just driving for profits. The scary thing is, they have got in a great position to exploit that now.

Just a few more nails, and I can bury "The Cloud". (Score:2) by VortexCortex (1117377) writes:

July 20, @05:51PM (#36828876)

I've always been wary of "cloud computing", esp. when it's powered by a hybrid "thick-client" connected to a remote data repository... Applications anyone? At least with a client side service (eg: mail reader app) I can continue to use the features I like (such as gestures, goggles, nibbles, etc.) beyond the external "support" lifetime -- Without wondering if a feature will disappear tomorrow.

As an avid Google Labs user, I find their lack of support disturbing.

Furthermore, my plotter does not work with Windows7. The MFG no-longer supports it, so they won't recompile the driver, or give out the source so that I may do so. XP's EOL is 993.0488278587964 from now. This tells me that not only will I be using G'Linux / FLOS Software in the near future, and insist on hardware driver source-code, but that "The Cloud" I use must be built from my own servers, or not at all.

I think I'll call my globally accessible private personal network "The Closet"; I suspect many will identify with this terminology in terms of privacy for multiple reasons.

[Jan 29, 2007] The Top 100 Alternative Search Engines by Charles Knight

January 29, 2007 / 104 comments

Written by Charles S. Knight, SEO, and edited by Richard MacManus. The Top 100 is listed at the end of the analysis.

Ask anyone which search engine they use to find information on the Internet and they will almost certainly reply: "Google." Look a little further, and market research shows that people actually use four main search engines for 99.99% of their searches: Google, Yahoo!, MSN, and Ask.com (in that order). But in my travels as a Search Engine Optimizer (SEO), I have discovered that in that .01% lies a vast multitude of the most innovative and creative search engines you have never seen. So many, in fact, that I have had to limit my list of the very best ones to a mere 100.

But it's not just the sheer number of them that makes them worthy of attention; each one of these search engines has that standard "About Us" link at the bottom of the homepage. I call it the "why we're better than Google" page. And after reading dozens and dozens of these pages, I have come to the conclusion that, taken as a whole, they are right!

The Search Homepage

In order to address their claims systematically, it helps to group them into categories and then compare them to their Google counterparts. For example, let's look at the first thing that almost everyone sees when they go to search the Internet - the ubiquitous Google homepage. That famously sparse, clean sheet of paper with the colorful Google logo is the most popular Web page in the entire World Wide Web. For millions and millions of Internet users, that Spartan white page IS the Internet.

Google has successfully made their site the front door through which everyone passes in order to access the Internet. But staring at an almost blank sheet of paper has become, well, boring. Take Ms. Dewey for example. While some may object to her sultry demeanor, it's pretty hard to deny that interfacing with her is far more visually appealing than with an inert white screen.

A second example comes from Simply Google. Instead of squeezing through the keyhole in order to reach Google's 37 search options, Simply Google places all of those choices and many, many more all on the very first page; neatly arranged in columns.

Artificial Intelligence

A second arena is sometimes referred to as Natural Language Processing (NLP), or Artificial Intelligence (AI). It is the desire we all have of wanting to ask a search engine questions in everyday sentences, and receive a human-like answer (remember "Good Morning, HAL"?). Many of us remember Ask Jeeves, the famous butler, which was an early attempt in this direction - that unfortunately failed.

Google's approach, Google Answers, was to enlist a cadre of "experts." The concept was that you would pose a question to one of these experts, negotiate a price for an answer, and then pay up when it was found and delivered. It was such a failure, Google had to cancel the whole program. Enter ChaCha. With ChaCha, you can pose any question that you wish, click on the "Search With Guide" button, and a ChaCha Guide appears in a Chat box and dialogues with you until you find what you are looking for. There's no time limit, and no fee.

Clustering Engines

Perhaps Google's most glaring and egregious shortcoming is their insistence on displaying the outcome of a search in an impossibly long, one-dimensional list of results. We all intuitively know that the World Wide Web is just that, a three dimensional (or "3-D") web of interconnected Web pages. Several search engines, known as clustering engines, routinely present their search results on a two-dimensional map that one can navigate through in search of the best answer. Search engines like KartOO and Quintura are excellent examples.

Recommendation Search Engines

Another promising category is the recommendation search engines. While Google essentially helps you to find what you already know (you just can't find it), recommendation engines show you a whole world of things that you didn't even know existed. Check out What to Rent, Music Map, or the stunning Live Plasma display. When you input a favorite movie, book, or artist, they recommend to you a world of titles or similar artists that you may never have heard of, but would most likely enjoy.

Metasearch Engines

Next we come to the metasearch engines. When you perform a search on Google, the results that you get are all from, well, Google! But metasearch engines have been around for years. They allow you to search not only Google, but a variety of other search engines too - in one fell swoop. There are many search engines that can do this, Dogpile, for instance, searches all of the "big four" mentioned above (Google, Yahoo!, MSN, and Ask) simultaneously. You could also try Zuula or PlanetSearch - which plows through 16 search engines at a time for you. A very interesting site to watch is GoshMe. Instead of searching an incredible number of Web pages, like conventional search engines, GoshMe searches for search engines (or databases) that each tap into an incredible number of Web pages. As I perceive it, GoshMe is a meta-metasearch engine (still in Beta)!

Other Alt Search Engines

And so it goes, feature after feature after feature. TheFind is a better shopping experience than Google's Froogle, IMHO. Like is a true visual search engine, unlike Google's Images, which just matches your keywords into images that have been tagged with those same keywords. Coming soon is Mobot (see the Demo at www.mobot.com). Google Mobile does let you perform a search on your mobile phone, but check out the Slifter Mobile Demo when you get a chance!

Finally, almost prophetically, Google is silent. Silent! At least Speeglebot talks to you, and Nayio listens! But of course, why should Google worry about these upstarts (all 100 of them)? Aren't they just like flies buzzing around an elephant? Can't Google just ignore them, as their share of the search market continues to creep upwards towards 100%, or perhaps just buy them? Perhaps.

The Last Question

Issac Asimov, the preeminent science fiction writer of our time, once said that his favorite story, by far, was The Last Question. The question, for those who have not read it, is "Can Entropy Be Reversed?" That is, can the ultimate running down of all things, the burning out of all stars (or their collapse) be stopped - or is it hopelessly inevitable?

The question for this age, I submit, is… "Can Google Be Defeated"? Or is Google's mission "to organize the world's information and make it universally accessible and useful" a fait accompli?

Perhaps the place to start is by reading (or re-reading) Asimov's "The Last Question." I won't give it away, but it does suggest The Answer….

Charles Knight is the Principal of Charles Knight SEO, a Search Engine Optimization company in Charlottesville, VA.

The Top 100

For an Excel spreadsheet of the entire Top 100 Alternative Search Engines, go to: http://charlesknightseo.com/list.aspx or email the author at [email protected].

This list is in alphabetical order. Feel free to share this list, but please retain Charles' name and email.

Update: Thanks Sanjeev Narang for providing a hyperlinked version of the list.

Update, 5 February 2007: Charles Knight has left a detailed comment (#94) in response to all the great feedback in the comments to this post. He also notes:

"...while it looks like a very simple, almost crude list of 100 names, it has taken countless hours to try and do it properly and fairly. The list will be updated all year long, and the Top 100 can only get better and better until the Best of 2007 are announced on 12/31/07."

Charles, keep up the good work! I plan to showcase a new user interface for our visual search (www.quintura.com) at the Future of Web Applications (FOWA) event in London on February 20 - 22. Stay tuned to our developments! PS I have a question, though. Why is Quintura for Kids not on the list and only a runner-up? :) The service has started being used in some elementary schools after only one month since a beta release.

Posted by: Yakov | January 29, 2007 6:01 AM

Interesting list. I guess this list is the top 100 AFTER Google, Ask, MSN and Yahoo.
I would have to list Vivisimo above some of the others you have listed here.
I too don't have the time to go through all of them but I'd like to know if any of your top 100 are vertically focussed. I've been following one vertical in particular, health, and I don't see any of the ones I found to be useful.

Slashdot News for nerds, stuff that matters

"The Thomson Gale publishing group has put together a comprehensive review of Google Scholar, and they find it highly lacking compared with similar offerings from Highwire Press, Scopus, and The Web of Science. Will Google's overhyped offerings drive these superior services out of the market?"

Harvest A Distributed Search System

Noted-L (1 of 3) [Noted] Google Adds Wildcards to Phrases

http://www.researchbuzz.com/news/2002/jan03jan0902.html#googleadds

--<cut>--
If you're a regular ResearchBuzz reader you already know how to search for phrases in Google using "wildcard words" -- you just use the word "the," which Google always considers a stopword. So, search for "three the mice" in Google and you'll find three green mice,
three blue mice, three blind mice, etc. Google has made using "the" unnecessary by adding a word-sized asterisk to its search syntax.

What is a word-sized asterisk? It's an asterisk you can use in place of a word; "three * mice" will find three green mice, three blind mice, etc. This asterisk CANNOT be used for part of a word. If you try to search for "three bl* mice" you'll get no results. Thanks to Gary Price for this tip.
--<cut>--

--
J C Lawrence

[Sep 13, 2000] Google Propping Up Yahoo In Search Results

Slashdot
The Easy Answer To This Question (Score:4, Insightful)
by Phrogman (atho@<Remove this>omphalos.net) on Wednesday September 13, @01:59PM EDT (#66)
(User #80473 Info) http://www.omphalos.net
First of all Google does not rank SOLELY based on links to a page, they use a combination of the number of links to the page, the text on the page, its position, etc just like every other search engine. They also use the number of links from the page, and the text for 50 or so characters on either side of a link that links to the page. Its a wonderfully complex set of formulas that are being used to determine relevancy. While I have read the early papers on the methodology that Google is employing (from when it was a Academic project) it has obviously undergone a lot of improvements and refinements over time. They do not release the ranking criteria they are using to the general public (this is normal for Search Engine companies, who guard their criteria closely, and periodically change them without notice).

What seems most likely to me, is simply that Since Google has partnered with Yahoo, they have shared details on their ranking system or have assisted Yahoo staff in positioning the ranking of Yahoo pages in the Google database. As a result, the ranking position of Yahoo pages is on the rise simply because they have some inside information or help. That is why the pages have risen slowly over time, rather than simply popping to the top of the charts, as they might if Google had simply rewritten their formulas to make an exception when a Yahoo page is concerned.

With the work that has gone into creating Google, I do not think they would want to do any screwing around with their formulas that would result in major changes like people are suggesting here. They can help their partners rank better though.

Not a problem... (Score:4, Informative)
by Parity on Wednesday September 13, @01:52PM EDT (#52)
(User #12797 Info)
For people who have actually read the article... it seems to me that what's going on here is that Google merged its database with Yahoo's, and naturally, everyone that uses Yahoo as a major resource will have links into Yahoo in their pages, and so Google's rankings have been shifted, not by 'conscious policy' but by a change in the contents of the database.
Yahoo's rise will stop when all the newly added directories have been fully spidered and statted and cross-ranked, and it'll probably fall as Google's database grows with non-Yahoo-database links being added.
Not that I have direct access to Google's database or algorithm, but, this seems more likely than a covert ranking-adjustment plan within Google.
yahoo users affect google (Duh!) (Score:4, Interesting)
by obtuse (innocenti@@hotmail) on Wednesday September 13, @01:54PM EDT (#57)
(User #79208 Info)
Once Yahoo links to Google:
Yahoo users significantly increase their use of Google, and submit URLs. These URLs will be Yahoo biased, because after all, these are Yahoo users. This bias changes Google's ratings, without any other intervention.

Hell, Google was probably good largely because it was popular with geeks. Like the Net at large, it will become diluted by pr0n surfers & greed. I hope Google and Yahoo both are looking at other methods of automatic category building, since there are lots of interesting approaches to that problem.

Don't want Yahoo results in Google search? (Score:2, Informative)
by Earthling ([email protected]) on Wednesday September 13, @03:19PM EDT (#117)
(User #146872 Info)
It's a fact, Yahoo URLs come up more and more often in Google. Weither it's from malicious code from the Google folks or simply Yahoo adapting their pages to get better results in it, I don't know and so I'll leave the speculation to the experts. =)

However, I do know that if I'm making a search in Google, I don't care to find a list of links on Yahoo as a result (else I would have used Yahoo, don't you think?).

So, to get rid of Yahoo results, simply make your search at Google Advanced Search page (http://www.google.com/advanced_search.ht ml) and put "yahoo.com in the "exclude" field.
Et voilа! No more Yahoo pages showing up in google.

Search Engines - Skewed Results - Doing Research (Score:1)
by mwdib on Wednesday September 13, @04:34PM EDT (#133)
(User #56263 Info)
I've been following the wonderful world of search engines for several years in my role as web educator and maintainer for a University library. Skewed results seem to be an inevitable part of commercial engines - Alta Vista, et. al. were doing it long before Google burst on the scene. One of the great weaknesses of the Internet is the inadequacy of search engines and directories in support of serious research. While librarians seem to think that they could nicely organize the whole thing, I have my doubts that Dublin Core metadata or some extension of MARC into site classification will ever solve the problem. That said, Google is still probably the best general, "comprehensive" search tool available today. Expecting dispassionate morality from a business entity, however, is naive -- so naive that I'm a bit surprised that SlashDot's cynical staffers find it noteworthy. If you'd like to dip into the sordid world of internet search tools check out Search Engine Watch -- it's a good starting point to find out about business relationships as well as characteristics and performance of the various engines.
"When I grow up, I'll be stable."
a non-cynical point of view (Score:1)
by Y2K is bogus on Wednesday September 13, @05:04PM EDT (#135)
(User #7647 Info)
It seems to be that there is a cynical bias on this issue. I personally have a theory that is not based on a mis-appropriation of google results, but rather pure technical know how. Since Yahoo partnered with google, they need to have their information indexed. It makes sense that google has most of it indexed, but it's unlikely that every page yahoo has generated is in google's index. So, google needs to index every page they have. It only makes sense that google would have more accurate results by thoroughly indexing yahoo. The issue that people are having is that the yahoo index isn't sectioned off into it's own little cluster, but part of the larger index in google. It's simply the fact that google has a more complete index of the yahoo pages, not because yahoo is paying for search results.
[ Reply to This | Parent ]
It's a feature ;) (Score:1)
by sciasbat on Wednesday September 13, @05:07PM EDT (#137)
(User #196122 Info)
I'm not joking: the high presence of yahoo links in Google's search results may depend on their scoring system. Google uses page ranks (basically the more a page is linked the more it is scored) and referring link text. Yahoo pages are a lot linked from outside, as usually all pages of links point to the corresponding yahoo category, making them have a high score. In fact also in the Google beginnings (I'm using Google since '98) yahoo links were very popular in the results, it's nothing of new.
[ Reply to This | Parent ]
But. . . Does it still work? (Score:1)
by Marcus Erroneous on Wednesday September 13, @07:16PM EDT (#148)
(User #11660 Info)
As one who only occasionally dabbles in conspiracies as a passing hobby, I object to the improved ranking only as a knee-jerk reaction. I was one of the last to know about Yahoo listing results for money, and that was what really causor Google. The results were worth the move. Although I had noticed more Yahoo pages showing up, I'm still finding what I want faster than before. I'm still mostly finding stuff on the first link or first page. As long as I keep finding what I'm looking for quickly, I'll keep using Google. When I start finding advertising pushed before content (like Yahoo previously) I'll move. The early Google served a need that others had forgotten. If they forget, someone will reinvent the wheel. And I'll ride that one.
Hating Micro$oft is not a crime. Neither is loving it a virtue.

HOWTO search the WEB

Isearch
Fortunato - May 20th 1998, 07:05 EST

Homepage: http://www.etymon.com/Isearch/

Isearch is software for indexing and searching text documents. It supports full text and field based search, relevance ranked results, Boolean queries, and heterogeneous databases. Isearch can parse many kinds of documents "out of the box," including HTML, mail folders, list digests, SGML-style tagged data, and USMARC. It can be extended to support other formats by creating descendant classes in C++ that define the document structure. It is pretty easy to customize in this way, provided that you know some C++ (and you will need to ftp the source code). A CGI interface is also included for web based searching.

ftp://ftp.redhat.com/pub/contrib/hurricane/i386/Isearch-1.41-1.i386.rpm (1 hit)

[July 8, 1999] Story Search Stinks! But You Don't Have to Take it Anymore

Anyone who's ever used a search engine knows about broken links. Lousy interfaces. 10,000 returns with no meaningful results. Missing pages.

Essentially, a search engine is a type of software that creates indexes of databases or Web sites based on content. When you submit a search term, it goes out and "reads" its indexes and returns applicable results. Excite, HotBot and Lycos are popular examples.

A recent report by Nielsen/NetRatings Inc. shows search engine popularity is slipping. Click for more. And a study conducted at the NEC Research Institute confirms search engines can't keep up with the Web's rapid growth. Click for more. NECRI discovered:

METASEARCHES

Metasearch sites are a timesaver because they query multiple engines simultaneously. Examples include:

DIRECTORIES

These hubs weed through the information glut with topic guides featuring recommended sites. They'll also recommend sites based on your search terms. Theoretically, Yahoo is a directory. But I prefer sites that put an emphasis on quality rather than quantity. Examples:

TOOLS

The right utility does wonders for streamlining searching. Here are three of my favorites, and you'll find more at the ZDNet Software Library. Click for more.

Alexa got my Natural Born Killer's nod. Click for more. This free browser add-on acts as helpful backseat driver when you surf. Offering stats on sites, including ratings by Alexa's thousands of users. Click for more.

BullsEye lets you search and store info. The searchers are highly customizable. For instance, it will search for industry-specific business news. Click for more.

Copernic has an easy learning curve for the novice (or frustrated searcher). It searches multiple sites (as narrowly or widely as you'd like), validates links and features good custom sorting and multi-threading options. Click for more.

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Top articles

Sites

...

Papers

**** Scientific American Feature Article Hypersearching the Web June 1999

Your Keyword Density


Seven Sisters

www.snap.com -- Snap is a human-compiled directory of web sites, supplemented by search results from Inktomi.. Snap launched in late 1997 and is backed by Cnet and NBC. Competitor to Yahoo...

www.google.com -- probably the most promising

Northern Light Search Softpanorama -- many broken links. Looks like Alta Vista based

Looksmart -- old references.

Ask Jeeves!

Alta Vista -- Many broken links. Troubles in Alta Vista lead to deterioration of quality

Yahoo -- below average quality of results; no broken links. Proprietary + Inktomi matches; not bad, but nothing special...


Google

The PageRank Citation Ranking Bringing Order to the Web - Page, Brin, Motwani, Winograd (ResearchIndex)



Site Search Engines

The following search engines consist of just one or two Perl scripts, and are suitable for small to medium sites.

The search engines below are written in C with Web interfaces in Perl. Installation is more involved than for the engines above, but indexing and searching is generally faster. These engines are typically server orientated. That is, they are designed for installation by Web server administrators, with individual users able to configure their own index files.

For high end commercial search engines capable of handling very large sites, see the reviews by the US Department of Education and Network Computing Magazine.


Random Findings

Appindex is a perl script designed to retrieve information from freshmeat's application index. It searches for a program (perl regexp accepted), retrieves available info on that program, then optionally launches a browser to view the homepage.

Appindex will now read the config and application index file from the user's homedir if it is available. Catagory searching is also implemented, along with a '-u' option that will update the local copy of the application index.

*** Inference Find! -- Server 46 -- not bad

* Infoseek ksh tutorial

Astalavista -- daily updated search engine monitoring hundreds sites with hack & crack stuff.

Mister Driver- search engine for device drivers.

News Hunt -- links to free newspaper archives and a searchable database of searchable newspapers.

***** Spider's Apprentice The -- a public service site that offers help on searching the Web. They also analyze and rate the major search engines.

Verity Internet Virtual Library Search -- searchable index of documents of interest to those using
and developing the world-wide web and its related technologies



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March, 12, 2019