The web is enormous - every day tens if not hundreds of thousands of new sites are added. Even more are updated. How do you find the proverbial needle in this haystack? We are a great fan of search engines - they have made it all the easier to find things on the web since 1993 and have democratised the web enormously. But have they also added to the confusion?
Searching is a great pastime if you've got time to waste - you never know where you are going. It's an adventure. Internet searching done by employees - sometimes not for work purposes - is one of the main occupations these days. And how much time do you spend at home doing it? Even more than watching TV for some people. Perhaps we should go to the gym.
This is all processed by enormous arrays of computers that regularly spider - or look at - every web page they can find. Each web page is parsed - geek-speak for decoded - in some way to extract the important words. Proprietary programs sort all this information. Many people are used to devise these algorithms and keep them up to date. It is one of the main growth areas and online shopping grew by 38% in 2008 - not the easiest business year.
In simple terms the calculation uses various complex approaches where the terms are all the words you can find on each web page plus the interactions with all the other website in the world - links to and from the page. This massive calculation is done every so often. When you enter a search enquiry, the associated probabilities for each of your search terms are calculated and added up. The pages displayed for you to see correspond to those with the largest resulting probabilities. You can find more information on Wikipedia.
This becomes a cat-and-mouse game between search engines and website owners because getting a website in the top 10 makes a lot of difference to the number of times it is viewed - and therefore to the number of sales.
Search engines change their parameters every now and again to avoid people guaranteeing top spots.
Back to your search. When you get your results, you know what happens.
Firstly there are all those pages that you don't want. These may be because your enquiry wasn't too well worded (whatever that means) or owners have promoted their sites to get in the Top 10 at the expense of the site you need - but don't know exists.
You have to scroll down looking at each entry to see whether it is relevant to your needs - most aren't. When you have found something that may be appropriate, you would like to know if there any competitors who may not have been so well promoted - but would otherwise be competitive. How do you find these?
So search engines also adopt a number of commercial approaches - because they need to pay for the computers, the people all that electricity and make a profit. There are sponsored links where people 'buy' positions in some way, paying whenever a given keyword is entered whether they make a sale or not. Then there is contextual advertising where small ads are placed on individual websites according to the information on the page. This is done 'automatically' but again someone has to derive the algorithms and select the advertisers.
Internet searching is one of the most prolific users of high performance computing and international bandwidth. The power consumed is enormous. Someone calculated that a two Google enquiries consumes about the same power as making a cup of tea. Did you know that 2% of our national electricity consumption is used in such data centres - where racks of computers spend their time processing information? That is not including your machines at home or in the office. Think about it - it's the same for all industrialised countries of course. Just how much CO2 is generated by those innocent enquiries?