Unicode nearing 50% of the web

Google logoAbout 18 months ago, we published a graph showing that Unicode on the web had just exceeded all other encodings of text on the web. The growth since then has been even more dramatic.

Web pages can use a variety of different character encodings, like ASCII, Latin-1, or Windows 1252 or Unicode. Most encodings can only represent a few languages, but Unicode can represent thousands: from Arabic to Chinese to Zulu. We have long used Unicode as the internal format for all the text we search: any other encoding is first converted to Unicode for processing.

This graph is from Google internal data, based on our indexing of web pages, and thus may vary somewhat from what other search engines find. However, the trends are pretty clear, and the continued rise in use of Unicode makes it even easier to do the processing for the many languages that we cover.

Searching for "nancials"?

Unicode is growing both in usage and in character coverage. We recently upgraded to the latest version of Unicode, version 5.2 (via ICU and CLDR). This adds over 6,600 new characters: some of mostly academic interest, such as Egyptian Hieroglyphs, but many others for living languages.

We're constantly improving our handling of existing characters. For example, the characters "fi" can either be represented as two characters ("f" and "i"), or a special display form "fi". A Google search for [financials] or [office] used to not see these as equivalent — to the software they would just look like *nancials and of*ce. There are thousands of characters like this, and they occur in surprisingly many pages on the web, especially generated PDF documents.

But no longer — after extensive testing, we just recently turned on support for these and thousands of other characters; your searches will now also find these documents. Further steps in our mission to organize the world's information and make it universally accessible and useful.

And we're angling for a party when Unicode hits 50%!

Source: googleblog

Tags: Google, Internet

Comments
Add comment

Your name:
Sign in with:
or
Your comment:


Enter code:

E-mail (not required)
E-mail will not be disclosed to the third party


Last news

 
Consumer group recommends iPhone 8 over anniversary model
 
LTE connections wherever you go and instant waking should come to regular PCs, too
 
That fiction is slowly becoming a reality
 
The Snapdragon 845 octa-core SoC includes the Snapdragon X20 LTE modem
 
Human moderators can help make YouTube a safer place for everyone
 
Google says Progressive Web Apps are the future of app-like webpages
 
All 2018 models to sport the 'notch'
 
The biggest exchange in South Korea, where the BTC/KRW pair is at $14,700 now
The Samsung Galaxy A5 (2017) Review
The evolution of the successful smartphone, now with a waterproof body and USB Type-C
February 7, 2017 /
Samsung Galaxy TabPro S - a tablet with the Windows-keyboard
The first Windows-tablet with the 12-inch display Super AMOLED
June 7, 2016 /
Keyboards for iOS
Ten iOS keyboards review
July 18, 2015 /
Samsung E1200 Mobile Phone Review
A cheap phone with a good screen
March 8, 2015 / 4
Creative Sound Blaster Z sound card review
Good sound for those who are not satisfied with the onboard solution
September 25, 2014 / 2
Samsung Galaxy Gear: Smartwatch at High Price
The first smartwatch from Samsung - almost a smartphone with a small body
December 19, 2013 /
 
 

News Archive

 
 
SuMoTuWeThFrSa
     12
3456789
10111213141516
17181920212223
24252627282930
31      




Poll

Do you use microSD card with your phone?
or leave your own version in comments (4)