How Computers See the World? | Part 1

Maheboob Patel
5 min readApr 21, 2022
Source: https://nealanalytics.com/blog/deep-learning-in-computer-vision-starts-with-data/

Computer vision is one of hottest AI topics these days. Be it self driving cars, keyless and cashless hotel, detecting risk of cardiac arrest , monitoring of weight gain in pigs the list goes on. Computer vision plays pivotal role.

Mobileye’s self-driving car on streets of Jerusalem
Alibaba’s keyless and cashless hotel !

“If 80 percent of our work is data preparation, then ensuring data quality is the important work of a machine learning team.”

Andrew Ng

For computer vision images is data. So you see where I am going! Cool. Where do we start? Deep Learning, OpenCV, PIL, CNN etc etc

Or, we should start with most fundamental element - The Pixel !

The word “Pixel” comes from the words “Picture Element” (pix = picture, el = element)

Pixel:

The smallest unit information in an image. We mainly deal with 2 types of pixels . a) pixel in an image and b) pixel of physical display

For simplicity let’s assume we are talking about grayscale images throughout this article.

Resolution :

This tells us how many pixels we have in an image or on physical display like phone, TV etc .

Source: https://www.researchgate.net

Its a rectangular grid of X rows and Y columns and each of those little boxes represent a pixel.

Example:
Here is our first tiny image of handwritten number 6.

Image of 6 in 28 by 28 pixels

This image is 28 by 28 pixels. i.e. it has 28 rows and 28 columns of pixels. That’s total 28*28 = 784 pixels.

As this is gray-scale (black and white) image, each pixel can have value from 0 to 255 (2⁸). That’s the range with 1 byte or 8 bits.

0=black, 255=white

Here 0 means black , 255 as white and everything else in between as shown above.

Show me some pixels please!!

Sure. This is how each pixel is stored in an image. This is 28 by 28 grid and each cell represent a pixel. Pixel value is displayed in each cell.

Pixel values on 28 by 28 grid

Hope you got an idea of what the pixel and resolution of digital image are. Remember so far its only about digital images. What’s the point of images if we can’t see them!

We view these images with the help of digital displays; e.g. mobile , TVs etc.
Lets explore how resolutions of physical displays work.

Resolution of Physical Displays

Just like digital images , physical devices too have pixels and resolutions. Difference is these are physical or real. i.e. you can touch and feel them :)

Here is simplified example of physical display. Each white dot represent a physical pixel which can be illuminated independently.

i.e. brightness value can be varied from 0 to 255. Remember 0=black and 255=white.

32×16 pixels

Below is list of pixel resolutions for few common mobile phones and laptops.

Examples of display resolutions

Now the question is how digital pixel is related to physical one.

Each digital pixel contains ‘information’ about overall image. for grayscale images its value between 0 and 255. Just pause for a second and think about it - EVERYTHING that’s black n white is made of these tiny dots !!

So are digital pixels mapped to physical ones? This is complicated topic but lets try to understand higher level concepts.

Our tiny 28 by 28 image contains total 784 pieces of ‘information’. Lets explore few scenarios.

Scenario 1: Image and Physical Display Have Same Resolution.

This is very rare but if at all this is the case then each digital pixel will be represented by corresponding physical pixel on display.

Scenario 2: Image Has Lower Resolution than Physical Display.

The technique used here is generally known as ‘zooming’. Two simple approaches used for Zooming are known as pixel replication or interpolation.

Zooming in this context mean we have less digital pixels than physical pixels on display. we may not get same quality as original picture.

This is not same as zooming high resolution image to focus on specific part of image. Here original image may have far more pixels than pixels on display. Quality of picture shouldn't be impacted to certain extent.

Source: https://homepages.inf.ed.ac.uk/rbf/HIPR2/scale.htm

Original image of 2 by 2 pixel is zoomed to 4 by 4 pixels; i.e 1 pixel is zoomed to 4 pixels. This can then be displayed on 4 by 4 screen.

Replication: Same pixel value is copied into new pixels.

Interpolation: New pixel values are derived by by taking a statistical sample (e.g. mean, average) of neighborhood pixels.

Zooming (4x) - replication vs interpolation

Scenario 3: Image Has Higher Resolution than Physical Display.

Here we create lesser pixels than original image and the technique is known as sub sampling.

https://homepages.inf.ed.ac.uk/rbf/HIPR2/scale.htm

That’s all for now. In part 2 , we will explore how computers detect and identify objects, all based on these tiny pixels!

Appreciate your suggestions and comments. Thank You!

Meanwhile, you may check this out.

References

Energy metabolism of the visual system
Geometric Scaling

--

--