Introduction to machine learning for data science – from scratch


Welcome everybody to the first article of my series “Introduction to machine learning for data science”

In this lecture or article, I will start with the core concepts before diving in the world of data science and machine learning and in this article I want to talk about computer science concept.

Wikipedia definition of computer science.

“Computer science is the study of the theory, experimentation, and engineering that form the basis for the design and use of computers. It is the scientific and practical approach to computation and its applications and the systematic study of the feasibility, structure, expression, and mechanization of the methodical procedures (or algorithms) that underlie the acquisition, representation, processing, storage, communication of, and access to information. An alternate, more succinct definition of computer science is the study of automating algorithmic processes that scale. A computer scientist specializes in the theory of computation and the design of computational systems.

This is the boring long definition that I don’t like and we don’t need.So we have to find a better way to define computer science that helps us understand the bases that we are working on.

To have a better understanding of what computer science is let us first ask this important question

What is data? or what is information?

Data: It’s anything that can be measured and described as a sequence of numbers.

What does this mean?

Let’s take this example: the word “Hello”

H E L L O, how we can represent this in numbers?

Simply if we have the following table which associates a number for each letter.letter numbers

Then simply we can say H: 8, E: 5, L: 12, O:15

Then HELLO will be represented: 85121215. This process is called Encoding.

Another method to map the word to numbers is the ASCII.

decimal ascii

Or maybe as you know you can represent it in binary 1-0 digits which really what computer only understands.

binary represent


Colors also can be represented as numbers.Each color is determined by the intensity of red, green, and blue.

colors number

Also, any picture or icon can be represented by numbers since its build from pixels and each pixel is a combination of colors and so on with videos which a set of images.And the sound is represented in waveform “a graph that represents a certain group of points”.All of these are samples of Data.

Unstructured vs structured Data.

After understanding what is Data, now let’s see the difference between structured and unstructured data.

let us understand the concept by showing some examples.

Any document, video, sound clip, or an email that represents its own content and not related to other data in a form of a structure. is called unstructured data.

while databases, spreadsheets, form documents.Anywhere that you can think of a certain structure on the data where it can be indexed and easy to search in.

What is computer science?

Going back to our main goal of this article which is defining computer science for the purpose of our main topic.

The Definition: The study and the implementation of computer technology to store, retrieve, and process data.

What is process data? , and how to process data?

In general in computer science, It is all about data and how to work with (C.R.U.D) operations.(Create,Retrieve,Update,Delete).

And programs are the recipes that tell the computer what to do with data, in other words, how to process data.

This is a sample application in python showing a sample data processing.

counter = 0
num = int(input("enter a number : "))
while counter < num:
       print(counter ," Hi")
       counter = counter +1
print("Program continue")



enter a number :  5
0  Hi
1  Hi
2  Hi
3  Hi
4  Hi
Program continue

If you are familiar with python or any programming language, you would have noticed that this
sample application gets a number from the user and loops while the counter is less than the user number.

This is a sample data processing code.

Alright now we have introduced computer science and data and we are ready to move towards explaining the second core concept which is Big data. 

