Introduction to Machine Learning

Before jumping directly into what is Machine Learning lets starts with the meaning of individual words i.e. What is Machine and What is Learning.

  • A machine is a tool containing one or more parts that transform energy. Machines are usually powered by chemical, thermal, or electrical means, and are often motorized.
  • Learning is the ability to improve behaviour based on Experience.

What is Machine Learning? :-

According to Tom Mitchell, Machine Learning is
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E
Here,

  • Task T is what machine is seeking to improve. It can be Prediction, Classification, Clustering etc.
  • Experience E can be training data or input data through which Machine try to learn.
  • Performance P can be some factor like improvements accuracy or new skills that Machine was previously unaware about of etc.

Machine Learning1

Machine Learning itself contains 2 main components the Learner and the Reasoner.

  • Input/ Experience is given to the Learner that learn some new skills out of it.
  • Background Knowledge can also be given to Learner for better learning.
  • With the help of Input and Background, Knowledge Learner generates the Model.
  • The model contains the information about how he has learnt from the Input and Experience.
  • Now, the Problem/ task is given to Reasoner it can be Prediction, Classification etc.
  • With the help of trained  Model Reasonar tries to generate the Solution.
  • Solution / Answer can be improved with adding additional input/ Experience Background knowledge.

How Machine Learning different from Standard Program? :-

In machine learning, you feed the computer the following things

  • Input [Experience]
  • Output [output corresponding to inputs]

And get Model/ Program as Output. With the help of this program, you can perform some tasks.

Whereas In Standard Program you feed the computer following things

  • Input
  • Program [how to process the input]

And get the Output a simple example can be to verify a number of Prime or not.

Machine Learning

REFRENCES :- Machine Learning By Tom Michell


knoldus-advt-sticker


Cassandra Internals :- Writing Process

What is Apache Cassandra?

Apache Cassandra is a massively scalable open source non-relational database that offers continuous availability, linear scale performance, operational simplicity and easy data distribution across multiple data centres and cloud availability zones. Cassandra was originally developed at Facebook The main reason that Cassandra was developed is to solve  Inbox-search problem. To read more about Cassandra you can refer to this blog.

Why you should go for Cassandra over a Relational Database:-

Relational Database Cassandra

Handles moderate incoming data velocity

Handles high incoming data velocity

Data arriving from one/few locations

Data arriving from many locations

Manages primarily structured data

Manages all types of data

Supports complex/nested transactions

Supports simple transactions

Single points of failure with failover

No single points of failure; constant uptime

Supports moderate data volumes

Supports very high data volumes

Centralized deployments

Decentralized deployments

Data are written in mostly one location

Data written in many locations

Supports read scalability (with consistency sacrifices)

Supports read and write scalability

Deployed in vertical scale up fashion

Deployed in horizontal scale out fashion

How the write Request works in Cassandra:-

  • The client sends a write request to a single, random Cassandra node. The node who receives the request acts as a proxy and writes the data to the cluster.
  • The cluster of nodes is stored as a “ring” of nodes and writes are replicated to N nodes using a replication placement strategy.
  • With the RackAwareStrategy, Cassandra will determine the “distance” from the current node for reliability and availability purposes.
  • Now”distance” is broken into three buckets: the same rack as the current node, same data centre as the current node, or a different data centre.
  • You configure Cassandra to write data to N nodes for redundancy and it will write the first copy to the primary node for that data, the second copy to the next node in the ring in another data centre, and the rest of the copies to machines in the same data centre as the proxy.
  • This ensures that a single failure does not take down the entire cluster and the cluster will be available even if an entire data centre goes offline.

In Short, the write request goes from your client to a single random node, which sends the write to N different nodes according to the replication placement strategy. Now node waits for the N successes and then returns success to the client.

Each of those N nodes gets that write request in the form of a “RowMutation” message. The node performs two actions for this message:

  • Append the mutation to the commit log for transactional purposes
  • Update an in-memory Memtable structure with the change

Cassandra does not update data in-place on disk, nor update indices, so there are no intensive synchronous disk operations to block the write.

There are several asynchronous operations which occur regularly:

  • A “full” Memtable structure is written to a disk-based structure called an SSTable so we don’t get too much data in-memory only.
  • The set of temporary SSTables which exist for a given ColumnFamily is merged into one large SSTable. At this point, the temporary SSTables are old and can be garbage collected at some point in the future.

That’s how the Writing process works in Cassandra internally.

Refrences :-  A Brief Introduction to Apache Cassandra and Cassandra Internals

Unit Testing Of Kafka

Knoldus

Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you to pass messages from one end-point to another.

Generally, data is published to topic via Producer API and  Consumers API consume data from subscribed topics.

In this blog, we will see how to do unit testing of kafka.

Unit testing your Kafka code is incredibly important. It’s transporting your most important data. As of now we have to explicitly  run zookeeper and kafka server to test the Producer and Consumer.

Now there is also an alternate to test kafka without running zookeeper and kafka broker.

Thinking how ?   EmbeddedKafka is there for you.

Embedded Kafka is a library that provides an in-memory Kafka broker to run your ScalaTest specs against. It uses Kafka 0.10.2.1 and ZooKeeper 3.4.8.

It will start zookeeper and kafka broker

View original post 367 more words

Ingress to Monix-Execution: Scheduler

Knoldus

In the previous blog you learnt about Monix Task. Now lets dive into Monix Scheduler. Akka’s Scheduler that includes pretty heavy dependency whereas Monix Scheduler that provides a common API which is resuable.

For Monix Scheduler to work first we need to include the dependency which is:-

libraryDependencies += "io.monix" % "monix-execution_2.11" % "2.0-RC10"

Make sure that your Scala version is same as Monix version in your dependency. The Monix Scheduler can be a replacement for Scala’s ExecutionContext as:-

View original post 72 more words

First interaction Artificial Neural Network

First interaction Artificial Neural Network

Knoldus

I hated biology in my school days and loved mathematics. After a long period of time I get to learn something which combines both mathematics and biology together, that is Artificial Neural Network short for ANN, inspired by biological Neural network. Though you might find it weird, that is how I would like to define the artificial neural network. When we say biology here, it is basically the study of brain or perhaps the nervous system. How nervous system works, Artificial intelligence just mimics that. Neural network is getting popularity hugely now a days with bigdata by its side. Infact one of newly joined colleague said, you cannot do artificial neural network or any other machine learning algorithm without bigdata but of course I didn’t believe him and decided to try it myself. So rest of whatever will be in this blog are from the first interaction of mine with…

View original post 784 more words

Resolving the Failure Issue of NameNode

Knoldus

In the previous blog “Smattering of HDFS“, we learnt that “The NameNode is a Single Point of Failure for the HDFS Cluster”. Each cluster had a single NameNode and if that machine became unavailable, the whole cluster would become unavailable until the NameNode is restarted or brought up on a different machine. Now in this blog, we will learn about resolving the failure issue of NameNode.

Issues that arise when NameNode fails/crashes-
The metadata for the HDFS like Namespace Information, block information etc, when in use needs to be stored in main memory, but for persistence storage, it is to be stored in disk. The NameNode stores two types of information:
1. in-memory fsimage – It is the latest and updated snapshot of the Hadoop filesystem namespace.
2. editLogs – It is the sequence of changes made to the filesystem after NameNode started.

The total availablity of HDFS

View original post 310 more words

Multiple Feeds at one place: MultiFeed App

Multiple Feeds at one place: MultiFeed App

Knoldus

jianOkay, what if i tell you, there is an app :D, ever feel about having an App where you can add all your interesting blogs feeds ?

Here it is, may be there are many other available in play store but this one is simple actually very simple, just add your interesting blog feed url, get top 20 feeds as a simple list, click the post read that in the app close the app. Done. You earned the skill.

Here is a simple running image to show the working of this multi feed app.

Can’t wait ? Ok just download it from here for free: Play Store

multifeeds

Keep Learning Keep Sharing 🙂

View original post