Supervised Learning
Supervised Learning is a type of Machine Learning where the model is trained on labeled data, and its goal is to learn a mapping from input to output. In other words, the model is provided with input features and corresponding correct output labels during the training process. The objective is to predict the correct output for new, unseen data. Supervised Learning can be further divided into two categories: regression and classification.
Regression Algorithms
Regression algorithms are used when the target variable is continuous and the goal is to predict a value within a specific range. Here are some popular regression algorithms implemented in Python:
- Linear Regression: A simple regression algorithm that fits a linear relationship between the input features and the target variable.
- Polynomial Regression: Extends linear regression by adding polynomial features to capture more complex relationships between the variables.
- Support Vector Regression (SVR): A non-linear regression algorithm that uses support vector machines to perform regression tasks.
- Decision Trees Regression: A non-parametric regression algorithm that models the target variable as a tree-like structure of decisions.
- Random Forest Regression: An ensemble technique that combines multiple decision trees to improve the accuracy and reduce overfitting.
Let’s explore how to implement some of these regression algorithms using Scikit-learn in Python:
# Importing the required libraries import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures from sklearn.svm import SVR from sklearn.tree import DecisionTreeRegressor from sklearn.ensemble import RandomForestRegressor # Sample data X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1) y = np.array([2, 4, 6, 8, 10]) # Linear Regression linear_regressor = LinearRegression() linear_regressor.fit(X, y) # Polynomial Regression polynomial_features = PolynomialFeatures(degree=2) X_poly = polynomial_features.fit_transform(X) polynomial_regressor = LinearRegression() polynomial_regressor.fit(X_poly, y) # SVR svr_regressor = SVR(kernel='rbf') svr_regressor.fit(X, y) # Decision Tree Regression decision_tree_regressor = DecisionTreeRegressor() decision_tree_regressor.fit(X, y) # Random Forest Regression random_forest_regressor = RandomForestRegressor() random_forest_regressor.fit(X, y) # Plotting the results plt.scatter(X, y, color='red', label='Data Points') plt.plot(X, linear_regressor.predict(X), label='Linear Regression') plt.plot(X, polynomial_regressor.predict(X_poly), label='Polynomial Regression') plt.plot(X, svr_regressor.predict(X), label='SVR') plt.plot(X, decision_tree_regressor.predict(X), label='Decision Tree Regression') plt.plot(X, random_forest_regressor.predict(X), label='Random Forest Regression') plt.legend() plt.xlabel('X') plt.ylabel('y') plt.title('Regression Algorithms in Python') plt.show()
In the code above, we use a simple set of sample data and implement Linear Regression, Polynomial Regression, SVR, Decision Tree Regression, and Random Forest Regression. The plot shows the fitted lines for each regression algorithm.
Classification Algorithms
Classification algorithms are used when the target variable is categorical and the goal is to assign input data points to predefined classes or categories. Here are some popular classification algorithms implemented in Python:
- Logistic Regression: A linear classification algorithm used for binary and multiclass classification tasks.
- K-Nearest Neighbors (KNN): A simple algorithm that assigns the class based on the majority class of its k-nearest neighbors.
- Decision Trees Classification: A tree-like model that makes decisions based on feature values.
- Random Forest Classification: An ensemble of decision trees that improves classification accuracy.
- Support Vector Machines (SVM): A powerful algorithm that finds the hyperplane that best separates classes in high-dimensional spaces.
Let’s see how to implement some of these classification algorithms using Scikit-learn in Python:
# Importing the required libraries import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.svm import SVC # Sample data X, y = make_classification(n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1, n_classes=3) # Logistic Regression logistic_classifier = LogisticRegression() logistic_classifier.fit(X, y) # K-Nearest Neighbors knn_classifier = KNeighborsClassifier() knn_classifier.fit(X, y) # Decision Tree Classification decision_tree_classifier = DecisionTreeClassifier() decision_tree_classifier.fit(X, y) # Random Forest Classification random_forest_classifier = RandomForestClassifier() random_forest_classifier.fit(X, y) # Support Vector Machines svm_classifier = SVC() svm_classifier.fit(X, y) # Plotting the results plt.scatter(X[:, 0], X[:, 1], c=y, cmap='rainbow', edgecolors='k', label='Data Points') plt.xlabel('Feature 1') plt.ylabel('Feature 2') plt.title('Classification Algorithms in Python') plt.legend() plt.show()
In the code above, we generate sample data using Scikit-learn’s make_classification function. Then, we implement Logistic Regression, K-Nearest Neighbors, Decision Tree Classification, Random Forest Classification, and Support Vector Machines for classifying the data points into different classes. The scatter plot shows the data points colored according to their classes.