Python Interview Questions

32 Questions

Python

Web DevelopmentFrontendBackendData Science

Question 25

How do you use NumPy and Pandas for data manipulation?

Answer:

NumPy and Pandas are powerful libraries for data manipulation in Python. Here’s a guide on how to use these libraries effectively:

NumPy for Data Manipulation

NumPy is a fundamental library for numerical computing in Python. It provides support for arrays, matrices, and many mathematical functions.

Creating Arrays

import numpy as np

# Creating a 1D array
array_1d = np.array([1, 2, 3, 4, 5])

# Creating a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])

# Creating arrays with specific values
zeros_array = np.zeros((3, 3))  # 3x3 array of zeros
ones_array = np.ones((2, 4))    # 2x4 array of ones
range_array = np.arange(10)     # Array of values from 0 to 9
linspace_array = np.linspace(0, 1, 5)  # 5 values evenly spaced between 0 and 1

Array Operations

# Element-wise operations
array = np.array([1, 2, 3, 4])
print(array + 2)  # Output: [3 4 5 6]
print(array * 2)  # Output: [2 4 6 8]

# Mathematical functions
print(np.sqrt(array))  # Output: [1. 1.41421356 1.73205081 2. ]
print(np.exp(array))   # Output: [2.71828183 7.3890561  20.08553692 54.59815003]

# Statistical operations
print(np.mean(array))  # Output: 2.5
print(np.sum(array))   # Output: 10
print(np.std(array))   # Output: 1.118033988749895

Indexing and Slicing

array = np.array([1, 2, 3, 4, 5])

# Indexing
print(array[0])  # Output: 1

# Slicing
print(array[1:4])  # Output: [2 3 4]
print(array[:3])   # Output: [1 2 3]
print(array[::2])  # Output: [1 3 5]

# Boolean indexing
print(array[array > 2])  # Output: [3 4 5]

Reshaping and Aggregation

array = np.arange(12).reshape((3, 4))

# Reshape
print(array)
# Output:
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

# Aggregation functions
print(np.sum(array, axis=0))  # Sum by columns
# Output: [12 15 18 21]

print(np.sum(array, axis=1))  # Sum by rows
# Output: [ 6 22 38]

Pandas for Data Manipulation

Pandas is a high-level data manipulation tool built on top of NumPy. It provides powerful data structures like DataFrame and Series.

Creating DataFrames

import pandas as pd

# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'San Francisco', 'Los Angeles']
}
df = pd.DataFrame(data)

# Creating a DataFrame from a CSV file
df = pd.read_csv('data.csv')

Basic DataFrame Operations

# Viewing data
print(df.head())    # View the first 5 rows
print(df.tail())    # View the last 5 rows
print(df.info())    # Summary of the DataFrame
print(df.describe())  # Statistical summary

# Selecting columns
print(df['Name'])   # Select a single column
print(df[['Name', 'Age']])  # Select multiple columns

# Selecting rows
print(df.iloc[0])    # Select the first row by index
print(df.loc[0])     # Select the first row by label (index)

# Slicing rows
print(df.iloc[1:3])  # Select rows by index range
print(df.loc[1:3])   # Select rows by label range (index)

# Boolean indexing
print(df[df['Age'] > 30])  # Filter rows based on a condition

Modifying DataFrames

# Adding a new column
df['Salary'] = [70000, 80000, 90000]

# Modifying existing columns
df['Age'] = df['Age'] + 1

# Dropping columns
df = df.drop('City', axis=1)

# Renaming columns
df = df.rename(columns={'Name': 'Full Name', 'Age': 'Years'})

Handling Missing Data

# Detecting missing data
print(df.isnull())

# Dropping rows with missing data
df = df.dropna()

# Filling missing data
df = df.fillna(0)

# Filling missing data with specific values
df['Age'] = df['Age'].fillna(df['Age'].mean())

Grouping and Aggregation

# Group by a column and compute aggregate statistics
grouped = df.groupby('City').agg({'Age': 'mean', 'Salary': 'sum'})

# Pivot tables
pivot_table = df.pivot_table(values='Salary', index='City', columns='Gender', aggfunc='mean')

Combining DataFrames

# Concatenating DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
result = pd.concat([df1, df2])

# Merging DataFrames
df1 = pd.DataFrame({'Key': ['A', 'B', 'C'], 'Value1': [1, 2, 3]})
df2 = pd.DataFrame({'Key': ['A', 'B', 'D'], 'Value2': [4, 5, 6]})
result = pd.merge(df1, df2, on='Key', how='inner')

Summary

NumPy: Ideal for numerical computations, array manipulations, mathematical operations, and handling large datasets efficiently.
Pandas: Built on top of NumPy, provides powerful and flexible data structures like DataFrame and Series for data manipulation, analysis, and handling heterogeneous data.

These libraries, used together, offer a robust toolkit for data manipulation and analysis in Python, enabling efficient and effective data processing workflows.