This website uses cookies to enhance the user experience

Python Interview Questions

32 Questions
Python

Python

Web DevelopmentFrontendBackendData Science

Question 25

How do you use NumPy and Pandas for data manipulation?

Answer:

NumPy and Pandas are powerful libraries for data manipulation in Python. Here鈥檚 a guide on how to use these libraries effectively:

NumPy for Data Manipulation

NumPy is a fundamental library for numerical computing in Python. It provides support for arrays, matrices, and many mathematical functions.

Creating Arrays

import numpy as np

# Creating a 1D array
array_1d = np.array([1, 2, 3, 4, 5])

# Creating a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])

# Creating arrays with specific values
zeros_array = np.zeros((3, 3))  # 3x3 array of zeros
ones_array = np.ones((2, 4))    # 2x4 array of ones
range_array = np.arange(10)     # Array of values from 0 to 9
linspace_array = np.linspace(0, 1, 5)  # 5 values evenly spaced between 0 and 1

Array Operations

# Element-wise operations
array = np.array([1, 2, 3, 4])
print(array + 2)  # Output: [3 4 5 6]
print(array * 2)  # Output: [2 4 6 8]

# Mathematical functions
print(np.sqrt(array))  # Output: [1. 1.41421356 1.73205081 2. ]
print(np.exp(array))   # Output: [2.71828183 7.3890561  20.08553692 54.59815003]

# Statistical operations
print(np.mean(array))  # Output: 2.5
print(np.sum(array))   # Output: 10
print(np.std(array))   # Output: 1.118033988749895

Indexing and Slicing

array = np.array([1, 2, 3, 4, 5])

# Indexing
print(array[0])  # Output: 1

# Slicing
print(array[1:4])  # Output: [2 3 4]
print(array[:3])   # Output: [1 2 3]
print(array[::2])  # Output: [1 3 5]

# Boolean indexing
print(array[array > 2])  # Output: [3 4 5]

Reshaping and Aggregation

array = np.arange(12).reshape((3, 4))

# Reshape
print(array)
# Output:
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

# Aggregation functions
print(np.sum(array, axis=0))  # Sum by columns
# Output: [12 15 18 21]

print(np.sum(array, axis=1))  # Sum by rows
# Output: [ 6 22 38]

Pandas for Data Manipulation

Pandas is a high-level data manipulation tool built on top of NumPy. It provides powerful data structures like DataFrame and Series.

Creating DataFrames

import pandas as pd

# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'San Francisco', 'Los Angeles']
}
df = pd.DataFrame(data)

# Creating a DataFrame from a CSV file
df = pd.read_csv('data.csv')

Basic DataFrame Operations

# Viewing data
print(df.head())    # View the first 5 rows
print(df.tail())    # View the last 5 rows
print(df.info())    # Summary of the DataFrame
print(df.describe())  # Statistical summary

# Selecting columns
print(df['Name'])   # Select a single column
print(df[['Name', 'Age']])  # Select multiple columns

# Selecting rows
print(df.iloc[0])    # Select the first row by index
print(df.loc[0])     # Select the first row by label (index)

# Slicing rows
print(df.iloc[1:3])  # Select rows by index range
print(df.loc[1:3])   # Select rows by label range (index)

# Boolean indexing
print(df[df['Age'] > 30])  # Filter rows based on a condition

Modifying DataFrames

# Adding a new column
df['Salary'] = [70000, 80000, 90000]

# Modifying existing columns
df['Age'] = df['Age'] + 1

# Dropping columns
df = df.drop('City', axis=1)

# Renaming columns
df = df.rename(columns={'Name': 'Full Name', 'Age': 'Years'})

Handling Missing Data

# Detecting missing data
print(df.isnull())

# Dropping rows with missing data
df = df.dropna()

# Filling missing data
df = df.fillna(0)

# Filling missing data with specific values
df['Age'] = df['Age'].fillna(df['Age'].mean())

Grouping and Aggregation

# Group by a column and compute aggregate statistics
grouped = df.groupby('City').agg({'Age': 'mean', 'Salary': 'sum'})

# Pivot tables
pivot_table = df.pivot_table(values='Salary', index='City', columns='Gender', aggfunc='mean')

Combining DataFrames

# Concatenating DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
result = pd.concat([df1, df2])

# Merging DataFrames
df1 = pd.DataFrame({'Key': ['A', 'B', 'C'], 'Value1': [1, 2, 3]})
df2 = pd.DataFrame({'Key': ['A', 'B', 'D'], 'Value2': [4, 5, 6]})
result = pd.merge(df1, df2, on='Key', how='inner')

Summary

  • NumPy: Ideal for numerical computations, array manipulations, mathematical operations, and handling large datasets efficiently.
  • Pandas: Built on top of NumPy, provides powerful and flexible data structures like DataFrame and Series for data manipulation, analysis, and handling heterogeneous data.

These libraries, used together, offer a robust toolkit for data manipulation and analysis in Python, enabling efficient and effective data processing workflows.

Recent job openings