In Python, especially for data science, there are several fundamental data types you'll encounter. Let's break down the most common ones and how they relate to data manipulation with libraries like Pandas and NumPy. Built-in Python Data Types These are the basic data types that come with Python itself. Numeric Types: int (Integer): Whole numbers, positive or negative, without a decimal point. Example: 10, -5. float (Floating-Point Number): Numbers with a decimal point. Example: 3.14, -0.5. complex (Complex Number): Numbers with a real and imaginary part. Example: 3 + 5j. – In data science, a complex data type, which has a real and imaginary part (e.g., 3+4j), is rarely used. It doesn't have a direct application in common data analysis tasks like statistical modeling, machine learning, or data visualization, which primarily deal with real-valued numbers. Its usage is generally limited to specific fields of science and engineering, such as signal processing, quantum mechanics, or electrical engineering, where imaginary numbers are essential for computations. Text Type: str (String): A sequence of characters. Strings are immutable, meaning they cannot be changed after creation. Example: "Hello, world!". Boolean Type: bool (Boolean): Represents one of two values: True or False. Used for logical operations. Sequence Types: list (List): An ordered, mutable collection of items. Lists can contain items of different data types. Example: my_store= [1, "apple", 3.14]. fruits = ["apple", "banana", "cherry"] print(fruits[0]) # apple tuple (Tuple): An ordered, immutable collection of items. Tuples are often used for heterogeneous data types (e.g., a record). Example: my_store= (1, "apple", 3.14). point = (10, 20) print(point[1]) # 20 Set Types: set (Set): An unordered collection of unique items. Sets are mutable. Example: my_store= {1, 2, 3}. unique_nums = {1, 2, 2, 3} print(unique_nums) # {1, 2, 3} frozenset: An immutable version of a set. Mapping Type: dict (Dictionary): An unordered collection of key-value pairs. Keys must be unique and immutable. Dictionaries are incredibly useful for mapping and lookups. Example: person = {"name": "Alice", "age": 30}. print(person["name"]) # Alice Data Types for Data Science While the built-in types are the foundation, data science libraries introduce specialized, optimized data structures. NumPy Data Types NumPy is the cornerstone of numerical computing in Python. It introduces its own data types to handle numerical data efficiently, especially for large arrays. These types are based on C, which makes them faster than standard Python types. ndarray (N-dimensional Array): The primary data structure in NumPy. It's a grid of values, all of the same type, indexed by a tuple of non-negative integers. This homogeneity is what makes NumPy arrays so efficient. Examples include: int64, int32: For integers of different sizes. float64, float32: For floating-point numbers. bool: For boolean values. datetime64: For date and time values. object: A catch-all type for non-numeric or mixed data, which is less efficient. Pandas Data Types Pandas is built on top of NumPy and provides powerful data structures for data analysis. Pandas data types are often referred to as dtypes. Series: A one-dimensional labeled array. Think of it as a single column of a spreadsheet. A Series can hold data of any type, but typically, it holds homogeneous data for efficiency. The dtype attribute of a Series will tell you the type of data it holds. DataFrame: A two-dimensional labeled data structure with columns of potentially different types. Think of it as a spreadsheet or a SQL table. Each column in a DataFrame is a Series and therefore has its own dtype. Common Pandas dtypes: object: The default type for strings or mixed types. It's flexible but less memory-efficient. Pandas treats strings as object dtype. int64, float64: The most common numeric types, inherited from NumPy. bool: For boolean data. datetime64[ns]: For date and time data. Pandas provides a powerful Timestamp object, and a Series of these is a datetime64 dtype. category: A special pandas type for categorical data. It's highly memory-efficient for columns with a limited number of unique values (e.g., "Male", "Female", "Unknown"). It stores these values as integers and a lookup table of the actual labels. When you check the .dtype of a string variable in a Pandas DataFrame, it shows up as object because that's the default data type Pandas uses for text or mixed-type data. It doesn't automatically convert it to a category data type, even if there are a limited number of unique values. The object dtype is flexible but memory-inefficient. You have to explicitly convert the column to a category type to leverage its benefits. You can do this using the .astype() method. Example : df['colors'] = df['colors'].astype('category') Validate : df['colors'].dtype Code Snippet Comparison of Core Python Collection Types TypeOrderedMutableUnique ElementslistYesYesNotupleYesNoNosetNoYesYesfrozensetNoNoYesdict (keys)NoYesYes (keys only) Data Type Conversion and Handling A crucial part of data science is ensuring your data is in the correct format. Type Casting: You can convert data from one type to another using methods like df['column'].astype(new_type). For example, df['age'].astype('int64') would convert a float column to an integer column. Missing Values: Pandas uses NaN (Not a Number) from NumPy to represent missing numeric data and None for missing object data. These missing values can impact the dtype of a column; for example, an integer column with a missing value will be converted to a float type to accommodate the NaN. A dictionary can be converted to DataFrame : # Sample DataFrame with a numeric column stored as a string df = pd.DataFrame({'price': ['10.50', '25.75', '18.00']}) # Sample DataFrame with date as a string df = pd.DataFrame({'event_date': ['2023-01-15', '2023-02-20', '2023-03-25']}) Date/Time Conversion: Use pd.to_datetime() to convert strings or numbers into datetime objects. This is essential for time series analysis. # Convert the 'event_date' column to datetime df['event_date'] = pd.to_datetime(df['event_date']) Convert to numeric : Convert any datatype to numeric column of Pandas Dataframe: # Convert the 'price' column to float df['price'] = pd.to_numeric(df['price']) Visual Flow of Data Types Here’s how Python data types relate to NumPy and Pandas:Python Built-in Types → NumPy dtypes → Pandas dtypes• int, float, bool → np.int32, np.float64, np.bool → pd.int64, pd.float64, pd.bool• str → np.object → pd.object or pd.category• datetime (via datetime module) → np.datetime64 → pd.datetime64[ns]This hierarchy shows how Pandas builds on top of NumPy, which in turn builds on Python.