Aggregation: SQL aggregate functions (e.g., GROUP BY, SUM, COUNT, AVG, MIN, MAX) Sorting: Sorting data by one or multiple columns Joining/Merging: Combining datasets using keys (e.g., inner join, outer join, left join, right join) Appending: Concatenating datasets vertically Dropping Columns/Rows: Removing unnecessary columns or rows Viewing Subsets of Data: Viewing a few records (head, tail) Viewing specific columns Data Type Conversion: Changing the format of fields (e.g., int to float, string to datetime) String Operations: Concatenation Splitting strings by delimiter (text-to-columns) Substring extraction Finding the last n characters Mathematical Operations: Addition, subtraction, multiplication, division of columns Calculating differences between data points Date and Time Extraction: Extracting year, month, quarter, day, etc., from date values Finding Extremes: Finding maximum or minimum values across multiple fields for each row Binning: Creating bins of numeric variables (e.g., using quantiles or fixed intervals) Custom Grouping: Creating customized groupings from character variables Flooring and Capping: Setting minimum (flooring) and maximum (capping) limits for values Missing Value Treatment: Handling missing data (e.g., imputation, removal) Dummy Variable Creation: Creating binary/indicator variables for categorical data Target Encoding: Encoding categorical variables based on the target variable's mean or other statistics Normalization and Standardization: Scaling features to a specific range or to have zero mean and unit variance Pivoting: Transforming data from long format to wide format (pivot tables) Unpivoting: Transforming data from wide format to long format (melt) Filtering: Subsetting data based on conditions Window Functions: Applying operations over a window of rows (e.g., rolling average, cumulative sum) Rank and Percentile Calculation: Ranking data and calculating percentiles Correlation and Covariance Calculation: Computing the correlation and covariance between variables Feature Engineering: Creating new features from existing data (e.g., polynomial features, interaction terms) Text Processing: Removing stopwords, stemming, lemmatization Outlier Detection and Treatment: Identifying and handling outliers Sampling: Drawing samples from the dataset (e.g., random sampling, stratified sampling) Merging Data from Different Sources: Integrating data from multiple sources (e.g., databases, APIs) Data Encryption and Decryption: Encrypting sensitive data fields for security Data Validation and Cleaning: Ensuring data quality by validating data types, consistency, and accuracy Visualization: Creating plots and charts to visualize data trends and distributions Pareto Analysis: Creating bar graph and cumulative % on y-axis By using these operations, data analysts can clean, transform, and analyze datasets effectively, gaining insights and preparing the data for further modeling or reporting tasks. Can you believe that Extreme-ML does all these things? Contact us to see a demo of the same Contact Us