The world of data analysis is becoming increasingly accessible, thanks to the rise of powerful AI code generation tools like Gemini, Claude, and ChatGPT. Whether you’re a coding novice or a seasoned developer, these models can help you easily visualize data into dashboards and turn raw data into meaningful insights.
Why Use AI?
When it comes to data analysis, AI models excel in these aspects:
- Simplifying data transformation: To clean or transform your data, you can ask these models to write Python scripts for tasks like filtering, sorting, or aggregating data. You can also upload multiple data sets and ask the model to create a unified data set.
- Calculating statistics: AI models can help you generate code to perform correlation calculations and summarize the output of the results for a report.
- Making data visualization easy: Quickly visualize your data by simply uploading your datasets and asking the model to generate an HTML dashboard.
Example: Sea Level Rise & Local Flooding Correlation Analysis
Let’s say I wanted to analyze sea level trends and local flooding events. At the end of this experiment, I’m looking to answer the question, “Is there a statistically significant correlation between long-term sea level rise and flood event frequency/severity?” and I’d also like to present the data as a simple HTML dashboard with a couple figures and summary statistics.
The workflow for this data analysis experiment might look like this:
- Data Acquisition & Integration. Download and combine sea level records from multiple sources to create a unified flood event dataset.
- Data Cleaning & Preparation. Handle missing values in time series data.
- Statistical Analysis. Calculate correlation coefficients between sea level trends and flood metrics.
- Visualization & Reporting. Generate a dashboard with time series plots and produce summary statistics and confidence intervals for key correlations.
Before we move on, it’s worth knowing that I’m not a software developer. While I’m an experienced data analyst and regularly use Python for data analysis, writing software like this is not something I could do easily and quickly. I was interested to see how AI could help take my data analysis skills to the next level.
Let’s Try It Out
I asked 3 AI models to perform the data analysis experiment using NOAA relative sea level trend data and National Weather Service historical flood data from Philadelphia by asking them the following questions:
- Can you take data from these csv files and then write some code to analyze if there is a statistically significant correlation between mean sea level and flood event frequency/severity?
- Can you write some code that creates a dashboard for this analysis, which includes data visualizations and summary statistics?
Gemini
Following the 1st prompt, Gemini delineated the steps the code performs, which are summarized below:
- Imports necessary libraries, using pandas for data manipulation and scipy.stats for statistical correlation analysis.
- Loads the datasets into a pandas DataFrame.
- Processes the data, which involves formatting, extracting, and grouping.
- Merges data sets.
- Handles missing values.
- Performs correlation analysis (calculates Pearson correlation coefficients)
After executing the script from this first step, Gemini provided the correlation coefficients and p-values to the console.
Following the 2nd prompt, Gemini produced a script and explained that it would create the following:
- A line chart showing the trend of mean sea level over the years.
- Two scatter plots visualizing the relationship between mean sea level and flood events, and mean sea level and flood severity.
- Summary statistics displaying the correlation coefficients and p-values from the correlation analysis.
The script required some minor debugging (one function had to be edited) but produced an HTML link in the terminal, which could then be copied and pasted into my browser and opened the HTML dashboard below.
Pretty neat, right? Let’s take a look at the next model.
Claude
Following the 1st prompt, Claude broke down the steps of analysis as the following:
- Data Preprocessing, which involved converting, grouping, and cleaning data
- Correlation Analysis, using Pearson correlation to measure linear relationships and determining statistical significance using p-value
After executing the script, I saw that Claude provided the correlation coefficients and p-values to the console. I also found it had exported a combined data set of the 2 separate files I uploaded, even though I hadn’t prompted it to produce this.
To create a Python dashboard for the 2nd prompt, Claude explained that it used Pandas for data processing, Plotly for interactive visualizations, and Dash for creating the dashboard. Claude summarized the key components of the dashboard:
- Sea Level Analysis Section
- Interactive time series plot of monthly mean sea level
- Selectable rolling average periods (1, 5, and 10 years)
- Detailed sea level statistics
- Flood Analysis Section
- Flood statistics summary
- Histogram of flood crest distributions
Finally, it created a separate README to explain the project.
Like Gemini’s script, Claude’s script also required some minor debugging (csv file names were incorrect, and one function had to be edited) but produced an HTML link in the terminal, which opened the HTML dashboard below.
As you can see, there were a couple callback errors and the time series plot didn’t show up correctly on the dashboard’s Sea Level Analysis Section.
ChatGPT
With the free version of the model, I found I was hitting my data analysis limit with my prompts and data uploads. It was able to output a couple snippets of code to check for statistical correlations in the data sets and to create a Python dashboard using Plotly Dash. I believe the code could run with some debugging, but it was clear it would take more effort compared to Gemini or Claude.
Results
I found that while these models are indeed powerful, the generated code was not always perfect and required debugging and tweaking. While the code generated by ChatGPT required further debugging to run, code generated by Gemini and Claude only took minor debugging to produce HTML dashboards.
While both Gemini and Claude provided detailed explanations of the generated code to help understand the underlying logic, Claude went a step further and produced both a combined data set and step-by-step instructions in a README file without being prompted.
Conclusion
AI code generation tools are significantly accelerating advanced data analysis. They empower individuals with limited coding skills to easily create data visualization dashboards, for the purpose of extracting meaningful insights from raw data. While I’ve written code to handle some data analysis and visualization tasks in the past, these new tools performed a detailed analysis and got a more advanced application up and running within a few minutes, with some minor debugging involved. While AI models are excellent assistants, they are not replacements for coding expertise and require some human oversight.
Next Steps
If you’re looking to take your data analysis skills even further, and have more advanced coding knowledge in your arsenal, consider exploring Windsurf or Cursor. Windsurf uses a more advanced coding environment which can take data visualizations and dashboards to the next level. If you are comfortable with VS code, Cursor allows you to use AI code generation directly within VS code, making the process of generating, debugging, and modifying code more efficient.
Are you interested in exploring how AI can help take your data analysis to the next level? Get in touch with us – we’d love to hear about your project.
