Do you want to summerise your dataset with just ONE line of python code instead of coding hours in a notebook??👀🤯🚀
Code includes descriptive statistics of the dataset, correlation plots, and missing values with a complete report
Explained with an example🔽
A 🧵↓
Let's take one small dataset of 'beers.csv'. the dataset contains the name of the beer brands and their ingredients as a features
Import pandas and read CSV file as usual
See the dataframe below↓
Now, You need to install a library called 'pandas-profiling' using pip or conda commands as per your default environment
See the code below↓
⚠Important note:
Check the version of pandas-profiling before going further, you must install the latest version, and also it has some dependencies which have to be installed
I have listed them here with their versions↓
Finally, import 'pandas_profiling' as 'pp' and 'escape' from 'markupsafe'
Use 'ProfileReport' method on dataframe and save file as '.html' or '.json' file. I have saved it as a '.html' file
(See the code below↓)
To see the report, you have to host it on your 'local host' server using the 'live share extension' of VS code
You have to install 'live share' on your vscode environment
Now, right-click on your saved '.html' file in your directory and click 'Open with Live Server'
Tadda... your pandas' profiling report is launched on your browser with just one line of code🚀
See how it looks like and its features ↓
End of the thread!
Now that you know the trick then, try it yourself and minimize the hours of closing for summary statistics
And, Don't forget to Like, RT, and follow me @avikumart_ for more such informative threads!👍📈