Python for Data Analysis: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(13 intermediate revisions by the same user not shown)
Line 14: Line 14:
The set of packages referred from this article focus on structured data, which includes tabular or spreadsheet-like data, in which each column may be a different type (relational database data, spreadsheets and CSV files), multidimensional arrays (matrices), multiple tables or related data joined by key columns, and evenly and unevenly spaced time series.
The set of packages referred from this article focus on structured data, which includes tabular or spreadsheet-like data, in which each column may be a different type (relational database data, spreadsheets and CSV files), multidimensional arrays (matrices), multiple tables or related data joined by key columns, and evenly and unevenly spaced time series.


Python is uniquely positioned for use in data analysis because of many specialized data processing libraries ([[Numpy]], [[Pandas]], [[scikit-learn]]), visualization libraries ([[matplotlib]], [[plotly]]) and other tools ([[Jupyter Notebook]], [[Jupyter Lab]]).
Python is uniquely positioned for use in data analysis because of the availability of many specialized data processing libraries:
* [[Numpy|NumPy]]
* [[Pandas|pandas]]
* [[scikit-learn]]
* [[SciPy]]
* [[statsmodels]]
* [[PyTorch]]
Visualization libraries are also available in Python:
* [[matplotlib]]
* [[plotly]]
Python programs can be executed from [[IPython]], [[Jupyter Notebook]] and [[Jupyter Lab]].


=C=
In addition to all these, Python's overall strength for general-purpose software engineering makes it a great glue language for data analysis applications.

Latest revision as of 23:50, 14 May 2024

External

Internal

Overview

This article is loosely based on Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition by Wes McKinney.

The set of packages referred from this article focus on structured data, which includes tabular or spreadsheet-like data, in which each column may be a different type (relational database data, spreadsheets and CSV files), multidimensional arrays (matrices), multiple tables or related data joined by key columns, and evenly and unevenly spaced time series.

Python is uniquely positioned for use in data analysis because of the availability of many specialized data processing libraries:

Visualization libraries are also available in Python:

Python programs can be executed from IPython, Jupyter Notebook and Jupyter Lab.

In addition to all these, Python's overall strength for general-purpose software engineering makes it a great glue language for data analysis applications.