To start our Data Science journey, we need to select the appropriate hardware and Operating System (1) to manage it as well as the software on the computer. Then, we are offered with two ways to interact with the computer commands, a Command-Line Interface (2) or a Graphical User Interface (3).
1. Operating Systems (OS)
Before all, we recommend the computer to be equiped with a multi-core processor, high-speed RAM, and a powerful graphics card to run your models and visualisation. Only then we need to select our operating system (OS). The OS manages both the hardware and software on the computer. It performs basic tasks such as controlling peripheral devices such as disk drives and printers, handling input and output, and managing files, memory and processes. There are mainly three popular operating systems for Data Science: Apple macOS, Microsoft Windows and Linux Operating System. They all have their pros and cons.
Operating Systems | Pros | Cons |
---|---|---|
Apple macOS | Hardware is high-quality. Graphical User Interface (GUI) is user-friendly. Data science environment is easy to set up. | Only compatible with Apple hardware. GPU support for deep learning is more limited than Linux. Some software packages and libraries may not perform as well as on Linux. |
Microsoft Windows | Widely used. Many OS dedicated installation package managers, data science tools and softwares. Integrates well with Microsoft Azure for cloud computing. | May require more resources (CPU, RAM) to run certain data science tasks efficiently compared to Linux. Some Python packages and libraries may have issues or may not be as up-to-date as on Linux. |
Linux Operating System | Can configure an environment tailored to your data science needs. Robust package managers that simplify software installation and management. Offers better GPU support for deep learning frameworks. Mostly open source and free to use. Full Unix compatibility. | Learning curve is steeper. Driver support can be less reliable than macOS or Windows for certain graphics cards and peripherals. Some proprietary software may not have Linux versions. |
2. Command-Line Interfaces (CLI)
The Command-Line Interface provides users with a text-based interface where commands are typed in as text commands, and the computer responds with text output. It requires the user to know specific commands and syntax to interact with the computer. We can navigate our directories, execute scripts, or download softwares and librairies via a CLI. Even though we need to memorize complex command line syntax to access our repositories, learning them can help us become more proficient with programming tools and languages that use this interface. As a result, CLI is typically used by experienced users and system administrators to perform complex tasks and automate repetitive tasks. Windows has two operating systems which CLI might appear similar at first glance: PowerShell and Command Prompt.
Terminal Window or Command Prompt. Command Prompt is an older and simpler CLI that has been part of the OS since their inception. Command Prompt is more limited than PowerShell and doesn’t offer the same level of flexibility and customization. It uses a more basic scripting language and lacks some of the advanced features that PowerShell provides.
PowerShell. PowerShell is a more modern and powerful CLI. It is designed to be a more robust and extensible shell than Command Prompt. PowerShell includes a powerful scripting language and allows for more complex and advanced operations than Command Prompt. PowerShell is object-oriented and utilizes .NET Framework objects, making it highly customizable and flexible.
3. Graphical User Interfaces (GUI)
A Graphical User Interface, on the other hand, provides users with a visual interface that allows users to interact with computer programs using icons, menus, and other graphical elements. GUI is widely used in modern Operating Systems. It’s a more intuitive and user-friendly interface that enables users to perform tasks without having to remember complex commands or syntax.
Explore more
Check my post that introduces the full stack: Baking Up The Ultimate Data Science Tech Stack