Scrapy is an application framework supporting development of applications in a given environment. I discuss here the steps of installation of Scrapy both CentOS and Windows environments including installation of the dependencies thereof.
Scrapy Installation on Centos 6.5
Scrapy needs python 2.7 and above to run in CentOS . CentOS 6.5 comes with Python 2.6 .So we need to install python 2.7+ to run Scrapy code. Here are the steps to install Python 2.7.11. Firstly install the Scrapy dependencies, preceding installation of Python 2.7.11.
yum -y update yum groupinstall -y 'development tools' yum install -y zlib-devel bzip2-devel openssl-devel xz-libs wget python-devel libxml2-devel libxslt-devel pyOpenSSL gcc libffi libffi-devel sqlite-devel sqlite
Installing Python 2.7.11
wget https://www.python.org/ftp/python/2.7.11/Python-2.7.11.tgz tar -xvzf Python-2.7.11.tgz # Enter the directory: cd Python-2.7.11 # Run the configure: ./configure --prefix=/usr/local # compile and install it: make make altinstall # Checking Python version: [root@nicetry ~]# python2.7 -V Python 2.7.11 #Environment variables Add below line in .bashrc file and run source .bashrc export PATH="/usr/local/bin:$PATH"
Install setup tools
wget --no-check-certificate https://pypi.python.org/packages/source/s/setuptools/setuptools-1.4.2.tar.gz # Extract the files: tar -xvf setuptools-1.4.2.tar.gz cd setuptools-1.4.2 # Install setuptools using the Python 2.7.8: python2.7 setup.py install
Install PIP (a package management system used to install and manage software packages written in Python.)
curl https://bootstrap.pypa.io/get-pip.py | python2.7 - pip -V pip 8.1.1 from /usr/local/lib/python2.7/site-packages (python 2.7) [root@hivestore1-srv1 data-ingestion]#
Install Scrapy
CFLAGS="-O0" pip install lxml pip install scrapy
Once Scrapy installation is done check “scrapy shell <URL> ” from command line
Scrapy Installation on Windows
Python installation
Install Python 2.7.11 from https://www.python.org/downloads/
Update the PATH ; open Command prompt and run:
c:\python27\python.exe c:\python27\tools\scripts\win_add2path.py
Close the command prompt window so that changes become effective. Reopen command prompt and run the following command and to check the Python version:
python --version
pywin32 installation
Install pywin32 from http://sourceforge.net/projects/pywin32/
Be sure you download the architecture (win32 or amd64) that matches your system [even our system is 64 bit also we need to download 32 bit only (example: pywin32-219.win32-py2.7.exe)]
PIP Installation
Install pip from https://pip.pypa.io/en/latest/installing.html
Now open a Command prompt to check pip is installed correctly:
pip --version
LXML package installation
Install LXML package from below https://pypi.python.org/packages/2.7/l/lxml/lxml-3.5.0.win32-py2.7.exe#md5=3fb7a9fb71b7d0f53881291614bd323c
Install Visual C++ Compiler
Install “Microsoft Visual C++ Compiler for Python 2.7” from https://www.microsoft.com/en-in/download/details.aspx?id=44266
Scrapy installation
Ensure Python 2.7 and pip package manager working, then install Scrapy:
pip install Scrapy
pip install happybase
pip install requests
pip install OrderedDict
pip install html2text
Add PyDev perspective
Install Eclipse Juno or STS then add PyDev perspective to write python code
Once Scrapy installation is done check “scrapy shell <URL> ” from command line