Tutorial - How to build your own LinkedIn Profile Scrapper in 2022
Hello techies,
Having entered into the mid-year of 2022, I have found a way to build a LinkedIn Profile Scrapper to help get user data and I want to share it with you all, hoping it might be of benefit to you.
Basically, this tutorial will help you out in building a tool that can fetch data of users that interest you on the LinkedIn platform.
Starting out, this is the breakdown of what this tutorial will cover.
- What is Linkedin?
- Have an Active Linkedin Account.
- Set Up your Code Workspace - Code Editor, Python programming language, Selenium Webdriver.
- Tutorial(coding).
Firstly, what is Linkedin?
LinkedIn is one of the largest professional network platforms where you can get job opportunities, connect and build professional relationships with your fellow colleagues in your field.
Have an Active Linkedin Account.
Here is the foremost and most important step, If you have one you can skip through else kindly visit linkedin.com and as of now here is the visual interface
Then go ahead and click the Join Now where there is a red mark and this will the display rendered, the SignUp page to create an account.
Kindly fill the form with your details, your email and password and submit. You should get a verification mail from LinkedIn, kindly follow the instruction given to verify your Linkedin account.
Setup your Code Workspace.
Firstly, a code editor is also pertinent and there are varieties of them but to save you time and undergoing stress, I recommend using Visual Studio Code(vscode), visit code.visualstudio.com and download the build compatible with your device operating system.
Secondly, as aforementioned we will be using the Python programming language. Visit the Python official website python.org, hover over the downloads that is marked in the image below and download the build that is compatible with your device.
Here are resources to install Python on different operating systems.
Mac Operating System
Windows Operating System
Linux Operating System(Ubuntu)
If you installed the Visual Studio Code Editor(vscode), here is also a resource on how to set up Python extensions for your code editor.
Lastly, we will need the Selenium Webdriver. In this tutorial, we will use the Selenium Webdriver to connect with the Chrome browser to use our Linkedin Profile Scrapper. Here is a tutorial to help you out with the installation youtube.com/watch?v=WnWQgUerR0c.
Tutorial(coding)
This is the last part of the tutorials and we'll start with writing of codes.
Firstly, we need to install the Python package that we'll be using, linkedin-scrapper
with the Package Installer for Python(pip). Pip is is used to install Python based packages and libraries.
pip install --user linkedin_scraper
I spoke about the Selenium Webdriver before now, so it is time we need to set the set the path.
export CHROMEDRIVER=~/chromedriver
Here we export into the CHROMEDRIVER variable the path of the Selenium Webdriver downloaded. To avoid errors at this point, it is advisable you create a folder for project and include both the Webdriver and Python file for easy path configuration.
from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()
Before writing this code, kindly make sure you have the linkedin-scrapper
package installed.
To check or confirm kindly run this on your terminal pip freeze
. This will help show all the Python packages you have installed on your device alphabetically, there you can search for the Linkedin Scrapper package for certainty.
We import some libraries/classes from both the Linkedin Scrapper and the Webdriver from Selenium.
A variable driver
is created where the Selenium Webdriver is defined.
A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.
email = "some-email@email.address"
password = "password123"
We earlier created a Linkedin account, now is the time to utilize it in the program(code).
An email
and password
variable is created and should be defined with validated Linkedin account details else error will be inevitable.
actions.login(driver, email, password)
person = Person("https://www.linkedin.com/in/olanrewaju-alaba/", driver=driver)
If you can recollect the actions
class was imported from Selenium. We are going to use this to login with our Linkedin account details using the Webdriver.
The Person
class installed is used to defined the profile of a particular Linkedin Profile by using the profile's url path.
Note:
if email and password isnt given, it'll prompt in your terminal.
The account used to log-in should have it's language set English to make sure everything works as expected
You might want to also get data from a Company Linkedin Profile.
from linkedin_scraper import Company
company = Company("https://ca.linkedin.com/company/google")
Instead of importing the Person class, you'll import the Company class or both. You'll use the Company
class to define a company variable using the url pattern in the code snippet above with the company name on LinkedIn.
Here is the final piece of code to get our Linkedin Profile Scrapper working.
person.scrape()
#or
company.scrape()
This code snippet above will get(scrape) the data of the specified Linkedin Profile of a person or company. After scrapping the data, the browser powered/engineered by the Webdriver will close but to continue in this process;
person.scrape(close_on_complete=False)
#or
company.scrape(close_on_complete=False)
By default close_on_complete
is set to True
, so it is important to set it to False
to the keep the browser on.
Our Linkedin Profile Scrapper is now perfectly built to fetch data, you can go ahead to test this program.
For those who are willing to learn more about the linkedin-scrapper
package, let move ahead to explore a little more.
Person
A Person object can be created with the following inputs:
Person(linkedin_url=None, name=None, about=[], experiences=[], educations=[], interests=[], accomplishments=[], company=None, job_title=None, driver=None, scrape=True)
linkedin_url
: This is the linkedin url of their profilename
: This is the name of the personabout
: This is the small paragraph about the personexperiences
: This is the past experiences they have. A list oflinkedin_scraper.scraper.Experience
educations
: This is the past educations they have. A list oflinkedin_scraper.scraper.Education
interests
: This is the interests they have. A list of linkedin_scraper.scraper.Interestaccomplishment
: This is the accomplishments they have. A list oflinkedin_scraper.scraper.Accomplishment
company
: This the most recent company or institution they have worked at.job_title
: This the most recent job title they have.Driver
This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.
For example
scrape
When this is True, the scraping happens automatically. To scrape afterwards, that can be run by the scrape() function from the Person object.scrape(close_on_complete=True)
This is the meat of the code, where execution of this function scrapes the profile. If close_on_complete is True (which it is by default), then the browser will close upon completion. If scraping of other profiles are desired, then you might want to set that to false so you can keep using the same driver.
Company
A Company object can be created with the following inputs:
Company(linkedin_url=None, name=None, about_us=None, website=None, headquarters=None, founded=None, company_type=None, company_size=None, specialties=None, showcase_pages=[], affiliated_companies=[], driver=None, scrape=True, get_employees=True)
linkedin_url
: This is the linkedin url of their profilename
: This is the name of the companyabout_us
: The description of the companywebsite
: The website of the companyheadquarters
: The headquarters location of the companyfounded
: When the company was foundedcompany_type
: The type of the companycompany_size
: How many people are employeed at the companyspecialties
: What the company specializes inshowcase_pages
: Pages that the company owns to showcase their productsaffiliated_companies
: Other companies that are affiliated with this oneget_employees
: Whether to get all the employees of company.Driver
This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.
Hi, this is where this tutorial will be put to a stop, I hope you learned a lot and had fun.
Thanks for reading through this article and I hope you found it useful, you can connect with me on;
Bye.