3 minutes
Loading Dataset into Google Collaboration using different way
Google collab provides an ubiquitous platform for all of us, with different hardware selection (CPU, GPU and TPU) band 12 GB of RAM (25 GB if your session crashes. I sometimes willingly crash the collab with some sample of code so that I will get 25 GB of ram for that particular session). At least in the case of training any model, I, personally prefer google collab to build and train my model. Only thing I found a little glitch in was loading your file to collab. These files will be uploaded for single session only. There are various ways to load your file (dataset). Three ways are described below.
# A. From GitHub (allows only < 25MB of file)
This must be the easiest way to get file from github. Sometimes you may be working with dataset available in github. Then, you can follow following steps.
Lets see how we can read files like json, pandas, html as pandas dataframe.
For pandas lower than version 0.19.0
- First go to the Dataset available in repository in github and click on raw file button on top left corner.
- Then, copy the raw file link.
- From pandas 0.19.2 and above you can directly pass url of csv/json/html into read_csv/read_json/read_html module and read the dataset as pandas dataframe.
For pandas less than version 19.02, we can follow following way
Reading CSV
url="url"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))
Reading json
url="url_for_json"
s=requests.get(url).content
c=pd.read_csv(s)
If you want to read files alternate way could be to download file and read the file then.
!wget <url>
And then read the files as per your requirements.
# B. From a local drive
If you want to upload files to collab session from your local drive then follow following procedure.
from google.colab import files
uploaded = files.upload()
This code will direct you towards a choose file windows and you can browse the file to be uploaded. After uploading file, you can list out your collab files with command:
!ls
Now, read file as per your requirements.
if its csv file, read using pandas, as mentioned above.
# C. Reading From Pydrive
Method I
This method requires you to get shareable link of your file. First, got to your file in drive and get link which will look like this:
https://drive.google.com/open?id=<id>
Now get the id from above url and put it in the following curl command.
!curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=<id>" > /dev/null
!curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=<id>" -o <>filename>
This will download the file from drive to your collab.
Method II
This has a bit lengthy procedure to be followed. However
- To read files from drive to collab, first install pydrive.
!pip install -U -q PyDrive
PyDrive is a wrapper library of google-api-python-client that simplifies many common Google Drive API tasks.
- Next, import some of the packages required.
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
- Now authenticate user and create google drive client.
auth.authenticate_user()
This code will prompt towards the link that google use to authenticate user to use the drive. It will ask if google cloud sdk could access google account which click okay and a verification code will be provided, which you should copy and paste under the input box shown in the prompt.
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
-
Now, got to the file you want to upload and get shareable link. Link will contain a param called id, get the id.
-
Finally, to get file from drive.
downloaded = drive.CreateFile({'id':id})
downloaded.GetContentFile('Filename.csv')
- At this point if you list out files in your collab session “!ls” , you can see your file.
- Now, read the file as per your file type or if its json, html, csv or any other format pandas will read directly, do it.
References
[1] https://towardsdatascience.com/3-ways-to-load-csv-files-into-colab-7c14fcbdcb92