How to obtain sheet names from XLS files without loading the whole file?

xlrd
get all sheet names in excel python pandas
get sheet name in excel python openpyxl
excel file sheet names
pandas excel get list of sheet names
nameerror: name sheet is not defined
pandas read excel multiple sheets
python list sheets in excel file

I'm currently using pandas to read an Excel file and present its sheet names to the user, so he can select which sheet he would like to use. The problem is that the files are really big (70 columns x 65k rows), taking up to 14s to load on a notebook (the same data in a CSV file is taking 3s).

My code in panda goes like this:

xls = pandas.ExcelFile(path)
sheets = xls.sheet_names

I tried xlrd before, but obtained similar results. This was my code with xlrd:

xls = xlrd.open_workbook(path)
sheets = xls.sheet_names

So, can anybody suggest a faster way to retrieve the sheet names from an Excel file than reading the whole file?


you can use the xlrd library and open the workbook with the "on_demand=True" flag, so that the sheets won't be loaded automaticaly.

Than you can retrieve the sheet names in a similar way to pandas:

import xlrd
xls = xlrd.open_workbook(r'<path_to_your_excel_file>', on_demand=True)
print xls.sheet_names() # <- remeber: xlrd sheet_names is a function, not a property

Pandas: Looking up the list of sheets in an excel file, I'm currently using pandas to read an Excel file and present its sheet names to the user, so he can select which sheet he would like to use. The problem is that  Editing is an optional step if you want more information on the files. Otherwise you can Load the query without editing. Remove the Content Column. If all you’re looking for is the list of file names from the folders, then you don’t need this column. This column can be used to import data from multiple files in multiple folders.


I have tried xlrd, pandas, openpyxl and other such libraries and all of them seem to take exponential time as the file size increase as it reads the entire file. The other solutions mentioned above where they used 'on_demand' did not work for me. The following function works for xlsx files.

def get_sheet_details(file_path):
    sheets = []
    file_name = os.path.splitext(os.path.split(file_path)[-1])[0]
    # Make a temporary directory with the file name
    directory_to_extract_to = os.path.join(settings.MEDIA_ROOT, file_name)
    os.mkdir(directory_to_extract_to)

    # Extract the xlsx file as it is just a zip file
    zip_ref = zipfile.ZipFile(file_path, 'r')
    zip_ref.extractall(directory_to_extract_to)
    zip_ref.close()

    # Open the workbook.xml which is very light and only has meta data, get sheets from it
    path_to_workbook = os.path.join(directory_to_extract_to, 'xl', 'workbook.xml')
    with open(path_to_workbook, 'r') as f:
        xml = f.read()
        dictionary = xmltodict.parse(xml)
        for sheet in dictionary['workbook']['sheets']['sheet']:
            sheet_details = {
                'id': sheet['sheetId'], # can be @sheetId for some versions
                'name': sheet['name'] # can be @name
            }
            sheets.append(sheet_details)

    # Delete the extracted files directory
    shutil.rmtree(directory_to_extract_to)
    return sheets

Since all xlsx are basically zipped files, we extract the underlying xml data and read sheet names from the workbook directly which takes a fraction of a second as compared to the library functions.

Benchmarking: (On a 6mb xlsx file with 4 sheets) Pandas, xlrd: 12 seconds openpyxl: 24 seconds Proposed method: 0.4 seconds

How to obtain sheet names from XLS files without loading the whole , I'm currently using pandas to read an Excel file and present its sheet names to the user, so he can select which sheet he would like to use. In any cell, enter the folder address of the folder from which you want to list the file names. In the cell where you want the list, enter the following formula (I am entering it in cell A3): =IFERROR (INDEX (GetFileNames ($A$1),ROW ()-2),"") Copy and paste the formula in the cells below to get a list of all the files.


By combining @Dhwanil shah's answer with the answer here I wrote code that is also compatible with xlsx files that have only one sheet:

def get_sheet_ids(file_path):
sheet_names = []
with zipfile.ZipFile(file_path, 'r') as zip_ref:
    xml = zip_ref.open(r'xl/workbook.xml').read()
    dictionary = xmltodict.parse(xml)

    if not isinstance(dictionary['workbook']['sheets']['sheet'], list):
        sheet_names.append(dictionary['workbook']['sheets']['sheet']['@name'])
    else:
        for sheet in dictionary['workbook']['sheets']['sheet']:
            sheet_names.append(sheet['@name'])
return sheet_names

Reading some excel data without loading the whole excel file , I am using the library to display the name of the sheets of an excel, and also to display 1) throw new Error('Cannot use multiple files'); when reading a big file, I can still list the sheets names, without actually having to load all the data? And then obtain the data from this._spread (GC.Spread.Sheets. How to load Excel File Name and Sheet Name with Data to SQL Server Table by using SSIS Package- Script Task C# Scripting Language As you can see that data is loaded from two Excel files. First Excel file had two sheets and data is loaded from both.


you can also use

data=pd.read_excel('demanddata.xlsx',sheet_name='oil&gas')
print(data)   

Here demanddata is the name of your file oil&gas is one of your sheet name.Let there may be n number of sheet in your worksheet.Just Give the Name of the sheet which you like to fetch at Sheet_name="Name of Your required sheet"

(Tutorial) Python Excel: The Definitive Guide, If you do not have Python installed on your system, then feel free to check out this tutorial. Since you load and read the files with .csv or .xlsx file format in Pandas, Get a sheet by name sheet = wb['Sheet1'] # Print the sheet title use to retrieve or filter only a specific sheet and not the whole workbook. List all file names from a folder and sub-folders into a worksheet with Kutools for Excel. List all file names from a folder into worksheet by using a web browser If you want to list files in a folder by using the web browser, you should make sure you have one of the web browsers ( Firefox, Opera and Google Chrome ) installed in your computer.


How To Get All Sheet Names From All Workbooks In A Folder, Get a list of all the sheet names in a workbook with 100+ sheets in it. In my case, I only have xlxs files in the folders so it's not a vital step to get the query working now. We may also want to allow for other Excel file extensions like .xls, Now it's ready to close and load this query and we have the folder  In-addition, I have another excel file, which would automatically get the figures from the source file, without opening it. I do not want to copy data manually from the source to the destination. This procedure would spare me from entering the figures repeatedly on multiple files, reducing possible errors, duplication etc.


Python Language Knowledge Base, How to obtain sheet names from XLS files without loading the whole file? I'm currently using pandas to read an Excel file and present its sheet names to the user  The following code allows you to read in data from each page of an Excel workbook into a list of data frames in R. Then the code will run a function to clean the data in each of those data frames. Lastly, the data frames are joined together into one data frame for analysis.


In R, is there any way to read data from the second, third, sheets of , if you are actually reading in excel files, here is the command to read different as a number but i think you may also be able to indicate the sheet name. If you have multiple sheets then it would not be a csv file, but an excel file. I found it incredibly slow to load data in Excel format and .csv very fast. #access library. The following function works for xlsx files. def get_sheet_details(file_path): sheets = [] file_name = os.path.splitext(os.path.split(file_path)[-1])[0] # Make a temporary directory with the file name directory_to_extract_to = os.path.join(settings.MEDIA_ROOT, file_name) os.mkdir(directory_to_extract_to) # Extract the xlsx file as it is just a zip file zip_ref = zipfile.ZipFile(file_path, 'r') zip_ref.extractall(directory_to_extract_to) zip_ref.close() # Open the workbook.xml which is very