Foreign bank annual reports

There a lot of interest in scraping the SEC website for data contained in filings, for trading or for general purpose data analysis. There is a popular Kaggle notebook from a few years ago Scraping and Analysing SEC Filings - Part 1 that walked through accessing, cleaning and analyzing data from the SEC website in order to run text analytics. I thought I would reimplement the notebook using edgartools just to see how much easier it is today.

The problem was stated as How can we quickly search through the SEC's EDGAR Database to extract insights about companies we are interested in, without the need for manual labour?

The process listed in the notebook was

  1. Start with a industry code (SIC code).
  2. Scrape company codes (CIK code) belonging to the SIC.
  3. Using these CIK codes, scrape information relating to each company's filings (denoted by accession number) within a specified time period such as the company name, date of filing, the form type and the link taking us to the filing. Add all this information into a master dictionary.
  4. For each filing, use the link taking us to the filings to scrape the text. Normalise the text from these scrapings so that they are human readable and therefore able to be analysed. The normalized text is then added into the master dictionary.

The edgartools approach

First of all we will not be scraping. edgartools uses the approved SEC data apis to get filings and automatically parses and presents the data for you in the most convenient form.

Getting foreign bank filings

The Kaggle notebook focuses on Form 20-F which is the equivalent of 10-K for foreign private issuers. In our code we get 20-F filings for 2024. Then we create a list of ciks for banks, based on the SIC value of 6029. Finally, we filter the filings to get only 20-F filings for foreign banks.

# Get Form 20-F filings in 2024
filings = get_filings(form="20-F", year=2024)

# Get the ciks for companies in banking (sic 6029)
bank_ciks = [f.cik for 
             f in tqdm(filings) 
             if Company(f.cik).sic == "6029"]

# Filter filings by ciks
bank_filings = filings.filter(cik=bank_ciks)
bank_filings

-

To view a filing you can simply select it by index. Then you can view the text using .view() or get the text for further processing using text()

Conclusion

The edgartools approach is a lot simpler than the Kaggle notebook. Scraping the SEC is a thing of the past since the library does a lot of work for you behind a very clean API.

There is a notebook version of this article here.

About edgartools

edgartools is the most powerful way to navigate SEC filings in Python. It is also the easiest. 

pip install edgartools

If you like it please leave a star on Github