SPPD: Spanish Public Procurement Data - First step
Written on March 17th, 2025 by Alvaro Carranza
Last year, before joining eDreams as an analyst, I started a personal project called SPPD (Spanish Public Procurement Data). I recognized the need for access to aggregated procurement data but found the available options limited—understandably so—particularly when it came to accessing data from specific companies, sectors, or contractors. So, I decided to build my own solution.
In a previous entry of this blog (drier than I would have liked—though I did and learned a lot last year!) covers the first version of a script to parse XML/ATOM files, which is the format this data is currently available in, into parquet files.
Long story short, I decided to start from scratch last week and work on a repository to build Python tools to: 1) Download and parse this data, 2) Build a database with it and 3) Create a Streamlit app to interact with the data.
I recently finished the first step! Inside this repository, there’s a module named dl_parser that allows users to download data from any available period (previous years, year-to-date, or specific months of the current year), and parse it into a Parquet file.
This project has helped me revisit some of the fundamentals of programming: testing, project organization, CI/CD (with GitHub Actions), and code quality, among others. The current codebase has plenty of room for improvement (mainly optimizing parsing speed), but I’m excited to keep building on it.
The project’s README has more information about its current state. Of course, I’m happy to answer any questions anyone might have about it!
Hopefully, the next blog entry will be about the database.