Data Science in Programmatic Advertising

July 8, 2020

An overview exploring the world of programmatic Advertising and how we utilize Data Science here at PubGenius. Written by our former intern, colleague, and friend Saloni Somaiya. You can check out her blog at

Sharing my first internship experience at pubGENIUS.Inc. Seattle, WA, USA

I am writing this blog to reflect my days this summer. It’s been a fruitful 4-month internship in Ads Team where I took my first step into advertising from a publisher’s side and applied data science techniques to optimize programmatic advertising revenue. Applying data-driven skills in the advertising field was really challenging and I got the opportunity to explore many new concepts.

It’s always exciting to share your work with people in the network as it helps to grow and that is why I decided to publish this blog covering all the aspects right from the start of this journey. The struggle was real as it was my first ever job experience in life. I had mixed emotions- scared, anxious as well as excited since my university is in Boston and I got this internship in a different state. I got to explore Seattle which is the hub of the IT industry!

About the company: 

pubGENIUS.Inc is a Seattle based ad-tech company with one simple goal — to generate MORE revenue for publishers.

My role:

— Developing algorithmic and predictive solutions to business problems and work with other engineers to integrate solutions
— Developing data visualization for various projects, both internal and external facing clients
— Responsible for developing, testing, and bringing into production of algorithms with a variety of very high-impact projects.

I had the coolest managers in the office- CTO, CEO, and COO. I reported directly to CTO on a daily basis which was pretty huge for me. All 3 were very friendly and I got to learn a lot from them. I was the first-ever intern working there so obviously they had a lot of expectations from me. I had to put in a great effort to get into professional life and understand the structure.

To begin with, my first few weeks went on research about this field and read articles, technical papers to understand the existing data science techniques used in the industry.

Below are some research topics to give a head start:

1. Study about programmatic advertising

Programmatic ad buying is automated buying and selling of ads and has changed the face of online advertising. This automation makes transactions efficient and more effective, streamlining the process and consolidating your digital advertising efforts in one technology platform.

Publishers get paid by running programmatic ads on their pages, and they expect digital marketing buyers to give higher prices for their ads.

Image for post

Advertising metrics and KPIs

  1. Acquisition — Traffic, Users, Sessions, Pageviews

  2. Behavior — Bounce rate, Retention, Engagement, Exposure CPI

  3. Conversions — Goal Conversion rate, Value per visitor, cost per acquisition

  4. Contextual — Domain, content quality, ad-unit size, position, layout

  5. Key metrics — Revenue, impressions, cost per click(CPC), cost per mile(CPM), click-through rate(CTR), Effective cost per mile(eCPM), Revenue per thousand impressions(RPM), Viewability, Fill rate, Ad requests, Return of Investment(ROI)

2. Header Bidding

It is an advanced programmatic technique similar to an auction wherein publishers offer inventory to multiple ad exchanges simultaneously before making calls to their ad servers. The idea is that by letting multiple demand sources bid on the same inventory at the same time, publishers increase their yield and make more money.
Key terms: Ad Exchange, Floor rate optimization, Prebid Revenue, Win rate

Image for post
We, as a publisher, are on the seller’s side in the online advertising market. Ad slots on publishers’ pages are available for advertisers (buyers) to bid via a real-time “online auction” system.

3. Advertising tools I used

  1. Google Ad Manager
  2. Google Analytics
  3. Assertive Yield

4. Search Engine Optimization (Keyword Search Volume)

Optimizing a website for search traffic is not a simple task. It usually takes a long time to see results from search engine optimization campaigns. Also, Google’s algorithm is very unpredictable. The more information we possess, the better results we’ll see from your SEO strategies. By using data science, we can get valuable insights into our website’s performance and answers to some of our SEO questions.

Image for post
Image for post

5. Exploring and scanning Online Ad Images

The more things we can detect, the more control we can give to our clients of what should be blocked. Deep-learning image processing, machine learning, and graph theory to investigate online advertising, to construct prediction models that can foresee an image ad’s success and also define which ads are most likely to be successful. There are AdTrace models to categorize ads in order to automatically block certain categories. The most important ones are NSFW classification and OCR text extraction.

Image for post
Image for post

The relevant ads are selected based on not only textual relevance but also visual similarity so that the ads yield contextual relevance to both the text in the Web page and the image content. The ad insertion positions are detected based on image saliency to minimize intrusiveness to the user.

6. Magic- The Gathering cards game

— finding combinations of cards that do really well together and create strategies to win. Recommending the player which card to be played and determine the probability of victory using the minimax algorithm, Multi-armed bandit, and Monte Carlo search.

Some of the business questions that were addressed:

  1. How can we use Amazon search historical data or API to predict the search volume for the same/related keywords in Amazon?
  2. Identify a segment of the userbase who is overwhelmingly unlikely to click on an ad
  3. Determine the correlation of viewability to CTR, fill rate, CPM and other key metrics
  4. Does Google AdWords optimize the CPM based on CTR?
  5. Why do some ad units outperform others?
  6. What’s the difference in performance for having 5 bidders to 10 bidders (advertisers) such as ad load speed, auction speed, fill rate, viewability?
  7. Graph out average viewability and how that relates to CPM
  8. Retrieving data from other bidders and investigating data from price buckets($0.01 increments)
  9. Prediction models: — Moving from reactive to predictive. So instead of “, price floor X did best last week, let’s use it this week”, it will be “we predict price floor Y will do best this week, even though X did best last week”
    — Predict Revenue by learning the historical data
  10. Revenue, ad requests, and impressions by traffic (PageViews)
  11. Figuring out the analytics architecture of the company since a lot of data is being collected on a daily basis at different platforms
  12. Perform A/B testing on ad size, device, location, audience, lazyload units, ad refresh, and more- the variant that gives higher conversions is the winning one, and that variant can help you optimize your site for better results. We gather both qualitative and quantitative user insights and use them to understand ways to increase revenue.
  13. Focus on campaign-specific factors and goals to measure success(competitor analysis)
Image for post
Image for post

Technologies and tools used by me to answer these questions:

I conducted a deep analysis for optimizing ads and understand different key factors to help the company identify interesting trends. I had implemented various models and dashboards but here I will be able to show you only a gist below:

  1. Machine learning– Ordinary Least square, Feature Selection, hypothesis test, correlation plots, Linear regression, Lasso regression, Random forest regressor, Convolutional neural network
Image for post
Comparison of actual and predicted values of my revenue prediction model

2. Data Analytics and BI Tools– Tableau, Google DataStudio, Salesforce Analytics, Trifacta, and Excel

Image for post
Image for post
CTR by device and ad prices(eCPM) by ad type


Research + Graphs+ Optimization Recommendation = Proposal

By the end of my internship, I wrapped up all my findings and delivered a detailed report to the team which contains more detailed information on the content introduced in this blog, plus instructions on how to conduct A/B testing to optimize programmatic ads revenue and other project ideas. It is an interesting problem for publishers to solve to optimize profit margins.

However it was indeed challenging for me to utilize my existing skills and explore in this field within a short span time because it requires statisticians/data scientists have sufficient domain knowledge in online advertising to propose a reasonable hypothesis, and this can only be acquired from work experience. On the other hand, no matter how efficient a model would solve the problem, adequate data is a must. Hence, publishers are supposed to start early to build up a data collection pipeline to collect data, especially for ad scanning models.

I really had a great time working here as I got a lot of opportunities to experience all the aspects of behind the scenes of advertising. I also learned how to communicate results with stakeholders efficiently.

Moving on, I am looking for full-time job opportunities in this field.

Thank you for reading this post!

Don’t hesitate to reach out to see how PubGenius can help out your own site!