Yandex denies hack, blames source code leak on former employee

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,763
Reputation
10,607
Daps
185,932


By

Bill Toulas


  • January 26, 2023
  • 09:44 AM

Yandex office building

A Yandex source code repository allegedly stolen by a former employee of the Russian technology company has been leaked as a Torrent on a popular hacking forum.
Yesterday, the leaker posted a magnet link that they claim are 'Yandex git sources' consisting of 44.7 GB of files stolen from the company in July 2022. These code repositories allegedly contain all of the company's source code besides anti-spam rules.

Yandex repository leaked on hacker forums
Yandex repository leaked on hacker forums (BleepingComputer)
Software engineer Arseniy Shestakov analyzed the leaked Yandex Git repository and said it contains technical data and code about the following products:
  • Yandex search engine and indexing bot
  • Yandex Maps
  • Alice (AI assistant)
  • Yandex Taxi
  • Yandex Direct (ads service)
  • Yandex Mail
  • Yandex Disk (cloud storage service)
  • Yandex Market
  • Yandex Travel (travel booking platform)
  • Yandex360 (workspaces service)
  • Yandex Cloud
  • Yandex Pay (payment processing service)
  • Yandex Metrika (internet analytics)
Shestakov also shared a directory listing of the leaked files on GitHub for those who want to see what source code was stolen.

"There are at least some API keys, but they are likely only been used for testing deployment only," said Shestakov about the leaked data.

In a statement to BleepingComputer, Yandex said their systems were not hacked, and a former employee leaked the source code repository.
"Yandex was not hacked. Our security service found code fragments from an internal repository in the public domain, but the content differs from the current version of the repository used in Yandex services.
A repository is a tool for storing and working with code. Code is used in this way internally by most companies.
Repositories are needed to work with code and are not intended for the storage of personal user data. We are conducting an internal investigation into the reasons for the release of source code fragments to the public, but we do not see any threat to user data or platform performance." - Yandex.

Exposure to hackers​

BleepingComputer also discussed the leak with Grigory Bakunov, a former senior systems administrator, deputy chief of development, and director of spreading technologies at Yandex. who is very familiar with the leaked code, having worked at the tech giant between 2002 and 2019.

Bakunov explained that the motive of the data leak was political, and the rogue Yandex employee responsible for the data leak had not tried to sell the code to competitors.

The former senior executive added that the leak does not contain any customer data, so it does not constitute a direct risk to the privacy or security of Yandex users, nor does it directly threaten to leak proprietary technology.


Yandex uses a monorepo structure called 'Arcadia,' but not all of the company's services use it. Also, even just to build a service, you need a lot of internal tools and special knowledge, as standard building procedures do not apply.
The leaked repository contains only code; the other important part is data. Key parts, like model weights for neural networks, etc., are absent, so it's almost useless.
Still, there are a lot of interesting files with names like "blacklist.txt" that could potentially expose working services.
However, Bakunov told BleepingComputer that the leaked code creates the potential for hackers to identify security gaps and create targeted exploits. Bakunov believes this is only a matter of time now.

The former executive also commented on Yandex's response, saying that the leaked code may not be identical to the current code used in the firm's working services but might be up to 90% similar.

Therefore, thoroughly examining the leaked code could yield possible weak points at Yandex for threat actors.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,763
Reputation
10,607
Daps
185,932



YANDEX SERVICES SOURCE CODE LEAK​

SHORT OVERVIEW OF BREACH CONTENTS

PUBLISHED THU, JAN 26, 2023 BY ARSENIY SHESTAKOV

Just a few hours ago I found mention on Twitter that proprietary source code of Russian giant Yandex been leaked on online community called BreachForums. In this post I’ll share results of my friend digging into said archives.
Important details about torrent:
  • It just content of repository without anything else.
  • All files are dated back to 24 February 2022.
  • It does not contain git history, mostly just code
  • No pre-built binaries for most of software with only few exceptions
  • There are no pre-trained ML models with some exceptions

Non-commercial announcement​

Please consider donating to Helping Hand for Ukraine Relief. This is small charity friend of mine Alexander Kubrak work for and it help civilians affected by Russian agression. Any amount you donate will be huge help for them.

Why is this big?​

Yandex is one of largest IT companies in Russia. Within country it provide wider range of services than Google. Imagine one company that replace Google, Uber, Amazon, Netflix and Spotify.

Is this leak real?​

I personally never worked at Yandex, but I know several people who worked there at different times or work there still. I verified that at least some of archives for sure contain modern source code for company services as well as documentation pointing to real intranet URLs.

What’s inside​

It looks like at least source code for all major services of Yandex been leaked:
  • Search Engine and Indexing Bot
  • Maps - Like Google Maps and Street View
  • Alice - AI assistant like Siri / Alexa
  • Taxi - Uber-like taxi service
  • Direct - Ads service like Google Ads / Adwords
  • Mail - Mail service like GMail
  • Disk - File storage service like Google drive
  • Market - Marketplace like Amazon
  • Travel - Like a Booking.com plus Airplane, Train and Bus tickets
  • Yandex360 - Like Google Workspaces for services on your own domain
  • Cloud - Probably not all infrastructure code was leaked.
  • Pay - Payment processing like Stripe, but with limited set of features
  • Metrika - Like Google Analytics
And at least backend part of majority of other company services is there. Largest archive called “frontend” is yet to be explored.

Full file list of files:​

If you dont want to download torrent, but curious of what’s inside you can get list of files from following gist:

You can also clone it like normnal git repository:
git clone



List of all files can be obtained with following commands.

Full list of files in torrent​

aapi.tar.bz2 client_method.tar.bz2 gencfg.tar.bz2 mobile-WARNING-notfull.tar.bz2.part skynet.tar.bz2
admins.tar.bz2 cloud.tar.bz2.part groups.tar.bz2 nginx.tar.bz2 smart_devices.tar.bz2.part
ads.tar.bz2 commerce.tar.bz2.part helpdesk.tar.bz2 noc.tar.bz2.part smarttv.tar.bz2
alice.tar.bz2.part config.tar.bz2 infra.tar.bz2 partner.tar.bz2 solomon.tar.bz2.part
analytics.tar.bz2.part connect.tar.bz2.part intranet.tar.bz2 passport.tar.bz2.part stocks.tar.bz2
antiadblock.tar.bz2 crm.tar.bz2.part investors.tar.bz2 pay.tar.bz2 switch.tar.bz2
antirobot.tar.bz2 crypta.tar.bz2 it-office.tar.bz2 payplatform.tar.bz2.part tasklet.tar.bz2
autocheck.tar.bz2 customer_service.tar.bz2 jupytercloud.tar.bz2 paysys.tar.bz2 taxi.tar.bz2.part
balancer.tar.bz2 datacloud.tar.bz2 kernel.tar.bz2.part portal.tar.bz2.part tools.tar.bz2
billing.tar.bz2 delivery.tar.bz2.part library.tar.bz2.part privacy_office.tar.bz2 travel.tar.bz2.part
bindings.tar.bz2 direct.tar.bz2.part load.tar.bz2.part products.tar.bz2 wmconsole.tar.bz2
captcha.tar.bz2 disk.tar.bz2 mail.tar.bz2.part robot.tar.bz2 yandex360.tar.bz2.part
cdn.tar.bz2 docs.tar.bz2 maps.tar.bz2.part rt-research.tar.bz2 yandex_io.tar.bz2.part
certs.tar.bz2 drive.tar.bz2.part maps_2.tar.bz2.part saas.tar.bz2 yaphone.tar.bz2
ci.tar.bz2.part extsearch.tar.bz2.part maps_adv.tar.bz2 sandbox.tar.bz2 yawe.tar.bz2
classifieds.tar.bz2.part frontend.tar.bz2.part market.tar.bz2.part search.tar.bz2
client_analytics.tar.bz2.part fuzzing.tar.bz2 metrika.tar.bz2.part security.tar.bz2

Security implications.​

Since this is leak only contain contents of git repositories there is no personal data. There are at least some API keys, but they are likely only been used for testing deployment only.


https://gist.github.com/ArseniyShestakov/53a80e3214601aa20d1075872a1ea989


 
Last edited:
Top