Azure SDK is over 500 MB and growing on each release
The azure SDK is ridiculously large for reasons that I have a hard time understanding. We pip install it for our CI pipelines and the vast majority of the size of our container is coming from the Azure SDK, in the SDK the network directory is taking almost half of the size and this is because there are 39 versions of the SDK. I have never seen anyone doing such a strange approach to version their API clients. I fail to understand why anyone would even want to use the client from 2015 on a cloud product like Azure.
Can the default release only prove the latest version of the client libraries, or at least provide a 'lean' version of the SDK? This release model is certainly not sustainable and is causing useless grief to your users. |
The text was updated successfully, but these errors were encountered: |
All reactions
- 👍 23 reactions
ghost added needs-triage
Workflow: This is a new issue that needs to be triaged to the appropriate team. Issues that are reported by GitHub users external to the Azure organization. The issue doesn't require a change to the product in order to be resolved. Most issues start as thatlabels
Apr 5, 2021Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @aznetsuppgithub.
Issue Details
The azure SDK is ridiculously large for reasons that I have a hard time understanding. We pip install it for our CI pipelines and the vast majority of the size of our container is coming from the Azure SDK, in the SDK the network directory is taking almost half of the size and this is because there are 39 versions of the SDK.
I have never seen anyone doing such a strange approach to version their API clients. I fail to understand why anyone would even want to use the client from 2015 on a cloud product like Azure.
root@1bba10bd1500:~/.pyenv/versions/3.9.2/lib/python3.9/site-packages/azure/mgmt/network# du -shc * | grep M | sort -n
1.2M aio
2.4M v2015_06_15
3.3M v2016_09_01
3.5M v2016_12_01
3.7M v2017_03_01
4.4M v2017_06_01
4.4M v2017_08_01
4.9M v2017_09_01
5.1M v2017_10_01
5.1M v2017_11_01
5.1M v2018_01_01
5.7M v2018_02_01
6.5M v2018_04_01
6.6M v2018_06_01
6.9M v2018_07_01
8.3M v2018_08_01
8.4M v2018_10_01
8.6M v2018_11_01
8.8M v2018_12_01
9.0M v2019_02_01
9.5M v2019_04_01
10M v2019_06_01
11M v2019_07_01
11M v2019_08_01
11M v2019_09_01
11M v2019_11_01
11M v2019_12_01
12M v2020_03_01
12M v2020_04_01
13M v2020_05_01
13M v2020_06_01
13M v2020_07_01
13M v2020_08_01
259M total
Can the default release only prove the latest version of the client libraries, or at least provide a 'lean' version of the SDK? This release model is certainly not sustainable and is causing useless grief to your users.
Author: | sodul |
---|---|
Assignees: | - |
Labels: |
|
Milestone: | - |
Hi @sodul, thanks for the feedback, we'll investigate asap.
Previously reported in #11149.
To clarify #11149 is only about azure-mgmt-network which is the largest directory but the problem is present across the entire Azure SDK.
I understand the reasoning for the approach to keep everything for backward compatibility but if you do have customers that point to the old versions then they should pin their requirement versions to the old pypi.org releases of the Azure SDK, not force everyone to keep a copy of everything around. How about providing two versions of the SDKs: one large with everything, one small with just the latest version.
I wrote a script that we run after pip install
. It detects the unused versions and this got us an azure folder shrink from ~ 680MB to ~ 280MB. It cannot go any lower because for some reason some of the objects model definitions from multiple versions are merged together to make the final list that is then used. The script detects the versions that are used internally by the SDK and preserves them, making the script very safe to use.
If there is interest I can open source the script.
We have released our script on GitHub. It does delete a good chunk of the API folders but not all of it. With the script the Azure directory is now just under 300MB instead of over 700MB. It is compatible with most, but not all, third party packages, as long as they do not point to a version that is trimmed.
@kristapratico Following up to see if there is any update on this issue? - Thank you
@KranthiPakala-MSFT we are working on this, and there is ongoing discussion on the issue to be sure we consider all possible impact of any decisions, and nobody would be broken by it.
@lmazuel I think one old proposal that won't break anything is to release separate azure-sdk-slim
with only latest APIs (that are used by default) and possibly do something with comments (iirc, removing comments reduces the size by 30%)
Removing non latest APIs, will remove about 60% of the disk space needed. A further design issues is that some of the API definitions import prior APIs in order to have a complete set of objects. I have no idea why these API definitions where designed this way but it is definitely not very good. I did not think of the idea of stripping comments, which means that we could probably extend azure-sdk-trim
to remove comments and other useless whitespace. There is probably a tool that 'compresses' python that we could run. Of course we would not want to remove docstrings, they do help.
@sodul Yeah, agreed. So far I saw only keyvault being broken by your tool (which should be fixed soon I guess #21623).
I think there are actually 2 scenarios we're talking about.. Development - I agree, comments & doc strings are useful.
However, building production image - docstrings are unnecessary.. The only trick there is - need to preserve number of empty lines as a replacement for a docstring comment to get same line numbers with exceptions.
Hi @sodul. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve
” to remove the “issue-addressed” label and continue the conversation.
For others we have now published the azure-sdk-trim tool to pypi.org to make it installable with a simple pip install azure-sdk-trim
.
https://pypi.org/project/azure-sdk-trim/
This tool is NOT affiliated with Microsoft or the Azure SDK maintainers.
With azure-cli==2.59.0 the trimming still helps a lot:
> azure-sdk-trim
/home/user/.pyenv/versions/3.12.3/lib/python3.12/site-packages/azure is using 1.2 GB.
Detected az cli with 39 SDKs to keep.
/home/user/.pyenv/versions/3.12.3/lib/python3.12/site-packages/azure is now using 607.5 MB.
Hi @sodul, we deeply appreciate your input into this project. Regrettably, this issue has remained unresolved for over 2 years and inactive for 30 days, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.
I've noticed a significant improvement with the more recent releases of the SDK. The space used has been pretty much halved from 1.2GB to 600MB and with azure-sdk-trim we went from 600MB to 300MB.
+ azure-sdk-trim
.pyenv/versions/3.12.3/lib/python3.12/site-packages/azure is using 606.9 MB.
Detected az cli with 39 SDKs to keep.
.pyenv/versions/3.12.3/lib/python3.12/site-packages/azure is now using 305.7 MB.
Saved 301.2 MB.
This was with azure-cli==2.60.0
.
Amazing, it's still an ongoing process but the sdk and the cli team have both been working on reducing the package size, glad that you're able to see the difference!
Hi @sodul, we deeply appreciate your input into this project. Regrettably, this issue has remained unresolved for over 2 years and inactive for 30 days, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.
The latest SDK releases are back to 1.2GB somehow.
Output from running https://github.com/clumio-code/azure-sdk-trim:
/Users/stephane/.pyenv/versions/3.12.6/lib/python3.12/site-packages/azure is using 1.2 GB.
Detected az cli with 39 SDKs to keep.
/Users/stephane/.pyenv/versions/3.12.6/lib/python3.12/site-packages/azure is now using 603.3 MB.
Saved 622.8 MB.
@msyyc do you know why the size would bump x 2?
@msyyc do you know why the size would bump x 2?
I think it is still related with some multiapi packages (e.g azure-mgmt-network/web/containerservice). These packages are updated frequently with more new api-version so the size increases more.
@msyyc @iscai-msft is there a hard limit on the size where it will be deemed unacceptable and be made a blocker for new releases? Is it 1.5GB, 2GB, 5GB, 10GB? Unless there is some drastic changes with the current SDK model these sizes will be reached.
I can't see this path to be sustainable, especially in the modern container based world.
Version 2.64.0 installed on debian takes 1.9G with all __pycache__
dirs. Removing them reduces the size by 2 to 980M.