Skip to content

Playwright #69

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/cicd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,9 @@ jobs:
- name: Install Pytest
run: pip install pytest pytest-mock

- name: Setup PlayWright
run: playwright install && playwright install-deps

- name: Run Pytest
run: pytest --no-header -vv

Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/generate-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,9 @@ jobs:
- name: Install Pytest
run: pip install pytest pytest-mock

- name: Setup PlayWright
run: playwright install && playwright install-deps

- name: Run Pytest
run: pytest --no-header -vv

Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/generate-test-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,9 @@ jobs:
- name: Install Pytest
run: pip install pytest pytest-mock

- name: Setup PlayWright
run: playwright install && playwright install-deps

- name: Run Pytest
run: pytest --no-header -vv

Expand Down
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ message: If you use this software, please cite it using these metadata.
title: PyPi Extractor
abstract: Extract package information for a given user in PyPi.
type: software
version: 0.1.2
date-released: 2024-06-26
version: 0.1.3
date-released: 2024-12-12
repository-code: https://github.com/DevelopersToolbox/pypi-extractor-package
keywords:
- "Wolf Software"
Expand Down
28 changes: 27 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,22 @@ PyPI Extractor is a Python package designed to fetch and process detailed inform
Python Package Index (PyPI). This package is particularly useful for users who want to retrieve and analyze metadata for packages
maintained by a specific PyPI user.

## Significant Update From 0.1.3

pypi.org no longer allow you to scrap details using the requests package, or any package that does not support JavaScript. To resolve this we have
updated this package to utilise [PlayWright](https://pypi.org/project/playwright/) when retrieving a list of packages for a given user. While we have
attempted to automate as much as possible you might want to do some of the work manually.

Playwright needs two commands to be run in order for it to function correctly:

```
playwright install
playwright install-deps
```

We have added an `auto_install` option to the main class so that you can instruct the package to do the install for you, this helps when installing the
package in a fully automated way, e.g. Puppet or similar.

## Features

- Retrieve a list of packages maintained by a specific PyPI user.
Expand Down Expand Up @@ -116,11 +132,13 @@ print(package_details)

A class to fetch and process package details for a given PyPI user.

##### `__init__(self, username: str)`
##### `__init__(self, username: str, verbose: bool, auto_install: bool)`

- Initializes the `PyPiExtractor` with a username.
- Parameters:
- `username` (str): The PyPI username.
- `verbose` (bool): Verbose output (Default: False)
- `auto_install` (bool): Auto install PlayWright dependencies (Default: False)
- Raises:
- `PyPiExtractorError`: If the username is not provided.

Expand All @@ -132,6 +150,14 @@ A class to fetch and process package details for a given PyPI user.
- Raises:
- `PyPiExtractorError`: If the username is not provided.

##### `enable_verbose(self)`

- Enable verbose mode.

##### `enable_auto_install(self)`

- Enable auto install.

##### `get_user_packages(self) -> list`

- Fetches the list of packages for the given PyPI user.
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
requests==2.32.3
beautifulsoup4==4.12.3
playwright==1.49.1
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

setup(
name='wolfsoftware.pypi-extractor',
version='0.1.2',
version='0.1.3',
author='Wolf Software',
author_email='[email protected]',
description='Extract package information for a given user in PyPi.',
Expand Down
149 changes: 72 additions & 77 deletions tests/testconf.py → tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,32 +16,56 @@
import requests


def raise_error(*args, **kwargs):
"""Raise an error if the real playwright gets used."""
raise RuntimeError("Real Playwright should not be invoked!")


@pytest.fixture
def mock_get_user_packages_success() -> Generator[Union[MagicMock, AsyncMock], Any, None]:
"""Fixture to mock requests.get for get_user_packages success case."""
with patch('requests.get') as mock_get:
mock_response = Mock()
mock_response.raise_for_status.return_value = None
mock_response.text = '''
<a class="package-snippet">
<h3 class="package-snippet__title">Package1</h3>
<p class="package-snippet__description">Description1</p>
</a>
<a class="package-snippet">
<h3 class="package-snippet__title">Package2</h3>
<p class="package-snippet__description">Description2</p>
</a>
'''
mock_get.return_value = mock_response
yield mock_get
def mock_playwright() -> Generator[MagicMock, None, None]:
"""Mock the Playwright sync API."""
with patch('wolfsoftware.pypi_extractor.pypi.sync_playwright') as mock_sync_playwright:
mock_playwright_instance = MagicMock()
mock_browser = MagicMock()
mock_context = MagicMock()
mock_page = MagicMock()

# Mock page.goto() and page.wait_for_selector()
mock_page.goto.return_value = None
mock_page.wait_for_selector.return_value = None

# Mock page.query_selector_all() to return simulated package elements
def mock_query_selector_all(selector):
"""Handle mocking the right data."""
if selector == 'a.package-snippet':
return [
MagicMock(query_selector=MagicMock(side_effect=[
MagicMock(inner_text=MagicMock(return_value="Package1")),
MagicMock(inner_text=MagicMock(return_value="Description1")),
])),
MagicMock(query_selector=MagicMock(side_effect=[
MagicMock(inner_text=MagicMock(return_value="Package2")),
MagicMock(inner_text=MagicMock(return_value="Description2")),
])),
]
return []
mock_page.query_selector_all.side_effect = mock_query_selector_all

mock_context.new_page.return_value = mock_page
mock_browser.new_context.return_value = mock_context
mock_playwright_instance.chromium.launch.return_value = mock_browser
mock_sync_playwright.return_value.__enter__.return_value = mock_playwright_instance
yield mock_sync_playwright


@pytest.fixture
def mock_get_user_packages_error() -> Generator[Union[MagicMock, AsyncMock], Any, None]:
"""Fixture to mock requests.get for get_user_packages error case."""
with patch('requests.get') as mock_get:
mock_get.side_effect = requests.RequestException("Request error")
yield mock_get
def mock_playwright_error() -> Generator[MagicMock, None, None]:
"""Fixture to mock Playwright with an error scenario."""
with patch('wolfsoftware.pypi_extractor.pypi.sync_playwright') as mock_sync_playwright:
mock_playwright_instance = MagicMock()
mock_playwright_instance.chromium.launch.side_effect = Exception("Playwright error")
mock_sync_playwright.return_value.__enter__.return_value = mock_playwright_instance
yield mock_sync_playwright


@pytest.fixture
Expand Down Expand Up @@ -155,24 +179,13 @@ def mock_get_package_details_error() -> Generator[Union[MagicMock, AsyncMock], A


@pytest.fixture
def mock_get_all_packages_details_success() -> Generator[Union[MagicMock, AsyncMock], Any, None]:
"""Fixture to mock requests.get for get_all_packages_details success case."""
def mock_get_all_packages_details_success() -> Generator[MagicMock, None, None]:
"""Mock requests.get for get_all_packages_details success case."""
with patch('requests.get') as mock_get:
mock_response_user = Mock()
# Mock response for the user packages API
mock_response_user = MagicMock()
mock_response_user.raise_for_status.return_value = None
mock_response_user.text = '''
<a class="package-snippet">
<h3 class="package-snippet__title">Package1</h3>
<p class="package-snippet__description">Description1</p>
</a>
<a class="package-snippet">
<h3 class="package-snippet__title">Package2</h3>
<p class="package-snippet__description">Description2</p>
</a>
'''
mock_response_package1 = Mock()
mock_response_package1.raise_for_status.return_value = None
mock_response_package1.json.return_value = {
mock_response_user.json.return_value = {
'info': {
'name': 'Package1',
'version': '1.0.0',
Expand All @@ -186,37 +199,30 @@ def mock_get_all_packages_details_success() -> Generator[Union[MagicMock, AsyncM
'requires_python': '>=3.6',
},
'releases': {
'0.9.0': [
{
'upload_time': '2021-01-01T00:00:00',
'upload_time_iso_8601': '2021-01-01T00:00:00Z',
'python_version': 'py3',
'url': 'https://example.com',
'filename': 'package-0.9.0.tar.gz',
'packagetype': 'sdist',
'md5_digest': 'abc123',
'digests': {'sha256': 'def456'},
'size': 12345
}
],
'1.0.0': [
{
'upload_time': '2021-06-01T00:00:00',
'upload_time_iso_8601': '2021-06-01T00:00:00Z',
'python_version': 'py3',
'url': 'https://example.com',
'url': 'https://example.com/package-1.0.0.tar.gz',
'filename': 'package-1.0.0.tar.gz',
'packagetype': 'sdist',
'md5_digest': 'ghi789',
'digests': {'sha256': 'jkl012'},
'size': 23456
'md5_digest': 'abc123',
'digests': {'sha256': 'def456'},
'size': 12345
}
],
]
},
'requires_dist': ['requests', 'beautifulsoup4'],
'urls': [{'url': 'https://example.com/package-1.0.0.tar.gz'}],
}
mock_response_package2 = Mock()

# Simulate two different package details responses
mock_response_package1 = MagicMock()
mock_response_package1.raise_for_status.return_value = None
mock_response_package1.json.return_value = mock_response_user.json.return_value

mock_response_package2 = MagicMock()
mock_response_package2.raise_for_status.return_value = None
mock_response_package2.json.return_value = {
'info': {
Expand All @@ -226,41 +232,30 @@ def mock_get_all_packages_details_success() -> Generator[Union[MagicMock, AsyncM
'author': 'Author2',
'author_email': '[email protected]',
'license': 'MIT',
'home_page': 'https://example.com',
'keywords': 'example, package',
'home_page': 'https://example.com/package2',
'keywords': 'example, package2',
'classifiers': ['Development Status :: 5 - Production/Stable'],
'requires_python': '>=3.6',
},
'releases': {
'1.0.0': [
{
'upload_time': '2021-01-01T00:00:00',
'upload_time_iso_8601': '2021-01-01T00:00:00Z',
'python_version': 'py3',
'url': 'https://example.com',
'filename': 'package-1.0.0.tar.gz',
'packagetype': 'sdist',
'md5_digest': 'abc123',
'digests': {'sha256': 'def456'},
'size': 12345
}
],
'2.0.0': [
{
'upload_time': '2021-06-01T00:00:00',
'upload_time_iso_8601': '2021-06-01T00:00:00Z',
'upload_time': '2022-06-01T00:00:00',
'upload_time_iso_8601': '2022-06-01T00:00:00Z',
'python_version': 'py3',
'url': 'https://example.com',
'url': 'https://example.com/package-2.0.0.tar.gz',
'filename': 'package-2.0.0.tar.gz',
'packagetype': 'sdist',
'md5_digest': 'ghi789',
'digests': {'sha256': 'jkl012'},
'size': 23456
}
],
]
},
'requires_dist': ['requests', 'beautifulsoup4'],
'urls': [{'url': 'https://example.com/package-2.0.0.tar.gz'}],
}
mock_get.side_effect = [mock_response_user, mock_response_package1, mock_response_package2]

# Simulate the sequence of requests
mock_get.side_effect = [mock_response_package1, mock_response_package2]
yield mock_get
Loading
Loading