Compare commits

..

41 Commits

Author SHA1 Message Date
5c3dace937 tag page download #40 2019-01-15 21:12:20 +08:00
b2d622f11a fix tag download issue #40 2019-01-15 21:09:24 +08:00
0c8264bcc6 fix download issues 2019-01-15 20:43:00 +08:00
a6074242fb nhentai suspended api #40 2019-01-15 20:29:10 +08:00
eb6df28fba 0.2.19 2018-12-30 14:13:27 +08:00
1091ea3e0a remove debug 2018-12-30 14:12:38 +08:00
0df51c83e5 change output filename 2018-12-30 14:06:15 +08:00
c5fa98ebd1 Update .travis.yml 2018-11-04 21:44:59 +08:00
3154a94c3d 0.2.18 2018-10-24 22:21:29 +08:00
c47018251f fix #27 2018-10-24 22:20:33 +08:00
74d0499092 add test 2018-10-24 22:07:43 +08:00
7e56d9b901 fix #33 2018-10-24 22:06:49 +08:00
8cbb334d36 fix #31 2018-10-24 21:56:21 +08:00
db6d45efe0 fix bug #34 2018-10-19 10:55:21 +08:00
d412794bce Merge pull request #32 from violetdarkness/patch-1
requirement.txt missing new line
2018-10-08 23:36:38 +08:00
8eedbf077b requirement.txt missing new line
I got error when installing and find this requirement.txt missing newline
2018-10-08 21:13:52 +07:00
c95ecdded4 remove gdb 2018-10-01 15:04:32 +08:00
489e8bf0f4 fix #29 0.2.16 2018-10-01 15:02:04 +08:00
86c31f9b5e Merge pull request #28 from tbinavsl/master
Max retries + misc. language fixes
2018-09-28 13:28:44 +08:00
6f20405f47 adding gif support and fixing yet another english typo 2018-09-09 23:38:30 +02:00
c0143548d1 reverted partially by mistake the max_page commit; also added retries on other features 2018-09-09 22:24:34 +02:00
114c364f03 oops 2018-09-09 21:42:03 +02:00
af26482b6d Max retries + misc. language fixes 2018-09-09 21:33:50 +02:00
b8ea917db2 max page #26 2018-08-24 23:55:34 +08:00
963f4d9ddf fix 2018-08-12 23:22:30 +08:00
ef36e012ce fix unicode error on windows / python2 2018-08-12 23:11:01 +08:00
16e8ce6f45 0.2.15 2018-08-12 22:48:26 +08:00
0632826827 download by tagname #15 2018-08-12 22:43:36 +08:00
8d2cd1974b fix unicodeerror on python3 2018-08-12 18:04:36 +08:00
8c176cd2ad Update README.md 2018-08-11 09:47:32 +08:00
f2c88e8ade Update README.md 2018-08-11 09:46:46 +08:00
2300744c5c Update README.md 2018-08-11 09:46:04 +08:00
7f30c84eff Update README.md 2018-08-11 09:45:04 +08:00
dda849b770 remove python3.7 2018-08-11 09:32:35 +08:00
14b3c82248 remove \r 2018-08-11 09:28:39 +08:00
4577e9df9a fix 2018-08-11 09:24:16 +08:00
de157ccb7f Merge branch 'master' of github.com:RicterZ/nhentai 2018-08-11 09:19:31 +08:00
126bbe8d49 add a test 2018-08-11 09:18:00 +08:00
8546b9e759 fix bug #24 2018-08-11 09:17:05 +08:00
6ff9751c30 fix 2018-07-01 12:50:37 +08:00
f316c3243b 0.2.12 2018-04-19 17:29:23 +08:00
16 changed files with 545 additions and 318 deletions

View File

@ -4,10 +4,9 @@ os:
language: python language: python
python: python:
- 2.7 - 2.7
- 2.6 - 3.6
- 3.3 - 3.5
- 3.4 - 3.4
- 3.5.2
install: install:
- python setup.py install - python setup.py install
@ -15,3 +14,6 @@ install:
script: script:
- NHENTAI=https://nhentai.net nhentai --search umaru - NHENTAI=https://nhentai.net nhentai --search umaru
- NHENTAI=https://nhentai.net nhentai --id=152503,146134 -t 10 --output=/tmp/ - NHENTAI=https://nhentai.net nhentai --id=152503,146134 -t 10 --output=/tmp/
- NHENTAI=https://nhentai.net nhentai -l nhentai_test:nhentai --download --output=/tmp/
- NHENTAI=https://nhentai.net nhentai --tag lolicon
- NHENTAI=https://nhentai.net nhentai --id 92066 --output=/tmp/ --cbz

View File

@ -7,11 +7,10 @@ nhentai
|_| |_|_| |_|\___|_| |_|\__\__,_|_| |_| |_|_| |_|\___|_| |_|\__\__,_|_|
あなたも変態。 いいね? あなたも変態。 いいね?
[![Build Status](https://travis-ci.org/RicterZ/nhentai.svg?branch=master)](https://travis-ci.org/RicterZ/nhentai) [![Build Status](https://travis-ci.org/RicterZ/nhentai.svg?branch=master)](https://travis-ci.org/RicterZ/nhentai) ![nhentai PyPI Downloads](https://pypistats.com/badge/nhentai.svg)
🎉🎉 nhentai 现在支持 Windows 啦!
由于 [http://nhentai.net](http://nhentai.net) 下载下来的种子速度很慢,而且官方也提供在线观看本子的功能,所以可以利用本脚本下载本子。 nHentai is a CLI tool for downloading doujinshi from [nhentai.net](http://nhentai.net).
### Installation ### Installation
@ -25,36 +24,45 @@ nhentai
sudo emerge net-misc/nhentai sudo emerge net-misc/nhentai
### Usage ### Usage
下载指定 id 列表的本子: Download specified doujinshi:
```bash ```bash
nhentai --id=123855,123866 nhentai --id=123855,123866
``` ```
下载某关键词第一页的本子: Search a keyword and download the first page:
```bash ```bash
nhentai --search="tomori" --page=1 --download nhentai --search="tomori" --page=1 --download
``` ```
下载用户 favorites 内容: Download your favourite doujinshi (login required):
```bash ```bash
nhentai --login "username:password" --download nhentai --login "username:password" --download
``` ```
Download by tag name:
```bash
nhentai --tag lolicon --download
```
### Options ### Options
`-t, --thread`:指定下载的线程数,最多为 10 线程。 + `-t, --thread`: Download threads, max: 10
`--path`:指定下载文件的输出路径,默认为当前目录。 + `--output`:Output dir of saving doujinshi
`--timeout`:指定下载图片的超时时间,默认为 30 秒。 + `--tag`:Download by tag name
`--proxy`:指定下载的代理,例如: http://127.0.0.1:8080/ + `--timeout`: Timeout of downloading each image
`--login`nhentai 账号的“用户名:密码”组合 + `--proxy`: Use proxy, example: http://127.0.0.1:8080/
`--nohtml`nhentai Don't generate HTML + `--login`: username:password pair of your nhentai account
`--cbz`nhentai Generate Comic Book CBZ file + `--nohtml`: Do not generate HTML
+ `--cbz`: Generate Comic Book CBZ File
### nHentai Mirror ### nHentai Mirror
如果想用自建镜像下载 nhentai 的本子,需要搭建 nhentai.neti.nhentai.net 的反向代理。 If you want to use a mirror, you should set up a reverse proxy of `nhentai.net` and `i.nhentai.net`.
例如用 h.loli.club 来做反向代理的话,需要 h.loli.club 反代 nhentai.neti.h.loli.club 反带 i.nhentai.net。 For example:
然后利用环境变量来下载:
i.h.loli.club -> i.nhentai.net
h.loli.club -> nhentai.net
Set `NHENTAI` env var to your nhentai mirror.
```bash ```bash
NHENTAI=http://h.loli.club nhentai --id 123456 NHENTAI=http://h.loli.club nhentai --id 123456
``` ```

View File

@ -1,3 +1,3 @@
__version__ = '0.2.12' __version__ = '0.2.19'
__author__ = 'RicterZ' __author__ = 'RicterZ'
__email__ = 'ricterzheng@gmail.com' __email__ = 'ricterzheng@gmail.com'

View File

@ -13,8 +13,12 @@ from nhentai.utils import urlparse, generate_html
from nhentai.logger import logger from nhentai.logger import logger
try: try:
reload(sys) if sys.version_info < (3, 0, 0):
sys.setdefaultencoding(sys.stdin.encoding) import codecs
import locale
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
sys.stderr = codecs.getwriter(locale.getpreferredencoding())(sys.stderr)
except NameError: except NameError:
# python3 # python3
pass pass
@ -36,21 +40,23 @@ def cmd_parser():
'\n\nEnvironment Variable:\n' '\n\nEnvironment Variable:\n'
' NHENTAI nhentai mirror url') ' NHENTAI nhentai mirror url')
parser.add_option('--download', dest='is_download', action='store_true', parser.add_option('--download', dest='is_download', action='store_true',
help='download doujinshi (for search result)') help='download doujinshi (for search results)')
parser.add_option('--show-info', dest='is_show', action='store_true', help='just show the doujinshi information') parser.add_option('--show-info', dest='is_show', action='store_true', help='just show the doujinshi information')
parser.add_option('--id', type='string', dest='id', action='store', help='doujinshi ids set, e.g. 1,2,3') parser.add_option('--id', type='string', dest='id', action='store', help='doujinshi ids set, e.g. 1,2,3')
parser.add_option('--search', type='string', dest='keyword', action='store', help='search doujinshi by keyword') parser.add_option('--search', type='string', dest='keyword', action='store', help='search doujinshi by keyword')
parser.add_option('--page', type='int', dest='page', action='store', default=1, parser.add_option('--page', type='int', dest='page', action='store', default=1,
help='page number of search result') help='page number of search results')
parser.add_option('--tags', type='string', dest='tags', action='store', help='download doujinshi by tags') parser.add_option('--tag', type='string', dest='tag', action='store', help='download doujinshi by tag')
parser.add_option('--max-page', type='int', dest='max_page', action='store', default=1,
help='The max page when recursive download tagged doujinshi')
parser.add_option('--output', type='string', dest='output_dir', action='store', default='', parser.add_option('--output', type='string', dest='output_dir', action='store', default='',
help='output dir') help='output dir')
parser.add_option('--threads', '-t', type='int', dest='threads', action='store', default=5, parser.add_option('--threads', '-t', type='int', dest='threads', action='store', default=5,
help='thread count of download doujinshi') help='thread count for downloading doujinshi')
parser.add_option('--timeout', type='int', dest='timeout', action='store', default=30, parser.add_option('--timeout', type='int', dest='timeout', action='store', default=30,
help='timeout of download doujinshi') help='timeout for downloading doujinshi')
parser.add_option('--proxy', type='string', dest='proxy', action='store', default='', parser.add_option('--proxy', type='string', dest='proxy', action='store', default='',
help='use proxy, example: http://127.0.0.1:1080') help='uses a proxy, for example: http://127.0.0.1:1080')
parser.add_option('--html', dest='html_viewer', action='store_true', parser.add_option('--html', dest='html_viewer', action='store_true',
help='generate a html viewer at current directory') help='generate a html viewer at current directory')
@ -62,6 +68,8 @@ def cmd_parser():
parser.add_option('--cbz', dest='is_cbz', action='store_true', parser.add_option('--cbz', dest='is_cbz', action='store_true',
help='Generate Comic Book CBZ File') help='Generate Comic Book CBZ File')
parser.add_option('--rm-origin-dir', dest='rm_origin_dir', action='store_true', default=False,
help='Remove downloaded doujinshi dir when generated CBZ file.')
try: try:
sys.argv = list(map(lambda x: unicode(x.decode(sys.stdin.encoding)), sys.argv)) sys.argv = list(map(lambda x: unicode(x.decode(sys.stdin.encoding)), sys.argv))
@ -86,20 +94,17 @@ def cmd_parser():
if not args.is_download: if not args.is_download:
logger.warning('YOU DO NOT SPECIFY `--download` OPTION !!!') logger.warning('YOU DO NOT SPECIFY `--download` OPTION !!!')
if args.tags:
logger.warning('`--tags` is under construction')
exit(1)
if args.id: if args.id:
_ = map(lambda id: id.strip(), args.id.split(',')) _ = map(lambda id: id.strip(), args.id.split(','))
args.id = set(map(int, filter(lambda id_: id_.isdigit(), _))) args.id = set(map(int, filter(lambda id_: id_.isdigit(), _)))
if (args.is_download or args.is_show) and not args.id and not args.keyword and not args.login: if (args.is_download or args.is_show) and not args.id and not args.keyword and \
not args.login and not args.tag:
logger.critical('Doujinshi id(s) are required for downloading') logger.critical('Doujinshi id(s) are required for downloading')
parser.print_help() parser.print_help()
exit(1) exit(1)
if not args.keyword and not args.id and not args.login: if not args.keyword and not args.id and not args.login and not args.tag:
parser.print_help() parser.print_help()
exit(1) exit(1)

View File

@ -5,7 +5,7 @@ import signal
import platform import platform
from nhentai.cmdline import cmd_parser, banner from nhentai.cmdline import cmd_parser, banner
from nhentai.parser import doujinshi_parser, search_parser, print_doujinshi, login_parser from nhentai.parser import doujinshi_parser, search_parser, print_doujinshi, login_parser, tag_parser
from nhentai.doujinshi import Doujinshi from nhentai.doujinshi import Doujinshi
from nhentai.downloader import Downloader from nhentai.downloader import Downloader
from nhentai.logger import logger from nhentai.logger import logger
@ -23,16 +23,23 @@ def main():
if options.login: if options.login:
username, password = options.login.split(':', 1) username, password = options.login.split(':', 1)
logger.info('Login to nhentai use credential \'%s:%s\'' % (username, '*' * len(password))) logger.info('Logging in to nhentai using credential pair \'%s:%s\'' % (username, '*' * len(password)))
for doujinshi_info in login_parser(username=username, password=password): for doujinshi_info in login_parser(username=username, password=password):
doujinshi_list.append(Doujinshi(**doujinshi_info)) doujinshi_list.append(Doujinshi(**doujinshi_info))
if options.tag:
doujinshis = tag_parser(options.tag, max_page=options.max_page)
print_doujinshi(doujinshis)
if options.is_download:
doujinshi_ids = map(lambda d: d['id'], doujinshis)
if options.keyword: if options.keyword:
doujinshis = search_parser(options.keyword, options.page) doujinshis = search_parser(options.keyword, options.page)
print_doujinshi(doujinshis) print_doujinshi(doujinshis)
if options.is_download: if options.is_download:
doujinshi_ids = map(lambda d: d['id'], doujinshis) doujinshi_ids = map(lambda d: d['id'], doujinshis)
else:
if not doujinshi_ids:
doujinshi_ids = options.id doujinshi_ids = options.id
if doujinshi_ids: if doujinshi_ids:
@ -50,7 +57,7 @@ def main():
if not options.is_nohtml and not options.is_cbz: if not options.is_nohtml and not options.is_cbz:
generate_html(options.output_dir, doujinshi) generate_html(options.output_dir, doujinshi)
elif options.is_cbz: elif options.is_cbz:
generate_cbz(options.output_dir, doujinshi) generate_cbz(options.output_dir, doujinshi, options.rm_origin_dir)
if not platform.system() == 'Windows': if not platform.system() == 'Windows':
logger.log(15, '🍻 All done.') logger.log(15, '🍻 All done.')
@ -62,7 +69,7 @@ def main():
def signal_handler(signal, frame): def signal_handler(signal, frame):
logger.error('Ctrl-C signal received. Quit.') logger.error('Ctrl-C signal received. Stopping...')
exit(1) exit(1)

View File

@ -5,8 +5,14 @@ from nhentai.utils import urlparse
BASE_URL = os.getenv('NHENTAI', 'https://nhentai.net') BASE_URL = os.getenv('NHENTAI', 'https://nhentai.net')
DETAIL_URL = '%s/api/gallery' % BASE_URL __api_suspended_DETAIL_URL = '%s/api/gallery' % BASE_URL
SEARCH_URL = '%s/api/galleries/search' % BASE_URL __api_suspended_SEARCH_URL = '%s/api/galleries/search' % BASE_URL
DETAIL_URL = '%s/g' % BASE_URL
SEARCH_URL = '%s/search/' % BASE_URL
TAG_URL = '%s/tag' % BASE_URL
TAG_API_URL = '%s/api/galleries/tagged' % BASE_URL
LOGIN_URL = '%s/login/' % BASE_URL LOGIN_URL = '%s/login/' % BASE_URL
FAV_URL = '%s/favorites/' % BASE_URL FAV_URL = '%s/favorites/' % BASE_URL

View File

@ -11,6 +11,7 @@ from nhentai.utils import format_filename
EXT_MAP = { EXT_MAP = {
'j': 'jpg', 'j': 'jpg',
'p': 'png', 'p': 'png',
'g': 'gif',
} }
@ -35,6 +36,7 @@ class Doujinshi(object):
self.downloader = None self.downloader = None
self.url = '%s/%d' % (DETAIL_URL, self.id) self.url = '%s/%d' % (DETAIL_URL, self.id)
self.info = DoujinshiInfo(**kwargs) self.info = DoujinshiInfo(**kwargs)
self.filename = format_filename('[%s][%s][%s]' % (self.id, self.info.artist, self.name))
def __repr__(self): def __repr__(self):
return '<Doujinshi: {0}>'.format(self.name) return '<Doujinshi: {0}>'.format(self.name)
@ -43,25 +45,32 @@ class Doujinshi(object):
table = [ table = [
["Doujinshi", self.name], ["Doujinshi", self.name],
["Subtitle", self.info.subtitle], ["Subtitle", self.info.subtitle],
["Characters", self.info.characters], ["Characters", self.info.character],
["Authors", self.info.artists], ["Authors", self.info.artist],
["Language", self.info.language], ["Language", self.info.language],
["Tags", self.info.tags], ["Tags", ', '.join(self.info.tag.keys())],
["URL", self.url], ["URL", self.url],
["Pages", self.pages], ["Pages", self.pages],
] ]
logger.info(u'Print doujinshi information of {0}\n{1}'.format(self.id, tabulate(table))) logger.info(u'Print doujinshi information of {0}\n{1}'.format(self.id, tabulate(table)))
def download(self): def download(self):
logger.info('Start download doujinshi: %s' % self.name) logger.info('Starting to download doujinshi: %s' % self.name)
if self.downloader: if self.downloader:
download_queue = [] download_queue = []
for i in range(1, self.pages + 1):
download_queue.append('%s/%d/%d.%s' % (IMAGE_URL, int(self.img_id), i, self.ext))
self.downloader.download(download_queue, self.filename)
'''
for i in range(len(self.ext)): for i in range(len(self.ext)):
download_queue.append('%s/%d/%d.%s' % (IMAGE_URL, int(self.img_id), i+1, EXT_MAP[self.ext[i]])) download_queue.append('%s/%d/%d.%s' % (IMAGE_URL, int(self.img_id), i+1, EXT_MAP[self.ext[i]]))
'''
self.downloader.download(download_queue, format_filename('%s-%s' % (self.id, self.name[:200])))
else: else:
logger.critical('Downloader has not be loaded') logger.critical('Downloader has not been loaded')
if __name__ == '__main__': if __name__ == '__main__':

View File

@ -29,22 +29,40 @@ class Downloader(Singleton):
self.path = str(path) self.path = str(path)
self.thread_count = thread self.thread_count = thread
self.threads = [] self.threads = []
self.thread_pool = None
self.timeout = timeout self.timeout = timeout
def _download(self, url, folder='', filename='', retried=0): def _download(self, url, folder='', filename='', retried=0):
logger.info('Start downloading: {0} ...'.format(url)) logger.info('Starting to download {0} ...'.format(url))
filename = filename if filename else os.path.basename(urlparse(url).path) filename = filename if filename else os.path.basename(urlparse(url).path)
base_filename, extension = os.path.splitext(filename) base_filename, extension = os.path.splitext(filename)
try: try:
if os.path.exists(os.path.join(folder, base_filename.zfill(3) + extension)): if os.path.exists(os.path.join(folder, base_filename.zfill(3) + extension)):
logger.warning('File: {0} existed, ignore.'.format(os.path.join(folder, base_filename.zfill(3) + logger.warning('File: {0} exists, ignoring'.format(os.path.join(folder, base_filename.zfill(3) +
extension))) extension)))
return 1, url return 1, url
response = None
with open(os.path.join(folder, base_filename.zfill(3) + extension), "wb") as f: with open(os.path.join(folder, base_filename.zfill(3) + extension), "wb") as f:
i = 0
while i < 10:
try:
response = request('get', url, stream=True, timeout=self.timeout) response = request('get', url, stream=True, timeout=self.timeout)
if response.status_code != 200: if response.status_code != 200:
raise NhentaiImageNotExistException raise NhentaiImageNotExistException
except NhentaiImageNotExistException as e:
raise e
except Exception as e:
i += 1
if not i < 10:
logger.critical(str(e))
return 0, None
continue
break
length = response.headers.get('content-length') length = response.headers.get('content-length')
if length is None: if length is None:
f.write(response.content) f.write(response.content)
@ -77,7 +95,7 @@ class Downloader(Singleton):
elif result == -1: elif result == -1:
logger.warning('url {} return status code 404'.format(data)) logger.warning('url {} return status code 404'.format(data))
else: else:
logger.log(15, '{0} download successfully'.format(data)) logger.log(15, '{0} downloaded successfully'.format(data))
def download(self, queue, folder=''): def download(self, queue, folder=''):
if not isinstance(folder, text): if not isinstance(folder, text):
@ -87,7 +105,7 @@ class Downloader(Singleton):
folder = os.path.join(self.path, folder) folder = os.path.join(self.path, folder)
if not os.path.exists(folder): if not os.path.exists(folder):
logger.warn('Path \'{0}\' not exist.'.format(folder)) logger.warn('Path \'{0}\' does not exist, creating.'.format(folder))
try: try:
os.makedirs(folder) os.makedirs(folder)
except EnvironmentError as e: except EnvironmentError as e:

View File

@ -104,6 +104,9 @@ class ColorizingStreamHandler(logging.StreamHandler):
text = parts.pop(0) text = parts.pop(0)
if text: if text:
if sys.version_info < (3, 0, 0):
write(text.encode('utf-8'))
else:
write(text) write(text)
if parts: if parts:

View File

@ -5,6 +5,7 @@ import os
import re import re
import threadpool import threadpool
import requests import requests
import time
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
from tabulate import tabulate from tabulate import tabulate
@ -27,7 +28,7 @@ def login_parser(username, password):
s.get(constant.LOGIN_URL) s.get(constant.LOGIN_URL)
content = s.get(constant.LOGIN_URL).content content = s.get(constant.LOGIN_URL).content
html = BeautifulSoup(content, 'html.parser').encode("ascii") html = BeautifulSoup(content, 'html.parser')
csrf_token_elem = html.find('input', attrs={'name': 'csrfmiddlewaretoken'}) csrf_token_elem = html.find('input', attrs={'name': 'csrfmiddlewaretoken'})
if not csrf_token_elem: if not csrf_token_elem:
@ -40,19 +41,27 @@ def login_parser(username, password):
'password': password, 'password': password,
} }
resp = s.post(constant.LOGIN_URL, data=login_dict) resp = s.post(constant.LOGIN_URL, data=login_dict)
if 'Invalid username (or email) or password' in resp.text: if 'Invalid username/email or password' in resp.text:
logger.error('Login failed, please check your username and password') logger.error('Login failed, please check your username and password')
exit(1) exit(1)
html = BeautifulSoup(s.get(constant.FAV_URL).content, 'html.parser').encode("ascii") html = BeautifulSoup(s.get(constant.FAV_URL).content, 'html.parser')
count = html.find('span', attrs={'class': 'count'}) count = html.find('span', attrs={'class': 'count'})
if not count: if not count:
logger.error('Cannot get count of your favorites, maybe login failed.') logger.error("Can't get your number of favorited doujins. Did the login failed?")
count = int(count.text.strip('(').strip(')')) count = int(count.text.strip('(').strip(')').replace(',', ''))
pages = count / 25 if count == 0:
logger.warning('No favorites found')
return []
pages = int(count / 25)
if pages:
pages += 1 if count % (25 * pages) else 0 pages += 1 if count % (25 * pages) else 0
logger.info('Your have %d favorites in %d pages.' % (count, pages)) else:
pages = 1
logger.info('You have %d favorites in %d pages.' % (count, pages))
if os.getenv('DEBUG'): if os.getenv('DEBUG'):
pages = 1 pages = 1
@ -67,8 +76,8 @@ def login_parser(username, password):
for page in range(1, pages + 1): for page in range(1, pages + 1):
try: try:
logger.info('Getting doujinshi id of page %d' % page) logger.info('Getting doujinshi ids of page %d' % page)
resp = s.get(constant.FAV_URL + '?page=%d' % page).content resp = s.get(constant.FAV_URL + '?page=%d' % page).text
ids = doujinshi_id.findall(resp) ids = doujinshi_id.findall(resp)
requests_ = threadpool.makeRequests(doujinshi_parser, ids, _callback) requests_ = threadpool.makeRequests(doujinshi_parser, ids, _callback)
[thread_pool.putRequest(req) for req in requests_] [thread_pool.putRequest(req) for req in requests_]
@ -87,29 +96,49 @@ def doujinshi_parser(id_):
logger.log(15, 'Fetching doujinshi information of id {0}'.format(id_)) logger.log(15, 'Fetching doujinshi information of id {0}'.format(id_))
doujinshi = dict() doujinshi = dict()
doujinshi['id'] = id_ doujinshi['id'] = id_
url = '{0}/{1}'.format(constant.DETAIL_URL, id_) url = '{0}/{1}/'.format(constant.DETAIL_URL, id_)
try: try:
response = request('get', url).json() response = request('get', url).content
except Exception as e: except Exception as e:
logger.critical(str(e)) logger.critical(str(e))
raise SystemExit
html = BeautifulSoup(response, 'html.parser')
doujinshi_info = html.find('div', attrs={'id': 'info'})
title = doujinshi_info.find('h1').text
subtitle = doujinshi_info.find('h2')
doujinshi['name'] = title
doujinshi['subtitle'] = subtitle.text if subtitle else ''
doujinshi_cover = html.find('div', attrs={'id': 'cover'})
img_id = re.search('/galleries/([\d]+)/cover\.(jpg|png)$', doujinshi_cover.a.img.attrs['data-src'])
if not img_id:
logger.critical('Tried yo get image id failed')
exit(1) exit(1)
doujinshi['name'] = response['title']['english'] doujinshi['img_id'] = img_id.group(1)
doujinshi['subtitle'] = response['title']['japanese'] doujinshi['ext'] = img_id.group(2)
doujinshi['img_id'] = response['media_id']
doujinshi['ext'] = ''.join(map(lambda s: s['t'], response['images']['pages'])) pages = 0
doujinshi['pages'] = len(response['images']['pages']) for _ in doujinshi_info.find_all('div', class_=''):
pages = re.search('([\d]+) pages', _.text)
if pages:
pages = pages.group(1)
break
doujinshi['pages'] = int(pages)
# gain information of the doujinshi # gain information of the doujinshi
needed_fields = ['character', 'artist', 'language'] information_fields = doujinshi_info.find_all('div', attrs={'class': 'field-name'})
for tag in response['tags']: needed_fields = ['Characters', 'Artists', 'Language', 'Tags']
tag_type = tag['type'] for field in information_fields:
if tag_type in needed_fields: field_name = field.contents[0].strip().strip(':')
if tag_type not in doujinshi: if field_name in needed_fields:
doujinshi[tag_type] = tag['name'] data = [sub_field.contents[0].strip() for sub_field in
else: field.find_all('a', attrs={'class': 'tag'})]
doujinshi[tag_type] += tag['name'] doujinshi[field_name.lower()] = ', '.join(data)
return doujinshi return doujinshi
@ -118,13 +147,91 @@ def search_parser(keyword, page):
logger.debug('Searching doujinshis of keyword {0}'.format(keyword)) logger.debug('Searching doujinshis of keyword {0}'.format(keyword))
result = [] result = []
try: try:
response = request('get', url=constant.SEARCH_URL, params={'query': keyword, 'page': page}).json() response = request('get', url=constant.SEARCH_URL, params={'q': keyword, 'page': page}).content
if 'result' not in response:
raise Exception('No result in response')
except requests.ConnectionError as e: except requests.ConnectionError as e:
logger.critical(e) logger.critical(e)
logger.warn('If you are in China, please configure the proxy to fu*k GFW.') logger.warn('If you are in China, please configure the proxy to fu*k GFW.')
raise SystemExit
html = BeautifulSoup(response, 'html.parser')
doujinshi_search_result = html.find_all('div', attrs={'class': 'gallery'})
for doujinshi in doujinshi_search_result:
doujinshi_container = doujinshi.find('div', attrs={'class': 'caption'})
title = doujinshi_container.text.strip()
title = title if len(title) < 85 else title[:82] + '...'
id_ = re.search('/g/(\d+)/', doujinshi.a['href']).group(1)
result.append({'id': id_, 'title': title})
if not result:
logger.warn('Not found anything of keyword {}'.format(keyword))
return result
def __api_suspended_doujinshi_parser(id_):
if not isinstance(id_, (int,)) and (isinstance(id_, (str,)) and not id_.isdigit()):
raise Exception('Doujinshi id({0}) is not valid'.format(id_))
id_ = int(id_)
logger.log(15, 'Fetching information of doujinshi id {0}'.format(id_))
doujinshi = dict()
doujinshi['id'] = id_
url = '{0}/{1}'.format(constant.DETAIL_URL, id_)
i = 0
while 5 > i:
try:
response = request('get', url).json()
except Exception as e:
i += 1
if not i < 5:
logger.critical(str(e))
exit(1) exit(1)
continue
break
doujinshi['name'] = response['title']['english']
doujinshi['subtitle'] = response['title']['japanese']
doujinshi['img_id'] = response['media_id']
doujinshi['ext'] = ''.join(map(lambda s: s['t'], response['images']['pages']))
doujinshi['pages'] = len(response['images']['pages'])
# gain information of the doujinshi
needed_fields = ['character', 'artist', 'language', 'tag']
for tag in response['tags']:
tag_type = tag['type']
if tag_type in needed_fields:
if tag_type == 'tag':
if tag_type not in doujinshi:
doujinshi[tag_type] = {}
tag['name'] = tag['name'].replace(' ', '-')
tag['name'] = tag['name'].lower()
doujinshi[tag_type][tag['name']] = tag['id']
elif tag_type not in doujinshi:
doujinshi[tag_type] = tag['name']
else:
doujinshi[tag_type] += ', ' + tag['name']
return doujinshi
def __api_suspended_search_parser(keyword, page):
logger.debug('Searching doujinshis using keywords {0}'.format(keyword))
result = []
i = 0
while i < 5:
try:
response = request('get', url=constant.SEARCH_URL, params={'query': keyword, 'page': page}).json()
except Exception as e:
i += 1
if not i < 5:
logger.critical(str(e))
logger.warn('If you are in China, please configure the proxy to fu*k GFW.')
exit(1)
continue
break
if 'result' not in response:
raise Exception('No result in response')
for row in response['result']: for row in response['result']:
title = row['title']['english'] title = row['title']['english']
@ -132,7 +239,7 @@ def search_parser(keyword, page):
result.append({'id': row['id'], 'title': title}) result.append({'id': row['id'], 'title': title})
if not result: if not result:
logger.warn('Not found anything of keyword {}'.format(keyword)) logger.warn('No results for keywords {}'.format(keyword))
return result return result
@ -146,5 +253,55 @@ def print_doujinshi(doujinshi_list):
tabulate(tabular_data=doujinshi_list, headers=headers, tablefmt='rst')) tabulate(tabular_data=doujinshi_list, headers=headers, tablefmt='rst'))
def __api_suspended_tag_parser(tag_id, max_page=1):
logger.info('Searching for doujinshi with tag id {0}'.format(tag_id))
result = []
response = request('get', url=constant.TAG_API_URL, params={'sort': 'popular', 'tag_id': tag_id}).json()
page = max_page if max_page <= response['num_pages'] else int(response['num_pages'])
for i in range(1, page + 1):
logger.info('Getting page {} ...'.format(i))
if page != 1:
response = request('get', url=constant.TAG_API_URL,
params={'sort': 'popular', 'tag_id': tag_id}).json()
for row in response['result']:
title = row['title']['english']
title = title[:85] + '..' if len(title) > 85 else title
result.append({'id': row['id'], 'title': title})
if not result:
logger.warn('No results for tag id {}'.format(tag_id))
return result
def tag_parser(tag_name, max_page=1):
result = []
tag_name = tag_name.lower()
tag_name = tag_name.replace(' ', '-')
for p in range(1, max_page + 1):
logger.debug('Fetching page {0} for doujinshi with tag \'{1}\''.format(p, tag_name))
response = request('get', url='%s/%s?page=%d' % (constant.TAG_URL, tag_name, p)).content
html = BeautifulSoup(response, 'html.parser')
doujinshi_items = html.find_all('div', attrs={'class': 'gallery'})
if not doujinshi_items:
logger.error('Cannot find doujinshi id of tag \'{0}\''.format(tag_name))
return
for i in doujinshi_items:
doujinshi_id = i.a.attrs['href'].strip('/g')
doujinshi_title = i.a.text.strip()
doujinshi_title = doujinshi_title if len(doujinshi_title) < 85 else doujinshi_title[:82] + '...'
result.append({'title': doujinshi_title, 'id': doujinshi_id})
if not result:
logger.warn('No results for tag \'{}\''.format(tag_name))
return result
if __name__ == '__main__': if __name__ == '__main__':
print(doujinshi_parser("32271")) print(doujinshi_parser("32271"))

View File

@ -1,6 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals, print_function from __future__ import unicode_literals, print_function
import sys
import os import os
import string import string
import zipfile import zipfile
@ -30,18 +31,19 @@ def urlparse(url):
return urlparse(url) return urlparse(url)
def readfile(path): def readfile(path):
loc = os.path.dirname(__file__) loc = os.path.dirname(__file__)
with open(os.path.join(loc, path), 'r') as file: with open(os.path.join(loc, path), 'r') as file:
return file.read() return file.read()
def generate_html(output_dir='.', doujinshi_obj=None): def generate_html(output_dir='.', doujinshi_obj=None):
image_html = '' image_html = ''
if doujinshi_obj is not None: if doujinshi_obj is not None:
doujinshi_dir = os.path.join(output_dir, format_filename('%s-%s' % (doujinshi_obj.id, doujinshi_dir = os.path.join(output_dir, doujinshi_obj.filename)
str(doujinshi_obj.name[:200]))))
else: else:
doujinshi_dir = '.' doujinshi_dir = '.'
@ -61,22 +63,29 @@ def generate_html(output_dir='.', doujinshi_obj=None):
if doujinshi_obj is not None: if doujinshi_obj is not None:
title = doujinshi_obj.name title = doujinshi_obj.name
if sys.version_info < (3, 0):
title = title.encode('utf-8')
else: else:
title = 'nHentai HTML Viewer' title = 'nHentai HTML Viewer'
data = html.format(TITLE=title, IMAGES=image_html, SCRIPTS=js, STYLES=css) data = html.format(TITLE=title, IMAGES=image_html, SCRIPTS=js, STYLES=css)
try:
if sys.version_info < (3, 0):
with open(os.path.join(doujinshi_dir, 'index.html'), 'w') as f: with open(os.path.join(doujinshi_dir, 'index.html'), 'w') as f:
f.write(data) f.write(data)
else:
with open(os.path.join(doujinshi_dir, 'index.html'), 'wb') as f:
f.write(data.encode('utf-8'))
logger.log(15, 'HTML Viewer has been write to \'{0}\''.format(os.path.join(doujinshi_dir, 'index.html'))) logger.log(15, 'HTML Viewer has been write to \'{0}\''.format(os.path.join(doujinshi_dir, 'index.html')))
except Exception as e:
logger.warning('Writen HTML Viewer failed ({})'.format(str(e)))
def generate_cbz(output_dir='.', doujinshi_obj=None): def generate_cbz(output_dir='.', doujinshi_obj=None, rm_origin_dir=False):
if doujinshi_obj is not None: if doujinshi_obj is not None:
doujinshi_dir = os.path.join(output_dir, format_filename('%s-%s' % (doujinshi_obj.id, doujinshi_dir = os.path.join(output_dir, doujinshi_obj.filename)
str(doujinshi_obj.name[:200])))) cbz_filename = os.path.join(os.path.join(doujinshi_dir, '..'), '%s.cbz' % doujinshi_obj.id)
cbz_filename = os.path.join(output_dir, format_filename('%s-%s.cbz' % (doujinshi_obj.id,
str(doujinshi_obj.name[:200]))))
else: else:
cbz_filename = './doujinshi.cbz' cbz_filename = './doujinshi.cbz'
doujinshi_dir = '.' doujinshi_dir = '.'
@ -84,20 +93,18 @@ def generate_cbz(output_dir='.', doujinshi_obj=None):
file_list = os.listdir(doujinshi_dir) file_list = os.listdir(doujinshi_dir)
file_list.sort() file_list.sort()
logger.info('Writing CBZ file to path: {}'.format(cbz_filename))
with zipfile.ZipFile(cbz_filename, 'w') as cbz_pf: with zipfile.ZipFile(cbz_filename, 'w') as cbz_pf:
for image in file_list: for image in file_list:
image_path = os.path.join(doujinshi_dir, image) image_path = os.path.join(doujinshi_dir, image)
cbz_pf.write(image_path, image) cbz_pf.write(image_path, image)
if rm_origin_dir:
shutil.rmtree(doujinshi_dir, ignore_errors=True) shutil.rmtree(doujinshi_dir, ignore_errors=True)
logger.log(15, 'Comic Book CBZ file has been write to \'{0}\''.format(doujinshi_dir)) logger.log(15, 'Comic Book CBZ file has been write to \'{0}\''.format(doujinshi_dir))
def format_filename(s): def format_filename(s):
"""Take a string and return a valid filename constructed from the string. """Take a string and return a valid filename constructed from the string.
Uses a whitelist approach: any characters not present in valid_chars are Uses a whitelist approach: any characters not present in valid_chars are
@ -109,7 +116,12 @@ and append a file extension like '.txt', so I avoid the potential of using
an invalid filename. an invalid filename.
""" """
valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits) valid_chars = "-_.()[] %s%s" % (string.ascii_letters, string.digits)
filename = ''.join(c for c in s if c in valid_chars) filename = ''.join(c for c in s if c in valid_chars)
filename = filename.replace(' ', '_') # I don't like spaces in filenames. filename = filename.replace(' ', '_') # I don't like spaces in filenames.
if len(filename) > 100:
filename = filename[:100] + '...]'
# Remove [] from filename
filename = filename.replace('[]', '')
return filename return filename